How to Scrape Website Data to Excel with Python (Requests + Pandas)
To scrape data from a website and export it to Excel using Python, you need three core libraries: Requests (to download the webpage), BeautifulSoup (to parse the HTML), and Pandas (to structure the data and write it to an Excel file).
1. Install Required Libraries
Run the following command in your terminal. The openpyxl engine is required by Pandas to write .xlsx files.
pip install requests pandas beautifulsoup4 openpyxl
2. Complete Web Scraping Script
This script targets a webpage, extracts product names and prices from HTML elements, structures them into a Pandas DataFrame, and exports them to an Excel spreadsheet.
import requests
from bs4 import BeautifulSoup
import pandas as pd
# Define target URL and headers to mimic a real browser
url = "https://quotes.toscrape.com/" # Example target website
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"
}
# 1. Fetch the webpage content
response = requests.get(url, headers=headers)
if response.status_code == 200:
# 2. Parse HTML content
soup = BeautifulSoup(response.content, "html.parser")
# Storage for extracted data
scraped_data = []
# Find all quote containers on the page
quotes = soup.find_all("div", class_="quote")
for quote in quotes:
text = quote.find("span", class_="text").text.strip()
author = quote.find("small", class_="author").text.strip()
# Append structured data as a dictionary
scraped_data.append({
"Quote": text,
"Author": author
})
# 3. Load data into Pandas DataFrame
df = pd.DataFrame(scraped_data)
# 4. Export DataFrame to Excel
output_file = "scraped_quotes.xlsx"
df.to_excel(output_file, index=False)
print(f"Data successfully saved to {output_file}")
else:
print(f"Failed to retrieve webpage. Status code: {response.status_code}")
3. How the Code Works
- HTTP Request:
requests.get()downloads the HTML source code. Theheadersdictionary includes aUser-Agentto prevent the server from blocking the script as a generic bot. - HTML Parsing:
BeautifulSoup(response.content, "html.parser")converts the raw HTML into a searchable tree structure. - Data Extraction:
soup.find_all()locates specific HTML tags by their classes. We loop through these elements to extract text using the.textattribute. - Excel Generation:
pd.DataFrame(scraped_data)organizes the list of dictionaries into rows and columns.df.to_excel()writes this structured data to disk. Settingindex=Falseprevents Pandas from adding an unnecessary row-index column to your spreadsheet.
Alternative: Scraping Clean HTML Tables Directly
If the target webpage contains data structured inside standard HTML <table> tags, Pandas can read and parse it directly without BeautifulSoup:
import pandas as pd
url = "https://www.w3schools.com/html/html_tables.asp"
# Read all HTML tables on the page into a list of DataFrames
tables = pd.read_html(url)
# Select the first table and save to Excel
if tables:
df = tables[0]
df.to_excel("html_table_output.xlsx", index=False)
Limitations and Best Practices
- Dynamic Content:
requestsonly fetches static HTML. If the website relies on JavaScript (React, Angular, Vue) to load data, you will need a browser automation tool like Playwright or Selenium instead of Requests. - Rate Limiting: Avoid sending too many requests in a short period. Insert delays using
time.sleep()if scraping multiple pages to prevent your IP address from being temporarily or permanently blocked.
Need this done fast? order a scraping script on Kwork.
Need help with this?
I take on freelance fixes and builds in this area.