All articles · Published 2026-06-22 · 501 words · 2 min read · EN · RU · ES

How to Scrape Website Data to Excel with Python (Requests + Pandas)

To scrape data from a website and export it to Excel using Python, you need three core libraries: Requests (to download the webpage), BeautifulSoup (to parse the HTML), and Pandas (to structure the data and write it to an Excel file).

1. Install Required Libraries

Run the following command in your terminal. The openpyxl engine is required by Pandas to write .xlsx files.

pip install requests pandas beautifulsoup4 openpyxl

2. Complete Web Scraping Script

This script targets a webpage, extracts product names and prices from HTML elements, structures them into a Pandas DataFrame, and exports them to an Excel spreadsheet.

import requests
from bs4 import BeautifulSoup
import pandas as pd

# Define target URL and headers to mimic a real browser
url = "https://quotes.toscrape.com/"  # Example target website
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"
}

# 1. Fetch the webpage content
response = requests.get(url, headers=headers)

if response.status_code == 200:
    # 2. Parse HTML content
    soup = BeautifulSoup(response.content, "html.parser")
    
    # Storage for extracted data
    scraped_data = []
    
    # Find all quote containers on the page
    quotes = soup.find_all("div", class_="quote")
    
    for quote in quotes:
        text = quote.find("span", class_="text").text.strip()
        author = quote.find("small", class_="author").text.strip()
        
        # Append structured data as a dictionary
        scraped_data.append({
            "Quote": text,
            "Author": author
        })
    
    # 3. Load data into Pandas DataFrame
    df = pd.DataFrame(scraped_data)
    
    # 4. Export DataFrame to Excel
    output_file = "scraped_quotes.xlsx"
    df.to_excel(output_file, index=False)
    print(f"Data successfully saved to {output_file}")

else:
    print(f"Failed to retrieve webpage. Status code: {response.status_code}")

3. How the Code Works

  • HTTP Request: requests.get() downloads the HTML source code. The headers dictionary includes a User-Agent to prevent the server from blocking the script as a generic bot.
  • HTML Parsing: BeautifulSoup(response.content, "html.parser") converts the raw HTML into a searchable tree structure.
  • Data Extraction: soup.find_all() locates specific HTML tags by their classes. We loop through these elements to extract text using the .text attribute.
  • Excel Generation: pd.DataFrame(scraped_data) organizes the list of dictionaries into rows and columns. df.to_excel() writes this structured data to disk. Setting index=False prevents Pandas from adding an unnecessary row-index column to your spreadsheet.

Alternative: Scraping Clean HTML Tables Directly

If the target webpage contains data structured inside standard HTML <table> tags, Pandas can read and parse it directly without BeautifulSoup:

import pandas as pd

url = "https://www.w3schools.com/html/html_tables.asp"

# Read all HTML tables on the page into a list of DataFrames
tables = pd.read_html(url)

# Select the first table and save to Excel
if tables:
    df = tables[0]
    df.to_excel("html_table_output.xlsx", index=False)

Limitations and Best Practices

  • Dynamic Content: requests only fetches static HTML. If the website relies on JavaScript (React, Angular, Vue) to load data, you will need a browser automation tool like Playwright or Selenium instead of Requests.
  • Rate Limiting: Avoid sending too many requests in a short period. Insert delays using time.sleep() if scraping multiple pages to prevent your IP address from being temporarily or permanently blocked.

Need this done fast? order a scraping script on Kwork.

Need help with this?

I take on freelance fixes and builds in this area.