Scrape Your First Website in Minutes with Python

Ever felt the need to pull out data from a website? What would you do? Visit the sites one after the other and gather information? Well that would work if you have a page or two. However, if you have lots of them, manual extraction will become too difficult a task; this is where web scraping comes to the rescue! -------------------------------------------------------------------------------- What is web scraping? Web scraping, as the name suggests, is a method of extracting data from web pa

December 15, 2021 2 Min Read
Scrape Your First Website in Minutes with Python
Scrape Your First Website in Minutes with Python

Codesphere

From everyone in the Codesphere Team:)

Table of Contents

Ever felt the need to pull out data from a website? What would you do? Visit the sites one after the other and gather information?

Well that would work if you have a page or two. However, if you have lots of them, manual extraction will become too difficult a task; this is where web scraping comes to the rescue!


What is web scraping?

Web scraping, as the name suggests, is a method of extracting data from web pages in an automated fashion. Scraping is super helpful in price comparisons, R&D, gathering data from social media, job listings, and more.

Many methods can be used to perform web scraping such as online services, APIs, or even writing your own script. And that’s why we are here. This article will teach you the basics of how to scrape data from the web. Before we get into that, let’s take a quick look at why we would even want to scrape data from the web.


Why do we need web scraping?

Websites, in general, have huge quantities of information. This information is mostly unstructured or cluttered. When users visit a website they only need a small percentage of what’s available.

While they can manually access it, the process is quite cumbersome, especially when repetition is involved (given that the data is dynamic and updated frequently). Hence, the need for web scraping.  

Once the script is set up for a particular webpage, it can be executed any number of times to extract data and use it as required.

Let’s get started!


Web scraping demo

This  script will extract weather data from  a webpage and save it to a .csv file. We will be using the following libraries to help us with the scraping and managing the extracted data:

  • Requests - This library is required to send an HTTP request to the web page. This will give us access to the HTML content of the webpage we want to scrape.
  • Beautiful Soup - This library gives us functions to help extract data from the HTML content we receive when we send an HTTP request.
  • Pandas - This library helps us manage the data that has been extracted. In this case we will use it to save our data to a .csv file.

In case you don’t have the aforementioned libraries installed, follow the commands given below to install them:

# Installing BeautifulSoup

pip install beautifulsoup4


# Installing requests

pip install requests

# Installing Pandas

pip install pandas


Writing the Code

Once you have the libraries installed, follow the steps given below to scrape data from web in python3

  1. Start by importing all the libraries.
  2. Send an HTTP request to the webpage using its URL. Make sure the response code is 200 which means the request was successful.
  3. Use the BeautifulSoup function to extract the raw HTML from the response received.
  4. From the raw HTML, extract the data we need using different selectors. The selectors used here are ‘class’ and ‘id’.
  5. Save the extracted data into a pandas dataframe in the form of a python dictionary.
  6. Save the dataframe to a csv file. Note: We are using the utf-16BE encoding to render the degree symbol properly in the csv file.


Once you have your code ready, you can deploy it directly to the cloud using Codesphere. Codesphere let’s you avoid the hassle of config so that you can spend more time doing what you do best: Actually coding!

Let us know what you’re going to scrape down below!

Till then, happy coding.

About the Author

Scrape Your First Website in Minutes with Python

Codesphere

From everyone in the Codesphere Team:)

We are building the next generation of Cloud, combining Infrastructure and IDE in one place, enabling a seamless DevEx and eliminating the need for DevOps specialists.

More Posts

Cloud Native Meetup Recap

Cloud Native Meetup Recap

Karlsruhe offers a vibrant tech scene and we are proud to be part of a group organizing expert & community meetups like this one.

Full Metal

Full Metal

Buying a used server on ebay kleinanzeigen and preparing it to be cloudified? Follow along to see what it takes to get a piece of metal running.

Structure PDF Table Data for AI Applications with GMFT

Structure PDF Table Data for AI Applications with GMFT

GMFT is a fast, lightweight toolkit for extracting tables from PDFs into formats like CSV, JSON, and Pandas DataFrames. Leveraging Microsoft's Table Transformer, GMFT efficiently processes both text and image tables, ensuring high performance for reliable data extraction.