Sunday, November 3, 2019

Extracting and Scraping Weather Data: Weather Forecast

Extracting and Scraping Weather Data


We can extract data from local weather site. You can extract weather information from this page.

  • First, explore the page structure with Chrome DevTools.
You can start the developer tools by pressing ctrl+shift+i or View -> Developer -> Developer Tools.
You can use the outermost element that contains all the text that corresponds to the extended forecast information.

In this case, the outermost tag in div tag with id seven-day-forecast. Under this comes the following section


Now, each forecast item (like "Tonight", "Thursday" and " Thursday Night") is contained in a div tag with the class tombstone-container.





  • Download the page and parse it.
1.      Download the web page containing the forecast.
2.      Create the BeautifulSoup  to parse.
3.     Find the div with the seven-day-forecast and asign it to seven_day.
4.     Inside seven_day,  find each individual forecast item.
5.     Extract and print the forecast item. 


  • Extract information from the page.
1.     tonight  has all the information. There are four pieces of information in tonight:
              a) Name of the forecast- ex. Tonight.
              b) Description of weather- ex. property of image.
              c) Short description of weather- ex. Mostly clear.
              d) Temperature- ex. 49 Degree.
Lets us extract forecast item, short description and temperature.


We can extract "title"  attribute from 'img'  tag. We can simply treat BeautifulSoup object like a dictionary and pass in attribute we wan as key:






  • Extract all the information from the page.

     Now, we will extract all the information fro the page. In the previous extraction, we only extracted single information ie.Tonight.
    Here we have extracted only the weekdays:

    We will also extract short description , temperature and description as well:






  • Combine the above data into a Pandas DataFrame.

    Now, we will combine the above extracted data into DataFrame (tabular format):



    • Let's play with the above data.
    a) You can use Regular Expression and Series.str.extract method to pull out numeric temperature values. 


    b)You can also find the mean of the temperatures:


    c) Select row that happens at night:




    More example on how to extract, scrape and combine data into DataFrame will be updated on this soon. Meanwhile, you should have good understanding of the above concepts and keep practising. 

    No comments:

    Post a Comment