BeautifulSoup

When we want to download website and scrape or get the data manually from the site, we use BeautifulSoup.

The command to run BeautifulSoup:

from requests import get

url='http://dataquestio.github.io/web-scraping-pages/simple.html'

response=get(url)

print(response.text[:500])

where,

2. Response is an object. This page has status_code property which means the page is downloaded successfully.

3. A status_code of 200 means page was downloaded successfully. 2 generally indicates success, 4 and 5 indicates an error.

4. To print the content of the page, use the content property.

5. Once we see what is inside the file, we use BeautifulSoup and look at different parts of the file.

6. We can select all the elements by using children property of soup. Children return a list generator, so we call list function on it.

7. We can use get_text method to extract all the text inside the tag.

8. If we want to extract a single tag line, we can use find_all method.

9. If we want to search for any tag that has class and id.

10. We can also search for items using css-selectors. Below finds all p tags that are inside div.

p a — finds all a tags inside of a p tag.
body p a — finds all a tags inside of a p tag inside of a body tag.
html body — finds all body tags inside of an html tag.
p.outer-text - finds all p tags with a class of outer-text.
p#first — finds all p tags with an id of first.
body p.outer-text — finds any p tags with a class of outer-text inside of a body tag.

Below is the example for BeautifulSoup:

Extracting and Scraping Weather Data

Python - Data Analytics