{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Python BeautifulSoup Web Scraping Tutorial\n", "Learn to scrape data from the web using the Python BeautifulSoup bs4 library. \n", "BeautifulSoup makes it easy to parse useful data out of an HTML page. \n", "First install the bs4 library on your system by running at the command line, \n", "*pip install beautifulsoup4* or *easy_install beautifulsoup4* (or bs4) \n", "See [BeautifulSoup official documentation](https://www.crummy.com/software/BeautifulSoup/bs4/doc/) for the complete set of functions.\n", "\n", "### Import requests so we can fetch the html content of the webpage\n", "You can see our example page has about 28k characters." ] }, { "cell_type": "code", "execution_count": 61, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "28556\n" ] } ], "source": [ "import requests\n", "r = requests.get('https://www.usclimatedata.com/climate/united-states/us')\n", "print(len(r.text))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Import BeautifulSoup, and convert your HTML into a bs4 object\n", "Now we can access specific HTML tags on the page using dot, just like a JSON object." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "