Beautiful Soup (HTML parser)

From Infogalactic: the planetary knowledge core
Jump to: navigation, search

<templatestyles src="Module:Hatnote/styles.css"></templatestyles>

Beautiful Soup
Original author(s) Leonard Richardson
Stable release 4.4.1 / September 28, 2015; 8 years ago (2015-09-28)
Written in Python
Platform Python
Type HTML parser library, Web scraping
License Python Software Foundation License (Beautiful Soup 3 - an older version) MIT License 4+[1]
Website www.crummy.com/software/BeautifulSoup/

Beautiful Soup is a Python package for parsing HTML and XML documents (including having malformed markup, i.e. non-closed tags, so named after tag soup). It creates a parse tree for parsed pages that can be used to extract data from HTML, which is useful for web scraping.[2]

It is available for Python 2.6+ and Python 3.

Code example

# anchor extraction from html document
from bs4 import BeautifulSoup
import urllib2

webpage = urllib2.urlopen('http://en.wikipedia.org/wiki/Main_Page')
soup = BeautifulSoup(webpage,'html.parser')
for anchor in soup.find_all('a'):
    print(anchor.get('href', '/'))

See also

References

  1. Lua error in package.lua at line 80: module 'strict' not found.
  2. Lua error in package.lua at line 80: module 'strict' not found.


<templatestyles src="Asbox/styles.css"></templatestyles>