Beautiful Soup (HTML parser)

Beautiful Soup
Original author(s)	Leonard Richardson
Stable release	4.4.1 / September 28, 2015; 8 years ago
Written in	Python
Platform	Python
Type	HTML parser library, Web scraping
License	Python Software Foundation License (Beautiful Soup 3 - an older version) MIT License 4+
Website	www.crummy.com/software/BeautifulSoup/

Beautiful Soup is a Python package for parsing HTML and XML documents (including having malformed markup, i.e. non-closed tags, so named after tag soup). It creates a parse tree for parsed pages that can be used to extract data from HTML, which is useful for web scraping.^[2]

It is available for Python 2.6+ and Python 3.

Code example

# anchor extraction from html document
from bs4 import BeautifulSoup
import urllib2

webpage = urllib2.urlopen('http://en.wikipedia.org/wiki/Main_Page')
soup = BeautifulSoup(webpage,'html.parser')
for anchor in soup.find_all('a'):
    print(anchor.get('href', '/'))

References

↑ Lua error in package.lua at line 80: module 'strict' not found.
↑ Lua error in package.lua at line 80: module 'strict' not found.

This computer-library-related article is a stub. You can help Wikipedia by expanding it.

[1] Lua error in package.lua at line 80: module 'strict' not found.

[2] Lua error in package.lua at line 80: module 'strict' not found.

[1]

[2]

Beautiful Soup (HTML parser)

Code example

See also

References

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools