Beautifulsoup4
Beautiful Soup is a Python library specialized for pulling data out of HTML and XML files.
At some extent it can be used to modify the HTML as well.
Usage
from bs4 import BeautifulSoup
html = """
<!DOCTYPE html>
<head>
<title>My page</title>
</head>
<body>
<p class="title"><b>My page</b></p>
<p class="story">My sisters:
<a href="http://example.com/maria" class="sister" id="link1">Maria</a>,
<a href="http://example.com/diana" class="sister" id="link2">Diana</a> and
</p>
</body>
</html>
"""
soup = BeautifulSoup(html, 'html.parser')
# find all links in the HTML and print their href attributes
for link in soup.find_all('a'):
print(link.get('href'))
AI/LLM's are quite good with
beautifulsoup4
.
👉 Try asking ReMark