How to get all the links in a HTML page stored in a string in Python?
The best way to extract links from an HTML page stored in a string is to use BeautifulSoup:
from bs4 import BeautifulSoup
soup = BeautifulSoup(htmlStr, features="html5lib")
# Get links
soup.findAll('a')
Here is a full example displaying the links:
from bs4 import BeautifulSoup
htmlStr = '<body><a href="https://ans.wiki">AnsWiki</a><br><a href="https://fr.ans.wiki">AnsWiki French</a></body>'
soup = BeautifulSoup(htmlStr, features="html5lib")
# Get links
for link in soup.findAll('a'):
print (link.get('href'))
The best way to extract links from an HTML page stored in a string is to use BeautifulSoup:
from bs4 import BeautifulSoup
soup = BeautifulSoup(htmlStr, features="html5lib")
# Get links
soup.findAll('a')
Here is a full example displaying the links:
from bs4 import BeautifulSoup
htmlStr = '<body><a href="https://ans.wiki">AnsWiki</a><br><a href="https://fr.ans.wiki">AnsWiki French</a></body>'
soup = BeautifulSoup(htmlStr, features="html5lib")
# Get links
for link in soup.findAll('a'):
print (link.get('href'))
| # | ID | Query | URL | Count |
|---|---|---|---|---|
| 0 | 13511 | alphons | https://en.ans.wiki/5935/how-to-get-all-the-links-in-a-html-page-stored-in-a-string-in-python | 1 |
| 1 | 12890 | en | https://en.ans.wiki/5935/how-to-get-all-the-links-in-a-html-page-stored-in-a-string-in-python | 5 |