Question #5935   Submitted by Answiki on 02/26/2022 at 06:09:09 PM UTC

How to get all the links in a HTML page stored in a string in Python?

Answer   Submitted by Answiki on 02/26/2022 at 06:16:59 PM UTC

The best way to extract links from an HTML page stored in a string is to use BeautifulSoup:

from bs4 import BeautifulSoup

soup = BeautifulSoup(htmlStr, features="html5lib")

# Get links
soup.findAll('a')


Here is a full example displaying the links:

from bs4 import BeautifulSoup

htmlStr = '<body><a href="https://ans.wiki">AnsWiki</a><br><a href="https://fr.ans.wiki">AnsWiki French</a></body>'
soup = BeautifulSoup(htmlStr, features="html5lib")

# Get links
for link in soup.findAll('a'):
  print (link.get('href'))

2 events in history
Answer by Answiki on 02/26/2022 at 06:16:59 PM

The best way to extract links from an HTML page stored in a string is to use BeautifulSoup:

from bs4 import BeautifulSoup

soup = BeautifulSoup(htmlStr, features="html5lib")

# Get links
soup.findAll('a')


Here is a full example displaying the links:

from bs4 import BeautifulSoup

htmlStr = '<body><a href="https://ans.wiki">AnsWiki</a><br><a href="https://fr.ans.wiki">AnsWiki French</a></body>'
soup = BeautifulSoup(htmlStr, features="html5lib")

# Get links
for link in soup.findAll('a'):
  print (link.get('href'))

Question by Answiki 02/26/2022 at 06:09:09 PM
How to get all the links in a HTML page stored in a string in Python?
# ID Query URL Count

Icons proudly provided by Friconix.