How find specific data attribute from html tag in BeautifulSoup4?

Posted on

Question :

How find specific data attribute from html tag in BeautifulSoup4?

Is there a way to find an element using only the data attribute in html, and then grab that value?

For example, with this line inside an html doc:

<ul data-bin="Sdafdo39">

How do I retrieve Sdafdo39 by searching the entire html doc for the element that has the data-bin attribute?

Answer #1:

You can use find_all method to get all the tags and filtering based on “data-bin” found in its attributes will get us the actual tag which has got it. Then we can simply extract the value corresponding to it, like this

from bs4 import BeautifulSoup
html_doc = """<ul data-bin="Sdafdo39">"""
bs = BeautifulSoup(html_doc)
print [item["data-bin"] for item in bs.find_all() if "data-bin" in item.attrs]
# ['Sdafdo39']
Answered By: thefourtheye

Answer #2:

A little bit more accurate

[item['data-bin'] for item in bs.find_all('ul', attrs={'data-bin' : True})]

This way, the iterated list only has the ul elements that has the attr you want to find

from bs4 import BeautifulSoup
bs = BeautifulSoup(html_doc)
html_doc = """<ul class="foo">foo</ul><ul data-bin="Sdafdo39">"""
[item['data-bin'] for item in bs.find_all('ul', attrs={'data-bin' : True})]

Answered By: xecgr

Answer #3:

You could solve this with gazpacho in just a couple of lines:

First, import and turn the html into a Soup object:

from gazpacho import Soup

html = """<ul data-bin="Sdafdo39">"""
soup = Soup(html)

Then you can just search for the “ul” tag and extract the href attribute:

soup.find("ul").attrs["data-bin"]
# Sdafdo39
Answered By: emehex

Leave a Reply

Your email address will not be published. Required fields are marked *