How to parse data-uri in python?

Posted on

Question :

How to parse data-uri in python?

HTML image elements have this simplified format:

<img src='something'>

That something can be data-uri, for example:


Is there a standard way of parsing this with python, so that I get content_type and base64 data separated, or should I create my own parser for this?

Asked By: blueFast


Answer #1:

Split the data URI on the comma to get the base64 encoded data without the header. Call base64.b64decode to decode that to bytes. Last, write the bytes to a file.

from base64 import b64decode

data_uri = "..."

# Python 2 and <Python 3.4
header, encoded = data_uri.split(",", 1)
data = b64decode(encoded)

# Python 3.4+
# from urllib import request
# with request.urlopen(data_uri) as response:
#     data =

with open("image.png", "wb") as f:
Answered By: blueFast

Answer #2:

w3lib (a library used by Scrapy) has a function to parse data uris:

>>> from w3lib.url import parse_data_uri
>>> parse_data_uri('')
ParseDataURIResult(media_type='image/png', media_type_parameters={}, data=b'x89PNGrnx1a')
Answered By: JRodDynamite

Answer #3:

Python since 3.4 have support for data-uri. Under hood using urllib.request.DataHandler.

from urllib.request import urlopen

with urlopen(data_uri) as response:
    data =
Answered By: Mikhail Korobov

Answer #4:

This may help:

import re
from lxml import html

BASE_NAME = "image_"

source_code = """<img src="
9TXL0Y4OHwAAAABJRU5ErkJggg==" alt="Red dot" />
<img src="" alt="Black dot" />"""

tree = html.fromstring(source_code)

for i,image in enumerate(tree.xpath('//img[contains(@src, "data:image")]/@src')):
    image_type, image_content = image.split(',', 1)
    image_type = re.findall('data:image/(w+);base64', image_type)[0]
    with open("{}{}.{}".format(BASE_NAME, i, image_type), "wb") as f:
    print "[*] '{}' image found with content: {}n".format(image_type, image_content)


[*] 'png' image found with content: iVBORw0KGgoAAAANSUhEUgAAAAUA

[*] 'gif' image found with content: R0lGODlhAQABAIAAAAUEBAAAACwAAAAAAQABAAACAkQBADs=

It will save every base64 image within <img> tags, with their respective file extension:

Prefixed by BASE_NAME + auto-increment digit(s) provided by enumerate + image_extension

enter image description here

Answered By: bl79

Answer #5:

Correcting JRodDynamite’s post:

from base64 import decodestring

png_arr= "..."
png_arr = png_arr.split(",")
png_arr = png_arr[1]

fh = open("imageToSave.png", "wb")

Answer #6:

from urllib import request

def download(data_uri,name):

    with request.urlopen(data_uri) as response:
         data =

    with open(name, "wb") as f:




Answered By: Frodo McPytel

Leave a Reply

Your email address will not be published. Required fields are marked *