Pretty printing XML in Python

Posted on

Solving problem is about exposing yourself to as many situations as possible like Pretty printing XML in Python and practice these strategies over and over. With time, it becomes second nature and a natural way you approach any problems in general. Big or small, always start with a plan, use other strategies mentioned here till you are confident and ready to code the solution.
In this post, my aim is to share an overview the topic about Pretty printing XML in Python, which can be followed any time. Take easy to follow this discuss.

Pretty printing XML in Python

What is the best way (or are the various ways) to pretty print XML in Python?

Answer #1:

import xml.dom.minidom
dom = xml.dom.minidom.parse(xml_fname) # or xml.dom.minidom.parseString(xml_string)
pretty_xml_as_string = dom.toprettyxml()
Answered By: Ben Noland

Answer #2:

lxml is recent, updated, and includes a pretty print function

import lxml.etree as etree
x = etree.parse("filename")
print etree.tostring(x, pretty_print=True)

Check out the lxml tutorial:
http://lxml.de/tutorial.html

Answered By: 1729

Answer #3:

Another solution is to borrow this indent function, for use with the ElementTree library that’s built in to Python since 2.5.
Here’s what that would look like:

from xml.etree import ElementTree
def indent(elem, level=0):
    i = "n" + level*"  "
    j = "n" + (level-1)*"  "
    if len(elem):
        if not elem.text or not elem.text.strip():
            elem.text = i + "  "
        if not elem.tail or not elem.tail.strip():
            elem.tail = i
        for subelem in elem:
            indent(subelem, level+1)
        if not elem.tail or not elem.tail.strip():
            elem.tail = j
    else:
        if level and (not elem.tail or not elem.tail.strip()):
            elem.tail = j
    return elem
root = ElementTree.parse('/tmp/xmlfile').getroot()
indent(root)
ElementTree.dump(root)
Answered By: ade

Answer #4:

Here’s my (hacky?) solution to get around the ugly text node problem.

uglyXml = doc.toprettyxml(indent='  ')
text_re = re.compile('>ns+([^<>s].*?)ns+</', re.DOTALL)
prettyXml = text_re.sub('>g<1></', uglyXml)
print prettyXml

The above code will produce:

<?xml version="1.0" ?>
<issues>
  <issue>
    <id>1</id>
    <title>Add Visual Studio 2005 and 2008 solution files</title>
    <details>We need Visual Studio 2005/2008 project files for Windows.</details>
  </issue>
</issues>

Instead of this:

<?xml version="1.0" ?>
<issues>
  <issue>
    <id>
      1
    </id>
    <title>
      Add Visual Studio 2005 and 2008 solution files
    </title>
    <details>
      We need Visual Studio 2005/2008 project files for Windows.
    </details>
  </issue>
</issues>

Disclaimer: There are probably some limitations.

Answered By: Nick Bolton

Answer #5:

BeautifulSoup has a easy to use prettify() method.

It indents one space per indentation level. It works much better than lxml’s pretty_print and is short and sweet.

from bs4 import BeautifulSoup
bs = BeautifulSoup(open(xml_file), 'xml')
print bs.prettify()
Answered By: ChaimG

Answer #6:

As others pointed out, lxml has a pretty printer built in.

Be aware though that by default it changes CDATA sections to normal text, which can have nasty results.

Here’s a Python function that preserves the input file and only changes the indentation (notice the strip_cdata=False). Furthermore it makes sure the output uses UTF-8 as encoding instead of the default ASCII (notice the encoding='utf-8'):

from lxml import etree
def prettyPrintXml(xmlFilePathToPrettyPrint):
    assert xmlFilePathToPrettyPrint is not None
    parser = etree.XMLParser(resolve_entities=False, strip_cdata=False)
    document = etree.parse(xmlFilePathToPrettyPrint, parser)
    document.write(xmlFilePathToPrettyPrint, pretty_print=True, encoding='utf-8')

Example usage:

prettyPrintXml('some_folder/some_file.xml')
Answered By: roskakori

Answer #7:

If you have xmllint you can spawn a subprocess and use it. xmllint --format <file> pretty-prints its input XML to standard output.

Note that this method uses an program external to python, which makes it sort of a hack.

def pretty_print_xml(xml):
    proc = subprocess.Popen(
        ['xmllint', '--format', '/dev/stdin'],
        stdin=subprocess.PIPE,
        stdout=subprocess.PIPE,
    )
    (output, error_output) = proc.communicate(xml);
    return output
print(pretty_print_xml(data))
Answered By: Russell Silva

Answer #8:

I tried to edit “ade”s answer above, but Stack Overflow wouldn’t let me edit after I had initially provided feedback anonymously. This is a less buggy version of the function to pretty-print an ElementTree.

def indent(elem, level=0, more_sibs=False):
    i = "n"
    if level:
        i += (level-1) * '  '
    num_kids = len(elem)
    if num_kids:
        if not elem.text or not elem.text.strip():
            elem.text = i + "  "
            if level:
                elem.text += '  '
        count = 0
        for kid in elem:
            indent(kid, level+1, count < num_kids - 1)
            count += 1
        if not elem.tail or not elem.tail.strip():
            elem.tail = i
            if more_sibs:
                elem.tail += '  '
    else:
        if level and (not elem.tail or not elem.tail.strip()):
            elem.tail = i
            if more_sibs:
                elem.tail += '  '
Answered By: Joshua Richardson

Leave a Reply

Your email address will not be published.