How to un-escape a backslash-escaped string?

Posted on

Question :

How to un-escape a backslash-escaped string?

Suppose I have a string which is a backslash-escaped version of another string. Is there an easy way, in Python, to unescape the string? I could, for example, do:

>>> escaped_str = '"Hello,\nworld!"'
>>> raw_str = eval(escaped_str)
>>> print raw_str

However that involves passing a (possibly untrusted) string to eval() which is a security risk. Is there a function in the standard lib which takes a string and produces a string with no security implications?

Asked By: Nick


Answer #1:

>>> print '"Hello,\nworld!"'.decode('string_escape')
Answered By: ChristopheD

Answer #2:

You can use ast.literal_eval which is safe:

Safely evaluate an expression node or a string containing a Python
expression. The string or node provided may only consist of the
following Python literal structures: strings, numbers, tuples, lists,
dicts, booleans, and None. (END)

Like this:

>>> import ast
>>> escaped_str = '"Hello,\nworld!"'
>>> print ast.literal_eval(escaped_str)
Answered By: jathanism

Answer #3:

All given answers will break on general Unicode strings. The following works for Python3 in all cases, as far as I can tell:

from codecs import encode, decode
sample = u'mon€y\nröcks'
result = decode(encode(sample, 'latin-1', 'backslashreplace'), 'unicode-escape')

In recent Python versions, this also works without the import:

sample = u'mon€y\nröcks'
result = sample.encode('latin-1', 'backslashreplace').decode('unicode-escape')

As outlined in the comments, you can also use the literal_eval method from the ast module like so:

import ast
sample = u'mon€y\nröcks'

Or like this when your string really contains a string literal (including the quotes):

import ast
sample = u'"mon€y\nröcks"'

However, if you are uncertain whether the input string uses double or single quotes as delimiters, or when you cannot assume it to be properly escaped at all, then literal_eval may raise a SyntaxError while the encode/decode method will still work.

Answered By: Jesko Hüttenhain

Answer #4:

In python 3, str objects don’t have a decode method and you have to use a bytes object. ChristopheD’s answer covers python 2.

# create a `bytes` object from a `str`
my_str = "Hello,\nworld"
# (pick an encoding suitable for your str, e.g. 'latin1')
my_bytes = my_str.encode("utf-8")

# or directly
my_bytes = b"Hello,\nworld"

# "Hello,
# world"
Answered By: asachet

Leave a Reply

Your email address will not be published. Required fields are marked *