I want to convert strings containing escaped characters to their normal form, the same way Python’s lexical parser does:

>>> escaped_str = 'One \'example\''
>>> print(escaped_str)
One 'Example'
>>> normal_str = normalize_str(escaped_str)
>>> print(normal_str)
One 'Example'

Of course the boring way will be to replace all known escaped characters one by one:

How would you implement normalize_str() in the above code?

Asked By: aligf


Answer #1:

>>> escaped_str = 'One \'example\''
>>> print escaped_str.encode('string_escape')
One \'example\'
>>> print escaped_str.decode('string_escape')
One 'example'

Several similar codecs are available, such as rot13 and hex.

The above is Python 2.x, but – since you said (below, in a comment) that you’re using Python 3.x – while it’s circumlocutious to decode a Unicode string object, it’s still possible. The codec has been renamed to “unicode_escape” too:

Python 3.3a0 (default:b6aafb20e5f5, Jul 29 2011, 05:34:11) 
[GCC 4.4.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> escaped_str = "One \'example\'"
>>> import codecs
>>> print(codecs.getdecoder("unicode_escape")(escaped_str)[0])
One 'example'
Answered By: Fred Nurk

Answer #2:

I assume the question is really:

I have a string that is formatted as if it were a part of Python source code. How can I safely interpret it so that n within the string is transformed into a newline, quotation marks are expected on either end, etc. ?

Try ast.literal_eval.

>>> import ast
>>> print ast.literal_eval(raw_input())
"hi, mom.n This is a "weird"" string, isn't it?""

