I need to escape a
& (ampersand) character in a string. The problem is whenever I
string = string.replace ('&', '&') the result is
'\&'. An extra backslash is added to escape the original backslash. How do I remove this extra backslash?
'\&' is only displayed – actually the string is
str = '&' new_str = str.replace('&', '&') new_str '\&' print new_str &
Try it in a shell.
The extra backslash is not actually added; it’s just added by the
repr() function to indicate that it’s a literal backslash. The Python interpreter uses the
repr() function (which calls
__repr__() on the object) when the result of an expression needs to be printed:
'\' '\' print '\' print '\'.__repr__() '\'
in literal string in a special way.
This is so you can type
'n' to mean newline or
't' to mean tab
'&' doesn’t mean anything special to Python, instead of causing an error, the Python lexical analyser implicitly adds the extra
Really it is better to use
r'&' instead of
r here means raw string and means that
isn’t treated specially unless it is right before the quote character at the start of the string.
In the interactive console, Python uses
repr to display the result, so that is why you see the double ”. If you
len(string) you will see that it is really only the 2 characters
'Here's a backslash: \' "Here's a backslash: \" print 'Here's a backslash: \' Here's a backslash: >>> 'Here's a backslash: \. Here's a double quote: ".' 'Here's a backslash: \. Here's a double quote: ".' print 'Here's a backslash: \. Here's a double quote: ".' Here's a backslash: . Here's a double quote ".
To Clarify the point Peter makes in his comment see this link
Unlike Standard C, all unrecognized
escape sequences are left in the
string unchanged, i.e., the backslash
is left in the string. (This behavior
is useful when debugging: if an escape
sequence is mistyped, the resulting
output is more easily recognized as
broken.) It is also important to note
that the escape sequences marked as
“(Unicode only)” in the table above
fall into the category of unrecognized
escapes for non-Unicode string
'\&' == '&' True len('\&') 2 print('\&') &
Or in other words:
'\&' only contains one backslash. It’s just escaped in the python shell’s output for clarity.
There is no extra backslash, it’s just formatted that way in the interactive environment. Try:
Then you can see that there really is no extra backslash.
printing a list can also cause this problem (im new in python, so it confused me a bit too):
>>>myList = ['\'] >>>print myList ['\'] >>>print ''.join(myList)
>>>myList = ['&'] >>>print myList ['\&'] >>>print ''.join(myList) &