Convert a Unicode string to a string in Python (containing extra symbols)

Posted on

Solving problem is about exposing yourself to as many situations as possible like Convert a Unicode string to a string in Python (containing extra symbols) and practice these strategies over and over. With time, it becomes second nature and a natural way you approach any problems in general. Big or small, always start with a plan, use other strategies mentioned here till you are confident and ready to code the solution.
In this post, my aim is to share an overview the topic about Convert a Unicode string to a string in Python (containing extra symbols), which can be followed any time. Take easy to follow this discuss.

Convert a Unicode string to a string in Python (containing extra symbols)

How do you convert a Unicode string (containing extra characters like £ $, etc.) into a Python string?

Answer #1:

See unicodedata.normalize

title = u"Klüft skräms inför på fédéral électoral große"
import unicodedata
unicodedata.normalize('NFKD', title).encode('ascii', 'ignore')
'Kluft skrams infor pa federal electoral groe'
Answered By: Sorantis

Answer #2:

You can use encode to ASCII if you don’t need to translate the non-ASCII characters:

>>> a=u"aaaàçççñññ"
>>> type(a)
<type 'unicode'>
>>> a.encode('ascii','ignore')
'aaa'
>>> a.encode('ascii','replace')
'aaa???????'
>>>
Answered By: Ferran

Answer #3:

>>> text=u'abcd'
>>> str(text)
'abcd'

If the string only contains ascii characters.

Answered By: igco

Answer #4:

If you have a Unicode string, and you want to write this to a file, or other serialised form, you must first encode it into a particular representation that can be stored. There are several common Unicode encodings, such as UTF-16 (uses two bytes for most Unicode characters) or UTF-8 (1-4 bytes / codepoint depending on the character), etc. To convert that string into a particular encoding, you can use:

>>> s= u'£10'
>>> s.encode('utf8')
'xc2x9c10'
>>> s.encode('utf16')
'xffxfex9cx001x000x00'

This raw string of bytes can be written to a file. However, note that when reading it back, you must know what encoding it is in and decode it using that same encoding.

When writing to files, you can get rid of this manual encode/decode process by using the codecs module. So, to open a file that encodes all Unicode strings into UTF-8, use:

import codecs
f = codecs.open('path/to/file.txt','w','utf8')
f.write(my_unicode_string)  # Stored on disk as UTF-8

Do note that anything else that is using these files must understand what encoding the file is in if they want to read them. If you are the only one doing the reading/writing this isn’t a problem, otherwise make sure that you write in a form understandable by whatever else uses the files.

In Python 3, this form of file access is the default, and the built-in open function will take an encoding parameter and always translate to/from Unicode strings (the default string object in Python 3) for files opened in text mode.

Answered By: Brian

Answer #5:

Here is an example:

>>> u = u'€€€'
>>> s = u.encode('utf8')
>>> s
'xe2x82xacxe2x82xacxe2x82xac'
Answered By: Bastien Léonard

Leave a Reply

Your email address will not be published.