Solving problem is about exposing yourself to as many situations as possible like Changing default encoding of Python? and practice these strategies over and over. With time, it becomes second nature and a natural way you approach any problems in general. Big or small, always start with a plan, use other strategies mentioned here till you are confident and ready to code the solution.
In this post, my aim is to share an overview the topic about Changing default encoding of Python?, which can be followed any time. Take easy to follow this discuss.
I have many “can’t encode” and “can’t decode” problems with Python when I run my applications from the console. But in the Eclipse PyDev IDE, the default character encoding is set to UTF-8, and I’m fine.
I searched around for setting the default encoding, and people say that Python deletes the
sys.setdefaultencoding function on startup, and we can not use it.
So what’s the best solution for it?
Here is a simpler method (hack) that gives you back the
setdefaultencoding() function that was deleted from
import sys # sys.setdefaultencoding() does not exist, here! reload(sys) # Reload does the trick! sys.setdefaultencoding('UTF8')
(Note for Python 3.4+:
reload() is in the
This is not a safe thing to do, though: this is obviously a hack, since
sys.setdefaultencoding() is purposely removed from
sys when Python starts. Reenabling it and changing the default encoding can break code that relies on ASCII being the default (this code can be third-party, which would generally make fixing it impossible or dangerous).
If you get this error when you try to pipe/redirect output of your script
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-5: ordinal not in range(128)
Just export PYTHONIOENCODING in console and then run your code.
A) To control
python -c 'import sys; print(sys.getdefaultencoding())'
echo "import sys; sys.setdefaultencoding('utf-16-be')" > sitecustomize.py
PYTHONPATH=".:$PYTHONPATH" python -c 'import sys; print(sys.getdefaultencoding())'
You could put your sitecustomize.py higher in your
Also you might like to try
reload(sys).setdefaultencoding by @EOL
B) To control
stdout.encoding you want to set
python -c 'import sys; print(sys.stdin.encoding, sys.stdout.encoding)'
PYTHONIOENCODING="utf-16-be" python -c 'import sys; print(sys.stdin.encoding, sys.stdout.encoding)'
Finally: you can use A) or B) or both!
For earlier versions a solution is to make sure PyDev does not run with UTF-8 as the default encoding. Under Eclipse, run dialog settings (“run configurations”, if I remember correctly); you can choose the default encoding on the common tab. Change it to US-ASCII if you want to have these errors ‘early’ (in other words: in your PyDev environment). Also see an original blog post for this workaround.
Regarding python2 (and python2 only), some of the former answers rely on using the following hack:
import sys reload(sys) # Reload is a hack sys.setdefaultencoding('UTF8')
In my case, it come with a side-effect: I’m using ipython notebooks, and once I run the code the ´print´ function no longer works. I guess there would be solution to it, but still I think using the hack should not be the correct option.
After trying many options, the one that worked for me was using the same code in the
sitecustomize.py, where that piece of code is meant to be. After evaluating that module, the setdefaultencoding function is removed from sys.
So the solution is to append to file
/usr/lib/python2.7/sitecustomize.py the code:
import sys sys.setdefaultencoding('UTF8')
When I use virtualenvwrapper the file I edit is
And when I use with python notebooks and conda, it is
There is an insightful blog post about it.
I paraphrase its content below.
In python 2 which was not as strongly typed regarding the encoding of strings you could perform operations on differently encoded strings, and succeed. E.g. the following would return
u'Toshio' == 'Toshio'
That would hold for every (normal, unprefixed) string that was encoded in
sys.getdefaultencoding(), which defaulted to
ascii, but not others.
The default encoding was meant to be changed system-wide in
site.py, but not somewhere else. The hacks (also presented here) to set it in user modules were just that: hacks, not the solution.
Python 3 did changed the system encoding to default to utf-8 (when LC_CTYPE is unicode-aware), but the fundamental problem was solved with the requirement to explicitly encode “byte”strings whenever they are used with unicode strings.
reload(sys) and setting some random default encoding just regarding the need of an output terminal stream is bad practice.
reload often changes things in sys which have been put in place depending on the environment – e.g. sys.stdin/stdout streams, sys.excepthook, etc.
Solving the encode problem on stdout
The best solution I know for solving the encode problem of
str‘s (e.g. from literals) on sys.stdout is: to take care of a sys.stdout (file-like object) which is capable and optionally tolerant regarding the needs:
Nonefor some reason, or non-existing, or erroneously false or “less” than what the stdout terminal or stream really is capable of, then try to provide a correct
.encodingattribute. At last by replacing
sys.stdout & sys.stderrby a translating file-like object.
When the terminal / stream still cannot encode all occurring unicode chars, and when you don’t want to break
Here an example:
#!/usr/bin/env python # encoding: utf-8 import sys class SmartStdout: def __init__(self, encoding=None, org_stdout=None): if org_stdout is None: org_stdout = getattr(sys.stdout, 'org_stdout', sys.stdout) self.org_stdout = org_stdout self.encoding = encoding or getattr(org_stdout, 'encoding', None) or 'utf-8' def write(self, s): self.org_stdout.write(s.encode(self.encoding, 'backslashreplace')) def __getattr__(self, name): return getattr(self.org_stdout, name) if __name__ == '__main__': if sys.stdout.isatty(): sys.stdout = sys.stderr = SmartStdout() us = u'aouäöü??ß²' print us sys.stdout.flush()
Using beyond-ascii plain string literals in Python 2 / 2 + 3 code
The only good reason to change the global default encoding (to UTF-8 only) I think is regarding an application source code decision – and not because of I/O stream encodings issues: For writing beyond-ascii string literals into code without being forced to always use
u'string' style unicode escaping. This can be done rather consistently (despite what anonbadger‘s article says) by taking care of a Python 2 or Python 2 + 3 source code basis which uses ascii or UTF-8 plain string literals consistently – as far as those strings potentially undergo silent unicode conversion and move between modules or potentially go to stdout. For that, prefer “
# encoding: utf-8” or ascii (no declaration). Change or drop libraries which still rely in a very dumb way fatally on ascii default encoding errors beyond chr #127 (which is rare today).
And do like this at application start (and/or via sitecustomize.py) in addition to the
SmartStdout scheme above – without using
... def set_defaultencoding_globally(encoding='utf-8'): assert sys.getdefaultencoding() in ('ascii', 'mbcs', encoding) import imp _sys_org = imp.load_dynamic('_sys_org', 'sys') _sys_org.setdefaultencoding(encoding) if __name__ == '__main__': sys.stdout = sys.stderr = SmartStdout() set_defaultencoding_globally('utf-8') s = 'aouäöü??ß²' print s
This way string literals and most operations (except character iteration) work comfortable without thinking about unicode conversion as if there would be Python3 only.
File I/O of course always need special care regarding encodings – as it is in Python3.
Note: plains strings then are implicitely converted from utf-8 to unicode in
SmartStdout before being converted to the output stream enconding.
Here is the approach I used to produce code that was compatible with both python2 and python3 and always produced utf8 output. I found this answer elsewhere, but I can’t remember the source.
This approach works by replacing
sys.stdout with something that isn’t quite file-like (but still only using things in the standard library). This may well cause problems for your underlying libraries, but in the simple case where you have good control over how sys.stdout out is used through your framework this can be a reasonable approach.
sys.stdout = io.open(sys.stdout.fileno(), 'w', encoding='utf8')