Solving problem is about exposing yourself to as many situations as possible like UnicodeEncodeError: ‘charmap’ codec can’t encode characters and practice these strategies over and over. With time, it becomes second nature and a natural way you approach any problems in general. Big or small, always start with a plan, use other strategies mentioned here till you are confident and ready to code the solution.
In this post, my aim is to share an overview the topic about UnicodeEncodeError: ‘charmap’ codec can’t encode characters, which can be followed any time. Take easy to follow this discuss.
I’m trying to scrape a website, but it gives me an error.
I’m using the following code:
import urllib.request
from bs4 import BeautifulSoup
get = urllib.request.urlopen("https://www.website.com/")
html = get.read()
soup = BeautifulSoup(html)
print(soup)
And I’m getting the following error:
File "C:Python34libencodingscp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 70924-70950: character maps to <undefined>
What can I do to fix this?
Answer #1:
I was getting the same UnicodeEncodeError
when saving scraped web content to a file. To fix it I replaced this code:
with open(fname, "w") as f:
f.write(html)
with this:
import io
with io.open(fname, "w", encoding="utf-8") as f:
f.write(html)
Using io
gives you backward compatibility with Python 2.
If you only need to support Python 3 you can use the builtin open
function instead:
with open(fname, "w", encoding="utf-8") as f:
f.write(html)
Answer #2:
I fixed it by adding .encode("utf-8")
to soup
.
That means that print(soup)
becomes print(soup.encode("utf-8"))
.
Answer #3:
In Python 3.7, and running Windows 10 this worked (I am not sure whether it will work on other platforms and/or other versions of Python)
Replacing this line:
with open('filename', 'w') as f:
With this:
with open('filename', 'w', encoding='utf-8') as f:
The reason why it is working is because the encoding is changed to UTF-8 when using the file, so characters in UTF-8 are able to be converted to text, instead of returning an error when it encounters a UTF-8 character that is not suppord by the current encoding.
Answer #4:
While saving the response of get request, same error was thrown on Python 3.7 on window 10. The response received from the URL, encoding was UTF-8 so it is always recommended to check the encoding so same can be passed to avoid such trivial issue as it really kills lots of time in production
import requests
resp = requests.get('https://en.wikipedia.org/wiki/NIFTY_50')
print(resp.encoding)
with open ('NiftyList.txt', 'w') as f:
f.write(resp.text)
When I added encoding=”utf-8″ with the open command it saved the file with the correct response
with open ('NiftyList.txt', 'w', encoding="utf-8") as f:
f.write(resp.text)
Answer #5:
Even I faced the same issue with the encoding that occurs when you try to print it, read/write it or open it. As others mentioned above adding .encoding=”utf-8″ will help if you are trying to print it.
soup.encode(“utf-8”)
If you are trying to open scraped data and maybe write it into a file, then open the file with (……,encoding=”utf-8″)
with open(filename_csv , ‘w’, newline=”,encoding=”utf-8″) as csv_file:
Answer #6:
set PYTHONIOENCODING=utf-8
set PYTHONLEGACYWINDOWSSTDIO=utf-8
You may or may not need to set that second environment variable PYTHONLEGACYWINDOWSSTDIO
.
Alternatively, this can be done in code (although it seems that doing it through env vars is recommended):
sys.stdin.reconfigure(encoding='utf-8')
sys.stdout.reconfigure(encoding='utf-8')
Additionally: Reproducing this error was a bit of a pain, so leaving this here too in case you need to reproduce it on your machine:
set PYTHONIOENCODING=windows-1252
set PYTHONLEGACYWINDOWSSTDIO=windows-1252
Answer #7:
For those still getting this error, adding encode("utf-8")
to soup
will also fix this.
soup = BeautifulSoup(html_doc, 'html.parser').encode("utf-8")
print(soup)
Answer #8:
if you are using windows try to pass encoding=’latin1′, encoding=’iso-8859-1′ or encoding=’cp1252′
example:
csv_data = pd.read_csv(csvpath,encoding='iso-8859-1')
print(print(soup.encode('iso-8859-1')))