I’m working with some CSV files, with the following code:
reader = csv.reader(open(filepath, "rU")) try: for row in reader: print 'Row read successfully!', row except csv.Error, e: sys.exit('file %s, line %d: %s' % (filename, reader.line_num, e))
And one file is throwing this error:
file my.csv, line 1: line contains NULL byte
What can I do? Google seems to suggest that it may be an Excel file that’s been saved as a .csv improperly. Is there any way I can get round this problem in Python?
== UPDATE ==
Following @JohnMachin’s comment below, I tried adding these lines to my script:
print repr(open(filepath, 'rb').read(200)) # dump 1st 200 bytes of file data = open(filepath, 'rb').read() print data.find('x00') print data.count('x00')
And this is the output I got:
'xd0xcfx11xe0xa1xb1x1axe1x00x00x00x00x00x00x00x00 .... <snip> 8 13834
So the file does indeed contain NUL bytes.
As @S.Lott says, you should be opening your files in ‘rb’ mode, not ‘rU’ mode. However that may NOT be causing your current problem. As far as I know, using ‘rU’ mode would mess you up if there are embedded
r in the data, but not cause any other dramas. I also note that you have several files (all opened with ‘rU’ ??) but only one causing a problem.
If the csv module says that you have a “NULL” (silly message, should be “NUL”) byte in your file, then you need to check out what is in your file. I would suggest that you do this even if using ‘rb’ makes the problem go away.
repr() is (or wants to be) your debugging friend. It will show unambiguously what you’ve got, in a platform independant fashion (which is helpful to helpers who are unaware what
od is or does). Do this:
print repr(open('my.csv', 'rb').read(200)) # dump 1st 200 bytes of file
and carefully copy/paste (don’t retype) the result into an edit of your question (not into a comment).
Also note that if the file is really dodgy e.g. no r or n within reasonable distance from the start of the file, the line number reported by
reader.line_num will be (unhelpfully) 1. Find where the first
x00 is (if any) by doing
data = open('my.csv', 'rb').read() print data.find('x00')
and make sure that you dump at least that many bytes with repr or od.
data.count('x00') tell you? If there are many, you may want to do something like
for i, c in enumerate(data): if c == 'x00': print i, repr(data[i-30:i]) + ' *NUL* ' + repr(data[i+1:i+31])
so that you can see the NUL bytes in context.
If you can see
x00 in the output (or