Is there a Python library that allows manipulation of zip archives in memory, without having to use actual disk files?
The ZipFile library does not allow you to update the archive. The only way seems to be to extract it to a directory, make your changes, and create a new zip from that directory. I want to modify zip archives without disk access, because I’ll be downloading them, making changes, and uploading them again, so I have no reason to store them.
Something similar to Java’s ZipInputStream/ZipOutputStream would do the trick, although any interface at all that avoids disk access would be fine.
According to the Python docs:
class zipfile.ZipFile(file[, mode[, compression[, allowZip64]]]) Open a ZIP file, where file can be either a path to a file (a string) or a file-like object.
So, to open the file in memory, just create a file-like object (perhaps using BytesIO).
file_like_object = io.BytesIO(my_zip_data) zipfile_ob = zipfile.ZipFile(file_like_object)
From the article In-Memory Zip in Python:
Below is a post of mine from May of 2008 on zipping in memory with Python, re-posted since Posterous is shutting down.
I recently noticed that there is a for-pay component available to zip files in-memory with Python. Considering this is something that should be free, I threw together the following code. It has only gone through very basic testing, so if anyone finds any errors, let me know and I’ll update this.
import zipfile import StringIO class InMemoryZip(object): def __init__(self): # Create the in-memory file-like object self.in_memory_zip = StringIO.StringIO() def append(self, filename_in_zip, file_contents): '''Appends a file with name filename_in_zip and contents of file_contents to the in-memory zip.''' # Get a handle to the in-memory zip in append mode zf = zipfile.ZipFile(self.in_memory_zip, "a", zipfile.ZIP_DEFLATED, False) # Write the file to the in-memory zip zf.writestr(filename_in_zip, file_contents) # Mark the files as having been created on Windows so that # Unix permissions are not inferred as 0000 for zfile in zf.filelist: zfile.create_system = 0 return self def read(self): '''Returns a string with the contents of the in-memory zip.''' self.in_memory_zip.seek(0) return self.in_memory_zip.read() def writetofile(self, filename): '''Writes the in-memory zip to a file.''' f = file(filename, "w") f.write(self.read()) f.close() if __name__ == "__main__": # Run a test imz = InMemoryZip() imz.append("test.txt", "Another test").append("test2.txt", "Still another") imz.writetofile("test.zip")
import io import zipfile zip_buffer = io.BytesIO() with zipfile.ZipFile(zip_buffer, "a", zipfile.ZIP_DEFLATED, False) as zip_file: for file_name, data in [('1.txt', io.BytesIO(b'111')), ('2.txt', io.BytesIO(b'222'))]: zip_file.writestr(file_name, data.getvalue()) with open('C:/1.zip', 'wb') as f: f.write(zip_buffer.getvalue())
The example Ethier provided has several problems, some of them major:
- doesn’t work for real data on Windows. A ZIP file is binary and its data should always be written with a file opened ‘wb’
- the ZIP file is appended to for each file, this is inefficient. It can just be opened and kept as an
- the documentation states that ZIP files should be closed explicitly, this is not done in the append function (it probably works (for the example) because zf goes out of scope and that closes the ZIP file)
- the create_system flag is set for all the files in the zipfile every time a file is appended instead of just once per file.
- on Python < 3 cStringIO is much more efficient than StringIO
- doesn’t work on Python 3 (the original article was from before the 3.0 release, but by the time the code was posted 3.1 had been out for a long time).
An updated version is available if you install
ruamel.std.zipfile (of which I am the author). After
pip install ruamel.std.zipfile
or including the code for the class from here, you can do:
import ruamel.std.zipfile as zipfile # Run a test zipfile.InMemoryZipFile() imz.append("test.txt", "Another test").append("test2.txt", "Still another") imz.writetofile("test.zip")
You can alternatively write the contents using
imz.data to any place you need.
You can also use the
with statement, and if you provide a filename, the contents of the ZIP will be written on leaving that context:
with zipfile.InMemoryZipFile('test.zip') as imz: imz.append("test.txt", "Another test").append("test2.txt", "Still another")
because of the delayed writing to disc, you can actually read from an old
test.zip within that context.