Skip to content Skip to sidebar Skip to footer

Extract Zip To Memory, Parse Contents

I want to read the contents of a zip file into memory rather than extracting them to disc, find a particular file in the archive, open the file and extract a line from it. Can a St

Solution 1:

IMO just using read is enough:

zfile = ZipFile('name.zip', 'r')
files = []
for name in zfile.namelist():
  if fnmatch.fnmatch(name, '*_readme.xml'):
    files.append(zfile.read(name))

This will make a list with contents of files that match the pattern.

Test: You can then parse contents afterwards by iterating through the list:

for file in files:
  print(file[0:min(35,len(file))].decode()) # "parsing"

Or better use a functor:

import zipfile as zip
import os
import fnmatch

zip_name = os.sys.argv[1]
zfile = zip.ZipFile(zip_name, 'r')

def parse(contents, member_name = ""):
  if len(member_name) > 0:
    print( "Parsed `{}`:".format(member_name) )  
  print(contents[0:min(35, len(contents))].decode()) # "parsing"

for name in zfile.namelist():
  if fnmatch.fnmatch(name, '*.cpp'):
    parse(zfile.read(name), name)

This way there is no data kept in memory for no reason and memory foot print is smaller. It might be important if the files are big.


Solution 2:

Don't overthink it. It Just Works:

import zipfile

# 1) I want to read the contents of a zip file ...
with zipfile.ZipFile('A-Zip-File.zip') as zipper:
  # 2) ... find a particular file in the archive, open the file ...
  with zipper.open('A-Particular-File.txt') as fp:
    # 3) ... and extract a line from it.
    first_line = fp.readline()

print first_line

Solution 3:

The question you link shows you that you need to read the file. Depending on your use case that may already be enough. In your code you replace the loop variable holding a filename with an empty string buffer. Try something like this:

zfile = ZipFile('name.zip', 'r')

for name in zfile.namelist():
    if fnmatch.fnmatch(name, '*_readme.xml'):
        ex_file = zfile.open(name) # this is a file like object
        content = ex_file.read() # now file-contents are a single string

If you really want a buffer that you can manipulate, then simply instantiate it with the contents:

buf = StringIO(zfile.open(name).read())

You may also want to look at BytesIO and note that there are differences between Python 2 and 3.


Solution 4:

Thank you to everyone that contributed solutions. This is what ended up working for me:

zfile = ZipFile('name.zip', 'r')

        for name in zfile.namelist():
            if fnmatch.fnmatch(name, '*_readme.xml'):
                zopen = zfile.open(name)
                for line in zopen:
                    if re.match('(.*)<foo>(.*)</foo>(.*)', line):
                        print line

Post a Comment for "Extract Zip To Memory, Parse Contents"