Skip to content Skip to sidebar Skip to footer

Read Multiple Lines From A File Batch By Batch

I would like to know is there a method that can read multiple lines from a file batch by batch. For example: with open(filename, 'rb') as f: for n_lines in f: process(n

Solution 1:

itertools.islice and two arg iter can be used to accomplish this, but it's a little funny:

from itertools import islice

n = 5# Or whatever chunk size you wantwithopen(filename, 'rb') as f:
    for n_lines initer(lambda: tuple(islice(f, n)), ()):
        process(n_lines)

This will keep isliceing off n lines at a time (using tuple to actually force the whole chunk to be read in) until the f is exhausted, at which point it will stop. The final chunk will be less than n lines if the number of lines in the file isn't an even multiple of n. If you want all the lines to be a single string, change the for loop to be:

# The b prefixes are ignored on 2.7, and necessary on 3.x since you opened# the file in binary modefor n_lines initer(lambda: b''.join(islice(f, n)), b''):

Another approach is to use izip_longest for the purpose, which avoids lambda functions:

from future_builtins importmap# Only on Py2from itertools import izip_longest  # zip_longest on Py3# gets tuples possibly padded with empty strings at end of filefor n_lines in izip_longest(*[f]*n, fillvalue=b''):

    # Or to combine into a single string:for n_lines inmap(b''.join, izip_longest(*[f]*n, fillvalue=b'')):

Solution 2:

You can actually just iterate over lines in a file (see file.next docs - this also works on Python 3) like

withopen(filename) as f:
    for line in f:
        something(line)

so your code can be rewritten to

n=5# your batch sizewithopen(filename) as f:
    batch=[]
    for line in f:
        batch.append(line)
        iflen(batch)==n:
            process(batch)
            batch=[]
process(batch) # this batch might be smaller or even empty

but normally just processing line-by-line is more convenient (first example)

If you dont care about how many lines are read exactly for each batch but just that it is not too much memory then use file.readlines with sizehint like

size_hint=2<<24# 16MBwithopen(filename) as f:
    while f: # not sure if this check works
        process(f.readlines(size_hint))

Post a Comment for "Read Multiple Lines From A File Batch By Batch"