Read Multiple Lines From A File Batch By Batch
Solution 1:
itertools.islice
and two arg iter
can be used to accomplish this, but it's a little funny:
from itertools import islice
n = 5# Or whatever chunk size you wantwithopen(filename, 'rb') as f:
for n_lines initer(lambda: tuple(islice(f, n)), ()):
process(n_lines)
This will keep islice
ing off n
lines at a time (using tuple
to actually force the whole chunk to be read in) until the f
is exhausted, at which point it will stop. The final chunk will be less than n
lines if the number of lines in the file isn't an even multiple of n
. If you want all the lines to be a single string, change the for
loop to be:
# The b prefixes are ignored on 2.7, and necessary on 3.x since you opened# the file in binary modefor n_lines initer(lambda: b''.join(islice(f, n)), b''):
Another approach is to use izip_longest
for the purpose, which avoids lambda
functions:
from future_builtins importmap# Only on Py2from itertools import izip_longest # zip_longest on Py3# gets tuples possibly padded with empty strings at end of filefor n_lines in izip_longest(*[f]*n, fillvalue=b''):
# Or to combine into a single string:for n_lines inmap(b''.join, izip_longest(*[f]*n, fillvalue=b'')):
Solution 2:
You can actually just iterate over lines in a file (see file.next docs - this also works on Python 3) like
withopen(filename) as f:
for line in f:
something(line)
so your code can be rewritten to
n=5# your batch sizewithopen(filename) as f:
batch=[]
for line in f:
batch.append(line)
iflen(batch)==n:
process(batch)
batch=[]
process(batch) # this batch might be smaller or even empty
but normally just processing line-by-line is more convenient (first example)
If you dont care about how many lines are read exactly for each batch but just that it is not too much memory then use file.readlines with sizehint
like
size_hint=2<<24# 16MBwithopen(filename) as f:
while f: # not sure if this check works
process(f.readlines(size_hint))
Post a Comment for "Read Multiple Lines From A File Batch By Batch"