How To Bypass Memory Error When Replacing A String In A Large Txt File?
Solution 1:
Try the following code:
chunk_size = 5000
buffer = ""
i = 0withopen(fileoutpath, 'a') as fout:
withopen(fileinpath, 'r') as fin:
for line in fin:
buffer += line.replace('| |', '|')
i+=1if i == chunk_size:
fout.write(buffer)
i=0
buffer = ""if buffer:
fout.write(buffer)
i=0
buffer = ""
This code reads one line at a time in memory.
It stores the results in a buffer
, which at most will contain chunk_size
lines at a time, after which it saves the result to file and cleans the buffer
. And so it goes on until the end of the file. At the end of the reading loop, if the buffer contains lines, it is written to disk.
In this way, in addition to checking the number of lines in memory, you also check the number of disk writes. Writing to files every time you read a line may not be a good idea, as well as having a chunk_size
too large. It's up to you to find a chunk_size
value that fits your problem.
Note: You can use the open()
buffering parameter, to get the same result. Find everything in documentation. But the logic is very similar.
Solution 2:
Try reading the file in line-by-line, instead of one giant chunk. I.e.
withopen(writefilepath, "w", errors='ignore') as filew:
withopen(readfilepath, "r", errors='ignore') as filer:
for line in filer:
print("Line {}: {}".format(cnt, line.strip()))
line = line.replace('| |', '|')
filew.write(line)
Post a Comment for "How To Bypass Memory Error When Replacing A String In A Large Txt File?"