Skip to content Skip to sidebar Skip to footer

How To Bypass Memory Error When Replacing A String In A Large Txt File?

I have several files to iterate through, some of them several million lines long. One file can have more than 500 MB. I need to prep them by searching and replacing '| |' string wi

Solution 1:

Try the following code:

chunk_size = 5000
buffer = ""
i = 0withopen(fileoutpath, 'a') as fout:
    withopen(fileinpath, 'r') as fin:
        for line in fin:
            buffer += line.replace('| |', '|')
            i+=1if i == chunk_size:
                    fout.write(buffer)
                    i=0
                    buffer = ""if buffer:
        fout.write(buffer)
        i=0
        buffer = ""

This code reads one line at a time in memory.

It stores the results in a buffer, which at most will contain chunk_size lines at a time, after which it saves the result to file and cleans the buffer. And so it goes on until the end of the file. At the end of the reading loop, if the buffer contains lines, it is written to disk.

In this way, in addition to checking the number of lines in memory, you also check the number of disk writes. Writing to files every time you read a line may not be a good idea, as well as having a chunk_size too large. It's up to you to find a chunk_size value that fits your problem.

Note: You can use the open()buffering parameter, to get the same result. Find everything in documentation. But the logic is very similar.

Solution 2:

Try reading the file in line-by-line, instead of one giant chunk. I.e.

withopen(writefilepath, "w", errors='ignore') as filew:
    withopen(readfilepath, "r", errors='ignore') as filer:
       for line in filer:
           print("Line {}: {}".format(cnt, line.strip()))
           line = line.replace('| |', '|')
           filew.write(line)

Post a Comment for "How To Bypass Memory Error When Replacing A String In A Large Txt File?"