Skip to content Skip to sidebar Skip to footer

How To Efficiently Remove The First Line Of A Large File?

This question has already been asked here and here, but none of the solutions worked for me. How do I remove the first line from a large file efficiently in Python 3? I am writing

Solution 1:

So, this approach is very hacky. It will work well if your line-sizes are about the same size with a small standard deviation. The idea is to read some portion of your file into a buffer that is small enough to be memory efficient but large enough that writing form both ends will not mess things up (since the lines are roughly the same size with little variance, we can cross our fingers and pray that it will work). We basically keep track of where we are in the file and jump back and forth. I use a collections.deque as a buffer because it has favorable append performance from both ends, and we can take advantage of the FIFO nature of a queue:

from collections import deque
def efficient_dropfirst(f, dropfirst=1, buffersize=3):
    f.seek(0)
    buffer = deque()
    tail_pos = 0
    # these next two loops assume the file has many thousands of
    # lines so we can safely drop and buffer the first few...
    for _ in range(dropfirst):
        f.readline()
    for _ in range(buffersize):
        buffer.append(f.readline())
    line = f.readline()
    while line:
        buffer.append(line)
        head_pos = f.tell()
        f.seek(tail_pos)
        tail_pos += f.write(buffer.popleft())
        f.seek(head_pos)
        line = f.readline()
    f.seek(tail_pos)
    # finally, clear out the buffer:
    while buffer:
        f.write(buffer.popleft())
    f.truncate()

Now, let's try this out with a pretend file that behaves nicely:

>>> s = """1. the quick
... 2. brown fox
... 3. jumped over
... 4. the lazy
... 5. black dog.
... 6. Old McDonald's
... 7. Had a farm
... 8. Eeyi Eeeyi Oh
... 9. And on this farm they had a
... 10. duck
... 11. eeeieeeiOH
... """

And finally:

>>> import io
>>> with io.StringIO(s) as f: # we mock a file
...     efficient_dropfirst(f)
...     final = f.getvalue()
...
>>> print(final)
2. brown fox
3. jumped over
4. the lazy
5. black dog.
6. Old McDonald's
7. Had a farm
8. Eeyi Eeeyi Oh
9. And on this farm they had a
10. duck
11. eeeieeeiOH

This should work out OK if dropfirst < buffersize by a good bit of "slack". Since you only want to drop the first line, just keep dropfirst=1, and you can maybe make buffersize=100 or something just to be safe. It will be much more memory efficient than reading "many thousands of lines", and if no single line is bigger than the previous lines, you should be safe. But be warned, this is very rough around the edges.


Solution 2:

Try this. It uses 3rd approach as you mentioned but won't make a new file.

filePath = r"E:\try.txt"
file_str = ""
with open(filePath,'r') as f:
        f.next()  # skip header line
        for line in f:
            file_str = file_str + line

with open(filePath, "w") as f:
    f.write(file_str)

Post a Comment for "How To Efficiently Remove The First Line Of A Large File?"