Fastest Way to Delete a Line from Large File in Python

Posted on

Question :

Fastest Way to Delete a Line from Large File in Python

I am working with a very large (~11GB) text file on a Linux system. I am running it through a program which is checking the file for errors. Once an error is found, I need to either fix the line or remove the line entirely. And then repeat…

Eventually once I’m comfortable with the process, I’ll automate it entirely. For now however, let’s assume I’m running this by hand.

What would be the fastest (in terms of execution time) way to remove a specific line from this large file? I thought of doing it in Python…but would be open to other examples. The line might be anywhere in the file.

If Python, assume the following interface:

def removeLine(filename, lineno):



Asked By: AJ.


Answer #1:

You can have two file objects for the same file at the same time (one for reading, one for writing):

def removeLine(filename, lineno):
    fro = open(filename, "rb")

    current_line = 0
    while current_line < lineno:
        current_line += 1

    seekpoint = fro.tell()
    frw = open(filename, "r+b"), 0)

    # read the line we want to discard

    # now move the rest of the lines in the file 
    # one line back 
    chars = fro.readline()
    while chars:
        chars = fro.readline()

Answered By: AJ.

Answer #2:

Modify the file in place, offending line is replaced with spaces so the remainder of the file does not need to be shuffled around on disk. You can also “fix” the line in place if the fix is not longer than the line you are replacing

import os
from mmap import mmap
def removeLine(filename, lineno):, os.O_RDWR)
    for i in range(lineno-1):
    m[p:q] = ' '*(q-p)

If the other program can be changed to output the fileoffset instead of the line number, you can assign the offset to p directly and do without the for loop

Answered By: K. Brafford

Answer #3:

As far as I know, you can’t just open a txt file with python and remove a line. You have to make a new file and move everything but that line to it. If you know the specific line, then you would do something like this:

f = open('in.txt')
fo = open('out.txt','w')

ind = 1
for line in f:
    if ind != linenumtoremove:
    ind += 1


You could of course check the contents of the line instead to determine if you want to keep it or not. I also recommend that if you have a whole list of lines to be removed/changed to do all those changes in one pass through the file.

Answered By: John La Rooy

Answer #4:

If the lines are variable length then I don’t believe that there is a better algorithm than reading the file line by line and writing out all lines, except for the one(s) that you do not want.

You can identify these lines by checking some criteria, or by keeping a running tally of lines read and suppressing the writing of the line(s) that you do not want.

If the lines are fixed length and you want to delete specific line numbers, then you may be able to use seek to move the file pointer… I doubt you’re that lucky though.

Answered By: Justin Peel

Answer #5:

Update: solution using sed as requested by poster in comment.

To delete for example the second line of file:

sed '2d' input.txt

Use the -i switch to edit in place. Warning: this is a destructive operation. Read the help for this command for information on how to make a backup automatically.

Answered By: Dancrumb

Answer #6:

def removeLine(filename, lineno):
    in = open(filename)
    out = open(filename + ".new", "w")
    for i, l in enumerate(in, 1):
        if i != lineno:
    os.rename(filename + ".new", filename)
Answered By: Mark Byers

Answer #7:

I think there was a somewhat similar if not exactly the same type of question asked here. Reading (and writing) line by line is slow, but you can read a bigger chunk into memory at once, go through that line by line skipping lines you don’t want, then writing this as a single chunk to a new file. Repeat until done. Finally replace the original file with the new file.

The thing to watch out for is when you read in a chunk, you need to deal with the last, potentially partial line you read, and prepend that into the next chunk you read.

Answered By: Matt Joiner

Answer #8:

@OP, if you can use awk, eg assuming line number is 10

$ awk 'NR!=10' file > newfile
Answered By: Heikki Toivonen

Leave a Reply

Your email address will not be published. Required fields are marked *