Search code examples
pythonalgorithmfileio

How to remove a line by its number from file without rewriting file python


I need to delete a line from a text file by its number, but without rewriting a file (for performance reasons). I have a program that needs to remove and add lines to file very often. For example, I have file test.txt with the following text:

Hello, world!
Example
Sample text
Abcdefghijklmnopqrstuvwxyz
Hello,
world!

and I need to delete line 'Example' with number num = 2.

I have this solution:

num = 2

with open('test.txt', 'r+') as f:
    for i in range(num - 1):
        f.readline()
    pos = f.tell()
    f.readline()
    s = f.read()
    f.seek(pos)
    f.truncate()
    f.write(s)

It works, but it rewrites the rest of the file (lines 3 to 6), which might take long.

I searched stackoverfow and google and didn't found solutions without rewriting.

Is there a way not to rewrite a file? Also, for more optimization, how can I find a position of the line by its number, without reading a prefix of the file?

UPD:

When I said not to rewrite a file, I meant rewriting the end of the file (delete it and then write it), because it could be very big, and just remove the line.

@Kolay.Ne, in a file I store just unique IDs of users, and user can add themselves or remove themselves. I think that txt file is a good solution for that.

@DarkKnight, I think that I could pad IDs with zeros to some constant length, and so I could just do f.seek(num * (length + 1)) (+ 1 because of the new line charcter) instead of for i in range(num - 1): f.readline().

Thanks all of you for help!


Solution

  • Is there a way not to rewrite a file?

    No, that's not possible. Typically, files are physically stored in file systems either as a contiguous block of data, or (given that the file is large enough) split into a couple of contiguous blocks of data. Thus, the task of removing something from the middle of a file is similar to the task of removing a middle element from array in memory: if you want to keep the tail of the array, you would have to move it back.

    how can I find a position of the line by its number, without reading a prefix of the file?

    Again, that's not possible for the same reason: in your filesystem the file is stored as a sequence of bytes, not a sequence of lines (methods such as .readline()/.readlines() are a nice Python interface, which internally still reads the file as a sequence of bytes). So, there's no API to seek lines (and even if there was one, internally it would still have to read those lines).

    (don't ask for what)

    That's generally a good idea to provide context of your question, one reason for that is to not end up with the XY problem.

    In your particular case, perhaps, the technical question that you have, is actually there due to a poor design. Among other things, I wonder:

    • Do you really need to keep this data in a file? Can you put it to memory instead? If not, could you temporarily cache it in memory and only write it back occasionally?
    • Is there a better way to organize your storage? Maybe you should have more than one file for your task?

    As you provide no context, I don't know the answer to any of the above questions and can not recommend a conceptual improvement.


    Now that the question is updated, we finally know that what you actually wanted is to track users, and their ids in particular. I assume you also want that storage to be persistent (that is, the users list must remain when your program is restarted).

    A good fit for this task is a database. It allows one to store and manipulate data (e.g. ids and any other info you have) about entities (e.g. users) in an application. I would recommend you to start off with SQLite: it is a rather simple relational DBMS, which python supports out of the box, and there's plenty tutorials on the internet (just google something like 'python sqlite quickstart' or 'python sqlite tutorial')