Search code examples
python-2.7csvpandaspandasql

reading the last index from a csv file using pandas in python2.7


I have a .csv file on disk, formatted so that I can read it into a pandas DataFrame easily, to which I periodically write rows. I need this database to have a row index, so every time I write a new row to it I need to know the index of the last row written.

There are plenty of ways to do this:

  • I could read the entire file into a DataFrame, append my row, and then print the entire DataFrame to memory again. This might get a bit slow as the database grows.
  • I could read the entire index column into memory, and pick the largest value off, then append my row to the .csv file. This might be a little better, depending on how column-reading is implemented.

I am curious if there is a way to just get that one cell directly, without having to read a whole bunch of extra information into memory. Any suggestions?


Solution

  • Reading the entire index column will still need to read and parse the whole file.

    If no fields in the file are multiline, you could scan the file backwards to find the first newline (but with a check if there is a newline past the data). The value following that newline will be your last index.

    Storing the last index in another file would also be a possibility, but you would have to make sure both files stay consistent.

    Another way would be to reserve some (fixed amount of) bytes at the beginning of the file and write (in place) the last index value there as a comment. But your parser would have to support comments, or be able to skip rows.