Search code examples
pythonlinuxfileread-write

Finding which files are being read from during a session (python code)


I have a large system written in python. when I run it, it reads all sorts of data from many different files on my filesystem. There are thousands lines of code, and hundreds of files, most of them are not actually being used. I want to see which files are actually being accessed by the system (ubuntu), and hopefully, where in the code they are being opened. Filenames are decided dynamically using variables etc., so the actual filenames cannot be determined just by looking at the code. I have access to the code, of course, and can change it.

I try to figure how to do this efficiently, with minimal changes in the code:

  1. is there a Linux way to determine which files are accessed, and at what times? this might be useful, although it won't tell me where in the code this happens
  2. is there a simple way to make an "open file" command also log the file name, time, etc... of the open file? hopefully without having to go into the code and change every open command, there are many of them, and some are not being used at runtime.

Thanks


Solution

  • You can trace file accesses without modifying your code, using strace. Either you start your program with strace, like this

    strace -f -e trace=file your_program.py
    

    Otherwise you attach strace to a running program like this

    strace -f -e trace=file -p <PID>