Search code examples
pythonjsonfileglob

Open and read latest json file one time only


SO members...how can i read latest json file in a directory one time only (if no new file print something). So far I can only read the latest file ...The sample script (run every 45mins) below open and read latest json file in a directory. In this case latest file is file3.json (json file created every 30mins). Thus, if file4 is not created for some reason (for example server fail to create new json file). If the script run again.. it will still read the same last file3.

files in directory

file1.json
file2.json
file3.json

The script below able to open and read latest json file created in the directory.

import glob
import os
import os.path
import datetime, time

listFiles = glob.iglob('logFile/*.json') 
latestFile = max(listFiles, key=os.path.getctime)
with open(latestFile, 'r') as f:
   mydata = json.load(f)
   print(mydata)

To ensure the script will only read newest file and read the newest file one time only...aspect something below:-

listFiles = glob.iglob('logFile/*.json') 
latestFile = max(listFiles, key=os.path.getctime)
if latestFile newer than previous open/read file: # Not sure to compare the latest file with the previous file.
    with open(latestFile, 'r') as f:
       mydata = json.load(f)
       print(mydata)
else:
    print("no new file created")

Thank you for your help. Example solution would be good to share.


I can't figure out the solution...seems simple but few days try n error without any luck.

(1)Make sure read latest file in directory 
(2)Make sure read file/s that may miss to read (due to script fail to run)
(3)Only read once all the files and if no new file give warning.

Thank you.


After SO discussion and suggestion, I got few methods to resolve or at least to accommodate some of the requirement. I just move files that have been processed. If no file create, script will run nothing and if script fail and once normalize it will run and read all related files available. I think its good for now. Thank you guyz...


Solution

  • Below is the answer rather an approach, I would like to propose:

    enter image description here

    The idea is as follows:
    Every log file that is written to a directory can have a key-val in it called "creation_time": timestamp (fileX.json that gets stored in the server). Now, your script runs at 45min to obtain the file which is dumped to a directory. In normal cases, you must be able to read the file, and finally, when you exit the script you can store the last read filename and the creation_time taken from the fileX.json into a logger.json.
    An example for a logger.json is as follows:

    {
    "creation_time": "03520201330",
    "file_name": "file3.json"
    }  
    

    Whenever a server fail or any delay occurs, there could be a rewritten of the fileX.json or new fileX's.json would have been created in the directory. In these situations, you would first open the logger.json and obtain both the timestamp and last filename as shown in the example above. By using the last filename, you can compare the old timestamp that is present in logger with the new timestamp in fileX.json. If they match basically there is no change you only read ahead files and rewrite the logger.
    If that is not the case you would re-read the last fileX.json again and proceed to read other ahead files.