I have a Python (3.6) script which reads data from a csv file into a pandas dataframe, pandas performs actions for each new line which is read from the CSV file...
This works fine for a static CSV file, e.g. one where all the data to be processed is already contained within the CSV file...
I would like to be able to append to the CSV file from another Python process so that data can be continuously fed into the pandas dataframe, or if the process that feeds the data to pandas reaches the end of the file, it waits for a new row to be appended to the CSV file and then continues reading rows into pandas...
Is this possible?
I am new to pandas and at the moment, I am having difficulty understanding how pandas can be used with real time/dynamic data as all the examples I see, seem to use static CSV files as a data source.
Ideally, I would like to be able to feed rows into pandas from a message queue directly, but I don't think this is possible - so I was thinking that if I have a second Python script that receives a message from a queue then appends it as a new row to the CVS file, the original script could read it into to pandas...
Am I misunderstanding how pandas works or can you give any pointers on if/how I can get this sort of thing to work?
You can pop comma separated values off a queue and wrap them in a dataframe.
You can then take that in-memory tiny dataframe and append it to whatever other dataframe you want, that's also in memory. You can also write it out to a file with .to_csv('whatever', mode='a').
It would be preferable to not write to csv in the first place and leave it an array of strings, but since this more directly answers your question:
big_df = pandas.read_csv('file.csv')
def handle_csv(csv):
mini_df = pd.DataFrame([sub.split(",") for sub in csv])
big_df.append(mini_df)
mini_df.to_csv("somefile", mode='a')