I am trying to save csv files after extracting dataframe from MONGODB.
When I create dataframe it is generating each dataframe based on its date columns.
Further more dataframe has a column id
which has 31 different id number.
I want to save my csv separetly including id number in csv name.
I was saving my dataframe in this way before by using pandas:
data = pd.DataFrame(energy_data)
for each_id in id_list:
item = int(each_id)
data.to_csv(os.path.join(dir, f'test_id_{item}.csv'), sep=',' )
# data is a dataframe I get it from MongoDB.
My dataframe looks like below(which shows 31 separate dataframe while I print):
datetimeAt id. total
2022-01-01 23:00:00 10121 1088030
.................
.................
datetimeAt id. total
2022-01-09 21:00:00 10023 1077030
................
................
datetimeAt id. total
2022-01-16 20:00:00 10024 1058030
..............
.................
datetimeAt id. total
2022-01-15 15:00:00 10012 108030
...................
When I run my code, I was able to get 31 csv files but each csv contains same data informatin i.e. first dataframe when I print from my extraction function.
So I want to save my csv based on id number as below with information corresponding to it's id number.
test_id-10121.csv
test_id-10023.csv
test_id-10024.csv
test_id-10012.csv
.................
Can I get some help?
Thank you
Pandas is great at selecting only relevant informations. You have simply to do:
for each_id in id_list:
item = int(each_id)
data[data[id] == item].to_csv(os.path.join(dir, f'test_id_{item}.csv'), sep=',' )
It could be:
...
data[data[id] == each_id].to_csv(os.path.join(dir, f'test_id_{item}.csv'), sep=',' )
if the id
column was not numeric...
But if you have a large dataframe and a rather high number of id values, it would not be efficient because you will repeat the extraction process for each id value. The Pandas way is to use grouby
:
for item, df_item in df.groupby('id'):
df_item.to_csv(os.path.join(dir, f'test_id_{item}.csv'), sep=',' )