Search code examples
pythonpython-3.xpandascsvexport-to-csv

How can I save csv separately based on it's entry column


I am trying to save csv files after extracting dataframe from MONGODB.

When I create dataframe it is generating each dataframe based on its date columns.

Further more dataframe has a column id which has 31 different id number.

I want to save my csv separetly including id number in csv name.

I was saving my dataframe in this way before by using pandas:

data = pd.DataFrame(energy_data)


for each_id in id_list:
    item = int(each_id)
    data.to_csv(os.path.join(dir, f'test_id_{item}.csv'), sep=',' )
                
# data is a dataframe I get it from MongoDB.

My dataframe looks like below(which shows 31 separate dataframe while I print):


datetimeAt            id.      total
2022-01-01 23:00:00   10121     1088030

.................
.................

datetimeAt            id.      total
2022-01-09 21:00:00   10023     1077030

................
................

datetimeAt            id.      total
2022-01-16 20:00:00   10024     1058030

..............
.................

datetimeAt            id.      total
2022-01-15 15:00:00   10012     108030
...................

When I run my code, I was able to get 31 csv files but each csv contains same data informatin i.e. first dataframe when I print from my extraction function.

So I want to save my csv based on id number as below with information corresponding to it's id number.

test_id-10121.csv
test_id-10023.csv
test_id-10024.csv
test_id-10012.csv
.................

Can I get some help?

Thank you


Solution

  • Pandas is great at selecting only relevant informations. You have simply to do:

    for each_id in id_list:
        item = int(each_id)
        data[data[id] == item].to_csv(os.path.join(dir, f'test_id_{item}.csv'), sep=',' )
    

    It could be:

    ...
        data[data[id] == each_id].to_csv(os.path.join(dir, f'test_id_{item}.csv'), sep=',' )
    

    if the id column was not numeric...


    But if you have a large dataframe and a rather high number of id values, it would not be efficient because you will repeat the extraction process for each id value. The Pandas way is to use grouby:

    for item, df_item in df.groupby('id'):
        df_item.to_csv(os.path.join(dir, f'test_id_{item}.csv'), sep=',' )