I'm trying to read and analyze a large csv file (11.5 GB) using python. And then use Power BI to create some visuals around it. But everytime I run any command line or even making changes to the dataframe in Power BI, it takes about 20-30 mins between each change.
One of the column heading is DeviceID
. I would like to split the large CSV into multiple csv files so that each file will have data that belongs to the one unique DeviceID
value.
Currently the dataframe looks like this in the single Full.csv
file looks like this:
DeviceID AreaName Longitude Latitude
12311 Dubai 55.55431 25.45631
12311 Dubai 55.55432 25.45634
12311 Dubai 55.55433 25.45637
12311 Dubai 55.55431 25.45621
12309 Dubai 55.55427 25.45627
12309 Dubai 55.55436 25.45655
12412 Dubai 55.55441 25.45657
12412 Dubai 55.55442 25.45656
After running the code, the single Full.csv
file should produce 3 csv files: 12311.csv
, 12309.csv
, 12412.csv
with each looking like this:
DeviceID AreaName Longitude Latitude
12311 Dubai 55.55431 25.45631
12311 Dubai 55.55432 25.45634
12311 Dubai 55.55433 25.45637
12311 Dubai 55.55431 25.45621
AND
DeviceID AreaName Longitude Latitude
12309 Dubai 55.55427 25.45627
12309 Dubai 55.55436 25.45655
AND
DeviceID AreaName Longitude Latitude
12412 Dubai 55.55441 25.45657
12412 Dubai 55.55442 25.45656
I read that the best way to deal with large files in python is to use the pandasql module. Would I be able to achieve what I have described above using pandsql?
Thanks
One of the column heading is DeviceID. I would like to split the large CSV into multiple csv files so that each file will have data that belongs to the one unique DeviceID value.
I do not think that this will speed up your process in PowerBI, do you do your calculation in PowerQuery or in PowerBI themself?
But anyway, you can create a list of your unique values for DeviceID:
df = pd.read_csv('Full.csv')
uniquelist = list(df['DeviceID'].unique())
and then split it based on this list and save it into csv files:
for i in uniquelist:
i = df.loc[df['DeviceID'] == i]
i.to_csv