I have a messy big CSV file with lot of Nan values and I read the dataframe using pd.read_csv(file, names = range(int))
. I wanted to split this data into multiple dataframe and store in dictionary using given key in data. I prepared a simple example to explain my problem.
Example rawdata: My data looks like similarly as given but with more number of columns and rows.
import pandas as pd
import numpy as np
df = pd.DataFrame(columns=([1,2,3,4]))
df.loc[0,:] = ['Home -AA',np.nan,np.nan,np.nan]
df.loc[1,:] = ['place/time','value1','value2','value3']
df.loc[2,:] = ['Home time1',1, 2, 3]
df.loc[3,:] = ['Home time2',4, 5, 6]
df.loc[4,:] = ['Home time3',7, 8, 9]
df.loc[5,:] = ['sum',11,np.nan , np.nan]
df.loc[6,:] = ['agg',12,np.nan , np.nan]
df.loc[7,:] = ['max',6,np.nan , np.nan]
df.loc[8,:] = ['min',8,np.nan , np.nan]
df.loc[9,:] = ['med',1,np.nan , np.nan]
df.loc[10,:] = ['Home -BB',np.nan,np.nan,np.nan]
df.loc[11,:] = ['place/time','value1','value2','value3']
df.loc[12,:] = ['Home time1',11, 12, 13]
df.loc[13,:] = ['Home time2',14, 15, 16]
df.loc[14,:] = ['Home time3',17, 18, 19]
df.loc[15,:] = ['sum',101,np.nan , np.nan]
df.loc[16,:] = ['agg',122,np.nan , np.nan]
df.loc[17,:] = ['max',62,np.nan , np.nan]
df.loc[18,:] = ['min',83,np.nan , np.nan]
df.loc[19,:] = ['med',12,np.nan , np.nan]
df.loc[20,:] = ['Home -CC',np.nan,np.nan,np.nan]
df.loc[21,:] = ['place/time','value1','value2','value3']
df.loc[22,:] = ['Home -DD',np.nan,np.nan,np.nan]
df.loc[23,:] = ['place/time','value1','value2','value3']
df.loc[24,:] = ['Home -EE',np.nan,np.nan,np.nan]
df.loc[25,:] = ['place/time','value1','value2','value3']
df.loc[26,:] = ['Home -FF',np.nan,np.nan,np.nan]
df.loc[27,:] = ['place/time','value1','value2','value3']
df.loc[28,:] = ['Home time1',211, 212, 213]
df.loc[29,:] = ['Home time1',212, 213, 214]
df.loc[30,:] = ['sum',115,np.nan , np.nan]
df.loc[31,:] = ['agg',124,np.nan , np.nan]
df.loc[32,:] = ['max',65,np.nan , np.nan]
df.loc[33,:] = ['min',85,np.nan , np.nan]
df.loc[34,:] = ['med',16,np.nan , np.nan]
Wanted Results: I want to convert this data frame into multiple data frame with define house keys and store as in dictionary dict1. (Result example)
df1 = pd.DataFrame(columns=([1,2,3,4]))
df1.loc[1,:] = ['place/time','value1','value2','value3']
df1.loc[2,:] = ['Home time1',1, 2, 3]
df1.loc[3,:] = ['Home time2',4, 5, 6]
df1.loc[4,:] = ['Home time3',7, 8, 9]
df2 = pd.DataFrame(columns=([1,2,3,4]))
df2.loc[11,:] = ['place/time','value1','value2','value3']
df2.loc[12,:] = ['Home time1',11, 12, 13]
df2.loc[13,:] = ['Home time2',14, 15, 16]
df2.loc[14,:] = ['Home time3',17, 18, 19]
df3 = pd.DataFrame(columns=([1,2,3,4]))
df3.loc[21,:] = ['place/time','value1','value2','value3']
df4 = pd.DataFrame(columns=([1,2,3,4]))
df4.loc[23,:] = ['place/time','value1','value2','value3']
df5 = pd.DataFrame(columns=([1,2,3,4]))
df5.loc[25,:] = ['place/time','value1','value2','value3']
df6 = pd.DataFrame(columns=([1,2,3,4]))
df6.loc[27,:] = ['place/time','value1','value2','value3']
df6.loc[28,:] = ['Home time1',211, 212, 213]
df6.loc[29,:] = ['Home time1',212, 213, 214]
dict1 = {'House -AA':df1, 'House -BB': df2,'House -CC': df3 , 'House -DD':df4, 'House -EE':df5, 'House -FF':df6}
Prepared the code using for loop but I cannot able to split all the data frame in correct way. If I do not break the loop then I will receive an error of (List index out of range). Can you please help me to get similar results as I explained above?
Prepared code idea:
namesplit = lambda x: x.split('-')[0]
postion = 'Home '
rawname = []
for i in df[1]:
x = namesplit(i)
if postion == x:
rawname.append(i)
test = {}
for i in range(len(rawname)):
x = df[df[1]==rawname[i]].index.values
y = df[df[1]==rawname[i+1]].index.values
if y == len(df) -9:
break
df_1 = df.iloc[x[0]:y[0], :]
test[rawname[i]] = df_1
You can simply use groupby
and cumsum
:
result = {}
for _, i in df.groupby(df[1].str.startswith("Home -").cumsum()):
name, d = i[1].iat[0], i.iloc[1:]
result[name] = d[~d[1].isin(["sum","agg","max","min","med"])]
print (result)
{'Home -AA': 1 2 3 4
1 place/time value1 value2 value3
2 Home time1 1 2 3
3 Home time2 4 5 6
4 Home time3 7 8 9,
'Home -BB': 1 2 3 4
11 place/time value1 value2 value3
12 Home time1 11 12 13
13 Home time2 14 15 16
14 Home time3 17 18 19,
'Home -CC': 1 2 3 4
21 place/time value1 value2 value3,
'Home -DD': 1 2 3 4
23 place/time value1 value2 value3,
'Home -EE': 1 2 3 4
25 place/time value1 value2 value3,
'Home -FF': 1 2 3 4
27 place/time value1 value2 value3
28 Home time1 211 212 213
29 Home time1 212 213 214}