Every day I create a csv that looks something like this.
Sector | Code | Col1 | Col2 | Col3 | Col4 |
---|---|---|---|---|---|
Sect1 | |||||
CodeA | data | data | data | data | |
CodeB | data | data | data | data | |
CodeC | data | data | data | data | |
CodeD | data | data | data | data | |
Sect2 | |||||
CodeE | data | data | data | data | |
CodeF | data | data | data | data | |
CodeG | data | data | data | data | |
CodeH | data | data | data | data | |
Sect3 | |||||
CodeI | data | data | data | data | |
CodeJ | data | data | data | data | |
CodeK | data | data | data | data | |
CodeL | data | data | data | data | |
SectN | |||||
CodeX | data | data | data | data | |
CodeY | data | data | data | data | |
CodeZ | data | data | data | data |
Note the blank cells: the whole row after the sector name, and the 'sector' column before each code.
Id like to read the csv and EITHER:
0 | 1 | 2 | 3 | 4 |
---|---|---|---|---|
Sector1 | col1total | col2total | col3total | col4total |
Sector2 | col1total | col2total | col3total | col4total |
Sector3 | col1total | col2total | col3total | col4total |
SectorN | col1total | col2total | col3total | col4total |
or
Im a noob and might have bitten off more than I can chew! Currently Im separating the original csv into sector csvs and summing that way using the following code, but as you can imagine its time consuming. Its not really relevant to my question, but heres the code I use:
import pandas
with open('D:\Libraries\Downloads\igcxtest.csv', 'r') as fileObject: reader_object = pandas.read_csv(fileObject, delimiter=';', decimal=',', header=0, names=('Noticia','Ativo','Variacao','Maximo', 'Ultimo','Minimo','Negocios','Semana', 'Mes','3 meses','6 meses','12 meses', 'Volume','Fourpercent','eightpercent', 'twelvepercent','plus25m','plus25q', 'plus50m','plus50q','minusfourpercent', 'minuseightpercent','minustwelvepercent', 'minus25m','minus25q','minus50m','minus50q', 'weekhigh','Twoweekhigh','Threeweekhigh', 'Threemthhigh','Yearhigh','weeklow', 'Twoweeklow','Threeweeklow','Threemthlow', 'Yearlow','Over200','Over50','Over21','Grafico')) # print(reader_object) reader_object.drop(['Noticia', 'Maximo', 'Ultimo','Minimo','Negocios','Semana','Mes','3 meses','6 meses','12 meses','Grafico'], axis=1, inplace=True) #print(reader_object) print(reader_object['Threemthhigh'])
reader_object['up_or_down'] = reader_object['Variacao'].apply(lambda x: 1 if x > 0 else (-1 if x < 0 else 0))
#Calculate ups and downs
ups = 0
downs = 0
up_down = []
for val in reader_object['Variacao']:
if val >0:
ups+=1
elif val < 0:
downs+=1
up_down = [ups, downs]
print(up_down)
#Calculate cumulative volume
reader_object['Volume'] = reader_object['Volume'] * reader_object['up_or_down']
#print(reader_object)
vol = 0
for v in reader_object['Volume']:
vol += v
reader_object.drop(['Variacao', 'up_or_down'], axis=1, inplace=True)
print(reader_object)
#print(ups, downs, vol)
#Sum columns
sums = reader_object.sum(numeric_only=True)
print(sums)
sums_list = []
x=0
for x in sums:
sums_list.append(x)
sums_list = up_down + sums_list
print(sums_list)
sum_df = pandas.DataFrame(sums_list).T
#print(sum_df)
sum_df.to_csv('D:\Libraries\Downloads\sums.csv')
I cant find info on ways to read the original csv, and split it by using (I imagine) keywords in rows ("sectorN") or by numbers of lines (how many "codes" in each "sector") first.
Any suggestions? Thanks...
You could try this:
df['State'].fillna(method = 'ffill', inplace = True)
df2 = df.query('Town is not null')
df3 = df2.groupby(['State', 'Town']).sum().reset_index()