I am trying to read only the values of every 4th row of my excel data sheet (every value corresponding to "media"). I can do this but I want to read every 4th row, starting from row 2. Or technically it would be row 2 and row 6 and row 10 etc. But no matter what I try I can't get what I want. excel datafile
I tried variations that I found from different questions:
cols=[' estacion_id','estacion_nombre','sensor_id','sensor_alias','sensor_nombre','fecha_ observacio n','metrica','valor' ]
f=pd.read_csv('env_picacho-ma_2013.csv', usecols=['fecha_observacion', 'valor'])
date=pd.read_csv('env_picacho-ma_2013.csv', usecols=['fecha_observacion'],skiprows=lambda x: logic(x) )
temp=pd.read_csv('env_picacho-ma_2013.csv', usecols=['valor'],skiprows= lambda x:(x % 4))
This gives me every 4th row: valor 0 7.7 1 8.4 2 4.7 3 6.8 4 8.2 ... ... 1820 791.0 1821 771.0 1822 841.0 1823 806.0 1824 795.0
I also tried:
def logic(index=0):
if index % 4 == 3:
return True
return False
temp=pd.read_csv('env_picacho-ma_2013.csv', usecols=['valor'],skiprows= lambda x: logic(x))
This skips every 3rd row out of 4. So I thought I could just skip every row1 and row3 and row4 in order to keep only row2. But this doesn't work either:
def logic(index=0):
if index % 4 == 3 and index % 4== 2:
return True
return False
gives me all the values again. I can sort of understand why this wold not work because I am contradicting myslef but I don't know how to do it right:
valor
0 NaN 1 3.78417 2 0.60000 3 7.70000 4 NaN ... ... 7295 806.00000 7296 20382.00000 7297 NaN 7298 NaN 7299 795.00000
I then found this which selects every top 2 values every 5 rows but I don't know how to modify this for my case (every 2nd value every 4 rows):
df.groupby(df.index//5).head(2)
I tried quite a few variations and I guess my other option would either be a loop but I am terrible at coding loops so if anyone can offer any advice it would be much appreciated. What I want is: valor 0 3.78417 1 4.48482 2 2.38661 ...
I am trying to read only ... (every value corresponding to "media").
You can use :
SR, GAP = 2, 3
df = pd.read_csv("env_picacho-ma_2013.csv", # add more params if needed
skiprows=lambda i: (i+SR)%(GAP+1) and i!=0)
Output :
print(df)
estacion_id estacion_no sensor_id ... fecha_observacion metrica valor
0 90 PICACHO_MA 225 ... 01/01/2013 media 3.78
1 90 PICACHO_MA 225 ... 02/01/2013 media 4.48
2 90 PICACHO_M 225 ... 03/01/2013 media 2.39
3 90 PICACHO_M. 225 ... 04/01/2013 media 3.18
4 90 PICACHO_M. 225 ... 05/01/2013 media 3.86
5 90 PICACHO_MA 225 ... 06/01/2013 media 2.27
6 90 PICACHO_MA 225 ... 07/01/2013 media 2.42
7 90 PICACHO_MA 225 ... 08/01/2013 media 0.89
8 90 PICACHO_M. 225 ... 09/01/2013 media 2.07
9 90 PICACHO_M. 225 ... 10/01/2013 media 2.90
10 90 PICACHO M 225 ... 11/01/2013 media 3.95
Intermediates (callable) :
index 0 # --> False (don't skip)
index 1 # --> True
index 2 # --> False (don't skip)
index 3 # --> True
index 4 # --> True
index 5 # --> True
index 6 # --> False (don't skip)
index 7 # --> True
index 8 # --> True
index 9 # --> True
index 10 # --> False (don't skip)
index 11 # --> True
index 12 # --> True
index 13 # --> True
index 14 # --> False (don't skip)
index 15 # --> True
index 16 # --> True
index 17 # --> True
index 18 # --> False (don't skip)
...