Search code examples
pandasrowread-csv

read every 2nd value every 4 rows in pandas


I am trying to read only the values of every 4th row of my excel data sheet (every value corresponding to "media"). I can do this but I want to read every 4th row, starting from row 2. Or technically it would be row 2 and row 6 and row 10 etc. But no matter what I try I can't get what I want. excel datafile

I tried variations that I found from different questions:

cols=[' estacion_id','estacion_nombre','sensor_id','sensor_alias','sensor_nombre','fecha_ observacio n','metrica','valor' ]  
f=pd.read_csv('env_picacho-ma_2013.csv', usecols=['fecha_observacion', 'valor'])

date=pd.read_csv('env_picacho-ma_2013.csv', usecols=['fecha_observacion'],skiprows=lambda x: logic(x)  )
temp=pd.read_csv('env_picacho-ma_2013.csv', usecols=['valor'],skiprows= lambda x:(x % 4))

This gives me every 4th row: valor 0 7.7 1 8.4 2 4.7 3 6.8 4 8.2 ... ... 1820 791.0 1821 771.0 1822 841.0 1823 806.0 1824 795.0

I also tried:

def logic(index=0):

if index % 4 == 3:
    return True

return False

temp=pd.read_csv('env_picacho-ma_2013.csv', usecols=['valor'],skiprows= lambda x: logic(x))

This skips every 3rd row out of 4. So I thought I could just skip every row1 and row3 and row4 in order to keep only row2. But this doesn't work either:

def logic(index=0):

if index % 4 == 3 and index % 4== 2:
    return True

return False

gives me all the values again. I can sort of understand why this wold not work because I am contradicting myslef but I don't know how to do it right:

        valor

0 NaN 1 3.78417 2 0.60000 3 7.70000 4 NaN ... ... 7295 806.00000 7296 20382.00000 7297 NaN 7298 NaN 7299 795.00000

I then found this which selects every top 2 values every 5 rows but I don't know how to modify this for my case (every 2nd value every 4 rows):

df.groupby(df.index//5).head(2)

I tried quite a few variations and I guess my other option would either be a loop but I am terrible at coding loops so if anyone can offer any advice it would be much appreciated. What I want is: valor 0 3.78417 1 4.48482 2 2.38661 ...


Solution

  • I am trying to read only ... (every value corresponding to "media").

    You can use :

    SR, GAP = 2, 3
    
    df = pd.read_csv("env_picacho-ma_2013.csv", # add more params if needed
                     skiprows=lambda i: (i+SR)%(GAP+1) and i!=0)
    

    Output :

    print(df)
    
        estacion_id estacion_no  sensor_id  ... fecha_observacion metrica valor
    0            90  PICACHO_MA        225  ...        01/01/2013   media  3.78
    1            90  PICACHO_MA        225  ...        02/01/2013   media  4.48
    2            90   PICACHO_M        225  ...        03/01/2013   media  2.39
    3            90  PICACHO_M.        225  ...        04/01/2013   media  3.18
    4            90  PICACHO_M.        225  ...        05/01/2013   media  3.86
    5            90  PICACHO_MA        225  ...        06/01/2013   media  2.27
    6            90  PICACHO_MA        225  ...        07/01/2013   media  2.42
    7            90  PICACHO_MA        225  ...        08/01/2013   media  0.89
    8            90  PICACHO_M.        225  ...        09/01/2013   media  2.07
    9            90  PICACHO_M.        225  ...        10/01/2013   media  2.90
    10           90   PICACHO M        225  ...        11/01/2013   media  3.95

    Intermediates (callable) :

    index  0 # --> False (don't skip)
    index  1 # -->  True
    index  2 # --> False (don't skip)
    index  3 # -->  True
    index  4 # -->  True
    index  5 # -->  True
    index  6 # --> False (don't skip)
    index  7 # -->  True
    index  8 # -->  True
    index  9 # -->  True
    index 10 # --> False (don't skip)
    index 11 # -->  True
    index 12 # -->  True
    index 13 # -->  True
    index 14 # --> False (don't skip)
    index 15 # -->  True
    index 16 # -->  True
    index 17 # -->  True
    index 18 # --> False (don't skip)
    ...