Search code examples
csvsortingcmdskip

how to copy consecutive and skip n- lines in csv?


I have csv files per year, f.i. hourdata-2019.csv, it looks like this:

date,hour,temp
20181231,24,75
20190101,1,76
20190101,2,76
20190101,3,75
20190101,4,76
20190101,5,74
......etc

what i would like to do is create a new file that has blocks of 2 consecutive lines and then skip some lines. F.I. I would like to know the temp difference for one hour per every 9 hours throughout the year, so if I skip 7 lines the new file should look something like:

20190101,1,76
20190101,2,76
20190101,10,57
20190101,11,60
20190101,19,61
20190101,20,56
.........etc

I was first hoping that i could use OpenOffice or LibreOffice calc module, but could only find solutions like Copy every nth line from one sheet to another and some macro for OO that i was unable to adopt to my needs. then I thought it might be more something for a cmd script, but i could not find a suitable example for that either. what I was able to do is make a list of the lines i want in calc,and with help of: =INDIRECT(ADDRESS($L$1,K3,1,,"Sheet1")) where l1=1 and k3=3 it should result in sheet1.C1, and that solved it for me


Solution

  • Nice challenge. Here is a pure batch solution:

    @echo off
    setlocal enabledelayedexpansion
    
    REM following code to produce some data for testing:
    (
    echo date,hour,temp
    echo 20181231,24,99
    for /l %%a in (1,1,9) do @for /l %%b in (1,1,24) do @echo 2019010%%a,%%b,!random:~-2!
    for /l %%a in (1,1,9) do @for /l %%b in (1,1,24) do @echo 2019011%%a,%%b,!random:~-2!
    for /l %%a in (1,1,9) do @for /l %%b in (1,1,24) do @echo 2019012%%a,%%b,!random:~-2!
    )>hourdata-test.csv
    
    
    REM code to extract desired values
    REM expected hour-pairs: 1,2 - 10,11 - 19,20 - 4,5 - 13,14 - 22,23 - 7,8 - 16,17 : repeat
    
    (for /f "tokens=1,* delims=:" %%a in ('findstr /n "^" hourdata-test.csv') do (
      set /a "x=%%a %% 9"
      if !x! == 3 echo %%b
      if !x! == 4 echo %%b
    ))>ninerdata.csv
    

    The trick is to use the line numbers, calculate Modulo 9 and then simply compare the resulting value. Skipping the first two lines is achieved by printing the modulo numbers 3 and 4.

    A full year of data should take less than 2 seconds.