Search code examples
rreadr

skipping to a certain position in a large txt file


I have over 100 .txt files on which i would like to do calculations. The files contain gaze data which is collected with an eye tracker.

The first part of the data is the calibration part. It contains only a limited number of variables. Every line looks like this(about 20 000 rows):

Event: Data - startTime 1563518990 endTime 1563619015 Gaze 885.638118989 316.57751978

The 2nd part of the data contains the actual gaze data as collected during a gaze test. It contains more variables, in which I'm interested. It looks like this:

Gaze Data - IviewTimestamp 649261961 OpenSesameTimeStamp 55191.0 GazeLeft 0.0 0.0 GazeRight 0.0 0.0 DistanceRight 530.630058679 DiameterLeft 4.89342033646 DiamaterRight 4.44607910548

However, when i use the function read_table2, it only find the variables gathered during the calibration proces. This is because R only looks at the first 1000 rows of the .txt file to determine the variables. I would like it to skip to the first line that contains "iviewTimestamp", so it only imports the relevant part of the .txt file and automatically find the right variables. Since the calibration length isn't equal in every subject, its not possible to skip to a fixed number.

How would i do this?


Solution

  • I'd suggest that you import the data and tidy it afterwards, rather than reading it twice.

    First import all the file that you have in your directory with:

    library(dplyr)
    library(purrr)
    df <- map_df(list.files(path = path, pattern = '*.txt', full.names = TRUE), read_table2)
    

    It's worth noting here that you can add optional args like col_names etc after you call 'read_table2'.

    Once all of your text files have been imported they can be filtered:

    filter(df, 'timeStampColumnName' == IviewTimestamp)