Search code examples
rtime-seriesexpandresamplingeye-tracking

Expand time series in new dataframe R


I am new in R, I have collected eye-tracking data that has the following structure:

Participant Trial Condition Fixation.Start  Fixation.End Fixated.Area
P01         T01   Early     4               206          Outside
P01         T01   Early     258             476          Competitor
P01         T01   Early     496             882          Target
P01         T02   Late      4               794          Outside
P01         T02   Late      838             1026         Target
P01         T02   Late      1046            1328         Target
P02         T01   Early     4               168          Outside
P02         T01   Early     232             452          Competitor
P02         T01   Early     494             738          Target
P02         T02   Late      4               176          Outside
P02         T02   Late      238             466          Target
P02         T02   Late      524             632          Competitor

In it, the fixation time to the different areas shown on screen was measured in milliseconds form beginning (Fixiation Start) to end (Fixation End). Each row is a fixation.

What I would like to do is to reshape the data into time bins of 50ms in a new dataframe so that each time bin (row) reflects what area was being fixated at that moment. In other words, I want the new dataframe to look like this:

Participant Trial   Condition   Time.Bin    Fixated.Area
P01         T01     Early       50          Outside
P01         T01     Early       100         Outside
P01         T01     Early       150         Outside
P01         T01     Early       200         Outside
P01         T01     Early       250         Competitor
P01         T01     Early       300         Competitor
P01         T01     Early       350         Competitor
P01         T01     Early       400         Competitor
P01         T01     Early       450         Competitor
P01         T01     Early       500         Target
P01         T01     Early       550         Target
P01         T01     Early       600         Target
P01         T01     Early       650         Target  

I think this should be pretty easy to do in R. Any ideas?


Solution

  • Here's a technique that expands each timeframe into by=50 time bins.

    base R

    Time.Bins <- Map(
      function(a, b) seq(a, b, by = 50),
      ceiling(dat$Fixation.Start / 50) * 50,
      dat$Fixation.End)
    
    out <- cbind(
      dat[, c("Participant", "Trial", "Condition", "Fixated.Area")
          ][ rep(seq_len(nrow(dat)), lengths(Time.Bins)),],
      Time.Bin = unlist(Time.Bins)
    )
    head(out, 20)
    #     Participant Trial Condition Fixated.Area Time.Bin
    # 1           P01   T01     Early      Outside       50
    # 1.1         P01   T01     Early      Outside      100
    # 1.2         P01   T01     Early      Outside      150
    # 1.3         P01   T01     Early      Outside      200
    # 2           P01   T01     Early   Competitor      300
    # 2.1         P01   T01     Early   Competitor      350
    # 2.2         P01   T01     Early   Competitor      400
    # 2.3         P01   T01     Early   Competitor      450
    # 3           P01   T01     Early       Target      500
    # 3.1         P01   T01     Early       Target      550
    # 3.2         P01   T01     Early       Target      600
    # 3.3         P01   T01     Early       Target      650
    # 3.4         P01   T01     Early       Target      700
    # 3.5         P01   T01     Early       Target      750
    # 3.6         P01   T01     Early       Target      800
    # 3.7         P01   T01     Early       Target      850
    # 4           P01   T02      Late      Outside       50
    # 4.1         P01   T02      Late      Outside      100
    # 4.2         P01   T02      Late      Outside      150
    # 4.3         P01   T02      Late      Outside      200
    

    dplyr

    library(dplyr)
    out <- dat %>%
      rowwise() %>%
      summarize(
        Participant, Trial, Condition, Fixated.Area,
        Time.Bin = seq(ceiling(Fixation.Start / 50) * 50, Fixation.End, by = 50),
        .groups = "drop"
      )
    out
    # # A tibble: 64 x 5
    #    Participant Trial Condition Fixated.Area Time.Bin
    #    <chr>       <chr> <chr>     <chr>           <dbl>
    #  1 P01         T01   Early     Outside            50
    #  2 P01         T01   Early     Outside           100
    #  3 P01         T01   Early     Outside           150
    #  4 P01         T01   Early     Outside           200
    #  5 P01         T01   Early     Competitor        300
    #  6 P01         T01   Early     Competitor        350
    #  7 P01         T01   Early     Competitor        400
    #  8 P01         T01   Early     Competitor        450
    #  9 P01         T01   Early     Target            500
    # 10 P01         T01   Early     Target            550
    # # ... with 54 more rows
    

    Fixing time=250

    Your expected output shows "Competitor" at time=250, but the data does not support that. If you need 250 (with or without an area), then you can interpolate this way.

    expbins <- do.call(rbind, by(out, out[,c("Participant", "Trial", "Condition")],
       FUN = function(z) {
         rng <- seq(min(z$Time.Bin), max(z$Time.Bin), by = 50)
         transform(z[rep(1, length(rng)),], Fixated.Area = NULL, Time.Bin = rng)
       }))
    out2 <- merge(expbins, out, by = c("Participant", "Trial", "Condition", "Time.Bin"), all = TRUE)
    head(out2, 10)
    #    Participant Trial Condition Time.Bin Fixated.Area
    # 1          P01   T01     Early       50      Outside
    # 2          P01   T01     Early      100      Outside
    # 3          P01   T01     Early      150      Outside
    # 4          P01   T01     Early      200      Outside
    # 5          P01   T01     Early      250         <NA>
    # 6          P01   T01     Early      300   Competitor
    # 7          P01   T01     Early      350   Competitor
    # 8          P01   T01     Early      400   Competitor
    # 9          P01   T01     Early      450   Competitor
    # 10         P01   T01     Early      500       Target
    

    which presents the time=250 as NA, an unknown state (which is better, imo).

    Dplyr, same:

    out %>%
      group_by(Participant, Trial, Condition) %>%
      summarize(
        Time.Bin = seq(min(Time.Bin), max(Time.Bin), by = 50),
        .groups = "drop"
      ) %>%
      full_join(out, by = c("Participant", "Trial", "Condition", "Time.Bin"))
    # # A tibble: 69 x 5
    #    Participant Trial Condition Time.Bin Fixated.Area
    #    <chr>       <chr> <chr>        <dbl> <chr>       
    #  1 P01         T01   Early           50 Outside     
    #  2 P01         T01   Early          100 Outside     
    #  3 P01         T01   Early          150 Outside     
    #  4 P01         T01   Early          200 Outside     
    #  5 P01         T01   Early          250 <NA>        
    #  6 P01         T01   Early          300 Competitor  
    #  7 P01         T01   Early          350 Competitor  
    #  8 P01         T01   Early          400 Competitor  
    #  9 P01         T01   Early          450 Competitor  
    # 10 P01         T01   Early          500 Target      
    # # ... with 59 more rows
    

    Data:

    dat <- structure(list(Participant = c("P01", "P01", "P01", "P01", "P01", "P01", "P02", "P02", "P02", "P02", "P02", "P02"), Trial = c("T01", "T01", "T01", "T02", "T02", "T02", "T01", "T01", "T01", "T02", "T02", "T02"), Condition = c("Early", "Early", "Early", "Late", "Late", "Late", "Early", "Early", "Early", "Late", "Late", "Late"), Fixation.Start = c(4L, 258L, 496L, 4L, 838L, 1046L, 4L, 232L, 494L, 4L, 238L, 524L), Fixation.End = c(206L, 476L, 882L, 794L, 1026L, 1328L, 168L, 452L, 738L, 176L, 466L, 632L), Fixated.Area = c("Outside", "Competitor", "Target", "Outside", "Target", "Target", "Outside", "Competitor", "Target", "Outside", "Target", "Competitor")), class = "data.frame", row.names = c(NA, -12L))