I have a txt file which I parse in R to get some statistical information out of it. It looks like this:
**New Session**
Event A
Event B
Event B
Event C
Event A
Event C
...
**New Session**
...
**New Session**
...
What I need to do is to track for certain events when they happen. I want to receive a table like this:
Event A | Session 1
Event A | Session 1
Event A | Session 2
Event A | Session 3
I have no trouble with the parsing but I have no idea how I could connect the individual events to the session they happened in. There are no timestamps I could use.
One approach might be to cut the file in individual text files containing one session. But I bet there is a way to count up the sessions while parsing through for a certain event?
If I had to cut it up: How do I make R parse all files in a row for a certain string?
It is not uncommon that data of different kind are mixed up in one column of a data file. As long as the different kind of data can be identified in some way, e.g., by a regular expression, the contents of the rows can be moved to different columns. Here, packages data.table
and zoo
are used:
library(data.table)
dt[V1 == "**New Session**", session := paste("Session", seq_len(.N))]
dt[, session := zoo::na.locf(session)]
dt[V1 != "**New Session**", .(event = V1, session)][order(event, session)]
event session
# 1: Event A Session 1
# 2: Event A Session 1
# 3: Event A Session 2
# 4: Event A Session 2
# 5: Event A Session 3
# 6: Event B Session 1
# 7: Event B Session 1
# ...
session
is filled with a string indicating the session number. Sessions are numbered consecutively as they appear in the source file. No date is needed.session
column is empty (NA
) are filled with the session number from above (locf
means last observation carried forward).dt <- fread("**New Session**
Event A
Event B
Event B
Event C
Event A
Event C
**New Session**
Event A
Event B
Event B
Event C
Event A
Event B
**New Session**
Event A
Event B
Event D
Event D
Event B
Event C
", header = FALSE, sep = "\n")