Search code examples
rtextdplyrgreplstartswith

Loading text file into R to analyze chat log


So, I have been trying to turn a text file (each line is a chat log) into R to turn it into a data frame and further tidy the data.

I am using read.Lines so I can have each log as a single line. Because read.Lines reads them a single long char; I then convert them to strings (I need to parse the log); as per below

rawchat <- readLines("disc-W-App-avec-loy.txt")
rawchat <- c(lapply(rawchat, toString))

My problem comes when I want to turn this list into data frame:

rawchat <- as.data.frame(rawchat)

It turns the list into a data frame of 1 observation of 42,000 variables. The intention was to turn it into 42,000 observations of one variable.

Any help please?

By the way, I am pretty new in tidying raw data in R.


So, I encountered another block:

I loaded a text file as data frame as per below.

rawchat <- readLines("disc-W-App-avec-loy.txt")
rawchat <- as.data.frame(rawchat, stringsAsFactors=FALSE)
names(rawchat) <- "chat"

I am currently trying to identify any row (42000) that starts with the number 16. I can't seem to apply correctly the startsWith() function or the dplyr starts_with(), even grepl with regular expressions.

Could it be the format of the observations of the data frame (chr)?


Solution

  • The problem is your rawchat <- c(lapply(rawchat, toString)) Just use

    rawchat <- readLines("disc-W-App-avec-loy.txt")")
    rawchat <- as.data.frame(rawchat, stringsAsFactors=FALSE)