Search code examples
rdelimiterdataformat

data format and uploading issue


Here is a snippet of data I have to analyze:

2014 log[1]: id="1" sys="Sec" sub="fill" severity="inf" dort="30" 2014 log[2]: id="2" sys="Sec" sub="fill" severity="inf" name="dropped"

Excuse me if this is an obvious question, but what kind of data is this, and how can I get it into a csv format to analyze? So far I have tried using R by specifying whitespace as the delimiter, but that didn't get me what I need:

table1<-read.table('data.txt', sep="", fill=TRUE, header = FALSE)

Ideally, "id", "sys", "sub", "severity", "dort", and "name"would be the column headers. Each row contains about the same number of variables, but some rows do not contain variables that other rows do. For those instances, I would like there to be "NULL". Ex: In row two in the data above, under the dort column I would like it to say "NULL"


Solution

  • Hope this helps!

    id="1" sys="Sec" sub="fill" severity="inf" dort="30" 
    id="2" sys="Sec" sub="fill" severity="inf" name="dropped"
    

    Let's consider that test.txt has above mentioned data then

    library(rjson)
    library(plyr)
    
    #prepare json data
    txt_file <- readLines("test.txt")
    json_file <- lapply(txt_file, function(x) paste0('{"', gsub(',', ',"',gsub("=",'":',gsub('"\\s+', '",', gsub("^\\s+|\\s+$","",x)))), '}'))
    json_file <- paste0("[", paste(json_file, collapse=","), "]")
    
    #read json data
    json_data <- fromJSON(json_file)
    
    #convert it to dataframe
    df <- rbind.fill(lapply(json_data,function(y){as.data.frame(t(y),stringsAsFactors=FALSE)}))
    df[df=='NULL'] <- NA
    df
    

    Output is:

      id sys  sub severity dort    name
    1  1 Sec fill      inf   30      NA
    2  2 Sec fill      inf   NA dropped