Search code examples
rjsonstringr

Replacing commas and colon within ' '


I have a dataset with key-value pairs that I want to import into R. The keys and values are separated by colons, while the key-value pairs are separated by commas. However, some of the values contain commas or colons, which can cause confusion when importing the data into R. To avoid this issue, I need to replace the commas and colons in the values with a different character before importing the data. For example:

{'AI': 'C3.ai, Inc.', 'BA': 'Boeing Company (The)', 'AAL': 'American Airlines Group, Inc.', 'MA': 'Mastercard :Incorporated'}

to

{'AI': 'C3.ai| Inc.', 'BA': 'Boeing Company (The)', 'AAL': 'American Airlines Group| Inc.', 'MA': 'Mastercard |Incorporated'}

I have tried this:

replacer<- function(x) {
  str_replace_all(x, "[,:]", "|")
}

clean_lines <- str_replace_all(lines, "(?<=')[^']*[:.][[:space:]]*[^']*[[:space:]]*[^']*(?=')", replacer)
cat(clean_lines)

which works fine for commas but messes up all colons, here is the result

{'AI': 'C3.ai| Inc.', 'BA': 'Boeing Company (The)', 'AAL': 'American Airlines Group| Inc.','MA': 'Mastercard :Incor| porated'}

how can i edit this code to replace only : within ' '


Solution

  • This is a JSON format, so read it as such. First, to make it a valid format, we need to replace single quotes - ' to double - ", then read using jsonlite package:

    library(jsonlite)
    
    # example file
    writeLines("{'AI': 'C3.ai, Inc.', 'BA': 'Boeing Company (The)', 'AAL': 'American Airlines Group, Inc.', 'MA': 'Mastercard :Incorporated'}", 
               "tmp.txt")
    
    # read from file
    x <- readLines("tmp.txt")
    
    x <- gsub("'", "\"", x, fixed = TRUE)
    
    fromJSON(x)
    # $AI
    # [1] "C3.ai, Inc."
    # 
    # $BA
    # [1] "Boeing Company (The)"
    # 
    # $AAL
    # [1] "American Airlines Group, Inc."
    # 
    # $MA
    # [1] "Mastercard :Incorporated"