Search code examples
rdata-cleaning

How to clean data where the variable name and property are in the same cell?


I need to clean data where the variable property and answer associated with a location are together in a single cell. The only thing consistent in my dataset is that they are separated by a colon (:). I need to remap the data to the variable property becomes the column header and the data is mapped for each Location.

I've attached an example:

enter image description here

There can also be a bunch of other symbols that are irrelevant. I just need to extract the string before the colon and the string or integer after the colon and it is mapped correctly for each location.

How do I do this on R? What functions should I be using

Example data:

Example1    Sunny:"TRUE"    NearCoast:False Schools:{"13"} 2
Example2    NearCoast:False Schools:{"6"}   Sunny:"FALSE" 3
Example3    Schools:{"2"}   Sunny:"TRUE"    NearCoast:TRUE Transport:5

Also, would it be possible that I could add exceptions to this process. For example, if the cell is simply a number alone, it is ignored. Or, if the property name is a specific thing such as "transport", it ignores the cell too.


Solution

  • Try this example, as mentioned in comments, we can reshape wide-to-long, then string split on :, then again reshape long-to-wide.

    df1 <- read.table(text = '
    Example1    Sunny:"TRUE"    NearCoast:False Schools:{"13"} 2
    Example2    NearCoast:False Schools:{"6"}   Sunny:"FALSE" 3
    Example3    Schools:{"2"}   Sunny:"TRUE"    NearCoast:TRUE Transport:5',
                      header = FALSE, stringsAsFactors = FALSE)
    
    
    library(tidyverse)
    
    gather(df1, key = "k", value = "v", -V1) %>% 
      separate(v, into = c("type", "value"), sep = ":") %>% 
      filter(!is.na(value)) %>% 
      select(-k) %>% 
      spread(key = type, value = value)
    
    #         V1 NearCoast Schools   Sunny Transport
    # 1 Example1     False  {"13"}  "TRUE"      <NA>
    # 2 Example2     False   {"6"} "FALSE"      <NA>
    # 3 Example3      TRUE   {"2"}  "TRUE"         5