Search code examples
rreshapereshape2meltdata-wrangling

Why does melt return NA column in R?


I have the following list df in R:

structure(list(disease = structure(c(1L, 1L), .Label = "Barcelona", class = "factor"), 
    `<18` = structure(list(0.193103448275862, 
        0.0445344129554656), .Names = c(NA_character_, NA_character_
    )), `19-25` = structure(list(0.0413793103448276, 
        0.345748987854251), .Names = c(NA_character_, NA_character_
    )), `26-64` = structure(list(0.448275862068966, 0.167611336032389), .Names = c(NA_character_, 
    NA_character_)), `46-64` = structure(list(0.0344827586206897, 
        0.00647773279352227), .Names = c(NA_character_, NA_character_
    )), `>65` = structure(list(0.282758620689655, 
        0.435627530364373), .Names = c(NA_character_, NA_character_
    )), type = structure(1:2, .Label = c("Clinical Trial", "Real-World"
    ), class = "factor")), class = "data.frame", row.names = c(NA, 
-2L))

I want to rearrange the dataframe, so that I can get each value by city, flat and agegroup using melt. However, I get an extra column as an output:

melt(df)
           city           type           variable      value          NA
1  Barcelona       flat                  <18           0.19310345 0.044534413
2  Barcelona       house                 <18           0.19310345 0.044534413
3  Barcelona       flat                  19 - 25       0.04137931 0.345748988
4  Barcelona       house                 19 - 25       0.04137931 0.345748988
5  Barcelona       flat                  26 - 45       0.44827586 0.167611336
6  Barcelona       house                 26 - 45       0.44827586 0.167611336
7  Barcelona       flat                  46 - 64       0.03448276 0.006477733
8  Barcelona       house                 46 - 64       0.03448276 0.006477733
9  Barcelona       flat                  > 65          0.28275862 0.435627530
10 Barcelona       house                 > 65          0.28275862 0.435627530

Is there any way of not having the NA column and getting unique values in the value column?


Solution

  • The issue is that your measure columns are list class, not numeric class. If we convert them to numeric, melt will work fine. (I show one way to do it, but it might be better to go earlier in your workstream and prevent the columns from being created as lists in the first place... this is definitely what you should do if my code that works on your sample data runs into issues on the larger data. tidyr::unnest may be able to help in that case.)

    sapply(df, class)
    #  disease      <18    19-25    26-64    46-64      >65     type 
    # "factor"   "list"   "list"   "list"   "list"   "list" "factor" 
    
    list_cols = sapply(df, is.list)
    
    df[list_cols] = lapply(df[list_cols], unlist)
    
    reshape2::melt(df, id.vars = c("disease", "type"))
    #      disease           type variable       value
    # 1  Barcelona Clinical Trial      <18 0.193103448
    # 2  Barcelona     Real-World      <18 0.044534413
    # 3  Barcelona Clinical Trial    19-25 0.041379310
    # 4  Barcelona     Real-World    19-25 0.345748988
    # 5  Barcelona Clinical Trial    26-64 0.448275862
    # 6  Barcelona     Real-World    26-64 0.167611336
    # 7  Barcelona Clinical Trial    46-64 0.034482759
    # 8  Barcelona     Real-World    46-64 0.006477733
    # 9  Barcelona Clinical Trial      >65 0.282758621
    # 10 Barcelona     Real-World      >65 0.435627530