Search code examples
rreshape2dcast

dcast not retaining variable type as character, error in vapply when a variable is NA


In attempting to reshape data using resphape2::dcast, I am encountering an error involving NA entries. Sample data are at the end.

The data are reshaped from long to wide, but on occasion some parameters have all NA entries, which appears to be causing the issue. Or at least, I think it is. If I remove any parameter like that, Ammonia in this example, the error goes away.

In debugging dcast, it seems to pin down to this line:

ordered <- vaggregate(.value = value, .group = overall, 
  .fun = fun.aggregate, ..., .default = fill, .n = n)

which results in the error:

Error in vapply(indices, fun, .default) : 
values must be type 'character',
but FUN(X[[1]]) result is type 'integer'

Seeing that the NA variable is first in line, I thought the aggregate function may default to integer, even though the entire column is character, but moving those rows did not solve it. The only way I can find to solve it is by using na.omit, which removes that parameter completely. My expected output would retain any parameters with all NA if possible. The second reason for this is if a day/depth is not sampled, it should be retained and those entries should be ns (not sampled). Is there a way I can solve this error without having to remove all NA parameters that will be reshaped?

Reproducible example (data are below dcast code):

library(reshape2)
dcast(df, station + date + depth ~ parmcode, value.var = "value_qualif", fill="ns")

dcast(na.omit(df), station + date + depth ~ parmcode,
 value.var = "value_qualif", fill="ns") # solves error, but removes parameter completely

Example data:

df <- structure(list(station = c("A", "A", "A", "A", "A", "A", "A", 
"A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", 
"A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", 
"A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", 
"A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", 
"A", "A", "A", "A", "A", "A", "A", "A", "A", "A"), date = c("7/2/2018", 
"7/2/2018", "7/2/2018", "7/2/2018", "7/2/2018", "7/2/2018", "7/2/2018", 
"7/2/2018", "7/2/2018", "7/2/2018", "7/2/2018", "7/2/2018", "7/2/2018", 
"7/2/2018", "7/2/2018", "7/2/2018", "7/2/2018", "7/2/2018", "7/2/2018", 
"7/2/2018", "7/2/2018", "7/2/2018", "7/2/2018", "7/2/2018", "7/2/2018", 
"7/2/2018", "7/2/2018", "7/2/2018", "7/2/2018", "7/2/2018", "7/1/2018", 
"7/1/2018", "7/1/2018", "7/1/2018", "7/1/2018", "7/1/2018", "7/1/2018", 
"7/1/2018", "7/1/2018", "7/9/2018", "7/9/2018", "7/9/2018", "7/9/2018", 
"7/9/2018", "7/9/2018", "7/9/2018", "7/9/2018", "7/9/2018", "7/9/2018", 
"7/9/2018", "7/9/2018", "7/9/2018", "7/9/2018", "7/9/2018", "7/9/2018", 
"7/9/2018", "7/9/2018", "7/9/2018", "7/9/2018", "7/9/2018", "7/9/2018", 
"7/9/2018", "7/9/2018", "7/9/2018", "7/9/2018", "7/9/2018", "7/9/2018", 
"7/9/2018", "7/9/2018"), depth = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 
18L, 18L, 18L, 18L, 18L, 18L, 18L, 18L, 18L, 18L, 1L, 1L, 1L, 
12L, 12L, 12L, 18L, 18L, 18L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 18L, 
18L, 18L, 18L, 18L, 18L, 18L, 18L, 18L, 18L), parmcode = c("CDOM", 
"DENSITY", "DO", "ENTERO", "PH", "TOTAL", "XMS", "TEMP", "SAL", 
"FECAL", "TOTAL", "FECAL", "ENTERO", "CDOM", "XMS", "TEMP", "SAL", 
"PH", "DO", "DENSITY", "DO", "DENSITY", "TOTAL", "FECAL", "PH", 
"CDOM", "XMS", "TEMP", "SAL", "ENTERO", "AMMONIA AS N", "AMMONIA AS N", 
"AMMONIA AS N", "AMMONIA AS N", "AMMONIA AS N", "AMMONIA AS N", 
"AMMONIA AS N", "AMMONIA AS N", "AMMONIA AS N", "TOTAL", "XMS", 
"TEMP", "SAL", "PH", "DO", "DENSITY", "CDOM", "FECAL", "ENTERO", 
"CDOM", "FECAL", "ENTERO", "PH", "DO", "TEMP", "XMS", "TOTAL", 
"DENSITY", "SAL", "TOTAL", "FECAL", "ENTERO", "XMS", "TEMP", 
"SAL", "PH", "DO", "DENSITY", "CDOM"), value_qualif = c(NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, "<2", "<2", "<2", "1.3", "69.67", 
"16.6", "33.7", "8.1", "7.6", "24.622", "5.5", "25.279", "<2", 
"<2", "7.8", "1.38", "72.96", "13.2", "33.61", "<2", NA, NA, 
NA, NA, NA, NA, NA, NA, NA, "<2", "77.82", "20.8", "33.72", "8.2", 
"8.8", "23.58", "1.01", "<2", "<2", "1.78", "<2", "<2", "8", 
"6.5", "13.5", "67.19", "2e", "25.197", "33.58", "2e", "2e", 
"<2", "75.53", "12.9", "33.61", "7.9", "5.5", "25.34", "1.77"
)), class = "data.frame", row.names = c(NA, -69L))

Some tangentially related questions that don't answer my question are POSIXct values become numeric in reshape2 dcast and Error with custom aggregate function for a cast() call in R reshape2

With using na.omit my output is:

  station     date depth CDOM DENSITY  DO ENTERO FECAL  PH   SAL TEMP TOTAL   XMS
1       A 7/2/2018    12  1.3  24.622 7.6     <2    <2 8.1  33.7 16.6    <2 69.67
2       A 7/2/2018    18 1.38  25.279 5.5     <2    <2 7.8 33.61 13.2    <2 72.96
3       A 7/9/2018     1 1.01   23.58 8.8     <2    <2 8.2 33.72 20.8    <2 77.82
4       A 7/9/2018    12 1.78  25.197 6.5     <2    <2   8 33.58 13.5    2e 67.19
5       A 7/9/2018    18 1.77   25.34 5.5     <2    2e 7.9 33.61 12.9    2e 75.53

Expected output without using na.omit is:

  station     date depth AMMONIA CDOM  DENSITY  DO  ENTERO FECAL  PH   SAL TEMP TOTAL   XMS
1       A 7/2/2018     1    ns    ns      ns    ns     ns    ns   ns    ns   ns    ns    ns
2       A 7/2/2018    12    ns    1.3   24.622  7.6    <2    <2  8.1  33.7 16.6    <2 69.67
3       A 7/2/2018    18    ns    1.38  25.279  5.5    <2    <2  7.8 33.61 13.2    <2 72.96
4       A 7/9/2018     1    ns    1.01  23.58   8.8    <2    <2  8.2 33.72 20.8    <2 77.82
5       A 7/9/2018    12    ns    1.78  25.197  6.5    <2    <2    8 33.58 13.5    2e 67.19
6       A 7/9/2018    18    ns    1.77  25.34   5.5    <2    2e  7.9 33.61 12.9    2e 75.53

Solution

  • The actual issue is that all the parameters have only one value for each triple of (station, date, depth) except for AMMONIA AS N, which has three NA entries.

    For instance,

    dcast(df, station + date + depth ~ parmcode, value.var = "value_qualif")
    # Aggregation function missing: defaulting to length
    #   station     date depth AMMONIA AS N CDOM DENSITY DO ENTERO FECAL PH SAL TEMP TOTAL XMS
    # 1       A 7/1/2018     1            3    0       0  0      0     0  0   0    0     0   0
    # 2       A 7/1/2018    12            3    0       0  0      0     0  0   0    0     0   0
    # 3       A 7/1/2018    18            3    0       0  0      0     0  0   0    0     0   0
    # 4       A 7/2/2018     1            0    1       1  1      1     1  1   1    1     1   1
    # 5       A 7/2/2018    12            0    1       1  1      1     1  1   1    1     1   1
    # 6       A 7/2/2018    18            0    1       1  1      1     1  1   1    1     1   1
    # 7       A 7/9/2018     1            0    1       1  1      1     1  1   1    1     1   1
    # 8       A 7/9/2018    12            0    1       1  1      1     1  1   1    1     1   1
    # 9       A 7/9/2018    18            0    1       1  1      1     1  1   1    1     1   1
    

    Once we remove the duplicate rows everything works smoothly

    dcast(df[!duplicated(df), ], station + date + depth ~ parmcode, value.var = "value_qualif")
    #   station     date depth AMMONIA AS N CDOM DENSITY   DO ENTERO FECAL   PH   SAL TEMP TOTAL   XMS
    # 1       A 7/1/2018     1         <NA> <NA>    <NA> <NA>   <NA>  <NA> <NA>  <NA> <NA>  <NA>  <NA>
    # 2       A 7/1/2018    12         <NA> <NA>    <NA> <NA>   <NA>  <NA> <NA>  <NA> <NA>  <NA>  <NA>
    # 3       A 7/1/2018    18         <NA> <NA>    <NA> <NA>   <NA>  <NA> <NA>  <NA> <NA>  <NA>  <NA>
    # 4       A 7/2/2018     1         <NA> <NA>    <NA> <NA>   <NA>  <NA> <NA>  <NA> <NA>  <NA>  <NA>
    # 5       A 7/2/2018    12         <NA>  1.3  24.622  7.6     <2    <2  8.1  33.7 16.6    <2 69.67
    # 6       A 7/2/2018    18         <NA> 1.38  25.279  5.5     <2    <2  7.8 33.61 13.2    <2 72.96
    # 7       A 7/9/2018     1         <NA> 1.01   23.58  8.8     <2    <2  8.2 33.72 20.8    <2 77.82
    # 8       A 7/9/2018    12         <NA> 1.78  25.197  6.5     <2    <2    8 33.58 13.5    2e 67.19
    # 9       A 7/9/2018    18         <NA> 1.77   25.34  5.5     <2    2e  7.9 33.61 12.9    2e 75.53
    
    dcast(df[!duplicated(df), ], station + date + depth ~ parmcode, value.var = "value_qualif", fill = "ns")
    #   station     date depth AMMONIA AS N CDOM DENSITY  DO ENTERO FECAL  PH   SAL TEMP TOTAL   XMS
    # 1       A 7/1/2018     1           ns   ns      ns  ns     ns    ns  ns    ns   ns    ns    ns
    # 2       A 7/1/2018    12           ns   ns      ns  ns     ns    ns  ns    ns   ns    ns    ns
    # 3       A 7/1/2018    18           ns   ns      ns  ns     ns    ns  ns    ns   ns    ns    ns
    # 4       A 7/2/2018     1           ns   ns      ns  ns     ns    ns  ns    ns   ns    ns    ns
    # 5       A 7/2/2018    12           ns  1.3  24.622 7.6     <2    <2 8.1  33.7 16.6    <2 69.67
    # 6       A 7/2/2018    18           ns 1.38  25.279 5.5     <2    <2 7.8 33.61 13.2    <2 72.96
    # 7       A 7/9/2018     1           ns 1.01   23.58 8.8     <2    <2 8.2 33.72 20.8    <2 77.82
    # 8       A 7/9/2018    12           ns 1.78  25.197 6.5     <2    <2   8 33.58 13.5    2e 67.19
    # 9       A 7/9/2018    18           ns 1.77   25.34 5.5     <2    2e 7.9 33.61 12.9    2e 75.53
    

    Alternatively, you could run

    dcast(df, station + date + depth ~ parmcode, value.var = "value_qualif", 
          fill = NA_character_, fun.aggregate = head, n = 1)
    #   station     date depth AMMONIA AS N CDOM DENSITY   DO ENTERO FECAL   PH   SAL TEMP TOTAL   XMS
    # 1       A 7/1/2018     1         <NA> <NA>    <NA> <NA>   <NA>  <NA> <NA>  <NA> <NA>  <NA>  <NA>
    # 2       A 7/1/2018    12         <NA> <NA>    <NA> <NA>   <NA>  <NA> <NA>  <NA> <NA>  <NA>  <NA>
    # 3       A 7/1/2018    18         <NA> <NA>    <NA> <NA>   <NA>  <NA> <NA>  <NA> <NA>  <NA>  <NA>
    # 4       A 7/2/2018     1         <NA> <NA>    <NA> <NA>   <NA>  <NA> <NA>  <NA> <NA>  <NA>  <NA>
    # 5       A 7/2/2018    12         <NA>  1.3  24.622  7.6     <2    <2  8.1  33.7 16.6    <2 69.67
    # 6       A 7/2/2018    18         <NA> 1.38  25.279  5.5     <2    <2  7.8 33.61 13.2    <2 72.96
    # 7       A 7/9/2018     1         <NA> 1.01   23.58  8.8     <2    <2  8.2 33.72 20.8    <2 77.82
    # 8       A 7/9/2018    12         <NA> 1.78  25.197  6.5     <2    <2    8 33.58 13.5    2e 67.19
    # 9       A 7/9/2018    18         <NA> 1.77   25.34  5.5     <2    2e  7.9 33.61 12.9    2e 75.53
    

    See this answer regarding NA_character_.