In attempting to reshape data using resphape2::dcast
, I am encountering an error involving NA
entries. Sample data are at the end.
The data are reshaped from long to wide, but on occasion some parameters have all NA
entries, which appears to be causing the issue. Or at least, I think it is. If I remove any parameter like that, Ammonia
in this example, the error goes away.
In debugging dcast
, it seems to pin down to this line:
ordered <- vaggregate(.value = value, .group = overall,
.fun = fun.aggregate, ..., .default = fill, .n = n)
which results in the error:
Error in vapply(indices, fun, .default) :
values must be type 'character',
but FUN(X[[1]]) result is type 'integer'
Seeing that the NA
variable is first in line, I thought the aggregate
function may default to integer, even though the entire column is character
, but moving those rows did not solve it. The only way I can find to solve it is by using na.omit
, which removes that parameter completely. My expected output would retain any parameters with all NA
if possible. The second reason for this is if a day/depth is not sampled, it should be retained and those entries should be ns
(not sampled). Is there a way I can solve this error without having to remove all NA
parameters that will be reshaped?
Reproducible example (data are below dcast
code):
library(reshape2)
dcast(df, station + date + depth ~ parmcode, value.var = "value_qualif", fill="ns")
dcast(na.omit(df), station + date + depth ~ parmcode,
value.var = "value_qualif", fill="ns") # solves error, but removes parameter completely
Example data:
df <- structure(list(station = c("A", "A", "A", "A", "A", "A", "A",
"A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A",
"A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A",
"A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A",
"A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A",
"A", "A", "A", "A", "A", "A", "A", "A", "A", "A"), date = c("7/2/2018",
"7/2/2018", "7/2/2018", "7/2/2018", "7/2/2018", "7/2/2018", "7/2/2018",
"7/2/2018", "7/2/2018", "7/2/2018", "7/2/2018", "7/2/2018", "7/2/2018",
"7/2/2018", "7/2/2018", "7/2/2018", "7/2/2018", "7/2/2018", "7/2/2018",
"7/2/2018", "7/2/2018", "7/2/2018", "7/2/2018", "7/2/2018", "7/2/2018",
"7/2/2018", "7/2/2018", "7/2/2018", "7/2/2018", "7/2/2018", "7/1/2018",
"7/1/2018", "7/1/2018", "7/1/2018", "7/1/2018", "7/1/2018", "7/1/2018",
"7/1/2018", "7/1/2018", "7/9/2018", "7/9/2018", "7/9/2018", "7/9/2018",
"7/9/2018", "7/9/2018", "7/9/2018", "7/9/2018", "7/9/2018", "7/9/2018",
"7/9/2018", "7/9/2018", "7/9/2018", "7/9/2018", "7/9/2018", "7/9/2018",
"7/9/2018", "7/9/2018", "7/9/2018", "7/9/2018", "7/9/2018", "7/9/2018",
"7/9/2018", "7/9/2018", "7/9/2018", "7/9/2018", "7/9/2018", "7/9/2018",
"7/9/2018", "7/9/2018"), depth = c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L,
18L, 18L, 18L, 18L, 18L, 18L, 18L, 18L, 18L, 18L, 1L, 1L, 1L,
12L, 12L, 12L, 18L, 18L, 18L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 18L,
18L, 18L, 18L, 18L, 18L, 18L, 18L, 18L, 18L), parmcode = c("CDOM",
"DENSITY", "DO", "ENTERO", "PH", "TOTAL", "XMS", "TEMP", "SAL",
"FECAL", "TOTAL", "FECAL", "ENTERO", "CDOM", "XMS", "TEMP", "SAL",
"PH", "DO", "DENSITY", "DO", "DENSITY", "TOTAL", "FECAL", "PH",
"CDOM", "XMS", "TEMP", "SAL", "ENTERO", "AMMONIA AS N", "AMMONIA AS N",
"AMMONIA AS N", "AMMONIA AS N", "AMMONIA AS N", "AMMONIA AS N",
"AMMONIA AS N", "AMMONIA AS N", "AMMONIA AS N", "TOTAL", "XMS",
"TEMP", "SAL", "PH", "DO", "DENSITY", "CDOM", "FECAL", "ENTERO",
"CDOM", "FECAL", "ENTERO", "PH", "DO", "TEMP", "XMS", "TOTAL",
"DENSITY", "SAL", "TOTAL", "FECAL", "ENTERO", "XMS", "TEMP",
"SAL", "PH", "DO", "DENSITY", "CDOM"), value_qualif = c(NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, "<2", "<2", "<2", "1.3", "69.67",
"16.6", "33.7", "8.1", "7.6", "24.622", "5.5", "25.279", "<2",
"<2", "7.8", "1.38", "72.96", "13.2", "33.61", "<2", NA, NA,
NA, NA, NA, NA, NA, NA, NA, "<2", "77.82", "20.8", "33.72", "8.2",
"8.8", "23.58", "1.01", "<2", "<2", "1.78", "<2", "<2", "8",
"6.5", "13.5", "67.19", "2e", "25.197", "33.58", "2e", "2e",
"<2", "75.53", "12.9", "33.61", "7.9", "5.5", "25.34", "1.77"
)), class = "data.frame", row.names = c(NA, -69L))
Some tangentially related questions that don't answer my question are POSIXct values become numeric in reshape2 dcast and Error with custom aggregate function for a cast() call in R reshape2
With using na.omit
my output is:
station date depth CDOM DENSITY DO ENTERO FECAL PH SAL TEMP TOTAL XMS
1 A 7/2/2018 12 1.3 24.622 7.6 <2 <2 8.1 33.7 16.6 <2 69.67
2 A 7/2/2018 18 1.38 25.279 5.5 <2 <2 7.8 33.61 13.2 <2 72.96
3 A 7/9/2018 1 1.01 23.58 8.8 <2 <2 8.2 33.72 20.8 <2 77.82
4 A 7/9/2018 12 1.78 25.197 6.5 <2 <2 8 33.58 13.5 2e 67.19
5 A 7/9/2018 18 1.77 25.34 5.5 <2 2e 7.9 33.61 12.9 2e 75.53
Expected output without using na.omit
is:
station date depth AMMONIA CDOM DENSITY DO ENTERO FECAL PH SAL TEMP TOTAL XMS
1 A 7/2/2018 1 ns ns ns ns ns ns ns ns ns ns ns
2 A 7/2/2018 12 ns 1.3 24.622 7.6 <2 <2 8.1 33.7 16.6 <2 69.67
3 A 7/2/2018 18 ns 1.38 25.279 5.5 <2 <2 7.8 33.61 13.2 <2 72.96
4 A 7/9/2018 1 ns 1.01 23.58 8.8 <2 <2 8.2 33.72 20.8 <2 77.82
5 A 7/9/2018 12 ns 1.78 25.197 6.5 <2 <2 8 33.58 13.5 2e 67.19
6 A 7/9/2018 18 ns 1.77 25.34 5.5 <2 2e 7.9 33.61 12.9 2e 75.53
The actual issue is that all the parameters have only one value for each triple of (station, date, depth) except for AMMONIA AS N
, which has three NA
entries.
For instance,
dcast(df, station + date + depth ~ parmcode, value.var = "value_qualif")
# Aggregation function missing: defaulting to length
# station date depth AMMONIA AS N CDOM DENSITY DO ENTERO FECAL PH SAL TEMP TOTAL XMS
# 1 A 7/1/2018 1 3 0 0 0 0 0 0 0 0 0 0
# 2 A 7/1/2018 12 3 0 0 0 0 0 0 0 0 0 0
# 3 A 7/1/2018 18 3 0 0 0 0 0 0 0 0 0 0
# 4 A 7/2/2018 1 0 1 1 1 1 1 1 1 1 1 1
# 5 A 7/2/2018 12 0 1 1 1 1 1 1 1 1 1 1
# 6 A 7/2/2018 18 0 1 1 1 1 1 1 1 1 1 1
# 7 A 7/9/2018 1 0 1 1 1 1 1 1 1 1 1 1
# 8 A 7/9/2018 12 0 1 1 1 1 1 1 1 1 1 1
# 9 A 7/9/2018 18 0 1 1 1 1 1 1 1 1 1 1
Once we remove the duplicate rows everything works smoothly
dcast(df[!duplicated(df), ], station + date + depth ~ parmcode, value.var = "value_qualif")
# station date depth AMMONIA AS N CDOM DENSITY DO ENTERO FECAL PH SAL TEMP TOTAL XMS
# 1 A 7/1/2018 1 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
# 2 A 7/1/2018 12 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
# 3 A 7/1/2018 18 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
# 4 A 7/2/2018 1 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
# 5 A 7/2/2018 12 <NA> 1.3 24.622 7.6 <2 <2 8.1 33.7 16.6 <2 69.67
# 6 A 7/2/2018 18 <NA> 1.38 25.279 5.5 <2 <2 7.8 33.61 13.2 <2 72.96
# 7 A 7/9/2018 1 <NA> 1.01 23.58 8.8 <2 <2 8.2 33.72 20.8 <2 77.82
# 8 A 7/9/2018 12 <NA> 1.78 25.197 6.5 <2 <2 8 33.58 13.5 2e 67.19
# 9 A 7/9/2018 18 <NA> 1.77 25.34 5.5 <2 2e 7.9 33.61 12.9 2e 75.53
dcast(df[!duplicated(df), ], station + date + depth ~ parmcode, value.var = "value_qualif", fill = "ns")
# station date depth AMMONIA AS N CDOM DENSITY DO ENTERO FECAL PH SAL TEMP TOTAL XMS
# 1 A 7/1/2018 1 ns ns ns ns ns ns ns ns ns ns ns
# 2 A 7/1/2018 12 ns ns ns ns ns ns ns ns ns ns ns
# 3 A 7/1/2018 18 ns ns ns ns ns ns ns ns ns ns ns
# 4 A 7/2/2018 1 ns ns ns ns ns ns ns ns ns ns ns
# 5 A 7/2/2018 12 ns 1.3 24.622 7.6 <2 <2 8.1 33.7 16.6 <2 69.67
# 6 A 7/2/2018 18 ns 1.38 25.279 5.5 <2 <2 7.8 33.61 13.2 <2 72.96
# 7 A 7/9/2018 1 ns 1.01 23.58 8.8 <2 <2 8.2 33.72 20.8 <2 77.82
# 8 A 7/9/2018 12 ns 1.78 25.197 6.5 <2 <2 8 33.58 13.5 2e 67.19
# 9 A 7/9/2018 18 ns 1.77 25.34 5.5 <2 2e 7.9 33.61 12.9 2e 75.53
Alternatively, you could run
dcast(df, station + date + depth ~ parmcode, value.var = "value_qualif",
fill = NA_character_, fun.aggregate = head, n = 1)
# station date depth AMMONIA AS N CDOM DENSITY DO ENTERO FECAL PH SAL TEMP TOTAL XMS
# 1 A 7/1/2018 1 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
# 2 A 7/1/2018 12 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
# 3 A 7/1/2018 18 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
# 4 A 7/2/2018 1 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
# 5 A 7/2/2018 12 <NA> 1.3 24.622 7.6 <2 <2 8.1 33.7 16.6 <2 69.67
# 6 A 7/2/2018 18 <NA> 1.38 25.279 5.5 <2 <2 7.8 33.61 13.2 <2 72.96
# 7 A 7/9/2018 1 <NA> 1.01 23.58 8.8 <2 <2 8.2 33.72 20.8 <2 77.82
# 8 A 7/9/2018 12 <NA> 1.78 25.197 6.5 <2 <2 8 33.58 13.5 2e 67.19
# 9 A 7/9/2018 18 <NA> 1.77 25.34 5.5 <2 2e 7.9 33.61 12.9 2e 75.53
See this answer regarding NA_character_
.