Dropping an element form a list via conventional means (for example ll["name"] <- NULL
), causes the entire list to be copied over. Normally, this is not noticable, until of course the data sets become large.
I have a list with a dozen elements each between 0.25 ~ 2 GB in size. Dropping three elements from this list takes about ten minutes to execute (on a relatively fast machine.)
Is there a way to drop elements from a list in-place?
I have tried the following:
TEST <- list(A=1:20, B=1:5)
TEST[["B"]] <- NULL
TEST["B"] <- NULL
TEST <- TEST[c(TRUE, FALSE)]
data.table::set(TEST, "B", value=NULL) # ERROR
Output with memory info:
cat("\n\n\nATTEMPT 1\n")
TEST <- list(A=1:20, B=1:5)
.Internal(inspect(TEST))
TEST[["B"]] <- NULL
.Internal(inspect(TEST))
cat("\n\n\nATTEMPT 2\n")
TEST <- list(A=1:20, B=1:5)
.Internal(inspect(TEST))
TEST["B"] <- NULL
.Internal(inspect(TEST))
cat("\n\n\nATTEMPT 3\n")
TEST <- list(A=1:20, B=1:5)
.Internal(inspect(TEST))
TEST <- TEST[c(TRUE, FALSE)]
I don't know how you could make a vector shorter without copying it. The next best thing would be to set the element to missing NA
or NULL
.
According to ?Extract
, you have to specify TEST[i] <- list(NULL)
to set an element to NULL
. And my tests indicate that i
must be an integer or logical vector.
> TEST <- list(A=1:20, B=1:5); .Internal(inspect(TEST))
@27d2c60 19 VECSXP g0c2 [NAM(1),ATT] (len=2, tl=0)
@27dd9e0 13 INTSXP g0c6 [] (len=20, tl=0) 1,2,3,4,5,...
@2805c98 13 INTSXP g0c3 [] (len=5, tl=0) 1,2,3,4,5
ATTRIB:
@1f38be8 02 LISTSXP g0c0 []
TAG: @d3f478 01 SYMSXP g1c0 [MARK,LCK,gp=0x4000] "names" (has value)
@2807430 16 STRSXP g0c2 [] (len=2, tl=0)
@dc2628 09 CHARSXP g1c1 [MARK,gp=0x61] [ASCII] [cached] "A"
@dc25f8 09 CHARSXP g1c1 [MARK,gp=0x61] [ASCII] [cached] "B"
> TEST[2] <- list(NULL); .Internal(inspect(TEST)); TEST
@27d2c60 19 VECSXP g0c2 [MARK,NAM(1),ATT] (len=2, tl=0)
@27dd9e0 13 INTSXP g0c6 [MARK] (len=20, tl=0) 1,2,3,4,5,...
@d3fb78 00 NILSXP g1c0 [MARK,NAM(2)]
ATTRIB:
@1f38be8 02 LISTSXP g0c0 [MARK]
TAG: @d3f478 01 SYMSXP g1c0 [MARK,LCK,gp=0x4000] "names" (has value)
@2807430 16 STRSXP g0c2 [MARK] (len=2, tl=0)
@dc2628 09 CHARSXP g1c1 [MARK,gp=0x61] [ASCII] [cached] "A"
@dc25f8 09 CHARSXP g1c1 [MARK,gp=0x61] [ASCII] [cached] "B"
$A
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
$B
NULL