I had an .rda file with a large list, that looked like this:
[[1]] Null
[[2]] Null
...
[[1000]] (Some data)
...
The first K
empty rows (999 in the example) were created because of bug in the code, so I decided to delete all the 1:K
rows. After saving the file it has grown large in size: before it was <1 GB and after it was >16GB. How could that be? How to fix it?
I can imagine that the problem is that before editing the list it had values from 1 to N
, and after the editing it contains only values from K+1
to N
, but is it so different? If this is the problem, how to clear the indexing?
The file might need a different compression type after removing the NULLs. It was probably uncompressed and then recompressed under the same compression scheme although it should have been different since the list got many times smaller.
From ?save
... a saved file can be uncompressed and re-compressed under a different compression scheme (and see resaveRdaFiles for a way to do so from within R).
So when I run resaveRdaFiles
on the z2
object in Ben Bolker's answer, it gets a good chunk smaller
file.info("tmp2.rda")[,1]
# [1] 2666373
tools::resaveRdaFiles("tmp2.rda")
file.info("tmp2.rda")[,1]
# [1] 2210736