I have a data set which includes vectors which are factors
> str(gdp)
'data.frame': 64 obs. of 31 variables:
$ 1 : Factor w/ 62 levels "","1,145.31",..: 1 1 1 53 16 20 22 24 30 32 ...
$ 2 : Factor w/ 64 levels "1,121.93","1,264.63",..: 42 59 10 13 18 16 17 23 25 35 ...
$ 3 : Factor w/ 62 levels "","1,072.07",..: 1 1 1 35 36 39 41 42 45 51 ...
$ 4 : Factor w/ 62 levels "","1,076.03",..: 1 1 1 15 16 21 23 26 27 36 ...
$ 5 : Factor w/ 62 levels "","1,023.09",..: 1 1 1 11 15 19 17 23 21 27 ...
$ 6 : Factor w/ 62 levels "","1,003.81",..: 1 1 1 40 45 46 47 52 56 7 ...
$ 7 : Factor w/ 62 levels "","1,137.23",..: 1 1 1 13 15 19 21 23 24 28 ...
$ 8 : Factor w/ 62 levels "","1,198.30",..: 1 1 1 26 31 34 35 39 40 47 ...
$ 9 : Factor w/ 64 levels "1,114.32","1,519.23",..: 27 30 36 41 49 51 50 54 56 64 ...
$ 10: Factor w/ 62 levels "","1,208.85",..: 1 1 1 35 39 40 42 45 46 53 ...
$ 11: Factor w/ 64 levels "","1,089.33",..: 1 11 17 20 23 24 26 29 31 37 ...
$ 12: Factor w/ 62 levels "","1,037.14",..: 1 1 1 22 23 25 31 30 36 41 ...
$ 13: Factor w/ 63 levels "","1,114.20",..: 1 63 1 8 11 12 14 20 22 27 ...
$ 14: Factor w/ 64 levels "1,169.73","1,409.74",..: 63 12 14 16 17 22 24 25 28 30 ...
$ 15: Factor w/ 62 levels "","1,117.66",..: 1 1 1 33 35 39 40 44 43 53 ...
$ 16: Factor w/ 63 levels "","1,045.73",..: 21 1 1 30 35 38 41 42 47 50 ...
$ 17: Factor w/ 62 levels "","1,088.39",..: 1 1 1 24 32 26 34 38 40 48 ...
$ 18: Factor w/ 62 levels "","1,244.71",..: 1 1 1 24 30 31 33 34 38 44 ...
$ 19: Factor w/ 62 levels "","1,155.37",..: 1 1 1 25 34 37 38 41 44 48 ...
$ 20: Factor w/ 64 levels "","1,198.29",..: 1 63 8 11 15 17 18 20 26 30 ...
$ 21: Factor w/ 36 levels "","1,065.67",..: 1 1 1 1 1 1 1 1 1 1 ...
$ 22: Factor w/ 64 levels "1,123.06","1,315.12",..: 12 14 15 17 22 23 24 26 27 40 ...
$ 23: Factor w/ 62 levels "","1,016.31",..: 1 1 1 22 25 31 33 38 43 49 ...
$ 24: Factor w/ 64 levels "1,029.92","1,133.27",..: 52 53 57 60 6 8 9 12 13 22 ...
$ 25: Factor w/ 64 levels "1,222.15","1,517.69",..: 60 62 7 8 12 14 15 21 22 25 ...
$ 26: num NA NA 1.29 1.32 1.36 1.39 1.43 1.62 1.56 1.72 ...
$ 27: Factor w/ 62 levels "","1,036.85",..: 1 1 1 12 16 21 22 27 25 33 ...
$ 28: Factor w/ 61 levels "","1,052.88",..: 1 1 1 12 13 17 18 24 23 26 ...
$ 29: Factor w/ 64 levels "1,018.62","1,081.27",..: 6 7 8 9 10 26 27 34 35 43 ...
$ 30: Factor w/ 62 levels "","1,203.92",..: 1 1 1 6 5 21 22 23 24 32 ...
$ 31: Factor w/ 62 levels "","1,039.85",..: 1 1 1 57 59 9 11 13 14 16 ...
I'm trying to preserve all information (decimal points) and turn all of the vectors into numeric. So far I've tried turning these vectors into characters and then to numeric, which was suggested in SO but I get
> gdp<-data.frame(lapply(gdp,as.character))
> gdp<-data.frame(lapply(gdp,as.numeric))
> str(gdp)
'data.frame': 64 obs. of 31 variables:
$ X1 : num 1 1 1 53 16 20 22 24 30 32 ...
$ X2 : num 42 59 10 13 18 16 17 23 25 35 ...
$ X3 : num 1 1 1 35 36 39 41 42 45 51 ...
$ X4 : num 1 1 1 15 16 21 23 26 27 36 ...
$ X5 : num 1 1 1 11 15 19 17 23 21 27 ...
$ X6 : num 1 1 1 40 45 46 47 52 56 7 ...
$ X7 : num 1 1 1 13 15 19 21 23 24 28 ...
$ X8 : num 1 1 1 26 31 34 35 39 40 47 ...
$ X9 : num 27 30 36 41 49 51 50 54 56 64 ...
$ X10: num 1 1 1 35 39 40 42 45 46 53 ...
$ X11: num 1 11 17 20 23 24 26 29 31 37 ...
$ X12: num 1 1 1 22 23 25 31 30 36 41 ...
$ X13: num 1 63 1 8 11 12 14 20 22 27 ...
$ X14: num 63 12 14 16 17 22 24 25 28 30 ...
$ X15: num 1 1 1 33 35 39 40 44 43 53 ...
$ X16: num 21 1 1 30 35 38 41 42 47 50 ...
$ X17: num 1 1 1 24 32 26 34 38 40 48 ...
$ X18: num 1 1 1 24 30 31 33 34 38 44 ...
$ X19: num 1 1 1 25 34 37 38 41 44 48 ...
$ X20: num 1 63 8 11 15 17 18 20 26 30 ...
$ X21: num 1 1 1 1 1 1 1 1 1 1 ...
$ X22: num 12 14 15 17 22 23 24 26 27 40 ...
$ X23: num 1 1 1 22 25 31 33 38 43 49 ...
$ X24: num 52 53 57 60 6 8 9 12 13 22 ...
$ X25: num 60 62 7 8 12 14 15 21 22 25 ...
$ X26: num NA NA 1 2 3 4 5 7 6 8 ...
$ X27: num 1 1 1 12 16 21 22 27 25 33 ...
$ X28: num 1 1 1 12 13 17 18 24 23 26 ...
$ X29: num 6 7 8 9 10 26 27 34 35 43 ...
$ X30: num 1 1 1 6 5 21 22 23 24 32 ...
$ X31: num 1 1 1 57 59 9 11 13 14 16 ...
which does not preserve all the decimal points, and does not fill in the blank as NAs. I've also tried
> gdp<-as.numeric(levels(gdp))[gdp]
Error in as.numeric(levels(gdp))[gdp] : invalid subscript type 'list'
Will there be a way to turn the vectors into numeric?
Let's break this down.
First, because gdp
is a data frame, levels
will return NULL
. You may be looking for the output of levels
on each column of gdp
. In which case you'd want to use something like lapply
.
levels(gdp)
# NULL
lapply(gdp, levels)
# this output will make sense
as.numeric(levels(gdp))[gdp]
# this will make no sense
The error is stating that you cannot use a list (gdp
) to subscript a vector.
To iterate through the columns of gdp
, you will need something like lapply
to work on each component.
gdp <- data.frame(lapply(gdp, function(x) {
if(!is.factor(x)) x
else as.numeric(gsub(",","",levels(x),fixed=TRUE))[x]
}))
Possibly your data set would be better served as a matrix since it appears to be all of type numeric. In which case:
gdp <- as.matrix(gdp)