I have a large dataset and have been asked to run summary statistics on it. Since by default, R reads columns with 0s and 1s as integers, I would like to convert them to logical before running the summary function. My dataset has a large number of columns, and many of them are the 0/1 variety. I couldn't find an easy way to convert them all at once, so I used a for loop:
for (n in 1:ncol(houses)) {
if (all(unique(houses[, n]) == c(0, 1)) | all(unique(houses[, n]) == c(0, 1, NA)))
houses[, n] %<>% as.logical
}
However, not only do the columns with a unique vector of (0, 1, NA) not convert, I get the following warning message repeatedly:
longer object length is not a multiple of shorter object length
This leads me to believe that there must be a better way to do this - I'm just not sure what it is.
Here are the first 10 rows of the df:
sp transbkc sameagt sqft newhouse age bedrooms baths acre firepl garagesz septic whrlpool occ poorcond
1 366000 NA 0 3400 1 1 4 4.00 1.28 0 3 0 0 0 0
2 99000 NA 0 1900 0 9 3 2.50 1.29 0 2 0 0 0 0
3 240000 NA 0 4000 0 4 4 3.25 1.40 0 2 0 0 1 0
4 112900 NA 0 1830 1 1 4 2.50 1.22 0 2 0 0 0 0
5 45000 NA 0 1250 NA NA 3 1.00 1.43 0 1 0 0 0 0
6 43000 NA 0 1195 NA NA 2 1.00 1.21 0 0 0 0 0 0
7 127000 NA 0 2350 1 1 4 2.00 1.50 0 2 0 0 0 0
8 191150 NA 0 2602 1 1 3 2.00 1.63 0 3 0 0 0 0
9 104000 NA 0 2000 NA NA 4 2.00 1.30 0 2 0 0 0 0
10 55000 NA 0 1600 NA NA 4 1.75 1.24 0 1 0 0 1 0
someren suprcond mednage elevationmeter medincrl pctcollg timetrend distress bedpcttotal lncdom fld1 fld2
1 0 0 38.3 238.60 1.3916808 54.2 168 0 0.5000000 0.000000 0 0
2 0 1 40.8 178.30 1.1682903 35.7 173 0 0.4285714 5.945421 0 0
3 0 1 47.8 223.69 1.4259059 63.8 48 0 0.4000000 4.795791 0 0
4 0 0 35.3 225.56 1.4160610 54.1 237 0 0.5714286 5.793014 0 0
5 1 0 34.4 212.84 0.4369706 20.2 161 0 0.5000000 5.525453 0 0
6 0 1 30.2 184.95 0.3092624 3.9 251 0 0.5000000 5.730100 1 0
7 0 0 39.9 239.09 1.0349982 29.7 83 0 0.5714286 5.303305 0 0
8 0 0 35.8 264.44 1.8390805 57.3 67 0 0.3750000 6.251904 0 0
9 1 0 40.8 178.52 1.1682903 35.7 14 0 0.4444444 5.493062 0 0
10 0 0 40.0 235.25 0.7435699 28.4 74 0 0.5000000 4.941642 0 0
ovrpr
1 0.34544754
2 0.01377296
3 -0.13879776
4 0.09606838
5 NA
6 NA
7 0.11856365
8 0.21277618
9 NA
10 NA
You can try using lapply
:
houses[] <- lapply(houses, function(x)
if(all(x %in% c(0, 1, NA))) as.logical(x) else x)
This would convert columns to logical which has value either 0, 1 or NA
in the column.
In dplyr
, we can use mutate_if
.
library(dplyr)
houses %>% mutate_if(~all(. %in% c(0, 1, NA)), as.logical)