Search code examples
rbinning

Combining multiple columns of values together in R


So I scripted a code that allows me to bin values in a single column. However, the problem I have is that my data is contained in multiple columns, all of varying length and with NAs in it. The script I have for binning goes like this:-

bin <- seq(min(data[, 1]), max(data[, 1]), by = 0.0005)
binnedData <- tapply(data[, 1], cut(data[, 1], breaks = bin), median)

I was wondering whether there was a way I could combine all the values from all columns into one massive long column so I can basically run this, or whether there's a way in which I can tweak this so that I can run it on my 1000 x 1000 matrix

This is taken from part of my data:-

102.23144 123.23242 102.23145
103.23144 123.23242 102.36563
103.83637        NA 102.36356
104.23225        NA 102.23423
105.87890        NA        NA

And the expected outcome with just combining column values:-

102.23144 
103.23144 
103.83637      
104.23225   
105.87890
123.23242
123.23242
102.23145
102.36563
102.36356
102.23423

And outcome with binning:-

(102.0000 - 102.0005) - Median of all values that fall into bin
(102.0005 - 102.0010) - Median of all values that fall into bin
(102.0015 - 102.0020) - Median of all values that fall into bin

Thanks


Solution

  • have you tried using melt function from reshape2 package.

    Here is some of your test data:

    test <- data.frame(V1=c(102.2314,103.2314,103.8364,104.2322,105.8789),
                       V2=c(123.2324,123.2324,NA,NA,NA),
                       V3=c(102.2314,102.3656,102.3636,102.2342,NA)
                      )
    
    > test
            V1       V2       V3
    1 102.2314 123.2324 102.2314
    2 103.2314 123.2324 102.3656
    3 103.8364       NA 102.3636
    4 104.2322       NA 102.2342
    5 105.8789       NA       NA
    

    and then use melt function -

    test_m <- melt(test)
    

    But there are NAs.

    > test_m
       variable    value
    1        V1 102.2314
    2        V1 103.2314
    3        V1 103.8364
    4        V1 104.2322
    5        V1 105.8789
    6        V2 123.2324
    7        V2 123.2324
    8        V2       NA
    9        V2       NA
    10       V2       NA
    11       V3 102.2314
    12       V3 102.3656
    13       V3 102.3636
    14       V3 102.2342
    15       V3       NA
    

    So, now same step with this filter -

    test_m<- melt(test)[which(!(is.na(melt(test)[,2]))),]
    
    > test_m
       variable    value
    1        V1 102.2314
    2        V1 103.2314
    3        V1 103.8364
    4        V1 104.2322
    5        V1 105.8789
    6        V2 123.2324
    7        V2 123.2324
    11       V3 102.2314
    12       V3 102.3656
    13       V3 102.3636
    14       V3 102.2342
    

    So, NAs removed. You can select only the 2nd col of the data remove the variable name col.