Search code examples
rsubsetmedian

Getting Median of a Column where value of another Column is 1 in R


Ok so I have a csv file similar to this structure

hashID,value,flag

98fafd,   35,   1

fh56w2,   25,   0

ggjeas,   55,   1

adfh5d,   45,   0

Basically what I want to do is get the median of the value column but only include rows where flag==1 in the calculation.

Is this even possible in R? I've searched around and haven't found anything like this.


Solution

  • Here is one possibility:

    Read your data set using the following command:

    newdata <- read.csv("stackoverflow questions/mediancol.csv")
    # I assume you have the data in csv format
    
       # Showing the data I used for the computation
         newdata <- structure(list(hashID = structure(c(1L, 3L, 4L, 2L), .Label = c("98fafd", 
    "adfh5d", "fh56w2", "ggjeas"), class = "factor"), value = c(35L, 
    25L, 55L, 45L), flag = c(1L, 0L, 1L, 0L)), .Names = c("hashID", 
    "value", "flag"), class = "data.frame", row.names = c(NA, -4L
    ))
        > newdata
      hashID value flag
    1 98fafd    35    1
    2 fh56w2    25    0
    3 ggjeas    55    1
    4 adfh5d    45    0
    
    # Subset the data when flag =1
    newdata1 <- subset(newdata,flag==1)
    
    # Look at the summary of the data
    
    > summary(newdata1)
        hashID      value         flag  
     98fafd:1   Min.   :35   Min.   :1  
     adfh5d:0   1st Qu.:40   1st Qu.:1  
     fh56w2:0   Median :45   Median :1  
     ggjeas:1   Mean   :45   Mean   :1  
                3rd Qu.:50   3rd Qu.:1  
                Max.   :55   Max.   :1
    
    # Only look at the median 
    median(newdata1$value)
    [1] 45