Search code examples
raveragemeandata-manipulation

Calculating the Medians and Means of Rows (in R)


I am using R programming language. Suppose I have the following data ("my_data"):

   student first_run second_run third_run fourth_run fifth_run sixth_run seventh_run eight_run ninth_run tenth_run
1   student1  19.70847   21.79771  16.49083   19.51691  13.97987  14.60733    13.89703  15.24651  20.75679  18.44020
2   student2  11.22369   15.36253  16.90215   20.20724  15.90227  15.14539    13.74945  18.30090  19.55124  17.24132
3   student3  15.93649   17.03599  14.20214   13.17548  14.70327  15.49697    13.08945  19.94142  22.41674  17.37958
4   student4  16.18733   15.13197  14.79481   16.75177  14.51287  17.71816    13.45054  14.25553  19.89091  18.88981
5   student5  18.71084   18.85453  17.15864   19.38880  15.68862  18.39169    15.26428  16.04526  18.92532  16.62409
6   student6  19.75246   12.74605  18.52214   17.92626  14.48501  17.20780    13.10512  12.46502  20.68583  15.87711
7   student7  14.75144   23.82376  18.51366   20.77424  14.22155  16.08186    12.95981  12.67820  20.12166  15.66006
8   student8  17.06516   15.63075  13.72026   15.02068  14.21098  15.99414    14.64818  16.15603  21.74607  17.07382
9   student9  20.27611   12.44592  12.26502   15.13456  14.61552  18.72192    15.11129  17.60746  18.83831  17.55257
10 student10  17.70736   16.21620  14.10861   17.20014  16.59376  19.50027    13.05073  15.80002  18.09781  18.34313

I want to add 2 columns to this data:

  • my_mean : the mean of each row
  • my_median: the median of each row

I tried the following code in R:

my_data$median = apply(my_data, 1, median, na.rm=T)

my_data$mean = apply(my_data, 1, mean, na.rm=T)

But I don't think this code is correct. For instance, when using this code, the median of the second row of data is returned as "16.90215"

But when I manually take the median of this row:

median(11.22369  , 15.36253 , 16.90215 ,  20.20724,  15.90227 , 15.14539   , 13.74945 , 18.30090 , 19.55124 , 17.24132)

I get an answer of

11.22

Can someone please show me what I am doing wrong?

Thanks


Solution

  • library(dplyr)
    
    df %>% 
      rowwise() %>% 
      mutate(median = median(c_across(where(is.numeric))),
             mean = mean(c_across(where(is.numeric))))
    

    c_across and rowwise were created for this type of situation. Most verbs work column-wise. To change this behavior pipe to rowwise first.

    c_across will then combine all values in a row that are numeric (hence where(is.numeric) into a numeric vector and then mean or median can be applied.

    Note: You will likely want to pipe the output to ungroup since rowwise creates a rowwise grouped data frame.