Search code examples
rsortingauto-generate

Generate Values According to Vector in Data Frame


I have the following data frame [data frame][1]

Now what I'd like to do is to generate a additional vector with significance codes according to the values in the vector p (p-values for each estimate). Is there a way R can generate a vector filled with stars (as significance) according to the info from an other vector? And furthermore: is there a way I can tell R, that it should reorganize the data frame according to a new order of observations (I would like to have the following order: vol_s, vol_s_avg, vol_s_med, vol_s_end, vol_l and so on)?

structure of df

structure(list(id = c("vol_avg_cer", "vol_avg_cer",     "vol_avg_cer","vol_avg_cer", "vol_cer", "vol_cer"), type = c("partial", "partial", 
"full", "full", "partial", "partial"), parm = c("vol_s_avg", 
"vol_l_avg", "vol_s_avg", "vol_l_avg", "vol_s", "vol_l"), estimate =     c(-0.00419972506246416, 
-0.0199988264598171, -0.0429143892387528, 0.0367191277063419, 
-0.0180348542378266, -0.0825424096818213), stderr = c(0.00729095969265321, 
0.00950796168366169, 0.0296902477909246, 0.052772355386909,     0.0280972492739437, 
0.0458807583546288), p = c(0.564602918461653, 0.0354328407781613, 
0.148344569863659, 0.486552631437604, 0.520955910904793, 0.0720085952786877
)), .Names = c("id", "type", "parm", "estimate", "stderr", "p"
), row.names = c(1L, 2L, 20L, 21L, 1825L, 1826L), class = "data.frame")

Solution

  • Building on @user2802241 's answer of using dplyr and symnum, to order the parm column you can define the the order of the column as a separate vector, then set the parm column as a factor using the vector as its levels, and arrange on that.

    e.g.

    library(dplyr)
    
    ## define a vector with the variables in the order you require
    factor_levels <- c("vol_s", "vol_s_avg", "vol_s_med","vol_s_end", "vol_l", "vol_l_avg", "vol_l_med", "vol_l_end")
    
    
    ## stay within dplyr - convert 'parm' to a factor and arrange on it
    df <- df %>%
      mutate(signif = symnum(p, 
                             cutpoints = c(0, 0.01, 0.05, 0.10, 0.5, 1), 
                             symbols = c("***", "**", "*", ".", " ")),
             parm = factor(parm, levels = factor_levels)) %>%
      arrange(parm)
    
    > df
               id    type      parm     estimate      stderr          p signif
    1     vol_cer partial     vol_s -0.018034854 0.028097249 0.52095591       
    2 vol_avg_cer partial vol_s_avg -0.004199725 0.007290960 0.56460292       
    3 vol_avg_cer    full vol_s_avg -0.042914389 0.029690248 0.14834457      .
    4     vol_cer partial     vol_l -0.082542410 0.045880758 0.07200860      *
    5 vol_avg_cer partial vol_l_avg -0.019998826 0.009507962 0.03543284     **
    6 vol_avg_cer    full vol_l_avg  0.036719128 0.052772355 0.48655263      .
    

    If you want to keep the parm column as a character you can convert it back

    df <- df %>%
      mutate(signif = symnum(p, 
                         cutpoints = c(0, 0.01, 0.05, 0.10, 0.5, 1), 
                         symbols = c("***", "**", "*", ".", " ")),
         parm = factor(parm, levels = factor_levels)) %>%
    arrange(parm) %>%
    mutate(parm = as.character(parm))