Search code examples
rdplyrpipelinecbind

How do I move the bottom half of a column's values into a newly created column?


I have a column that contains means of three different measurements in the first 50% of rows, and the associated standard errors in the last 50% of rows. In the previous column are the names used for each of those (meanNativeSR, meanExoticSR, meanTotalSR, seN, seE, seT). I wanted to create 2 new columns that contain the se_ names in the first column, and their values in the second column, then get rid of that bottom 50% of rows. Here is my dataset:

df <- structure(list(Invasion = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 
3L, 3L, 3L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 1L, 1L, 1L, 2L, 
2L, 2L, 3L, 3L, 3L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 1L, 1L, 
1L, 2L, 2L, 2L, 3L, 3L, 3L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L
), .Label = c("Low", "Medium", "High"), class = "factor"), Growth = structure(c(1L, 
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 
3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 
1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 
2L, 3L, 1L, 2L, 3L), .Label = c("cover", "herb", "woody"), class = "factor"), 
    mean_se = c("meanNativeSR", "meanNativeSR", "meanNativeSR", 
    "meanNativeSR", "meanNativeSR", "meanNativeSR", "meanNativeSR", 
    "meanNativeSR", "meanNativeSR", "meanExoticSR", "meanExoticSR", 
    "meanExoticSR", "meanExoticSR", "meanExoticSR", "meanExoticSR", 
    "meanExoticSR", "meanExoticSR", "meanExoticSR", "meanTotalSR", 
    "meanTotalSR", "meanTotalSR", "meanTotalSR", "meanTotalSR", 
    "meanTotalSR", "meanTotalSR", "meanTotalSR", "meanTotalSR", 
    "seN", "seN", "seN", "seN", "seN", "seN", "seN", "seN", "seN", 
    "seE", "seE", "seE", "seE", "seE", "seE", "seE", "seE", "seE", 
    "seT", "seT", "seT", "seT", "seT", "seT", "seT", "seT", "seT"
    ), value = c(0.769230769230769, 0.230769230769231, 0.923076923076923, 
    2.46153846153846, 6.84615384615385, 0.538461538461538, 1.69230769230769, 
    1.76923076923077, 1.15384615384615, 0.384615384615385, 0, 
    1.38461538461538, 1.76923076923077, 0, 2.23076923076923, 
    2.07692307692308, 0.769230769230769, 2.46153846153846, 1.15384615384615, 
    0.230769230769231, 2.53846153846154, 4.23076923076923, 6.84615384615385, 
    3.23076923076923, 3.76923076923077, 2.76923076923077, 3.84615384615385, 
    0.280883362823162, 0.12162606385263, 0.329364937914491, 0.312463015562922, 
    0.705710715103738, 0.24325212770526, 0.36487819155789, 0.280883362823162, 
    0.191021338791684, 0.140441681411581, 0, 0.180400606147055, 
    0.201081886427668, 0, 0.230769230769231, 0.329364937914491, 
    0.12162606385263, 0.24325212770526, 0.273771237231572, 0.12162606385263, 
    0.24325212770526, 0.394738572265145, 0.705710715103738, 0.440772139427464, 
    0.532938710021193, 0.257050482766198, 0.336767321450351)), row.names = c(NA, 
-54L), class = c("tbl_df", "tbl", "data.frame"))

I was able to figure out what I wanted to do with the code below, but surely there must be a more elegant way as this way required me to create unnecessary intermediates.

#create an intermediate data.frame that contains just the means and their values from the first half of original df
df.mean <- head(df, -27)
#rename columns 3 and 4
colnames(df.mean)[3] <- "mean"
colnames(df.mean)[4] <- "mean_value"


#create another intermediate data.frame with standard error values from the bottom half of original df
df.se <- df[28:54,]
#rename columns 3 and 4
colnames(df.se)[3] <- "se"
colnames(df.se)[4] <- "se_value"


#cbind those together to get desired result
df.final <- cbind(df.mean, df.se[,3:4])

#remove intermediates
rm(df.mean); rm(df.se)

Is there a simpler way to accomplish this, perhaps using pipes or some functions in the tidyverse, or even with base R?


Solution

  • I think that other than pulling things together there is no shorter and easier way to accomplish your goals. The longest part of your code is assigning the new colnames, which can't really be shortened. The rest can be put into a single line. But really, you have to always balance terseness and readability.

    And the the dplyr methods shown above are really neat, but I believe they are meant to deal with more complex/general cases than yours.

    df_final_2 <- cbind(head(df, -27), df[28:54,3:4])
    colnames(df_final_2)[3:6] <- c("mean", "mean_value","se", "se_value")