Using R.Studio I have a table of raw data from a DNA size distribution plot for hundreds of samples. The RFU (y values) are arranged in columns for each sample with the same size (x values) in a separate column - see below.
Size distribution graph example for visualisation
Example data: (made up values just to show the format of the table)
sample001_rfu | sample002_rfu | sample003_rfu | size_bp |
---|---|---|---|
5678 | 4567 | 3456 | 1000 |
8901 | 7890 | 6789 | 5000 |
10234 | 10123 | 10010 | 10000 |
12356 | 12345 | 11234 | 15000 |
15678 | 14567 | 13445 | 20000 |
13890 | 16589 | 15624 | 25000 |
10987 | 13425 | 17245 | 30000 |
8902 | 11323 | 15428 | 35000 |
6513 | 8919 | 12879 | 40000 |
4178 | 6528 | 10256 | 45000 |
3213 | 4380 | 8621 | 50000 |
I am trying to find the maximum y value (RFU) for all samples (i.e. max value in each column) and report the corresponding x value (size) which will be used for downstream automated sample processing planning.
So, in the table above:
I have used the following to do this for one sample:
df$size_bp[which.max(df$sample001_rfu)]
However, I cannot seem to find a solution to repeat this for each sample_rfu (column) in the table without manually replacing the sample id in the code above. I would then like to store these values and their sample IDs (column header) as a list which will later be compared against different processing thresholds.
Any suggestions would be greatly appreciated!
Here is another tidyverse approach:
library(dplyr)
library(tidyr)
df %>%
pivot_longer(-size_bp) %>%
group_by(name) %>%
slice_max(n=1, value)
size_bp name value
<int> <chr> <int>
1 20000 sample001_rfu 15678
2 25000 sample002_rfu 16589
3 30000 sample003_rfu 17245