Search code examples
rmatrixvectorpattern-matchingsubstr

In R, how to add a column to a data frame based on the contents of the first column?


I have a data frame of just one column that looks like this:

>df

     Sample_Name
1    GW16F1_A-1
2    GW16F1_A-10
3    GW16F1_A-12
4    GW16F2_A-2
5    GW16F2_A-3
6    GW16F2_A-5
7    GW16V1_A-6
8    GW16V1_A-7
9    GW16V2_A-8
10   GW16V2_A-9

I want to append a second column to this data frame based on the contents of the Sample_Name column, so the output would look like this:

>df
     SampleName   SampleGroup
1    GW16F1_A-1   F1
2    GW16F1_A-10  F1
3    GW16F1_A-12  F1
4    GW16F2_A-2   F2
5    GW16F2_A-3   F2
6    GW16F2_A-5   F2
7    GW16V1_A-6   V1
8    GW16V1_A-7   V1
9    GW16V2_A-8   V2
10   GW16V2_A-9   V2

Is there a function that will read through the contents of a column and output a new vector based on it?


Solution

  • substr should be sufficient for this, given your sample input.

    Try:

    > transform(df, sampleGroup = substr(df$Sample_Name, 5, 6))
       Sample_Name sampleGroup
    1   GW16F1_A-1          F1
    2  GW16F1_A-10          F1
    3  GW16F1_A-12          F1
    4   GW16F2_A-2          F2
    5   GW16F2_A-3          F2
    6   GW16F2_A-5          F2
    7   GW16V1_A-6          V1
    8   GW16V1_A-7          V1
    9   GW16V2_A-8          V2
    10  GW16V2_A-9          V2