In R, how to add a column to a data frame based on the contents of the first column?

I have a data frame of just one column that looks like this:

>df

     Sample_Name
1    GW16F1_A-1
2    GW16F1_A-10
3    GW16F1_A-12
4    GW16F2_A-2
5    GW16F2_A-3
6    GW16F2_A-5
7    GW16V1_A-6
8    GW16V1_A-7
9    GW16V2_A-8
10   GW16V2_A-9

I want to append a second column to this data frame based on the contents of the Sample_Name column, so the output would look like this:

>df
     SampleName   SampleGroup
1    GW16F1_A-1   F1
2    GW16F1_A-10  F1
3    GW16F1_A-12  F1
4    GW16F2_A-2   F2
5    GW16F2_A-3   F2
6    GW16F2_A-5   F2
7    GW16V1_A-6   V1
8    GW16V1_A-7   V1
9    GW16V2_A-8   V2
10   GW16V2_A-9   V2

Is there a function that will read through the contents of a column and output a new vector based on it?

Solution

substr should be sufficient for this, given your sample input.

Try:

> transform(df, sampleGroup = substr(df$Sample_Name, 5, 6))
   Sample_Name sampleGroup
1   GW16F1_A-1          F1
2  GW16F1_A-10          F1
3  GW16F1_A-12          F1
4   GW16F2_A-2          F2
5   GW16F2_A-3          F2
6   GW16F2_A-5          F2
7   GW16V1_A-6          V1
8   GW16V1_A-7          V1
9   GW16V2_A-8          V2
10  GW16V2_A-9          V2