Search code examples
rbioinformaticsseurat

Convert character string and data frame into another data frame


I have a one row data frame that looks like this:

            Donor  Treatment Timepoint
  MK434_016   WT5 ST002_50uM       6hr

And a character string which looks like this:

[1] "AAACAAGCAAACAAGAATTCGGTT-1" "AAACAAGCAAACAATCATTCGGTT-1" "AAACAAGCAAACCTGAATTCGGTT-1" "AAACAAGCAAACTTGGATTCGGTT-1"
[5] "AAACAAGCAAAGACCCATTCGGTT-1" "AAACAAGCAAAGGTAAATTCGGTT-1"

I'd like to merge the two to create a data frame that looks like this:

                           Donor  Treatment Timepoint
AAACAAGCAAACAAGAATTCGGTT-1   WT5 ST002_50uM       6hr
AAACAAGCAAACAATCATTCGGTT-1   WT5 ST002_50uM       6hr
AAACAAGCAAACCTGAATTCGGTT-1   WT5 ST002_50uM       6hr
etc...

I've tried merging them in several different ways using rbind() or paste() but can't figure out how to get the full data frame I'm looking for.


Solution

  • I'll first join them together without row names, as some tools honor them, some ignore them, and some actively remove them.

    df2 <- cbind(df1[rep(1, length(strings)),], data.frame(barcode = strings))
    df2
    #             Donor  Treatment Timepoint                    barcode
    # MK434_016     WT5 ST002_50uM       6hr AAACAAGCAAACAAGAATTCGGTT-1
    # MK434_016.1   WT5 ST002_50uM       6hr AAACAAGCAAACAATCATTCGGTT-1
    # MK434_016.2   WT5 ST002_50uM       6hr AAACAAGCAAACCTGAATTCGGTT-1
    # MK434_016.3   WT5 ST002_50uM       6hr AAACAAGCAAACTTGGATTCGGTT-1
    # MK434_016.4   WT5 ST002_50uM       6hr AAACAAGCAAAGACCCATTCGGTT-1
    # MK434_016.5   WT5 ST002_50uM       6hr AAACAAGCAAAGGTAAATTCGGTT-1
    

    From here, if you really want to remove the barcode info from the columns and make them row names, it is simple enough:

    rownames(df2) <- df2$barcode
    df2$barcode <- NULL
    df2
    #                            Donor  Treatment Timepoint
    # AAACAAGCAAACAAGAATTCGGTT-1   WT5 ST002_50uM       6hr
    # AAACAAGCAAACAATCATTCGGTT-1   WT5 ST002_50uM       6hr
    # AAACAAGCAAACCTGAATTCGGTT-1   WT5 ST002_50uM       6hr
    # AAACAAGCAAACTTGGATTCGGTT-1   WT5 ST002_50uM       6hr
    # AAACAAGCAAAGACCCATTCGGTT-1   WT5 ST002_50uM       6hr
    # AAACAAGCAAAGGTAAATTCGGTT-1   WT5 ST002_50uM       6hr
    

    A quick dplyr version:

    library(dplyr)
    df1[rep(1, length(strings)),] %>%
      `rownames<-`(NULL) %>%
      mutate(barcode = strings) %>%
      tibble::column_to_rownames("barcode")
    #                            Donor  Treatment Timepoint
    # AAACAAGCAAACAAGAATTCGGTT-1   WT5 ST002_50uM       6hr
    # AAACAAGCAAACAATCATTCGGTT-1   WT5 ST002_50uM       6hr
    # AAACAAGCAAACCTGAATTCGGTT-1   WT5 ST002_50uM       6hr
    # AAACAAGCAAACTTGGATTCGGTT-1   WT5 ST002_50uM       6hr
    # AAACAAGCAAAGACCCATTCGGTT-1   WT5 ST002_50uM       6hr
    # AAACAAGCAAAGGTAAATTCGGTT-1   WT5 ST002_50uM       6hr
    

    Data

    df1 <- structure(list(Donor = "WT5", Treatment = "ST002_50uM", Timepoint = "6hr"), class = "data.frame", row.names = "MK434_016")
    strings <- c("AAACAAGCAAACAAGAATTCGGTT-1", "AAACAAGCAAACAATCATTCGGTT-1", "AAACAAGCAAACCTGAATTCGGTT-1", "AAACAAGCAAACTTGGATTCGGTT-1", "AAACAAGCAAAGACCCATTCGGTT-1", "AAACAAGCAAAGGTAAATTCGGTT-1")