Search code examples
rdataframedata-analysis

Why does my new data frame create two new variables when I combine two data frames together using cbind() function in R?


I'm currently working on a data analysis project but the problem is, the new data frame I created is odd.

mycob1 <- read.csv("MYCOB_1.csv")
mycob1
     Date      Direction     RFU        Ct
1  Lot_210927         0    6.3588  9.164329
2  Lot_210927         0    5.0394 11.350701
3  Lot_210927         0    4.9946 37.334669
4  Lot_210927         0    4.8604  8.168337
5  Lot_210927         0    4.9032 37.306613
6  Lot_210927         0    4.9502 22.176353
7  Lot_210927         0    4.7858 23.713427
8  Lot_210927         0    5.2778 10.496994
9  Lot_210927         1 1021.8458 32.119668
10 Lot_210927         1 1020.1998 31.500716
11 Lot_210927         1 1065.8000 31.979674
12 Lot_210927         1  988.0452 31.019754
13 Lot_210927         1 1085.2206 31.557973
14 Lot_210927         1 1072.8540 31.745491
15 Lot_210927         1 1020.6496 31.218151
16 Lot_210927         1  983.4106 31.981162
mycob2 <- read.csv("MYCOB_2.csv")
mycob2
Date Direction       RFU       Ct
1  Lot_211020         0    0.6876 47.72087
2  Lot_211020         0   40.1056 38.37418
3  Lot_211020         0   97.0882 37.72917
4  Lot_211020         0   10.3170 36.18236
5  Lot_211020         0   67.3742 37.39050
6  Lot_211020         0   10.2540 40.16776
7  Lot_211020         0    6.9624 28.07575
8  Lot_211020         0    9.5718 28.84626
9  Lot_211020         0   13.0306 38.87375
10 Lot_211020         1  860.3956 29.15746
11 Lot_211020         1  884.9338 30.03665
12 Lot_211020         1 1552.2462 27.90839
13 Lot_211020         1  738.2328 29.22760
14 Lot_211020         1 1419.6448 29.13627
15 Lot_211020         1 1441.6212 29.35351
16 Lot_211020         1  424.9774 31.56446

mycob12 <- cbind(mycob1, mycob2, by.x = "Lot_210927", by.y = "Lot_211020")
mycob12
         Date Direction       RFU        Ct       Date Direction       RFU       Ct       by.x       by.y
1  Lot_210927         0    6.3588  9.164329 Lot_211020         0    0.6876 47.72087 Lot_210927 Lot_211020
2  Lot_210927         0    5.0394 11.350701 Lot_211020         0   40.1056 38.37418 Lot_210927 Lot_211020
3  Lot_210927         0    4.9946 37.334669 Lot_211020         0   97.0882 37.72917 Lot_210927 Lot_211020
4  Lot_210927         0    4.8604  8.168337 Lot_211020         0   10.3170 36.18236 Lot_210927 Lot_211020
5  Lot_210927         0    4.9032 37.306613 Lot_211020         0   67.3742 37.39050 Lot_210927 Lot_211020
6  Lot_210927         0    4.9502 22.176353 Lot_211020         0   10.2540 40.16776 Lot_210927 Lot_211020
7  Lot_210927         0    4.7858 23.713427 Lot_211020         0    6.9624 28.07575 Lot_210927 Lot_211020
8  Lot_210927         0    5.2778 10.496994 Lot_211020         0    9.5718 28.84626 Lot_210927 Lot_211020
9  Lot_210927         1 1021.8458 32.119668 Lot_211020         0   13.0306 38.87375 Lot_210927 Lot_211020
10 Lot_210927         1 1020.1998 31.500716 Lot_211020         1  860.3956 29.15746 Lot_210927 Lot_211020
11 Lot_210927         1 1065.8000 31.979674 Lot_211020         1  884.9338 30.03665 Lot_210927 Lot_211020
12 Lot_210927         1  988.0452 31.019754 Lot_211020         1 1552.2462 27.90839 Lot_210927 Lot_211020
13 Lot_210927         1 1085.2206 31.557973 Lot_211020         1  738.2328 29.22760 Lot_210927 Lot_211020
14 Lot_210927         1 1072.8540 31.745491 Lot_211020         1 1419.6448 29.13627 Lot_210927 Lot_211020
15 Lot_210927         1 1020.6496 31.218151 Lot_211020         1 1441.6212 29.35351 Lot_210927 Lot_211020
16 Lot_210927         1  983.4106 31.981162 Lot_211020         1  424.9774 31.56446 Lot_210927 Lot_211020

For clarification, "Direction" just indicates if the sample is positive or negative. I want to find if there's a correlation between the RFU and Ct and Direction variables. But I can't seem to figure out a way to do so. The odd part of this new data frame I created called "mycob12" is that it creates two new variable at the end called "by.x" and "by.y" and I'm not sure what I can do to remove them. Is there a way to remove the variables?

edit: I would want to use these data frames and create graphs with them to explore any patterns with direction, RFU, and Ct. I've thought about removing the date and just putting multiple data frames on top of each other.

Thank you!


Solution

  • I'm not sure what exactly you're trying to do, but looking at your data it seems to make more sense to just stack both dataframes and then later sort them using the Date variable.

    Following your dataframes above:

    df1 <- data.frame(Date = c("Lot_210927","Lot_210927","Lot_210927"),
                      Direction = c(0,0,0),
                      RFU = c(6.3588,5.0394,4.9946),
                      Ct = c(9.164329,11.350701,37.334669))
    
    df2 <- data.frame(Date = c("Lot_211020","Lot_211020","Lot_211020"),
                      Direction = c(0,0,0),
                      RFU = c(0.6876,40.1056,97.0882),
                      Ct = c(47.72087,38.37418,37.72917))
    

    You can stack them with the tidyverse using bind_rows : (note that it will just superpose both dataframes. I'd recommend only using this if you have exactly the same column names and data types -e.g. numerical, character, etc. - in both dataframes, else you should use something like left_join from the tidyverse)

    library(tidyverse)
    
    df_merged <- bind_rows(df1,df2)
    
    df_merged
            Date Direction     RFU        Ct
    1 Lot_210927         0  6.3588  9.164329
    2 Lot_210927         0  5.0394 11.350701
    3 Lot_210927         0  4.9946 37.334669
    4 Lot_211020         0  0.6876 47.720870
    5 Lot_211020         0 40.1056 38.374180
    6 Lot_211020         0 97.0882 37.729170
    

    You could then produce a correlation matrix as follows:

    df_num <- df_merged[, c(2:4)]
    
    
    df_cor <- round(cor(df_num),2)
    
    df_cor %>%
      head()
    
              Direction  RFU   Ct
    Direction         1   NA   NA
    RFU              NA 1.00 0.29
    Ct               NA 0.29 1.00
    

    Just isolating the numerical variables and plotting a correlation matrix with them. Obviously it's not glaringly interesting with 6 datapoints and the direction always being 0, but with your full dataset it should be a good starting point.