Search code examples
rdataframemergedatasetmerging-data

After merging two data sets in R, the resulting data frame is missing all the values in one column. What is causing this?


I am working from two Excel sheets. This is my code:

# Read the first dataset
df1 <- read.csv("ATLweather.csv")

# Read the second dataset
df2 <- read.csv("electricityprices.csv")

 # Merge the datasets
 library("dplyr")
 merged_df <- left_join(df1, df2, by = "Timestamp")
  head(merged_df)

 library(writexl)

 # Save the merged data frame as an Excel file
  write_xlsx(merged_df, path = "C:/Users/Nathaniel/Documents/newfile.xlsx")

This successfully merges the data sets, but replaces all the data in my "cost" column with "NA" (pictured below in my environment).

enter image description here

This is my first time doing any coding, so I'm at a loss as to how to fix this. What am I doing wrong?


Solution

  • As @Claudio pointed out, the Timestamp vectors are character vectors that are formatted differently (you can tell they are characters by the "chr" in your screenshot). R doesn't know that characters should be interpreted as dates. You can change them both to date objects and merge them that way, but a simple way would be to convert one format to the other and leave them as characters (sometimes playing with dates in R is more work than it is worth, but see the package lubridate if you want to play with them as dates).

    Here is a quick fix:

    Timestamp<-c("1/1/2010 0:00","1/1/2010 1:00")  ## create example data
    
    Timestamp<-as.POSIXct(Timestamp,format="%m/%d/%Y %H:%M") ## can enter timezone here if you want
    Timestamp<-as.character(Timestamp)
    
    Timestamp  # look at output
    "2010-01-01 00:00:00" "2010-01-01 01:00:00"
    

    Then just change Timestamp here to df2$Timestamp for your purposes.