Search code examples
rdataframedaterowsrbind

Binding two dataframes with "Date" column on different RStudio IDEs produces different results


I am producing a new dataframe by binding together two dataframes obtained from two different GitHub repositories. Both dataset have a Date column. When I do this operation on my machine everything is fine, and I can use the functions rbind() or bind_rows() to bind together the dataframes.
Another user tried the same code and the result is different. In particular, the Date column is split. The dates of the first dataframe are under the first column (called Date), while the dates of the second dataframe are placed at the end of the dataframe, in a new column (that I haven't created) called X.U.FEFF.Date.

Below there is the code I used:

library(dplyr)
library(RCurl)

setwd(dir = "YOUR_WORKING_DIRECTORY")

#####===== FIRST DATAFRAME =====#####
cases <- read.csv(text = getURL(url = "https://raw.githubusercontent.com/openZH/covid_19/master/COVID19_Cases_Cantons_CH_total.csv"),
                  header = TRUE,
                  stringsAsFactors = FALSE,
                  na.strings = c("", "NA"),
                  encoding = "UTF-8")

# Removed data for whole Switzerland and Leichtenstein
cases <- subset(x = cases,
                !is.element(el = canton,
                            set = c("CH", "FL")),
                select = c("date",
                           "canton",
                           "tested_pos"))

names(cases)[1] <- "Date"

# Dataset restructured according to the cases dataset format
cases <- reshape(data = cases,
                 idvar = "Date",
                 timevar = "canton",
                 v.names = "tested_pos",
                 direction = "wide",
                 )

names(cases) <- gsub(pattern = "tested_pos.",
                     replacement = "",
                     x = names(cases))

cases[is.na(cases)] <- 0

cases <- cases[order(cases$Date,
                     decreasing = FALSE), ]

#####===== SECOND DATAFRAME =====#####
cases2 <- read.csv(text = getURL(url = "https://raw.githubusercontent.com/daenuprobst/covid19-cases-switzerland/master/covid19_cases_switzerland.csv"),
                   header = TRUE,
                   stringsAsFactors = FALSE,
                   na.strings = c("", "NA"),
                   encoding = "UTF-8")

# Remove total daily cases for Switzerland
cases2 <- subset(x = cases2,
                 select = -c(CH))

# rbind between two cases datasets
cases_tot <- bind_rows(cases[1:7, ],
                       cases2)

write.csv(x = cases_tot,
          file = paste0(getwd(),
                        "/cases_tot.csv"),
          row.names = FALSE,
          quote = FALSE)

For the other user, the function rbind() just fails, while the function bind_rows() produces the output displayed in this image. I don't know how to solve this issue because I can't reproduce it on my machine.

Any idea about what's causing this issue? Thanks a lot.


Solution

  • As per comment:

    Change read.csv() to read_csv() for more robust csv parsing!