Let's say my data has two identical columns called SYC SJ Equity. When I import this into R using the base read.csv()
and setting check.names
to FALSE
, the data gets imported with two columns having identical names. I can then delete the duplicated columns like so:
df <- df %>%
do(.[!duplicated(names(.))])
If I use the read_csv()
function from the readr to import the data, the duplicated column gets the column name in the form of "SYC SJ Equity_1". I can then do the following to delete the duplicated column
df <- df %>%
select(-contains("_"))
However, if the data is in a sheet in a .xlsx
format, and I use the read_excel()
function from the readxl
package, the duplicated column gets imported as:
New names:
* `SYC SJ Equity` -> `SYC SJ Equity...406`
* `SYC SJ Equity` -> `SYC SJ Equity...407`
In this case, both columns get renamed, whereas with read_csv()
, only the extra instance(s) gets renamed. In this case, I have to match the entire string to delete the duplicated column. I do not know what numbers will get assigned to the duplicated columns. I assume it's based on the column index? Furthermore, even with the read_csv()
import, my solution won't work if my columns all naturally have underscores in their names in the raw data. So my question is, how do I delete a duplicated column in the tidyverse with data stored in tibbles? This is easy with base R, with duplicated()
. But this doesn't work with the tibbles of the tidyverse. I understand that dplyr
verbs select by column name and not by column index,so the column names always have to be different. But is there a way to always deleted the duplicated column with tidyverse, without looking at the console to see the new duplicated names?
Building off the answer provided by Ronak, if you want to do this in dplyr
, then you can just use his provided solution with select_if
.
library(dplyr)
df <- data.frame("x" = runif(3),
"SYC SJ Equity...406" = c("a", "a", "b"),
"SYC SJ Equity...407" = c("a", "a", "b"),
"y" = runif(3))
df %>%
select_if(!duplicated(sub("\\.\\.\\..*", "", names(.))))