Search code examples
rexcelr-xlsx

Import excel with duplicate columns that end with a number (r xlsx)


I have an xlsx file that contains let's say 4 columns:

Thingie1
Thingie1
Thingie2
Thingie2

(I know it's stupid... it's because I am combining data from 2 different databases, and the 1 and 2 denote a difference between survey 1 and survey 2 and chose to dump out into a xlsx file.

When I import this into R using the xlsx package, it changes the column titles to:

Thingie1
Thingie3
Thingie2
Thingie4

Obviously my real-world example is much more complex and it then becomes very difficult to identify what is what. I would like to have something like:

Thingie1
Thingie11
Thingie2
Thingie21

Now, I can resolve this by opening the file in excel and changing the columns there... so I can solve my immediate problem, but I hate breaking my workflow... would there be some way to import this data in a better way into R itself?


Solution

  • I think you should read the first line (containing column names) alone of your file with header=FALSE for proper naming of columns. Then use make.unique function upon it. Later read the whole file with header=TRUE and set the column names using colnames function.

    x <- c("Thingie1", "Thingie1", "Thingie2", "Thingie2")
    x
    #[1] "Thingie1" "Thingie1" "Thingie2" "Thingie2"
    make.unique(x)
    #[1] "Thingie1"   "Thingie1.1" "Thingie2"   "Thingie2.1"