I'm using R to create a data frame with several duplicate columns. I'd like to consolidate all of the duplicated columns into a single column. How to do this in R?
Note 1 : When I build dataframes with multiple columns, R adds numbers to the names of the repeated columns by default.
Note 2: I am looking for code that will work with the columns regardless of their order.
code:
# Create the data frame.
emp.data <- data.frame(
emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),
salary = c(623.3,515.2,611.0,729.0,843.25),
salary = c(700.3,600.2,721.0,730.5,845.4),
emp_name = c("Kevin","Tracy","Thompson","Peter","Bevan"),
stringsAsFactors = FALSE
)
# Print the data frame.
print(emp.data)
current result
emp_name salary salary.1 emp_name.1
Rick 623.3 700.3 Kevin
Dan 515.20 600.2 Tracey
Michelle 611.00 721.0 Thompson
Ryan 729.00 730.5 Peter
Gary 843.25 845.4 Bevan
Expected output
emp_name salary
Rick 623.3
Dan 515.20
Michelle 611.00
Ryan 729.00
Gary 843.25
Kevin 700.3
Tracey 600.2
Thompson 721.0
Peter 730.5
Bevan 845.4
Use melt
from data.table
library(data.table)
melt(setDT(emp.data), measure = patterns("^emp_name", "salary"),
value.name = c("emp_name", "salary"))[, variable := NULL][]
emp_name salary
1: Rick 623.30
2: Dan 515.20
3: Michelle 611.00
4: Ryan 729.00
5: Gary 843.25
6: Kevin 700.30
7: Tracy 600.20
8: Thompson 721.00
9: Peter 730.50
10: Bevan 845.40
emp.data <- structure(list(emp_name = c("Rick", "Dan", "Michelle", "Ryan",
"Gary"), salary = c(623.3, 515.2, 611, 729, 843.25), salary = c(700.3,
600.2, 721, 730.5, 845.4), emp_name = c("Kevin", "Tracy", "Thompson",
"Peter", "Bevan")), class = "data.frame", row.names = c(NA, -5L
))