I am cleaning a large dataset and column 1 has several spellings for company name. A company ID in Column 2 is the same across all company spellings, so I would like to search on company ID and replace all company spellings with the first spelling occurrence (it is not that important which one).
I am new to R and data.table syntax, but have tried to iterate through company IDs and use setDT to replace the company name values. I cannot, however, figure out how to replace with the first instance of the company name spelling for only those companies sharing the same company ID.
This is where I am so far:
library(data.table)
#Sample Datatable
CompanyDT <- data.table(Company_Name=c("Froklo","Forklo","Forlko","Corp3","Grap","Garp"), Company_ID=c(1,1,1,2,3,3))
# Company_Name Company_ID
#1: Froklo 1
#2: Forklo 1
#3: Forlko 1
#4: Corp3 2
#5: Grap 3
#6: Garp 3
Loop I am working on
for(j in CompanyDT[,.(Company_ID)])
FirstFacName <- CompanyDT[Company_ID[j], Company_Name]
setDT(CompanyDT)[, Company_Name:=FirstFacName]
I want this result, with the first spelling instance replacing the names of all companies with the same company ID:
# Company_Name Company_ID
#1: Froklo 1
#2: Froklo 1
#3: Froklo 1
#4: Corp3 2
#5: Grap 3
#6: Grap 3
but I get this result, with the first company name in the table replacing all company names, regardless of company ID:
# Company_Name Company_ID
#1: Froklo 1
#2: Froklo 1
#3: Froklo 1
#4: Froklo 2
#5: Froklo 3
#6: Froklo 3
The dplyr
way would be:
library(dplyr)
CompanyDT %>%
group_by(Company_ID) %>%
mutate(Company_Name_new = first(Company_Name))
# A tibble: 6 x 3
# Groups: Company_ID [3]
Company_Name Company_ID Company_Name_new
<chr> <dbl> <chr>
1 Froklo 1 Froklo
2 Forkslo 1 Froklo
3 Forlko 1 Froklo
4 Corp3 2 Corp3
5 Grap 3 Grap
6 Garp 3 Grap