Let's say I have this dataset.
id headquarter_name subsidiary_name
A Apple Apple
A Apple Inc. Apple
A Apple Computer Apple corp.
B Microsoft MS
B MS MS
For your convenience, I created this dataset here:
I want to make, for each id, just a list of "unique" names distributed across widely like this.
id Name1 Name2 Name3 Name4
A Apple Apple Inc. Apple Computer Apple corp.
B Microsoft MS
It needs to be unique. "Apple" appears many times, but it was included just once, for example.
To make this, the only thing that I could think about was this so far, which, of course, doesn't do what I want to do.
collapse (first) headquarter_name subsidiary_name,by(id)
clear
input str1 id str42 ( headquarter_name subsidiary_name)
A "Apple" "Apple"
A "Apple Inc." "Apple"
A "Apple Computer" "Apple corp."
B "Microsoft" "MS"
B "MS" "MS"
end
stack id headquarter_name id subsidiary_name, into(id name) clear
drop _stack
duplicates drop
bysort id (name) : gen which = _n
reshape wide name, i(id) j(which)
list
+--------------------------------------------------------+
| id name1 name2 name3 name4 |
|--------------------------------------------------------|
1. | A Apple Apple Computer Apple Inc. Apple corp. |
2. | B MS Microsoft |
+--------------------------------------------------------+
A variant on this would add sorting by frequency of mentions before dropping of duplicates.