Search code examples
stata

By ID, uniquely expanding multiple columns


Let's say I have this dataset.

id  headquarter_name    subsidiary_name
A     Apple                 Apple
A     Apple Inc.            Apple
A     Apple Computer        Apple corp.
B     Microsoft             MS
B     MS                    MS

For your convenience, I created this dataset here:

I want to make, for each id, just a list of "unique" names distributed across widely like this.

id   Name1        Name2         Name3          Name4
A    Apple      Apple Inc.  Apple Computer   Apple corp.
B    Microsoft     MS

It needs to be unique. "Apple" appears many times, but it was included just once, for example.

To make this, the only thing that I could think about was this so far, which, of course, doesn't do what I want to do.

collapse (first) headquarter_name subsidiary_name,by(id)

Solution

  • clear
    input str1 id  str42 ( headquarter_name    subsidiary_name) 
    A     "Apple"                 "Apple"
    A     "Apple Inc."            "Apple"
    A     "Apple Computer"        "Apple corp."
    B     "Microsoft"             "MS"
    B     "MS"                    "MS"
    end 
    stack id headquarter_name id subsidiary_name, into(id name) clear 
    drop _stack
    duplicates drop
    bysort id (name) : gen which = _n
    reshape wide name, i(id) j(which)
    list 
    
         +--------------------------------------------------------+
         | id   name1            name2        name3         name4 |
         |--------------------------------------------------------|
      1. |  A   Apple   Apple Computer   Apple Inc.   Apple corp. |
      2. |  B      MS        Microsoft                            |
         +--------------------------------------------------------+
    

    A variant on this would add sorting by frequency of mentions before dropping of duplicates.