Search code examples
statadata-analysisdata-management

Changing organization of data so that each observation represents a new variable (I tried)


I am working in Stata with a dataset on electric vehicle charging stations. Variables include

station_name name of charging station

review_text all of the customer reviews for a specific station delimited by }{

num_reviews number of customer reviews.

I'm trying to make a new file where each observation represents one customer review in a new variable customer_review and another variable station_id has the name of the corresponding station. So, if the original dataset had 100 observations (one per station) with 5 reviews each, the new file should have 500 observations.

How can I do this? I would include some code I have tried but I have no idea how to start.


Solution

  • If your data look like this:

           station              reviews   n  
      1.         1   {good}{bad}{great}   3  
      2.         2    {poor}{excellent}   2  
    
    

    Then the following:

    split(reviews), parse(}{)
    drop reviews n
    reshape long reviews, i(station) j(review_num)
    drop if reviews==""
    replace reviews = subinstr(reviews, "}","",.)
    replace reviews = subinstr(reviews, "{","",.)
    

    will produce:

           station   review~m     reviews  
      1.         1          1        good  
      2.         1          2         bad  
      3.         1          3       great  
      4.         2          1        poor  
      5.         2          2   excellent