How to split a string into only two parts (and not discard other parts)

Say that I have these data:

clear all
set obs 2
gen title = "dog - cat - horse" in 1
replace title = "chicken - frog - ladybug" in 2
tempfile data
save `data'

I can split these into three parts:

use `data', clear
split title, p(" - ")

And I can split them into two parts, discarding the third part:

use `data', clear
split title, p(" - ") limit(2)

Is there an off-the-shelf solution to split into only two parts, but to group everything after the first splitting character (dash in this case) into the second variable? In R, I would use separate with the extra="merge" option (see tidyr separate only first n instances).

In other words, for the first row, I would like the first observation's title1 to be dog and for title2 to be cat - horse.

I realize that this is possible using custom code (see Stata split string into parts), but I am hoping for a simple command along the lines of Stata's split/R's separate to accomplish my goal.

Solution

This isn't at present an option in the official split command. (Full disclosure: I was the previous author.)

You could just write your own command. This one needs more generality and more error checks, but it does what I think you want with your data example. Detail: is trimming spaces desired?

clear all
set obs 2
gen title = "dog - cat - horse" in 1
replace title = "chicken - frog - ladybug" in 2

gen title1 = trim(substr(title, 1, strpos(title, "-") - 1))
gen title2 = trim(substr(title, strpos(title, "-") + 1, .))

program split2
    syntax varname(string), parse(str) [suffixes(numlist int min=2 max=2)]
    
    if "`suffixes'" == "" local suffixes "1 2"
    tokenize "`suffixes'"
    
    gen `varlist'`1' = trim(substr(`varlist', 1, strpos(`varlist', "`parse'") - 1))
    gen `varlist'`2' = trim(substr(`varlist', strpos(`varlist', "`parse'") + strlen("`parse'"), .))
end 

split2 title, parse("-") suffixes(3 4)

list 
    
     +--------------------------------------------------------------------------------+
     |                    title    title1           title2    title3           title4 |
     |--------------------------------------------------------------------------------|
  1. |        dog - cat - horse       dog      cat - horse       dog      cat - horse |
  2. | chicken - frog - ladybug   chicken   frog - ladybug   chicken   frog - ladybug |
     +--------------------------------------------------------------------------------+

Note also the egen function ends() and its head and tail options. Using that would need two calls. It generates just one variable at a time.