I have a tricky problem and I'm wondering if there's a clever regex solution. I have input data that consists of two columns, but the first column needs to be split into multiple lines with the second column intact. For example, a file called test:
cat_;_dog_;_rat animal
chair_;_desk object
The output needs to look like this:
cat animal
dog animal
rat animal
chair object
desk object
There are an arbitrary number of ; separators on each line. There is probably a way to do this in a one-liner, which I prefer since I'm piping the data in and out. I tried this:
perl -pe 's/(\w+)_;_(\w+)\t(.+)/$1\t$3\n$2\t$3/g' test
The first column has words (\w+)
delimited by _;_
, then a tab, and then the second column. But this only consumes one iteration of the data:
cat animal
dog_;_rat animal
chair object
desk object
I tried the following too just in case the /g
global tag wasn't getting it right:
perl -pe 's/(\w+)(_;_(\w+))+\t(.+)/$1\t$4\n$3\t$4/g' test
It still only goes one round. Who's got some ideas?
perl -lane 'print "$_ $F[1]" for split /_;_/, $F[0];'
-n
reads the input line by line and runs the code for each line;-l
removes newlines from input and adds them to output;-a
splits each input line on whitespace into the @F
array;_;_
, and for each value, it prints it ($_
) followed by the second column.