SAS: How to spot Regex patterns in space separated values and split them into multiple columns

I have a data set that goes like this:

DATA hogwarts;
INFILE DATALINES delimiter="," dsd;
LENGTH Index Wizards $ 255;
INPUT Index Wizards $ Points;
DATALINES;
1,'Harry Potter Ron Weasley',100
2,'Hermione Granger Harry Potter',200
3,'Ron Weasley',300
;
RUN;

In other words it looks like this:

Index	Wizards	Points
1	Harry Potter Ron Weasley	100
2	Hermione Granger Harry Potter	200
3	Ron Weasley	300

I need to spot regex patterns stored in a wizards data set that looks like this:

Name
Harry Potter
Ron Weasley
Hermione Granger

You can generate it with:

DATA wizards;
INPUT Name $64.;
DATALINES;
Harry Potter
Ron Weasley
Hermione Granger
;
RUN

and output a want data set that looks like this:

Index	Wizards	Points	Wizard1	Wizard2
1	Harry Potter Ron Weasley	100	Harry Potter	Ron Weasley
2	Hermione Granger Harry Potter	200	Hermione Granger	Harry Potter
3	Ron Weasley	300	Ron Weasley	.

The result in Wizard1,Wizard2...WizardN needs to be positional: e.g. at line 2 of hogwarts Hermione Granger pattern appears first in the Wizards column then appears Harry Potter so the value of Wizard1 in want needs to be Hermione Granger.

Also patterns in wizards can be composed of one, two or more than two words, while Wizards is a multival columns in the hogwarts where values are separated by spaces so we cannot count of the fact the number of space between each pattern to be fix.

Plus the number of Wizard1-WizardN columns is not predetermined.

Does someone have a clue?

Many thanks in advance,

Solution

You can use regular expressions to detect names on a word boundary in the wizards list. The PRXMATCH function will return the position in the list where a match is made, and that position is used to sort a table of all matches. Finally, TRANSPOSE can pivot your table of matches into the 'across' structure you want.

Example:

The data crossing is an outer join, which means every row of hogwarts has some comparison to every row of wizards. In the example the wizards are stored in a temporary array, as are the corresponding regular expression used to test for asserting a positive match.

DATA hogwarts;
INFILE DATALINES delimiter="," dsd;
LENGTH Index 8 Wizards $ 255;
INPUT Index Wizards $ Points;
DATALINES;
1,'Harry Potter Ron Weasley',100
2,'Hermione Granger Harry Potter',200
3,'Ron Weasley',300
;
DATA wizards;
INPUT Name $64.;
DATALINES;
Harry Potter
Ron Weasley
Hermione Granger
;

proc sql;
  select count(name) into :name_count from wizards ;
data found ;
  array wzname [&name_count] $64 _temporary_ ;
  array rx [&name_count] _temporary_ ;

  if 0 then set hogwarts; /* prep PDV */

  if _n_ = 1 then do _n_ = 1 by 1 until (end_wizards);
    set wizards end=end_wizards ;
    wzname [_n_] = name ;
    rx[_n_] = prxparse('m/\b'||trim(name)||'\b/i') ;
  end ;
  
  set hogwarts ;

  do _n_ = 1 to dim(rx) ;
    pos = prxmatch (rx[_n_],wizards) ;
    if pos then do ;
      name = wzname[_n_] ;
      output ;
    end ;
  end ;
run ;
proc sort data=found ;
  by index pos ;
run ;
proc transpose data=found out=want(drop=_name_) prefix=wizard;
  by index wizards points ;
  var name ;
run ;