I'm trying to figure out a good algorithm for embedding name as such.
space = 0, word = 1, comma = 2, double quotations = 3
So "Bob Dylan" should embed as "101"
While "Brown, Millie Bobby" should embed as "120101"
and "Dwayne "The Rock" Johnson" should embed as "103101301"
I would suggest a very simple solution:
\w+
and replace them with 1
.\s
and replace it with 0
.,
and replace it with 2
."
with 3
.