Search code examples
regextextsplitfindallfindstr

Regex to split or to find the Dictionary 'Like' elements in text


I have a string that is like

"Name: Abcde fghijk, College: so and so college, somewhere, on earth Department: I Dont Know, Designation: still to be decided"

and i need to output something like this.

[ 'Name: Abcde fghijk,' , 
'College: so and so college, somewhere, on earth' , 
'Department: I Dont Know,' , 
'Designation: still to be decided' ]

I,ve been trying to formulate somekind of regex to find or to split the elements in certain way like this

r"[^\s]*:.*?,"

which i could bring it to something like this

['Name: Abcde fghijk,','College: so and so college,','Department: I Dont Know,']

but it misses some part of it.

 "somewhere, on earth" and "Designation: still to be decided"

Can someone help out on this! I NEED SOMETHING LIKE capture until one word before next : or till the end


Solution

  • Here is an re.findall approach which seems to be working:

    inp = "Name: Abcde fghijk, College: so and so college, somewhere, on earth Department: I Dont Know, Designation: still to be decided"
    matches = re.findall(r'\w+: .*?\s*(?=\w+:|$)', inp)
    print(matches)
    

    This prints:

    ['Name: Abcde fghijk, ',
     'College: so and so college, somewhere, on earth ',
     'Department: I Dont Know, ',
     'Designation: still to be decided']
    

    Explanation of regex:

    • \w+: match leading label followed by colon
    • .*? space followed by any content, up to, but not including
    • \s* optional whitespace
    • (?=\w+:|$) assert that what follows is another label: or end of input