I have a string that is like
"Name: Abcde fghijk, College: so and so college, somewhere, on earth Department: I Dont Know, Designation: still to be decided"
and i need to output something like this.
[ 'Name: Abcde fghijk,' ,
'College: so and so college, somewhere, on earth' ,
'Department: I Dont Know,' ,
'Designation: still to be decided' ]
I,ve been trying to formulate somekind of regex to find or to split the elements in certain way like this
r"[^\s]*:.*?,"
which i could bring it to something like this
['Name: Abcde fghijk,','College: so and so college,','Department: I Dont Know,']
but it misses some part of it.
"somewhere, on earth" and "Designation: still to be decided"
Can someone help out on this! I NEED SOMETHING LIKE capture until one word before next : or till the end
Here is an re.findall
approach which seems to be working:
inp = "Name: Abcde fghijk, College: so and so college, somewhere, on earth Department: I Dont Know, Designation: still to be decided"
matches = re.findall(r'\w+: .*?\s*(?=\w+:|$)', inp)
print(matches)
This prints:
['Name: Abcde fghijk, ',
'College: so and so college, somewhere, on earth ',
'Department: I Dont Know, ',
'Designation: still to be decided']
Explanation of regex:
\w+:
match leading label followed by colon .*?
space followed by any content, up to, but not including\s*
optional whitespace(?=\w+:|$)
assert that what follows is another label: or end of input