I have a list json_data:
> print(json_data)
> ['abc', 'bcd/chg', 'sdf', 'bvd', 'wer/ewe', 'sbc & osc']
I need to split those elements with '/', '&' or 'and' into two different elements. The result I am looking for should look like this:
>['abc', 'bcd', 'chg', 'sdf', 'bvd', 'wer', 'ewe', 'sbc' , 'osc']
The code is:
separators = ['/', 'and', '&']
titles = []
for i in json_data:
titles.extend([t.strip() for t in i.split(separators)
if i.strip() != ''])
When running it, I am getting an error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-15-d0db85078f05> in <module>()
5 titles = []
6 for i in json_data:
----> 7 titles.extend([t.strip() for t in i.split(separators)
8 if i.strip() != ''])
TypeError: Can't convert 'list' object to str implicitly
How can this be fixed?
Regex is your friend:
>>> import re
>>> pat = re.compile("[/&]|and")
>>> json_data = ['abc', 'bcd/chg', 'sdf', 'bvd', 'wer/ewe', 'sbc & osc']
>>> titles = []
>>> for i in json_data:
... titles.extend([x.strip() for x in pat.split(i)])
...
>>> titles
['abc', 'bcd', 'chg', 'sdf', 'bvd', 'wer', 'ewe', 'sbc', 'osc']
This line noise: re.compile("[/&]|and")
means "create a regular expression matching either [/&]
or the word 'and'
". [/&]
of course matches either /
or &
.
Having that in hand, pat.split(i)
just splits the string i
on anything matching pat
.
LATE EDIT: Realized that of course we can skip the strip() step by complicating the regex a little. If we have the regex "\s[/&]\s|\sand\s" then of course we match any whitespace before or after the basic matched elements. This means that splitting on this pattern removes the excess whitespace, and in addition it prevents us from splitting in the middle of a word like "sandwich", should that happen to appear in our data:
>>> pat = re.compile("\s[/&]\s|\sand\s")
>>> pat.split("beans and rice and sandwiches")
['beans', 'rice', 'sandwiches']
>>>
This simplifies the construction of the list, since we no longer need to strip the whitespace from the results of the split, which incidentally saves us some looping. Given the new pattern, we can write it this way:
>>> titles = []
>>> for i in json_data:
... titles.extend(pat.split(i))
...