Search code examples
pythonnltktokenize

Using starred expressions with variables


I intend to strip words linked by "-" in a words list. I would use starred expression because I ignore if i will obtain a list among the list with split().

I work well with constant expression:

[i for i in [*['1','2'],'1']]

yield:

['1', '2', '1']

I would obtain the same process with variables :

test=pd.DataFrame( {'columns0' :[['hanging', 'heart', 't-light', 'holder']]})
test.apply(lambda x : [e if len(e.split('-'))==1 else (*e.split('-')) for e in x ])

but as you expected it doesn't work :

  File "<ipython-input-1109-dda6b3df14bb>", line 3
    test.apply(lambda x : [e if len(e.split('-'))==1 else ( *e.split('-')) for e in x ])
                                                           ^
SyntaxError: can't use starred expression here

Solution

  • Why even bother differentiating those cases? split returns a list either way, regardless of the length. Just nest the comprehension:

    lambda x: [token for e in x for token in e.split('-')]