I'm trying to add a new column to a dataframe (dfA
) based on values from another dataframe (dfB
):
s = dfA['value'].tolist()
dfB['value'] = dfB['text_bod'].str.contains('|'.join(s))
Can progress_map
be used with this setup?
dfB['value] = 'dfB['text_bod].progress_map(func)'
Or is there some other way tqdm can be implemented?
Alternative method using FlashText:
from flashtext import KeywordProcessor
s = dfA['value'].tolist()
processor = KeywordProcessor()
processor.add_keywords_from_list(s)
dfB['value'] = dfB['text_bod'].progress_map(lambda x: processor.extract_keywords(x))
Not aware of a str.contains
way, but you can use progress_map
with a callback that does the exact same thing, but with re.search
:
import re
dfB['value'] = dfB['text_bod'].progress_map(
lambda x: bool(re.search('|'.join(s), x))
)
As a function, you can use
def extract(x, p):
m = p.search(x)
if m:
return m.groups(0)
return np.nan
p = re.compile('|'.join(s))
dfB['value'] = dfB['text_bod'].progress_map(lambda x: extract(x, p))
This should allow you greater flexibility than a lambda.