Search code examples
pythonpandasstringlistattributeerror

Pandas.Series.Str.Find mixed in with x in a list


Good afternoon!

Long story short, I'm trying to do sentiment analysis on certain features of phones based on a review dataset. I'm coordinating this with a .loc function and it's worked before but this is a certain list instead of a string. I'm trying to link this to any x within a list, with the x being a list.

Here's what I have:

Battery = ['battery', 'charge', 'juice', 'talk time', 'hours', 'minutes']
batt = apple['Reviews'].str.lower().str.find(x in Battery)!=-1

The error returned is:

AttributeError: Can only use .str accessor with string values.

I did it this way because it did not like when I just put Battery instead of x in Battery.

Any suggestions? Thanks again!

The expected output would be, if I ran the variable assigned, are all rows that have any of the keywords. (The x within Battery). So any rows with anything like charge, juice, etc. would pop up.


Solution

  • If apple['Review'] is just a column of strings, you can check str.contains().

    Given these Battery and apple:

    Battery = ['battery', 'charge', 'juice', 'talk time', 'hours', 'minutes']
    apple = pd.DataFrame({'Review': ['abc battery xyz', 'foo bar', 'orange juice bar', 'talk time']})
    
    #              Review
    # 0   abc battery xyz
    # 1           foo bar
    # 2  orange juice bar
    # 3         talk time
    

    This would be the batt output:

    batt = apple[apple['Review'].str.lower().str.contains('|'.join(Battery))]
    
    #              Review
    # 0   abc battery xyz
    # 2  orange juice bar
    # 3         talk time
    

    If apple['Review'] is a column of lists, you can first join them with str.join(' ') before checking str.contains():

    batt = apple[apple['Review'].str.join(' ').str.lower().str.contains('|'.join(Battery))]