Search code examples
pythonpandasstringlambda

How to limit strings in Pandas row by row using apply and lambda?


I have the following dataframe:

# initialize list of lists
data = [['1', "Tag1, Tag323, Tag36"], ['2', "Tag11, Tag212"], ['4', "Tag1, Tag12, Tag3, Tag324"]]
 
# Create the pandas DataFrame
df = pd.DataFrame(data, columns = ['ID', 'Tag'])

print(df)

    ID  Tag
    1   Tag1, Tag323, Tag36
    2   Tag11, Tag212
    4   Tag1, Tag12, Tag3, Tag324

I would like to manipulate the string values (e.g. "Tag1, Tag2, Tag3") in column tag with the follwing condition. In each row, if there are more than 2 tags, the output should look like "Tag1, Tag2 ..". The tag length can be different.

print(df)

    ID  Tag
    1   Tag1, Tag323 ..
    2   Tag11, Tag212
    4   Tag1, Tag12 .. 

Does anyone know a Pandas apply and lambda method to solve this?


Solution

  • Idea is split tags by Series.str.split and if length is greater like N select first N values and join:

    N = 2
    s = df['Tag'].str.split(",")
    df['Tag'] = df['Tag'].mask(s.str.len().gt(N), s.str[:N].str.join(",") + "...")
    print (df)
      ID              Tag
    0  1  Tag1, Tag323...
    1  2    Tag11, Tag212
    2  4   Tag1, Tag12...
    

    Solution with apply:

    N = 2
    df['Tag'] = df['Tag'].apply(lambda x: ', '.join(x.split(',')[:N]) + '...'
                                if x.count(',') + 1 > N else x)
    

    Or:

    N = 2
    df['Tag'] = df['Tag'].mask(df['Tag'].str.count(',').gt(N), 
                               df['Tag'].apply(lambda x: ', '.join(x.split(',')[:N]) + '...'))