I have the following dataframe:
# initialize list of lists
data = [['1', "Tag1, Tag323, Tag36"], ['2', "Tag11, Tag212"], ['4', "Tag1, Tag12, Tag3, Tag324"]]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns = ['ID', 'Tag'])
print(df)
ID Tag
1 Tag1, Tag323, Tag36
2 Tag11, Tag212
4 Tag1, Tag12, Tag3, Tag324
I would like to manipulate the string values (e.g. "Tag1, Tag2, Tag3"
) in column tag
with the follwing condition. In each row, if there are more than 2 tags, the output should look like "Tag1, Tag2 .."
. The tag length can be different.
print(df)
ID Tag
1 Tag1, Tag323 ..
2 Tag11, Tag212
4 Tag1, Tag12 ..
Does anyone know a Pandas apply and lambda method to solve this?
Idea is split tags by Series.str.split
and if length is greater like N
select first N
values and join:
N = 2
s = df['Tag'].str.split(",")
df['Tag'] = df['Tag'].mask(s.str.len().gt(N), s.str[:N].str.join(",") + "...")
print (df)
ID Tag
0 1 Tag1, Tag323...
1 2 Tag11, Tag212
2 4 Tag1, Tag12...
Solution with apply
:
N = 2
df['Tag'] = df['Tag'].apply(lambda x: ', '.join(x.split(',')[:N]) + '...'
if x.count(',') + 1 > N else x)
Or:
N = 2
df['Tag'] = df['Tag'].mask(df['Tag'].str.count(',').gt(N),
df['Tag'].apply(lambda x: ', '.join(x.split(',')[:N]) + '...'))