I have been working with a data frame in which data record have useful information in square brackets and non-useful information outside the square bracket.
Sample Data frame:
Record Data
1 Rohan is [age:10] with [height:130 cm].
2 Girish is [age:12] with [height:140 cm].
3 Both kids live in [location:Punjab] and [location:Delhi].
4 They love to play [Sport:Cricket] and [Sport:Football].
Expected Output:
Record Data
1 [age:10],[height:130 cm]
2 [age:12],[height:140 cm]
3 [location:Punjab],[location:Delhi]
4 [Sport:Cricket],[Sport:Football]
I have been trying this but cannot get the desired output.
df['b'] = df['Record'].str.findall('([[][a-z \s]+[]])', expand=False).str.strip()
print(df['b'])
That doesn't seems to work.
I am new with Python.
I believe you need for strings
findall
with join
:
df['b'] = df['Data'].str.findall('(\[.*?\])').str.join(', ')
print (df)
Record Data \
0 1 Rohan is [age:10] with [height:130 cm].
1 2 Girish is [age:12] with [height:140 cm].
2 3 Both kids live in [location:Punjab] and [Delhi].
3 4 They love to play [Sport:Cricket] and [Sport:F...
b
0 [age:10], [height:130 cm]
1 [age:12], [height:140 cm]
2 [location:Punjab], [Delhi]
3 [Sport:Cricket], [Sport:Football]
If need values in lists
:
df['b'] = df['Data'].str.findall('\[(.*?)\]')
print (df)
Record Data \
0 1 Rohan is [age:10] with [height:130 cm].
1 2 Girish is [age:12] with [height:140 cm].
2 3 Both kids live in [location:Punjab] and [Delhi].
3 4 They love to play [Sport:Cricket] and [Sport:F...
b
0 [age:10, height:130 cm]
1 [age:12, height:140 cm]
2 [location:Punjab, Delhi]
3 [Sport:Cricket, Sport:Football]