Search code examples
pythonregexdataframedata-sciencedata-processing

To have data only in square bracket in a data frame using a Python (regex)


I have been working with a data frame in which data record have useful information in square brackets and non-useful information outside the square bracket.

Sample Data frame:

 Record        Data
      1          Rohan is [age:10] with [height:130 cm].
      2          Girish is [age:12] with [height:140 cm].
      3          Both kids live in [location:Punjab] and [location:Delhi].
      4          They love to play [Sport:Cricket] and [Sport:Football].

Expected Output:

 Record        Data
      1          [age:10],[height:130 cm]
      2          [age:12],[height:140 cm]
      3          [location:Punjab],[location:Delhi]
      4          [Sport:Cricket],[Sport:Football]

I have been trying this but cannot get the desired output.

df['b'] = df['Record'].str.findall('([[][a-z \s]+[]])', expand=False).str.strip()
print(df['b'])

That doesn't seems to work.

I am new with Python.


Solution

  • I believe you need for strings findall with join:

    df['b'] = df['Data'].str.findall('(\[.*?\])').str.join(', ')
    print (df)
    
       Record                                               Data  \
    0       1            Rohan is [age:10] with [height:130 cm].   
    1       2           Girish is [age:12] with [height:140 cm].   
    2       3   Both kids live in [location:Punjab] and [Delhi].   
    3       4  They love to play [Sport:Cricket] and [Sport:F...   
    
                                       b  
    0          [age:10], [height:130 cm]  
    1          [age:12], [height:140 cm]  
    2         [location:Punjab], [Delhi]  
    3  [Sport:Cricket], [Sport:Football] 
    

    If need values in lists:

    df['b'] = df['Data'].str.findall('\[(.*?)\]')
    print (df)
    
       Record                                               Data  \
    0       1            Rohan is [age:10] with [height:130 cm].   
    1       2           Girish is [age:12] with [height:140 cm].   
    2       3   Both kids live in [location:Punjab] and [Delhi].   
    3       4  They love to play [Sport:Cricket] and [Sport:F...   
    
                                     b  
    0          [age:10, height:130 cm]  
    1          [age:12, height:140 cm]  
    2         [location:Punjab, Delhi]  
    3  [Sport:Cricket, Sport:Football]