Search code examples
pythonpandaslistsplitcalculated-columns

How do i extract numbers only from a list of string and create a columns in dataframe based on output?


I have a list of strings, given below from which i want to extract only numbers, and then i want to create a column based on output.

['CGST- INPUT 9%  MAHARASHTRA',
 'SGST-INPUT 9%  MAHARASHTRA',
 'CGST INPUT @6% MAHARASHTRA',
 'SGST INPUT @6% MAHARASHTRA',
 'CGST- INPUT 2.50%  MAHARASHTRA',
 'SGST-INPUT 2.50%  MAHARASHTRA',
 'TDS ON OFFICE RENT',
 'TDS ON CONTRACTOR',
 'TDS ON CONSULTANTS',
 'TDS ON OFFICE RENT (COMPANY)',
 'TDS ON CONSULTANY FEE']

Output should be as belows

Rate    CGST      SGST     TDS
 9       XX        XX      XX
 6       XX        XX      XX
2.50     XX        XX      XX

I have few columns in a Dataframe which i have converted to list above. There are values in each column which i want to sum and show them saperatly as per the rate mentioned in each list item.


Solution

  • A regular expression that will identify numbers in a string (including those with decimal fractions) is:

    r'[-+]?[0-9]*\.?[0-9]+'
    

    So, for example :

    import re
    mystring = 'abc50def6.75ghi'
    pattern = r'[-+]?[0-9]*\.?[0-9]+'
    print(list(map(float, re.findall(pattern, mystring))))
    

    Output:

    [50.0, 6.75]
    

    Having extracted your numbers you can then use these values to build your Dataframe