Search code examples
pythondataframetext

extract alphanumeric from text in dataframe in Python


I have a data frame that is called df and looks like this

Text                         No
c0404079=0.00                34
c1444716<=0.00               45
1c0<0226311 <= 0.00          36
c0001208 <= 0.00             32
0.00<c0243026<=2.00          85
c0036983 <= 0.00             55
c00369

74=0.00 39

I want to create a new column in that df that is called "Code"

this code can be the code in the first column which start with the letter c till the furst non alpha-numeric char or the end of the line

so the dataframe will be

c0404079=0.00                34            c0404079
c1444716<=0.00               45            c1444716
1.0<c00226311 <= 0.00        36            c00226311
c0001208 <= 0.00             32            c0001208
0.00<c0243026<=2.00          85            c0243026
c0036983 <= 0.00             55            c0036983
c0036974=0.00                39            c0036974

Any idea how to do that?

I tried this but I did not get the right results

df['Code'] = df['Text'].str.extract(r'c^(\d[^\W_]{5,})')

Solution

  • given your df here is how to get everything from the letter c, til the first non alphanumeric char:

    df['extracted'] = df['text'].str.extract(r'(c[^\W]+)')

                       text  extracted
    0        c1444716<=0.00   c1444716
    1  1.0c00226311 <= 0.00  c00226311
    2   0.00<c0243026<=2.00   c0243026