Search code examples
pythonpandassubstringslice

Regular expression in Pandas: Get substring between a space and a colon


I have a Pandas dataframe with the column store. It contains a list of stores that look like this:

H-E-B 721:1101 W STAN SCHLUETER LOOP,KILLEEN,TX
H-E-B PLUS 39:2509 N MAIN ST,BELTON,TX

I want the store number, which are 721 and 39 in the above examples.

Here is my process for getting it:

  1. Find the position of the colon.
  2. Slice backwards until reaching a space.

How do I do this in Python/Pandas? I'm guessing that I need to use regex, but I have no idea how to start.


Solution

  • You can use str.extract with the (\d+): regex:

    df['number'] = df['store'].str.extract('(\d+):', expand=False).astype(int)
    

    Output:

                                                 store  number
    0  H-E-B 721:1101 W STAN SCHLUETER LOOP,KILLEEN,TX     721
    1           H-E-B PLUS 39:2509 N MAIN ST,BELTON,TX      39
    

    regex demo