Pandas: how to split or extract multiple values from column

This is my first Stack post and I'm new in the worlds of Python and Pandas. I'm trying to get 3 separate values from a column. I tried both .str.split and .str.extract but can't get the result. The df I have:

     wow
0    1000100011-DT000111-1111 Hellostreet 45  Town
1    1000100012  DT000122-1222-Hellostrasse 56 Place
2    1000100013-DT000111-1133
3    1000106789 DT000111-1144 Street 45
4    DT000111-1441 Hellostreet 100

The result I want:

     number1      number2          street3
0    1000100011   DT000111-1111    Hellostreet 45  Town
1    1000100012   DT000122-1222    Hellostrasse 56 Place
2    1000100013   DT000111-1133    None
3    1000106789   DT000111-1144    Street 45
4    None         DT000111-1441    Hellostreet 100

[number1] - numbers are random but always 10 digits [number2] - numbers are random but always 6 first '-' and then 4 digits. Characters are always 'DT'[street3] can consist of a street name, number and/or town. Those do not have to be separated.

Thank you in advance!

Solution

To solve it, you can use the .str.extract method, splitting the string into three groups using regular expressions.

import pandas as pd


df = pd.DataFrame({'wow': ['1000100011-DT000111-1111 Hellostreet 45  Town',
                           '1000100012  DT000122-1222-Hellostrasse 56 Place',
                           '1000100013-DT000111-1133',
                           '1000106789 DT000111-1144 Street 45',
                           'DT000111-1441 Hellostreet 100']})

df = df['wow'].str.extract(
    r"(?P<number1>\d{0,10}).*?(?P<number2>DT\d{6}-\d{4}).*?(?P<street3>\w.*|$)")
df.replace('', None, inplace=True)
print(df)

      number1        number2                street3
0  1000100011  DT000111-1111   Hellostreet 45  Town
1  1000100012  DT000122-1222  Hellostrasse 56 Place
2  1000100013  DT000111-1133                   None
3  1000106789  DT000111-1144              Street 45
4        None  DT000111-1441        Hellostreet 100

"
(?P<number1>\d{0,10}).*?(?P<number2>DT\d{6}-\d{4}).*?(?P<street3>\w.*|$)
"
g
Named Capture Group number1 (?P<number1>\d{0,10})
\d matches a digit (equivalent to [0-9])
{0,10} matches the previous token between 0 and 10 times, as many times as possible, giving back as needed (greedy)
. matches any character (except for line terminators)
*? matches the previous token between zero and unlimited times, as few times as possible, expanding as needed (lazy)
Named Capture Group number2 (?P<number2>DT\d{6}-\d{4})
DT matches the characters DT literally (case sensitive)
\d matches a digit (equivalent to [0-9])
{6} matches the previous token exactly 6 times
- matches the character - with index 4510 (2D16 or 558) literally (case sensitive)
\d matches a digit (equivalent to [0-9])
{4} matches the previous token exactly 4 times
. matches any character (except for line terminators)
*? matches the previous token between zero and unlimited times, as few times as possible, expanding as needed (lazy)
Named Capture Group street3 (?P<street3>\w.*|$)
1st Alternative \w.*
\w matches any word character (equivalent to [a-zA-Z0-9_])
. matches any character (except for line terminators)
* matches the previous token between zero and unlimited times, as many times as possible, giving back as needed (greedy)
2nd Alternative $
$ asserts position at the end of the string, or before the line terminator right at the end of the string (if any)
Global pattern flags 
g modifier: global. All matches (don't return after first match)