Search code examples
pythonstringlistsubstringextract

How to extract a part of the string in python?


I have the following list:

lst = ['SDO_GEOMETRY(2001, NULL, MDSYS.SDO_POINT_TYPE(9971, 18847, NULL), NULL, NULL)', 
'SDO_GEOMETRY(2001, NULL, MDSYS.SDO_POINT_TYPE(9971, 19188, NULL), NULL, NULL)',
'SDO_GEOMETRY(2001, NULL, MDSYS.SDO_POINT_TYPE(9972, 18282, NULL), NULL, NULL)',
'SDO_GEOMETRY(2001, NULL, MDSYS.SDO_POINT_TYPE(9977, 19201, NULL), NULL, NULL)',
'SDO_GEOMETRY(2001, NULL, MDSYS.SDO_POINT_TYPE(9989, 18635, NULL), NULL, NULL)']

I would like to extract only the string that contains the number in brackets after MDSYS.SDO_POINT_TYPE. How do I do that?

What I tried so far?

op=[]
for i in lst:
    x = (i[46:56])
    y = str('('+x+')')
    op.append(y)

But, the numbers are not always in position 46-56, how do I optimize that?

Desired output:

['(9971, 1884)',
 '(9971, 1918)',
 '(9972, 1828)',
 '(9977, 1920)',
 '(9989, 1863)']

Solution

  • If the numbers between the parenthesis and the NULL can be at different positions, you can use a pattern to first get the values between parenthesis in a capture group.

    Then you can find the digits in the group 1 value.

    \bMDSYS\.SDO_POINT_TYPE\(([^()]+)\)
    
    • \bMDSYS\.SDO_POINT_TYPE\( match MDSYS\.SDO_POINT_TYPE(
    • ([^()]+) Capture all between parenthesis in group 1
    • \) Match closing )

    See a Python demo ad a Regex demo

    Note that in desired output the last digit is missing for the second value.

    import re
    
    lst = ['SDO_GEOMETRY(2001, NULL, MDSYS.SDO_POINT_TYPE(9971, 18847, NULL), NULL, NULL)',
           'SDO_GEOMETRY(2001, NULL, MDSYS.SDO_POINT_TYPE(9971, 19188, NULL), NULL, NULL)',
           'SDO_GEOMETRY(2001, NULL, MDSYS.SDO_POINT_TYPE(9972, 18282, NULL), NULL, NULL)',
           'SDO_GEOMETRY(2001, NULL, MDSYS.SDO_POINT_TYPE(9977, 19201, NULL), NULL, NULL)',
           'SDO_GEOMETRY(2001, NULL, MDSYS.SDO_POINT_TYPE(9989, 18635, NULL), NULL, NULL)']
    
    op = []
    for s in lst:
        m = re.search(r"\bMDSYS\.SDO_POINT_TYPE\(([^()]+)\)", s)
        if m:
            op.append("({})".format(", ".join(re.findall(r"\d+", m.group(1)))))
    
    print(op)
    

    Output

    ['(9971, 18847)', '(9971, 19188)', '(9972, 18282)', '(9977, 19201)', '(9989, 18635)']