Search code examples
pythontext-extraction

text extraction multiple variables using rsplit


I am trying to split the content of a string into separate strings in a list. for that I used the following code

A='27_D | 27_B & 52_E'
B = pd.DataFrame(columns=['PL_HFG'])
print(contains_all(A, '|&'))
C=A.replace('&', '').replace('|', '')
print(C)
D=C.rsplit(" ",1)
print(D)

The final printed output is ['27_D 27_B ', '52_E']. What I want to have is ['27_D, 27_B ', '52_E']. If you look closely you would see that 27_D and 27_B are separated in the wanted case. But in my case they are put together


Solution

  • This is a simple approach (no rsplit needed):

    a = '27_D | 27_B & 52_E'
    b = [i for i in a.split() if '_' in i]
    print(b)
    

    It splits the str by spaces, and then creates a list only with the elements that contain an underscore.

    Output:

    ['27_D', '27_B', '52_E']