Search code examples
stringzapier

extracting a string between 2 strings


I'm fairly new to python and I'm trying to extract a string between 2 strings using code with zapier using python. example: dfsgsdfgsdfgsdfgsdfgsdfg Service: what i 'm trying to extract Customer Details: gfdgsdfgsdfgsdfgsdfg The input string is called 'description' and I'm trying to extract what's between the string 'Service:' and 'Customer Details:'

I've used the following code

import re
match = re.search(r'Service:(.*?)Customer Details:',input_data['description'])
return {'description': match}

which is successful while testing but returns 'description: null'

I've also tried with this code:

myString=input_data['description']
mySubstring=myString[myString.find("Service:")+8:myString.find("Customer Details:")-17]
return {mySubstring}

I get the error 'SyntaxError: invalid syntax (usercode.py, line 8)'

If someone could help me it would be deeply appreciated. Thanks!

UPDATE 1: Thanks Abion47 for your help. I have put the following code.

import re
input = input_data['description']
match = re.search(r'Service:(.*?)Customer Details:', input).group(1)
print match 

I got the error below:

Traceback (most recent call last):
File "/tmp/tmpmvAChp/usercode.py", line 10, in the_function match = re.search(r'Service:(.*?)Customer Details:', input).group(1)
AttributeError: 'NoneType' object has no attribute 'group'

UPDATE 2 the error above was due to the code not finding the string and thus returning something empty.

here is my input text, its coming from a google calendar event:

Appointment Details
Provider: John Smith 
Service: Adult Consultation
Customer Details:
Name: John Doe
Notes: Hi ghdfhdfg, dfghdfgg appointment I had for the 6th of January at 9.30 with this one. Is it possibile?
Status: Confirmed

with the code below I got it to work but I got null:

import re
name = input_data['description']
print name
try:
    try:
        name = re.search(r'(?s)(?<=Name:)(.*?)(?=Customer Details:)', input_data['description']).group(1).strip("\n\r ")
    except AttributeError:
        name = re.search(r'(?s)(?<=Name:)(.*?)(?=Customer Details:)', input_data['description']).group(1)
except AttributeError:
name = re.search(r'(?s)(?<=Name:)(.*?)(?=Customer Details:)', input_data['description'])
return { 'name': name }

but I got the following result, it doesn't;t find my string even though it's there!

name: null
runtime_meta
duration_ms: 0
memory_used_mb: 23
logs
    1. Appointment Details
    2. Provider: John Smith 
    3. Service: Adult Consultation
    4. Customer Details:
    5. Name: John Doe
    6. Notes: Hi ghdfhdfg, dfghdfgg appointment I had for the 6th of January at 9.30 with this one. Is it possibile?
    7. Status: Confirmed
id: vbgOSvUOsBO8tAuLjk4wP0JMsMWsL0WV

If someone knows what's wrong in the code, it would be really appreciated!

WORKING CODE

Thanks @abion47 for your help, the full working code is:

import re
name = input_data['description']
print name
myMatch = re.search(r'Service: (.*?)[\r\n]+Customer Details:', name).group(1)
print myMatch
return { 'myMatch': myMatch }

Solution

  • You can do this with Regex using the following commands in the shell:

    input = "dfsgsdfgsdfgsdfgsdfgsdfg Service: what i 'm trying to extract Customer Details: gfdgsdfgsdfgsdfgsdfg"
    match = re.search(r'Service:(.*?)Customer Details:', input).group(1)
    print match
    
    # Will print " what i 'm trying to extract "
    

    EDIT:

    This is why it's important to post a Minimal, Complete, and Verifiable Example in your question the first time. If we don't know the exact data you are operating on, then we have to make assumptions, which can easily be wrong and lead us to give you answers that you can't use. Now that you've provided us with the actual input data, I can tell you immediately why your approaches aren't working.

    Your substring approach (which I can only speculate about because you still haven't posted that full script so we can't know which is "line 8") is likely breaking because after you add 8 to the start index and subtract 17 from the end index, the end index becomes less than the start index, which is an error.

    Vicrobot's substring approach is inadequate because there are more things in your string that can start with "C" than just "Customer Details" and there are plenty of colons that it can match with other than the one it's trying to (but not in the sample string you gave us).

    Your and my regex approaches aren't working because your input string contains newlines which need to be taken into account, otherwise the regex patterns aren't going to match properly.

    This is how you can handle it in both cases:

    input = '''Appointment Details
    Provider: John Smith 
    Service: Adult Consultation
    Customer Details:
    Name: John Doe
    Notes: Hi ghdfhdfg, dfghdfgg appointment I had for the 6th of January at 9.30 with this one. Is it possibile?
    Status: Confirmed'''
    
    # Option 1: Substring
    
    mySubstring = input[ input.find('Service: ')+9 : input.find('\nCustomer Details:') ]
    print mySubstring
    
    # Option 2: Regex
    
    import re
    myMatch = re.search(r'Service: (.*?)[\r\n]+Customer Details:', input).group(1)
    print myMatch
    

    Working example on Repl.it

    Given the two options, I would go with the Regex approach. It's the standard way to do text parsing and is generally less error-prone. (It can also operate faster than a substring filter in many cases, of which I suspect this is one.)