Search code examples
pythonregexstringstartswithends-with

How to extract some piece of word from a string?


I have the following string in python:

datastring = """
Animals {
    idAnimal
    nameAnimal
    animalko5854hg[name="Jazz"]
    animal6ljkjh[name="Pinky"]
    animal595s422d1252g55[name="Steven"]
    animalko5854hg[name="David"]
}
"""

print(type(datastring))#->str

My string is data than a read before from a file text, now I have that data in datastring. In datastring always in the fourth line, the data is showed in the next way: animalidAnimal[name="nameAnimal"

So I would like to code a function that takes as a parameter a string like above, and return the part of idAnimal of the first line that starts in the following way: animalidAnimal[name="nameAnimal" So for example in the first string my expected output would be:

ko5854hg

Other example:

datastring = """
Animals {
    idAnimal
    nameAnimal
    animal456jlk165ut[name="Dalty"]
    animal6ljkj[name="Moon"]

}

Expected output:

456jlk165ut

Last example:

datastring = """
Animals {
    idAnimal
    nameAnimal
    animalk45lil69lhfr5942lk[name="Jazz"]
    animal6ljkjh[name="Pinky"]
    animal595s422d1252g55[name="Steven"]
    animalko5854hg[name="David"]
    animalko5854hg[name="Oty"]
    animalko5854hg[name="Dan"]
}

Expected output:

k45lil69lhfr5942lk

I don´t want to be considered as a lazy person, but I don´t really know how to start coding that, I read about startswith and endswith functions, but those only return True/False values.

Thanks.


Solution

  • You can start the match with { and use a capture group for the animalId:

    {[^{}]*?\banimal(\w+)\[name="[^\s"*]*"]
    

    The pattern matches:

    • { Match a { char
    • [^{}]*? Match any character except { and } as few as possible
    • \banimal Match animal with a leading word boundary
    • (\w+) Capture group 1, match 1+ word characters
    • \[name="[^\s"*]*"] Match the `[name="...."]

    Regex demo

    Example code

    import re
    
    pattern = r"{[^{}]*?\banimal(\w+)\[name=\"[^\s\"*]*\"]"
    
    s = ("Animals {\n"
                "    idAnimal\n"
                "    nameAnimal\n"
                "    animal456jlk165ut[name=\"Dalty\"]\n"
                "    animal6ljkj[name=\"Moon\"]\n\n"
                "}")
    
    m = re.search(pattern, s)
    if m:
        print(m.group(1))
    

    Output

    456jlk165ut