Search code examples
pythonwikipediafindall

find a some text in string bettwen some specific characters


Hi I can select and put to list some specific text from string for example, I have a string

"== example ==
Random text here
=== example2 ==="

I need select a text on example place and bellow an example but of another = stop example text put into a list together with text bellow but no exercise 2 only exercise without "==" and text below this list: I will try it with this:

import wikipedia
page = wikipedia.page("Albert Einstein")
text = page.content

lst = []
l = []
n = 1
for pos,char in enumerate(text):
    try:
        if(char == "="):
            lst.append(pos)
            if lst[n+1] == lst[n+2] +1:
                    print(text[lst[n+1]:lst[n+2] +1])
                    l.append(text[lst[n]:lst[n+1] +1])
                    n =+ 1
            else:
                continue
    except IndexError:
        continue

expected output: ["Life and career", "Albert Einstein was born in Ulm(text bellow headers")


Solution

  • I can understand that you want to extract strings which are present in between == someString ==, which essentially are headers of wikipedia page that you are searching for.

    For these types of requirement, regex is what you need and not manual string index searching. Would suggest you to read about regex

    Here is the code for your use case

    import wikipedia
    import re
    page = wikipedia.page("Albert Einstein")
    text = page.content
    regex_result = re.findall("==\s(.+?)\s==", text)
    print(regex_result)
    

    regex_result is a list containing strings
    Snapshot of regex_result
    enter image description here