Search code examples
pythontextextract

Extracting specific text from txt file in python


I've recently picked up python to do some text extracting. I have a data set that looks like this:

    @article{noauthor_collective_nodate,
    title = {Collective teacher efficacy},
    abstract = {Overview Influence: Collective teacher efficacy Domain: Teacher Sub-Domain: Teacher attributes Potential to Accelerate Student Achievement: Potential to considerably accelerate Influence Definition: The shared belief by a group of teachers in a particular educational environment that they have the skills to positively impact student outcomes. Evidence Number of meta-analyses: 2 Number of studies: 61 Number of students: 3,489 Number of effects: 61 Effect size: 1.39},
@article{noauthor_collective_nodate,
    title = {Collective teacher efficacy},
    abstract = {Overview Influence: Collective teacher efficacy Domain: Teacher Sub-Domain: Teacher attributes Potential to Accelerate Student Achievement: Potential to considerably accelerate Influence Definition: The shared belief by a group of teachers in a particular educational environment that they have the skills to positively impact student outcomes. Evidence Number of meta-analyses: 2 Number of studies: 61 Number of students: 3,489 Number of effects: 61 Effect size: 1.39},
}

@article{noauthor_initial_nodate,
    title = {Initial teacher education programs},
    abstract = {Overview Influence: Initial teacher education programs Domain: Teacher Sub-Domain: Teacher Education Potential to Accelerate Student Achievement: Likely to have small positive impact Influence Definition: Initial teacher education or {ITEs} (sometimes at the undergraduate level and sometimes at the post-graduate level) is the entry-level qualification for teaching in numerous countries, including the United States. More recently, there are school-based {ITEs}, non-accredited {ITEs}, and many online {ITE} programs. Evidence Number of meta-analyses: 5 Number of studies: 117 Number of students: 106,016 Number of effects: 509 Effect size: 0.10},
}

@article{noauthor_professional_nodate,
    title = {Professional development programs},
    abstract = {Overview Influence: Professional development programs Domain: Teacher Sub-Domain: Teacher Education Potential to Accelerate Student Achievement: Likely to have positive impact Influence Definition: Professional development relates to courses or interventions aimed to enhance the beliefs, actions, impact of knowledge of teachers and school leaders. Evidence Number of meta-analyses: 21 Number of studies: 1,151 Number of students: 2,321,242 Number of effects: 2,938 Effect size: 0.37},
    keywords = {Program Development},
}

And I want to extract the title and part of the abstract from this text. I managed to extract my desired output by using this code:

s = "@article{noauthor_collective_nodate, title = {Collective teacher efficacy}, abstract = {Overview Influence: Collective teacher efficacy Domain: Teacher Sub-Domain: Teacher attributes Potential to Accelerate Student Achievement: Potential to considerably accelerate Influence Definition: The shared belief by a group of teachers in a particular educational environment that they have the skills to positively impact student outcomes. Evidence Number of meta-analyses: 2 Number of studies: 61 Number of students: 3,489 Number of effects: 61 Effect size: 1.39},}@article{noauthor_collective_nodate, title = {Collective teacher efficacy}, abstract = {Overview Influence: Collective teacher efficacy Domain: Teacher Sub-Domain: Teacher attributes Potential to Accelerate Student Achievement: Potential to considerably accelerate Influence Definition: The shared belief by a group of teachers in a particular educational environment that they have the skills to positively impact student outcomes. Evidence Number of meta-analyses: 2 Number of studies: 61 Number of students: 3,489 Number of effects: 61 Effect size: 1.39},}"


start = s.find("title = {") + len("title = {")
end = s.find("}, abstract")

start2 = s.find("Influence Definition: ") + len("Influence Definition: ")
end2 = s.find("Evidence Number of meta-analyses:")

substring = s[start:end]
substring2 = s[start2:end2]
print(substring+' - '+substring2+";")

Output:

Collective teacher efficacy - The shared belief by a group of teachers in a particular educational environment that they have the skills to positively impact student outcomes. ;

The problem is:

  • That this only takes out the first search result
  • I want to be able to run it on the original text file instead of copy it in as "s".

Can someone please lend out a helping hand?


Solution

  • This should do it:

    with open("myfile.txt", "r") as f:
        s = f.readlines()
        for x in s:
            if x.__contains__("title"):
                start = x.find("title = {") + len("title = {")
                end = x.find("}")
                substring = x[start:end] + " - "
            if x.__contains__("Influence Definition"):
                start = x.find("Influence Definition: ") + len("Influence Definition: ")
                end = x.find("Evidence Number of meta-analyses:")
                substring += x[start:end]
                print(substring)
                print()
        f.close()
    

    For example, if your file is called myfile.txt, this will print the following:

    Collective teacher efficacy - The shared belief by a group of teachers in a particular educational environment that they have the skills to positively impact student outcomes.

    Collective teacher efficacy - The shared belief by a group of teachers in a particular educational environment that they have the skills to positively impact student outcomes.

    Initial teacher education programs - Initial teacher education or {ITEs} (sometimes at the undergraduate level and sometimes at the post-graduate level) is the entry-level qualification for teaching in numerous countries, including the United States. More recently, there are school-based {ITEs}, non-accredited {ITEs}, and many online {ITE} programs.

    Professional development programs - Professional development relates to courses or interventions aimed to enhance the beliefs, actions, impact of knowledge of teachers and school leaders.