Search code examples
pythonstringreplacesplitparagraph

Splitting paragraphs using python


How can I separate a full paragraph in orderly manner? For example: Right below is a string

"'Objective: To estimate the Prevalence of post-traumatic stress disorders (PTSD) among adults in field practise areas of Government Medical College, Srinagar, India. Methodology: The present study was cross-sectional in nature and was conducted in field practice areas of Government Medical College Srinagar. Three blocks of field practise areas of Government Medical College, Srinagar comprising of various villages were selected. Further 10 per cent of these villages were selected by the method of randomization sampling and then 10 per cent of household were taken again by systemic random sampling. In the selected household all adult population (18 years and above) were selected and screened by using General health questionnaires(GHQ). The patients who screened positive for PTSD (post-traumatic stress Disorders) were assessed and diagnosed. From the line listing the positive cases, the prevalence rates were calculated. Results: Of the total 3400 subjects (age>/=18 years), the prevalence of posttraumatic stress disorders among general population was found to be 3.76%. Prevalence was found to be more in females (Chi-square test=2.086, p>0.05 (Insignificant). Most of cases were found to be in the age group 0-40 years. Most of the cases were unmarried, illiterate and belong to lower socioeconomic class. Death of near one comprised the major traumatic event. Acute onset Posttraumatic stress disorder was the commonest type, previous history of psychiatric illness was found in 12 % of patients and drug abuse was present in 22.6%. Conclusion: Our findings clearly indicates that posttraumatic stress disorders (PTSD) is a prevalent disorder in the developing world, especially in disaster prone regions and in areas of political unrest. Resilience to various traumatic events in Kashmir has developed over the years and this might explains the lower prevalence of Post-traumatic disorder (PTSD) in our study.'"

Using python, I want to achieve the result above into few paragraphs like these...

"'Objective: To estimate the Prevalence of post-traumatic stress disorders (PTSD) among adults in field practise areas of Government Medical College, Srinagar, India.

Methodology: The present study was cross-sectional in nature and was conducted in field practice areas of Government Medical College Srinagar. Three blocks of field practise areas of Government Medical College, Srinagar comprising of various villages were selected. Further 10 per cent of these villages were selected by the method of randomization sampling and then 10 per cent of household were taken again by systemic random sampling. In the selected household all adult population (18 years and above) were selected and screened by using General health questionnaires(GHQ). The patients who screened positive for PTSD (post-traumatic stress Disorders) were assessed and diagnosed. From the line listing the positive cases, the prevalence rates were calculated.

Results: Of the total 3400 subjects (age>/=18 years), the prevalence of posttraumatic stress disorders among general population was found to be 3.76%. Prevalence was found to be more in females (Chi-square test=2.086, p>0.05 (Insignificant). Most of cases were found to be in the age group 0-40 years. Most of the cases were unmarried, illiterate and belong to lower socioeconomic class. Death of near one comprised the major traumatic event. Acute onset Posttraumatic stress disorder was the commonest type, previous history of psychiatric illness was found in 12 % of patients and drug abuse was present in 22.6%.

Conclusion: Our findings clearly indicates that posttraumatic stress disorders (PTSD) is a prevalent disorder in the developing world, especially in disaster prone regions and in areas of political unrest. Resilience to various traumatic events in Kashmir has developed over the years and this might explains the lower prevalence of Post-traumatic disorder (PTSD) in our study.'"

Lastly, I want to store each paragraphs into a string with obj, method, result and conclusion. How can I do that?

This is the code I used:

   content = repr(content).replace(".", ".\n")

But with these, the percentage in the text such as 22.6% will be split into another line.

Edited: What if the string belongs to an object in a list?

content = record.get("AB")

content = re.split(r"\B\s(?=[^\s:]+:)", content)

Is it working?


Solution

  • You could split on whitespace that follows a non-word character (e. g. punctuation) and is followed by a single word, followed by a colon:

    obj, method, result, conclusion = re.split(r"\B\s(?=[^\s:]+:)", subject)
    

    That will work if there are exactly four substrings that obey these rules.

    However, it appears that a more specific approach might be better:

    >>> regex = re.compile(r"""Objective:\s(.*?)Methodology:\s(.*?)
    ...                        Results:\s(.*?)Conclusion:\s(.*)""", re.S|re.X)
    >>> obj, method, result, conclusion = regex.match(subject).groups()
    

    (where subject contains your input string).