Search code examples
pythontext-filesdata-analysis

How to extract a sum data from a text file on Python


I have a text file txt that has 6 columns: 1.sex (M /F) 2.age 3.height 4.weight 5.-/+ 6.zip code

I need to find from this text how many Males have - sign. ( for example: from the txt 30 M(Males) are - )

So I need only the number at the end.

Logically I need to work with Column1 and column 5 but I am struggling to get only one (sum) number at the end.

This is the content of the text:

M 87  66 133 - 33634
M 17  77 119 - 33625
M 63  57 230 - 33603
F 55  50 249 - 33646
M 45  51 204 - 33675
M 58  49 145 - 33629
F 84  70 215 - 33606
M 50  69 184 - 33647
M 83  60 178 - 33611
M 42  66 262 - 33682
M 33  75 176 + 33634
M 27  48 132 - 33607

I am getting the result now..., but I want both M and positive. How can I add that to occurrences??

f=open('corona.txt','r')
data=f.read()
occurrences=data.count('M')
print('Number of Males that have been tested positive:',occurrences)

Solution

  • If you do any significant amount of work with text and columnar data, I would suggest getting started on learning pandas

    For this task, if your csv is one record per line and is space-delimited:

    import pandas as pd
    d = pd.read_csv('data.txt', 
            names=['Sex', 'Age', 'Height', 'Weight', 'Sign', 'ZIP'], 
            sep=' ', index_col=False)
    
    d[(d.Sex=='M') & (d.Sign=='-')].shape[0] # or
    len(d[(d.Sex=='M') & (d.Sign=='-')]) # same result, in this case = 9
    

    Pandas is a very extensive package. What this code does is build a DataFrame from your csv data, giving each column a name. Then selects from this, each row where both of your conditions Sex == 'M' and Sign == '-', and reports the number of records thus found.

    I recommend starting here