Search code examples
pythonregexstringfile-iotext-files

Is there any method to count different items in the text file for every matched string and store in dataframe?


The text file looks like

data/File_10265.data:

Apple:2kg

Apple:3kg

Banana:1kg

Banana:4kg

Some string1

data/File_10276.data:

Apple:6kg

Apple:5kg

Apple:3kg

Banana:2kg

Banana:4kg

Banana:2kg

Banana:4kg

Extra line

data/File_10278.data:

Apple:3kg

Banana:2kg

Banana:4kg

Banana:2kg

Banana:7kg

Some words

The code is as follows:

import re
import pandas as pd
f = open("Samplefruit.txt", "r")
lines = f.readlines()
Apple_count=0
Banana_count=0
File_count=0
Filename_list=[]
Apple_list=[]
Banana_list=[]
for line in lines:
    match1=re.findall('data/(?P<File>[^\/]+(?=\..*data))',line)    
    if match1:
        Filename_list.append(match1[0])
        print('Match found:',match1)           
    if line.startswith("Apple"):
        Apple_count+=1
    elif line.startswith("Banana"):
        Banana_count+=1
    Apple_list.append(Apple_count)
    Banana_list.append(Banana_count)
    df=pd.DataFrame({'Filename': Filename_list,'Apple': 
    Apple_list,'Banana': 
    Banana_list})

The desired output:

Filename: |Apple |Banana

File_10265|2 |2

File_10276|3 |4

File_10278|1 |4


Solution

  • Here, I have posted an answer. Thanks, @Mani,@CarySwoveland, @Zero, and @M B for your support. The code is as follows:

    import pandas as pd
    text = {}
    with open(r"Samplefruit.txt", "r") as file:
        for line in file:
            if "data" in line:
                Filename=line.split('/')[-1].split('.')[0]
                Apple_count=0
                Banana_count=0
                print('----------------')            
                print(Filename)                   
            elif ("Apple" in  line or "Banana" in  line):                                          
                if line.startswith("Apple"):            
                    Apple_count+=1
                elif line.startswith("Banana"):
                    Banana_count+=1
                print('Apple:',Apple_count)
                print('Banana:',Banana_count)
            text[Filename] = {'Apple':Apple_count,'Banana':Banana_count}
        File_list.append(Filename)
      
        df = pd.DataFrame(
            {"Filename": text.keys(), "Apple":  [x['Apple'] for x in text.values()],"Banana":  [x['Banana'] for x in text.values()]}
    
        )
        print(df)