Search code examples
pythondatabasepython-requestsextractreadlines

How can I extract certain portions of all lines in a text file?


I have a text file that contains several lines of data but I only need to know a little portion of a line. From the initial file, I can narrow it down to the line that contains the information I need but I am unsure how to extract only the necessary data: the contents of mfgcode, modelno and qtyavail.

import csv

with open('file.csv', 'r') as csv_file:
    csv_reader = csv.reader(csv_file)

    for line in csv_reader:
        print(line)

The results text file is printed and the response is:

['<part branch="1" core="0.00" cost="10.39" deliverytime="1" desc="" errcode="success" kit="" linecode="brand" linenum="1" list="30.08" mfgcode="nike" modelno="1110" qtyavail="40" qtyreq="1" uom="" />']
['<part branch="1" core="0.00" cost="10.66" deliverytime="1" desc="" errcode="success" kit="" linecode="brand" linenum="1" list="30.48" mfgcode="adidas" modelno="1109" qtyavail="209" qtyreq="1" uom="" />']
['<part branch="1" core="0.00" cost="20.17" deliverytime="1" desc="" errcode="success" kit="" linecode="brand" linenum="1" list="30.24" mfgcode="puma" modelno="1108" qtyavail="2" qtyreq="1" uom="" />']

How can I only extract the values of mfgcode, modelno and qtyavail?


Solution

  • Try this:

    import csv
    import re
    
    with open('file.csv', 'r') as csv_file:
        csv_reader = csv.reader(csv_file)
    
    ff = []
    for line in csv_reader:
           ff.append([re.search('mfgcode="(.+?)"', line[0] ).group(1),re.search('modelno="(.+?)"', line[0] ).group(1),re.search('qtyavail="(.+?)"', line[0] ).group(1)])
            
    df = pd.DataFrame(ff,columns =['mfgcode','modelno','qtyavail'])
    df.to_csv("test.csv",index=False)
    print (df)    
    

    Output:

      mfgcode modelno qtyavail
    0    nike    1110       40
    1  adidas    1109      209
    2    puma    1108        2