Search code examples
pythoncsvooptext

Need to search a file for specific terms and output the second term into a csv with repetition - Python


I am trying to read a text file that uses ":" as the delimiter, find specific search terms in the first column, and output the second column to a .csv file.

The file I'm pulling from has multiple sections that look like this (showing 2 rows of many):

 Object : Info1
   Type : Info2
   LastChange : INFO3
   DeviceId : INFO4
EndObject

Object : Info5
   Type : Info6
   LastChange : INFO7
   DeviceId : INFO8
EndObject

and this repeats with the same first column (object, type..etc) but different Info#

I want to search through and pull the 'Info#' into a csv file to read out as: Info1,Info2,Info3,Info4 by searching the first column there (Object Type LastChange DeviceId)

So far I have gotten it to output object & type, but the for loop I have only does one iteration, my code so far:

import csv
import string
import pandas as pd


        

filename1 = 'test.txt'                      #EDIT THIS TO MATCH EXACTLY THE .DMP FILE YOU WISH TO READ!!

infile = open(filename1, 'r', errors = 'ignore')                    #this names the read file variable, !!DO NOT TOUCH!!             
lines = infile.readlines()

        
filename2 = 'test.csv'                    
outfile = open(filename2,'w')
headerList ="Type:Device:Name:Change\n".split(':')     
headerString = ','.join(headerList)
outfile.write(headerString)
for line in lines[1:]:
       sline = line.split(":")                    

       if  'Type' in sline[0]:
        dataList = sline[1:]                                  
        dataString = ','.join(dataList) 
        typestring1 = ','.join([x.strip() for x in dataString.split(",")])   

       if ' Object' in sline[0]:
        objectList = sline[1:]
        objectstring = ','.join(objectList)
        namestring1 = ','.join([x.strip()for x in objectstring.split(",")])
                   
writeString = (typestring1 + "," + namestring1+ ","+ "\n")
outfile.write(writeString)



outfile.close()
infile.close()

I'm new to python and any help would be greatly appreciated.


Solution

  • I looked for a parser for the format but couldn't find the format for the input you're showing. I'd take some time to make sure there isn't already a "Python parser for microcontroller memory DMP file". You know the context better than me, so maybe your search will be more fruitful.

    In the meantime, given your sample, input.txt:

    
     Object : Info1
       Type : Info2
       LastChange : INFO3
       DeviceId : INFO4
    EndObject
    
    
    
    
    Object : Info5
       Type : Info6
       LastChange : INFO7
       DeviceId : INFO8
    EndObject
    

    Here' an end-to-end solution that can read that sample and convert each "block" of object data into a CSV row.

    The big point to stress is breaking down these kinds of problems into as many discrete steps as possible, like below:

    1. Filter the DMP file to make sure you have at least a colon (:) in the line to parse into a value (or more specifically, just Type :)
    2. Parse the filtered lines and prove you have found all your blocks
    3. Convert the lines in each block into a row (that you can pass to the csv module's writer class)
    4. Write your rows as CSV
    import csv
    import pprint
    
    filtered_lines = []
    with open('input.txt') as f:
        for line in f:
            line = line.strip()
            if line.startswith('Object') or line == 'EndObject':
                filtered_lines.append(line)
                continue
        
            # Keep only Type
            if line.startswith('Type :'):
                filtered_lines.append(line)
                continue
        
            # or, keep any line with a color
            # if ':' in line:
            #     filtered_lines.append(line)
            #     continue
    
            # at this point, no predicate has been satisfied, drop line
            pass  # redundant, but poignant and satisfying :)
    
    
    all_blocks = []
    this_block = None
    in_block = False
    for line in filtered_lines:
        # Find the start of a "block" of data
        if line.startswith('Object'):
            in_block = True
            this_block = []
    
        # Find the end of block... 
        if line == 'EndObject':
            # save it
            all_blocks.append(this_block)
    
            # reset for next block
            this_block = None
            in_block = False
    
        if in_block:
            this_block.append(line)
    
    print('Blocks:')
    pprint.pprint(all_blocks)
    
    # Convert a list of blocks to a list of rows
    all_rows = []
    for block in all_blocks:
        row = []
    
        # Convert a list of lines (key : value) to a "row", a list of single-value strings
        for line in block:
            _, value = line.split(':')
            row.append(value.strip())
        
        all_rows.append(row)
    
    print('Rows:')
    pprint.pprint(all_rows)
    
    # Finally, save as CSV
    with open('output.csv', 'w', newline='') as f:
        writer = csv.writer(f)
        writer.writerows(all_rows)
    

    When I run that against that input, I get:

    Blocks:
    [['Object : Info1', 'Type : Info2'], ['Object : Info5', 'Type : Info6']]
    Rows:
    [['Info1', 'Info2'], ['Info5', 'Info6']]
    

    and the final, output.csv:

    Info1,Info2
    Info5,Info6