Need to search a file for specific terms and output the second term into a csv with repetition - Python

I am trying to read a text file that uses ":" as the delimiter, find specific search terms in the first column, and output the second column to a .csv file.

The file I'm pulling from has multiple sections that look like this (showing 2 rows of many):

 Object : Info1
   Type : Info2
   LastChange : INFO3
   DeviceId : INFO4
EndObject

Object : Info5
   Type : Info6
   LastChange : INFO7
   DeviceId : INFO8
EndObject

and this repeats with the same first column (object, type..etc) but different Info#

I want to search through and pull the 'Info#' into a csv file to read out as: Info1,Info2,Info3,Info4 by searching the first column there (Object Type LastChange DeviceId)

So far I have gotten it to output object & type, but the for loop I have only does one iteration, my code so far:

import csv
import string
import pandas as pd


        

filename1 = 'test.txt'                      #EDIT THIS TO MATCH EXACTLY THE .DMP FILE YOU WISH TO READ!!

infile = open(filename1, 'r', errors = 'ignore')                    #this names the read file variable, !!DO NOT TOUCH!!             
lines = infile.readlines()

        
filename2 = 'test.csv'                    
outfile = open(filename2,'w')
headerList ="Type:Device:Name:Change\n".split(':')     
headerString = ','.join(headerList)
outfile.write(headerString)
for line in lines[1:]:
       sline = line.split(":")                    

       if  'Type' in sline[0]:
        dataList = sline[1:]                                  
        dataString = ','.join(dataList) 
        typestring1 = ','.join([x.strip() for x in dataString.split(",")])   

       if ' Object' in sline[0]:
        objectList = sline[1:]
        objectstring = ','.join(objectList)
        namestring1 = ','.join([x.strip()for x in objectstring.split(",")])
                   
writeString = (typestring1 + "," + namestring1+ ","+ "\n")
outfile.write(writeString)



outfile.close()
infile.close()

I'm new to python and any help would be greatly appreciated.

Solution

I looked for a parser for the format but couldn't find the format for the input you're showing. I'd take some time to make sure there isn't already a "Python parser for microcontroller memory DMP file". You know the context better than me, so maybe your search will be more fruitful.

In the meantime, given your sample, input.txt:


 Object : Info1
   Type : Info2
   LastChange : INFO3
   DeviceId : INFO4
EndObject




Object : Info5
   Type : Info6
   LastChange : INFO7
   DeviceId : INFO8
EndObject

Here' an end-to-end solution that can read that sample and convert each "block" of object data into a CSV row.

The big point to stress is breaking down these kinds of problems into as many discrete steps as possible, like below:

Filter the DMP file to make sure you have at least a colon (:) in the line to parse into a value (or more specifically, just Type :)
Parse the filtered lines and prove you have found all your blocks
Convert the lines in each block into a row (that you can pass to the csv module's writer class)
Write your rows as CSV

import csv
import pprint

filtered_lines = []
with open('input.txt') as f:
    for line in f:
        line = line.strip()
        if line.startswith('Object') or line == 'EndObject':
            filtered_lines.append(line)
            continue
    
        # Keep only Type
        if line.startswith('Type :'):
            filtered_lines.append(line)
            continue
    
        # or, keep any line with a color
        # if ':' in line:
        #     filtered_lines.append(line)
        #     continue

        # at this point, no predicate has been satisfied, drop line
        pass  # redundant, but poignant and satisfying :)


all_blocks = []
this_block = None
in_block = False
for line in filtered_lines:
    # Find the start of a "block" of data
    if line.startswith('Object'):
        in_block = True
        this_block = []

    # Find the end of block... 
    if line == 'EndObject':
        # save it
        all_blocks.append(this_block)

        # reset for next block
        this_block = None
        in_block = False

    if in_block:
        this_block.append(line)

print('Blocks:')
pprint.pprint(all_blocks)

# Convert a list of blocks to a list of rows
all_rows = []
for block in all_blocks:
    row = []

    # Convert a list of lines (key : value) to a "row", a list of single-value strings
    for line in block:
        _, value = line.split(':')
        row.append(value.strip())
    
    all_rows.append(row)

print('Rows:')
pprint.pprint(all_rows)

# Finally, save as CSV
with open('output.csv', 'w', newline='') as f:
    writer = csv.writer(f)
    writer.writerows(all_rows)

When I run that against that input, I get:

Blocks:
[['Object : Info1', 'Type : Info2'], ['Object : Info5', 'Type : Info6']]
Rows:
[['Info1', 'Info2'], ['Info5', 'Info6']]

and the final, output.csv:

Info1,Info2
Info5,Info6