python sql-server excel analytics transpose

Transpose multiple txt data files and import into excel or database

I have about 1000 txt files in the format below. The identifier is separated from the data by ':' . However the data it self can contains ':' I am working on a script that can allow me to transpose and import the data into an excel, csv, microsoft acess or mssql format. in a structured way. Given that the identifiers will be the same for all the txt files.

Author: sharkespare
Content-Length: 15200098
Content-Type: application/pdf
Creation-Date: 2015-02-05T07:27:34Z
Last-Modified: 2015-02-05T07:28:38Z
Last-Save-Date: 2015-02-05T07:28:38Z
created: Thu Feb 05 08:27:34 CET 2015
creator: Williams spear
date: 2015-02-05T07:27:34Z
dc:creator: Library of congress

Solution

What about:

Open txt file and split by ": " (with space after the colon), put it into a list.

text=[]
with open('read_file.txt','r') as f:
    for line in f:
        for word in line.split(": "):
            text.append(word)

Now every even element is an identifier and the next odd element ist the corresponding data. Rebuild the list putting every identifier and its data together:

sorted = [text[i:i+2] for i in range(0, len(text), 2)]

Finally write everything into a csv file in colums using zip:

import csv
with open('save_file.csv', 'w') as f:
    writer = csv.writer(f)
    writer.writerows(zip(*sorted))

If you don't want to transpose it, just replace zip(*sorted) with sorted. After saving the text from the first file, loop over the other ones and add only the data to the csv file. You can achieve that by opening with the 'a' parameter, allowing you to append stuff at the end of the file, and using [zip(*sorted)[1]]:

files = ["file1.txt", "file2.txt", "file3.txt", etc]
for each in  files:
    text=[]
    with open(str(each),'r') as f:
        for line in f:
            for word in line.split(": "):
                text.append(word)

    sorted = [lst[i:i+2] for i in range(0, len(lst), 2)]

    with open('save_file.csv', 'a') as f:
        writer = csv.writer(f)
        writer.writerows([zip(*sorted)[1]])