Search code examples
pythonexcelsplitxlrdxlwt

Splitting a text document into an excel sheet xls


I am currently trying to export/convert a text document I have into an .xls file. So after from what i've found i was able to create an xls but now i just need to get the formatting correct in xls from the text document.

Heres an example of what I am trying to do.

Lets say I have the following text document: numbers.txt

|<DOg>|
    |Data1 = 300    |
    |Data2 = 200    |
    |Data3 = 15 |
    |Data4 = 14 |
    |Data5 = 4  |
|<DOg>|
    |Data1 = 800    |
    |Data2 = 500    |
    |Data3 = 25 |
    |Data4 = 10 |
    |Data5 = 5  |

if I run my code using | as the delimiter I receive this as the .xls file

excel

As you can see the formatting is off.

The goal am i trying to get is the following formatting instead.

excelFormatted

The current code I am using is the following:

mypath = raw_input("Please enter the directory path for the input files: ")

from os import listdir
from os.path import isfile, join
textfiles = [ join(mypath,f) for f in listdir(mypath) if isfile(join(mypath,f)) and '.txt' in  f]

def is_number(s):
    try:
        float(s)
        return True
    except ValueError:
        return False

import xlwt
import xlrd

style = xlwt.XFStyle()
style.num_format_str = '#,###0.00'

for textfile in textfiles:
    f = open(textfile, 'r+')
    row_list = []
    for row in f:
        row_list.append(row.split('|'))
    column_list = zip(*row_list)
    # for column_list in f:
    #     column_list.append(column.split('|'))
    workbook = xlwt.Workbook()
    worksheet = workbook.add_sheet('Sheet1')
    i = 0
    for column in column_list:
        for item in range(len(column)):
            value = column[item].strip()
            if is_number(value):
                worksheet.write(item, i, float(value), style=style)
            else:
                worksheet.write(item, i, value)
        i+=1
    workbook.save(textfile.replace('.txt', '.xls'))

My idea was using the .split() method for columns however I'm unsure how to implemented correctly since when I use split for columns every single line ends up being it's own column.


Solution

  • It looks like you have unlimited columns. You need to capture all the results in an array and transpose them as follows:

    import re
    
    # Strip all spaces and dump all data into an array
    lines = [mo for mo in re.findall('(?s)(?<=\|)([<\w].+?)\s+?\|', open('py.txt').read())]
    # Create an array to hold the transformation
    combined = ['' for x in range(len(lines) / lines.count("<DOg>|"))]
    # Append by rows
    for idx in range(len(lines)):
      combined[idx % len(combined)] += lines[idx] + ','
    
    # Write array to file
    output = open('numbersConverted.csv','w')
    for comb in combined:
      output.write(comb + "\n")
    output.close
    

    This will dump your results in a numbersConverted.csv ready for import.