Search code examples
pythontextencodingxls

Python encoding textfile, open it, replace multiple sections and output without empty lines as text formatted in .csv style


What I have is a file "test.xls" which is basically a old xls (xml formatting) which looks like this in notepad:

<table cellspacing="1" rules="all" border="1">
    <tr>
        <td>Row A</td><td>Row B</td><td>Row C</td>
    </tr>
    <tr>
        <td>New York</td><td>23</td><td>warm</td>
    </tr>
    <tr>
        <td>San Francisco</td><td>40</td><td>hot</td>
    </tr>
</table>

Now I'm using Python to convert it to a .txt (flatfile) which I can later on import to my MSSQL Database.

What I have so far:

import codecs
import os

# read the file with a specific encoding
with codecs.open('test.xls', 'r', encoding = 'ansi') as file_in, codecs.open('test_out.txt', 'w') as file_out:
    lines = file_in.read()
    lines = lines.replace('<tr>', '')

    # save the manipulated data into a new file with new encoding
    file_out.write(lines)

This approach results in a .txt like this:

Row A;Row B;Row C

New York;23;warm

San Francisco;40;hot

I tried to get rid of the empty lines by multiple approaches, the last one was:

for lines in file_in:
        if line != '\n':
            file_out.write(lines)

But the File either looks the same or is completely empty


Solution

  • To get rid of the empty lines:

    list.txt:

    Row A;Row B;Row C
    
    New York;23;warm
    
    San Francisco;40;hot
    

    Hence:

    logFile = "list.txt"
    with open(logFile) as f:
        content = f.readlines()
    
    # to remove empty lines
    content = [l.strip() for l in content if l.strip()]
    for line in content:
        print(line)
    

    OUTPUT:

    Row A;Row B;Row C
    New York;23;warm
    San Francisco;40;hot
    

    EDIT:

    perhaps, to read from the file and then overwrite it, using a list that stores the results which can later be written to the file.

    logFile = "list.txt"                # your file name
    results = []                        # an empty list to store the lines
    with open(logFile) as f:            # open the file
        content = f.readlines()         # read the lines
    
    # you may also want to remove empty lines
    content = [l.strip() for l in content if l.strip()]   # removing the empty lines
    for line in content:
        results.append(line)    # appending each line to the list
    
    print(results)              # printing the list
    
    
    with open(logFile, "w") as f:    # open the file in write mode
        for elem in results:         # for each line stored in the results list
            f.write(str(elem) + '\n')  # write the line to the file
        print("Thank you, your data was overwritten")  # Tadaa-h!