Search code examples
pythoncsv

How to read complex txt file with blocks of data and save it as csv file in python?


If i have a file organized like this

++++++++++++++
Country 1

**this sentence is not important.
**date 25.09.2017, also not important
*******
Address
**Office

        Address A, 100 City. Country X
**work time 09h00-16h00<br>9h00-14h00
**www.example.com
**[email protected];
**012/345 67 89
**téléfax 123/456 67 89
*******
Address
**Home Office

        Address A, 200 City. Country X
**[email protected];
**001/000 00 00
**téléfax 111/111 11 11
*******
Address
**Living address

        Address 0, 123 City
**[email protected]
**000/000 00 00
**téléfax 222/222 22 22
++++++++++++++
Country 2

**this sentence is not important.
**date 25.09.2017, also not important
*******
Address
**Office

        AAA 11, 30 City 

        BBB 22, 30 City
**work time 08h00-12h30  
**www.example.com
**[email protected]
**000/000 00 00
**téléfax 111/11 11 11
*******

ETC

And i want to put data in csv file with these columns:

Country (Line right after ++++++++++++++), Address (Line right after *******), Office (after **), WorkTime (after **), Website (after **), Email (after **), Phone (after **), Fax (after **)

How do I do it in Python? Problem is, in some lists there is missing data, so i know some rows in csv file will end up all messed up, but i don't mind doing some manual work tweaking the database after i do this. Another problem is, country names vary, so i would need to use ++++++++++++++ as separator.

I tried something like this

import csv
with open('listofdata.txt', 'r') as FILE:
   DATA = FILE.read()

LIST = DATA.split('++++++++++++++')

LIST2 = []
LIST3 = []
LIST4 = []

for ITEMS in LIST:
    LIST2 = ITEMS.split('*******')    
    for items2 in LIST2:
        LIST3 = items2.split('**')
        LIST4.append(LIST3)


with open('file.csv', 'w') as CSV:
    for ITEMS in LIST4:
        csv.write(ITEMS)

But it doesn't work.

ERROR: `Traceback (most recent call last): File "test.py", line 22, in csv.write(ITEMS) AttributeError: 'module' object has no attribute 'write'

`


Solution

  • In the very last line you wrote your file object "csv" instead of "CSV", that was the reason there was an error.

    I added the procedure on how to use the csv module within python to your code.

    All you have to do now is work on your parsing method.

    Code:

    import csv
    with open('listofdata.txt', 'r') as FILE:
       DATA = FILE.read()
    
    LIST = DATA.split('++++++++++++++')
    
    LIST2 = []
    LIST3 = []
    LIST4 = []
    
    for ITEMS in LIST:
        LIST2 = ITEMS.split('*******')
        for items2 in LIST2:
            LIST3 = items2.split('**')
            LIST4.append(LIST3)
    
    with open('file.csv', 'w') as csvfile:
        spamwriter = csv.writer(csvfile, delimiter=',')
        for ITEMS in LIST4:
            spamwriter.writerow(ITEMS)
    

    Output:

    ""
    
    "
    Country 1
    
    ","this sentence is not important.
    ","date 25.09.2017, also not important
    "
    
    "
    Address
    ","Office
    
            Address A, 100 City. Country X
    ","work time 09h00-16h00<br>9h00-14h00
    ","www.example.com
    ","[email protected];
    ","012/345 67 89
    ","téléfax 123/456 67 89
    "
    
    "
    Address
    ","Home Office
    
            Address A, 200 City. Country X
    ","[email protected];
    ","001/000 00 00
    ","téléfax 111/111 11 11
    "
    
    "
    Address
    ","Living address
    
            Address 0, 123 City
    ","[email protected]
    ","000/000 00 00
    ","téléfax 222/222 22 22
    "
    
    "
    Country 2
    
    ","this sentence is not important.
    ","date 25.09.2017, also not important
    "
    
    "
    Address
    ","Office
    
            AAA 11, 30 City
    
            BBB 22, 30 City
    ","work time 08h00-12h30
    ","www.example.com
    ","[email protected]
    ","000/000 00 00
    ","téléfax 111/11 11 11
    "
    
    "
    "