Search code examples
python-3.xcsvthreadpoolexecutorconcurrent.futures

Problem with writing the first line to a CSV file


I'm using a thread to run one function of my code, here's a snippet:

with concurrent.futures.ThreadPoolExecutor() as executor:
    futures = []
    for addresses in ipa_str_s:
        futures.append(executor.submit(checking_connection, ip_address=addresses))
    for future in concurrent.futures.as_completed(futures):
        save_result = (future.result())
        saving_statistics(save_result, saving)

the variable ipa_str_s has a list of IP addresses.

the saving_statistics function, which waits for each call of the checking_connection function to give me the opportunity to save the result.

Called function saving_statistics:

def saving_statistics(save_result, saving):
    with open(saving, 'a', encoding='utf-8') as csv_file:
        csv_writer = csv.writer(csv_file, delimiter=';')
        csv_writer.writerow(['IP-address', 'Packets_transmitted', 'Packets_received'])
        csv_writer.writerow(save_result)

if specify the mode a, then get this result:

IP-address;Packets_transmitted;Packets_received
192.168.1.1;3;3
IP-address;Packets_transmitted;Packets_received;
192.168.1.2;3;0
IP-address;Packets_transmitted;Packets_received;
192.168.1.3;3;0

if specify the mode w, then get this result:

IP-address;Packets_transmitted;Packets_received
192.168.1.3;3;0

Could you tell me please, how can I come to the normal content of a file like this:

IP-address;Packets_transmitted;Packets_received
192.168.1.1;3;3
192.168.1.2;3;0
192.168.1.3;3;0

Thank you very much!


Solution

  • I think your overall process is as simple as making a number of async network connections, then collecting the results into a CSV file, and this process probably runs in a reasonable amount of time, and there won't be restarts. If so, don't append.

    Like Loydms stated, just open then write then close:

    import csv
    import random
    import time
    
    import concurrent.futures
    
    
    def checking_connection(addr):
        sleep_ms = random.randrange(50, 100) / 1000
        time.sleep(sleep_ms)
    
        pckts_in = random.randrange(500, 1000)
        pckts_out = random.randrange(500, 1000)
    
        return addr, pckts_in, pckts_out, sleep_ms
    
    
    with concurrent.futures.ThreadPoolExecutor() as executor:
        futures = []
        for address in [f"192.168.1.{x}" for x in range(256)]:
            futures.append(executor.submit(checking_connection, address))
    
    
    with open("data.csv", "w", newline="") as f:
        writer = csv.writer(f, delimiter=";")
        writer.writerow(["IP_address", "Packets_transmitted", "Packets_received", "Sleep"])
        for future in concurrent.futures.as_completed(futures):
            writer.writerow(future.result())
    

    When I run that code, I get a new CSV every time, with a header and 256 rows of fake IP stats in about 1.8s:

    | IP_address    | Packets_transmitted | Packets_received | Sleep |
    |---------------|---------------------|------------------|-------|
    | 192.168.1.94  | 879                 | 933              | 0.063 |
    | 192.168.1.245 | 846                 | 577              | 0.079 |
    | 192.168.1.144 | 555                 | 656              | 0.099 |
    | 192.168.1.127 | 659                 | 936              | 0.06  |
    | 192.168.1.43  | 706                 | 740              | 0.091 |
    ...
    

    If your process takes an unreasonable amount of time, or there are restarts, then either pick a logging approach or use Python's SQLite3 module to create a small DB and do inserts.

    From either a log file or the DB you can create the CSV when it's ultimately needed.