Search code examples
python-3.xpandassqlalchemy

My Python script is not formatting the CSV file properly if there are NULL values


I have the following Python 3 code that runs a database stored procedure and puts the data into a dataframe and then saves it to a .csv file.

It works, but if there is a null value at the end, it just leaves a trailing comma.

Here is my script:

def GenerateCSV(fileName):

    connection_string = "DRIVER={ODBC Driver 17 for SQL Server};SERVER=dev;DATABASE=Cities;UID=z;PWD=z"
    connection_url = URL.create(
        "mssql+pyodbc", query={"odbc_connect": connection_string})

    engine = sa.create_engine(connection_url)

    qry = text("EXECUTE dbo.sp_city_parser")
    with engine.connect() as con:
        rs = con.execute(qry)

    df = pd.read_sql_query(qry, engine)

    df.to_csv(fileName, encoding='utf-8', mode='a', index= False, header=False)

The program generates a csv file that looks like this:

*CityName,*CityID,*CityCountryCode,*CitySize,CityPopulation
Tokyo,656,5,X-Large,
Chicago,887,9,Large,
Berlin,432,3,Large,

As you can see, since CityPopulation is NULL, it just leaves a trailing comma with nothing there.

Shouldn't it leave something like this? Where the '' represents the null value?

Tokyo,656,5,X-Large,''

Thanks!


Solution

  • if you want to represent null values in csv file with something else you can pass na_rep param to your method :

    df.to_csv(fileName, na_rep ='NULL', ... ) 
    

    output :

    Tokyo,656,5,X-Large,NULL
    Chicago,887,9,Large,NULL
    Berlin,432,3,Large,NULL
    

    or you can quote your values :

    df.to_csv(fileName, quoting=csv.QUOTE_NONNUMERIC, ...)
    

    output:

    "Tokyo",656,5,"X-Large",""
    "Chicago",887,9,"Large",""
    "Berlin",432,3,"Large",""
    

    and to be clear here , none of these is necessary to declare null values in csv file. Any csv reader would understand that column is null.