Search code examples
pythonpandassasexport-to-csv

Export SAS lib to csv with correct date format (in CSV file)


I use: Python 3.7 SAS v7.1 Eterprise

I want to export some data (from library) from SAS to CSV. After that I want to import this CSV to Pandas Dataframe and use it. I have problem, because when I export data from SAS with this code:

proc export data=LIB.NAME

outfile='path\to\export\file.csv'
dbms=csv
replace;
run;

Every column were exported correctly instead of Column with Date. In SAS I see something like:

06NOV2018
16APR2018

and so on... In CSV it looks the same. But if i import this CSV to DataFrame, unfortunatelly, Python see the column with date as Object/string instead of date type.

So here is my question. How Can I export whole library to CSV from SAS with correct type of column (ecpessially column with Date). Maybe I should convert something before Export? Plz help me with this, In SAS I'm new, i want to just import Data from it and use it in Python.

Before you write something, keep in mind, that I had tried with pandas read_sas function, but during this command I've got such Exception with error:

df1 = pd.read_sas(path)

ValueError: Unexpected non-zero end_of_first_byte Exception ignored in: 'pandas.io.sas._sas.Parser.process_byte_array_with_data' Traceback (most recent call last): File "pandas\io\sas\sas.pyx", line 31, in pandas.io.sas._sas.rle_decompress

I put fillna function and show the same error :/

df = pd.DataFrame.fillna((pd.read_sas(path)), value="")

I tried with sas7bdat module in Python, but I've got the same error. Then I tried with sas7bdat_converter module. But CSV has the same values in Date column, so problem with dtype will arrive after convert csv to DataFrame.

Have you got any sugestions? I've spent 2 days tried to figure it out, but without any positive results :/


Solution

  • Regarding the read_sas error, a Git issue has been reported but closed for lack of reproducible example. However, I can easily import SAS data files with Pandas using .sas7bdat files generated from SAS 9.4 base (possibly the v7.1 Enterprise is the issue).

    However, consider using parse_dates argument of read_csv as it can convert your date DDMMMYY format to datetime during import. No change needed with your SAS exported dataset.

    sas_df = pd.read_csv(r"path\to\export\file.csv", parse_dates = ['DATE_COLUMN'])