Search code examples
pythoncsvpandasiostata

most efficient I/O setup between Stata and Python (Pandas)


I am using Stata to process some data, export the data in a csv file and load it in Python using the pandas read_csv function.

The problem is that everything is so slow. Exporting from Stata to a csv file takes ages (exporting in the dta Stata format is much faster), and loading the data via read_csv is also very slow. Using the read_stata pandas function is even worse.

I wonder is there are any other options? Like exporting a format other than csv? My csv dataset is approx 6-7 Gb large.

Any help appreciated

Thanks


Solution

  • Pretty efficient pd.read_stata()/.to_stata(), see here