Search code examples
pysparkapache-spark-sqldata-science-experience

Importing a SparkSession DataFrame on DSX


I'm currently working on Data Science Experience and would like to import a CSV file as a SparkSession DataFrame. I am able to successfully import the DataFrame, however, all of the column attributes are converted to string type. How do you make this DSX feature recognize the types present in the CSV file?


Solution

  • Currently, the generated code for the actual creation of the pyspark.sql.DataFrame looks like this:

    df_data_1 = spark.read\
      .format('org.apache.spark.sql.execution.datasources.csv.CSVFileFormat')\
      .option('header', 'true')\
      .load('swift://container_name.' + name + '/test.csv')
    df_data_1.take(5)
    

    You have to add the the following options, then the schema will be inferred:

    .option(inferschema='true')\