I'm currently working on Data Science Experience and would like to import a CSV file as a SparkSession DataFrame. I am able to successfully import the DataFrame, however, all of the column attributes are converted to string type. How do you make this DSX feature recognize the types present in the CSV file?
Currently, the generated code for the actual creation of the pyspark.sql.DataFrame
looks like this:
df_data_1 = spark.read\
.format('org.apache.spark.sql.execution.datasources.csv.CSVFileFormat')\
.option('header', 'true')\
.load('swift://container_name.' + name + '/test.csv')
df_data_1.take(5)
You have to add the the following options, then the schema will be inferred:
.option(inferschema='true')\