pyspark apache-spark-sql databricks azure-databricks aws-databricks

Use of '\' in reading dataframe

# File location and type
file_location = "/FileStore/tables/FileName.csv"
file_type = "csv"

#CSV options
infer_schema = "true"
first_row_is_header = "true"
delimiter = ","

# The applied options are for CSV files. For other files types, these will be ignored.
df = spark.read.format(file_type) \
  .option("inferSchema", infer_schema) \
  .option("header", first_row_is_header) \
  .option("sep", delimiter) \
  .load(file_location)

display(df)

This is generic code to read the data from csv file. In this code, what is the use of ".option("inferSchema", infer_schema) " and what "" will do in this code?

Solution

Backslash '' is used at the end of line to denote that the code after backslash is considered to be in the same line. This is mostly done is long code where code expands over single line.

inferSchema is used to infer the data types of the columns in dataframe. If we make inferSchema as true, then spark reads all the data in dataframe while loading data to infer the data types of the columns.

"" is used with .option function. It is used to add different parameter while reading a file. There can be many parameters added using option function such as header, inferSchema, sep, schema etc.

pyspark.sql.DataFrameReader.csv

You can refer the above link for further help.