Search code examples
pythonpysparkazure-databricks

Fetch first row as Headers from one csv and values from other csv


I have two csv's. First csv has only 1 row which is headers. 2nd csv has values. I want to create the dataframe which has headers from row1 from csv1 and values from all rows within csv 2. Both the csv's has same number of fields starting from _c0 till _c1000 (has about 1000 columns). Columns types can be different within each csv but column names and number of columns will be same. Below is the example snip. I am using databricks (pyspark). Any help is appreciated.

enter image description here


Solution

  • You can impose the schema resulted from reading the first file on reading the second file:

    df1 = spark.read.option('header', True).csv('<path to the file with header>')
    df2 = spark.read.schema(df1.schema).csv('<path to the file without header>')