What is the best way to read .tsv file with header in pyspark and store it in a spark data frame.
I am trying to use "spark.read.options" and "spark.read.csv" commands however no luck.
Regards, Jit
Well you can directly read the tsv file without providing external schema if there is header available as:
df = spark.read.csv(path, sep=r'\t', header=True).select('col1','col2')
Since spark is lazily evaluated it'll read only selected columns. Hope it helps.