What is the best way to read .tsv file with header in pyspark and store it in a spark data frame.
I am trying to use "spark.read.options" and "spark.read.csv" commands however no luck.
Thanks.
Regards, Jit
Well you can directly read the tsv file without providing external schema if there is header available as:
df = spark.read.csv(path, sep=r'\t', header=True).select('col1','col2')
Since spark is lazily evaluated it'll read only selected columns. Hope it helps.