I have a folder having .txt and .csv files (having exactly same column names)
However, while I am trying to read only CSV Files in PySpark and trying the following code below it is reading and appending both text and csv files together
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("CSV Reader").getOrCreate()
csv_path = "path/to/csv/folder"
df = spark.read \
.format("csv") \
.option("header", "true") \
.option("inferSchema", "true") \
.load(csv_path)
You can use pathGlobFilter
as an option and define a pattern to read only .csv files
spark.read.format("csv").option('pathGlobFilter', '*.csv').load(csv_path)
Hope this is going to help I've found that option here: https://dbmstutorials.com/pyspark/spark-read-write-dataframe-options.html