Search code examples
dataframeapache-sparkpysparkapache-spark-sqlrdd

Pyspark DataFrame Filtering


I have a dataframe as follows:

|Property ID|Location|Price|Bedrooms|Bathrooms|Size|Price SQ Ft|Status|

When I am filtering it with bedrooms or bathrooms it is giving correct answer

df = spark.read.csv('/FileStore/tables/realestate.txt', header=True, inferSchema=True, sep='|')
df.filter(df.Bedrooms==2).show()

But when I am filtering it with Property ID as df.filter(df.Property ID==1532201).show() , I am getting an error. Is it because there is a space in betweeen Property and ID ?


Solution

  • You can also use the square bracket notation to select the column:

    df.filter(df['Property ID'] == 1532201).show()
    

    Or use a raw SQL string to filter: (note the backticks)

    df.filter('`Property ID` = 1532201').show()