dataframe apache-spark pyspark apache-spark-sql rdd

Pyspark DataFrame Filtering

I have a dataframe as follows:

|Property ID|Location|Price|Bedrooms|Bathrooms|Size|Price SQ Ft|Status|

When I am filtering it with bedrooms or bathrooms it is giving correct answer

df = spark.read.csv('/FileStore/tables/realestate.txt', header=True, inferSchema=True, sep='|')
df.filter(df.Bedrooms==2).show()

But when I am filtering it with Property ID as df.filter(df.Property ID==1532201).show() , I am getting an error. Is it because there is a space in betweeen Property and ID ?

Solution

You can also use the square bracket notation to select the column:

df.filter(df['Property ID'] == 1532201).show()

Or use a raw SQL string to filter: (note the backticks)

df.filter('`Property ID` = 1532201').show()

Expanding dataframe to include non existing values
How to add a new row to an existing DataFrame which is the sum of two rows?
obtaining last value of dataframe column without index
Using named columns and relative row numbers with Pandas 3
How to convert index of a pandas dataframe into a column
Conditional mapping in pandas
How to stream DataFrame using FastAPI without saving the data to csv file?
Assign groups in dataframe using vectors containing start and end indexes
Streamlit multiselect, if I don't select anything, doesn't show data frame
How to order a data frame by one descending and one ascending column?
How to extract multiple JSON objects from one file?
Convert Categorical codes to Categorical values
Polars vs. Pandas: size and speed difference
Visualizing Relationships Between Heterogeneous Data Variables in a Pandas DataFrame
Python Pandas - how to read in data from list (data) and columns (separate list)
Converting a pandas dataframe in wide format to long format
Reshape wide to long in pandas
data.frame rows to a list
Filter dataframe by nearest date
How to write pandas dataframe to oracle database using to_sql?
Converting components of Array Struct as columns in python
Dropping grouped rows based on a certain hierarchical column
Using Python, how to update an excel sheet on Azure Blob Storage while maintaining existing data in the sheet?
Swapping strings or values in grouped data based on condition
How to plot a Probability Density Function in Python?
python code to remove special character '-' from certain strings only
Summing columns of Pandas dataframe in a systematic way
Import .dat file containing index, as pandas dataframe
Search a dataframe from a list and add column to say found or not
Pandas groupby with tag-style list