PySpark: TypeError: 'str' object is not callable in dataframe operations

I am reading files from a folder in a loop and creating dataframes from these. However, I am getting this weird error TypeError: 'str' object is not callable. Please find the code here:

for yr in range (2014,2018):
  cat_bank_yr = sqlCtx.read.csv(cat_bank_path+str(yr)+'_'+h1+'bank.csv000',sep='|',schema=schema)
  cat_bank_yr=cat_bank_yr.withColumn("cat_ledger",trim(lower(col("cat_ledger"))))
  cat_bank_yr=cat_bank_yr.withColumn("category",trim(lower(col("category"))))

The code runs for one iteration and then stops at the line

cat_bank_yr=cat_bank_yr.withColumn("cat_ledger",trim(lower(col("cat_ledger"))))

with the above error.

Can anyone help out?

Solution

Your code looks fine - if the error indeed happens in the line you say it happens, you probably accidentally overwrote one of the PySpark function with a string.

To check this, put the following line directly above your for loop and see whether the code runs without an error now:

from pyspark.sql.functions import col, trim, lower

Alternatively, double-check whether the code really stops in the line you said, or check whether col, trim, lower are what you expect them to be by calling them like this:

col

should return

function pyspark.sql.functions._create_function.._(col)