Search code examples
dataframepysparkaggregatecol

pyspark dataframe error: _() missing 1 required positional argument: 'col'


Does anyone know what this issue is below: code:

def writeDataToDBFS(data: DataFrame, dirName: str):
  data.write.format("delta").mode("overwrite").save(dir + "/" + dirName)

ls = ["workspaceId"]
newls_2a = ls.copy()
newls_2a.append("date")
newls_2b = ls.copy()
Df = (DataDf.select(*(newls_2a))
            .distinct()
            .groupBy(*(ls)).agg(f.count().alias("numCnt")))
writeDataToDBFS(Df, "Df")

error:

_() missing 1 required positional argument: 'col'

Solution

  • this solved it:

    Df = (DataDf.select(*(newls_2a))
            .distinct()
            .groupBy(*(ls)).agg(f.count("date").alias("numCnt")))