Search code examples
pythonpysparkbooleannumeric

How to find the how many TRUE or FALSE are in the VALUE column


I have a PySpark Dataframe with a column of strings. I did find if those columns are numeric or not. But now I want to find how many TRUE are in the Value column.

values = [('25q36',),('75647',),('13864',),('8758K',),('07645',)]
df = sqlContext.createDataFrame(values,['ID',])
df.show()
+-----+
|   ID|
+-----+
|25q36|
|75647|
|13864|
|8758K|
|07645|
+-----+

I did apply the following

from pyspark.sql import functions as F

my_df.select(
  "ID",
  F.col("ID").cast("int").isNotNull().alias("Value ")
).show()

+-----+------+
|   ID|Value |
+-----+------+
|25q36| false|
|75647|  true|
|13864|  true|
|8758K| false|
|07645|  true|
+-----+------+

But now I want to know how many TRUE or False are in that column.


Solution

  • Good Night.

    Try something like that...

    df.groupBy('Value').count().show()