I have a PySpark Dataframe with a column of strings. I did find if those columns are numeric or not. But now I want to find how many TRUE are in the Value column.
values = [('25q36',),('75647',),('13864',),('8758K',),('07645',)]
df = sqlContext.createDataFrame(values,['ID',])
df.show()
+-----+
| ID|
+-----+
|25q36|
|75647|
|13864|
|8758K|
|07645|
+-----+
I did apply the following
from pyspark.sql import functions as F
my_df.select(
"ID",
F.col("ID").cast("int").isNotNull().alias("Value ")
).show()
+-----+------+
| ID|Value |
+-----+------+
|25q36| false|
|75647| true|
|13864| true|
|8758K| false|
|07645| true|
+-----+------+
But now I want to know how many TRUE or False are in that column.
Good Night.
Try something like that...
df.groupBy('Value').count().show()