Why rdd.getNumPartitions() is triggering a job in spark?

Why rdd.getNumPartitions() is triggering a job in my code below?

Please consider this code:

employee_df = spark.read.format('csv') \
                        .option('header', 'true') \
                            .load('/FileStore/tables/employee.csv')

print(employee_df.rdd.getNumPartitions())

Output: 1

At this stage, employee_df.rdd.getNumPartitions() did not trigger any job and just printed number of partitions as 1.

But if I repartition the data and run employee_df.rdd.getNumPartitions() again as follow:

employee_df = employee_df.repartition(2)
print(employee_df.rdd.getNumPartitions())

Output:

(1) Spark Jobs
Job 14 

View
(Stages: 1/1)

Stage 21 1/1 succeeded   View
2

I see that a job has been triggered. From what I have read, rdd.getNumPartitions() is not an action. Then why is it triggering a job if it's not an action? Does it have something to do with repartitioning?

Solution

The task action which you were seeing is not for getNumPartitions() but it is for repartition(). The repartition() method in Spark triggers a full shuffle, hence you see the task.