I am new to PySpark (or Spark for this matter). I converted a Python list to RDD
name_list_json = [ '{"name": "k"}', '{"name": "b"}', '{"name": "c"}' ]
name_list_rdd = spark.sparkContext.parallelize(name_list_json)
print(name_list_rdd)
This prints out "ParallelCollectionRDD[2] at readRDDFromFile at PythonRDD.scala:262". Two questions here:
What does 2 in ParallelCollectionRDD[2] mean? Is that a number of partitions?
Also why does readRDDFromFile show up here? Is that because the python list is saved to a file and then loaded from the file?