Search code examples
pythonapache-sparkpysparkapache-spark-ml

ERROR '_ImageSchema' object has no attribute 'readImages'


Trying to load image from folder in pyspark

from pyspark.ml.image import ImageSchema
from pyspark.sql.functions import lit

zero_df = ImageSchema.readImages('../Transfer-Learning- 
PySpark/images/o').withColumn("label",lit(0))

throws error

     AttributeError                            Traceback (most recent call last)
     <ipython-input-9-29c9b120f9c2> in <module>
                   2 from pyspark.sql.functions import lit
                    3 
     ----> 4 zero_df = ImageSchema.readImages('../Transfer-Learning- 
     PySpark/images/o').withColumn("label",lit(0))

     AttributeError: '_ImageSchema' object has no attribute 'readImages'

Python 3.8 Spark v3.0.2


Solution

  • Since Spark 2.4, images can be loaded directly with a DataFrameReader using the format image:

    zero_df = spark.read.format("image").load(<path to files>)
    

    More details can be found here.

    The usage of ImageSchema.readImages as been deprecated since then and the method has been removed in Spark 3.0.0