Search code examples
javaapache-sparkapache-spark-sqlapache-spark-ml

How to read image apache spark using ImageSchema class Java language


I have a problem when read image file from HDFS and use ImageSchema class https://spark.apache.org/docs/2.3.0/api/java/org/apache/spark/ml/image/ImageSchema.html I don't know how to get image data and integrated OpenCV library.

Thank you everyone


Solution

  • You can try below sample for reading image files using ImageSchema.readImages method.

    import static org.apache.spark.sql.functions.col;
    import org.apache.spark.ml.image.ImageSchema;
    import org.apache.spark.sql.Dataset;
    import org.apache.spark.sql.Row;
    import org.apache.spark.sql.SparkSession;
    import org.opencv.core.Core;
    
    public class ReadImageExample {
        public static void main(String[] args) {
            SparkSession spark = SparkSession.builder().appName("ReadImageExample").master("local").getOrCreate();
            Dataset<Row> ds = ImageSchema.readImages("C:\\temp", spark, false, 0, true, 1.0, 1);
            ds.printSchema();
            ds.select(col("image.width"), col("image.height"), col("image.mode")).show();
        }
    }
    

    You will need following dependencies.

    <dependencies>
            <dependency>
                <groupId>org.apache.spark</groupId>
                <artifactId>spark-core_2.11</artifactId>
                <version>2.3.1</version>
            </dependency>
            <dependency>
                <groupId>org.apache.spark</groupId>
                <artifactId>spark-sql_2.11</artifactId>
                <version>2.3.1</version>
            </dependency>
            <dependency>
                <groupId>org.apache.spark</groupId>
                <artifactId>spark-mllib_2.11</artifactId>
                <version>2.3.1</version>
            </dependency>
    </dependencies>