Search code examples
jsonscalaapache-sparkavroparquet

How to write JSON string to parquet, avro file in scala without spark


I am want to write simple JSON string to parquet and avro file format in scala without spark framework.

My JSON string looks:

    {"emp_id":"123","emp_name":"Mike","emp_status":"true"}

I did not find any solution for that, Is it possible to write parquet and avro file from simple JSON string in scala without spark framework??


Solution

  • Here is a example

    build.sbt

    ThisBuild / version := "0.1.0-SNAPSHOT"
    
    ThisBuild / scalaVersion := "2.13.8"
    
    lazy val root = (project in file("."))
      .settings(
        name := "parquet",
        libraryDependencies ++= Seq(
          "com.github.mjakubowski84" %% "parquet4s-core" % "2.1.0",
          "org.apache.hadoop" % "hadoop-client" % "3.3.1"
        ),
      )
    
    

    Test.scala

    import com.github.mjakubowski84.parquet4s.{ ParquetReader, ParquetWriter, Path }
    
    object Test {
      def main(args: Array[String]): Unit = {
        case class Emp(emp_id: String, emp_name: String, emp_status: String)
    
        val emps = Seq(
          Emp("123", "Mike", "true")
        )
    
        val path = Path("emp1.parquet")
    
    
        ParquetWriter.of[Emp].writeAndClose(path, emps)
    
        val parquetIterable = ParquetReader.as[Emp].read(path)
        try {
          parquetIterable.foreach(println)
        } finally parquetIterable.close()
      }
    }
    

    And the output

    Emp(123,Mike,true)