Search code examples
scalagridfscasbah

Reading file contents with casbah gridfs throws MalformedInputException


Consider the following sample code: it writes a file to mongodb and then tries to reread it

import com.mongodb.casbah.Imports._
import com.mongodb.casbah.gridfs.Imports._

object TestGridFS{
    def main(args: Array[String]){
        val mongoConn = MongoConnection()
        val mongoDB = mongoConn("gridfs_test")
        val gridfs = GridFS(mongoDB) // creates a GridFS handle on ``fs``
        val xls = new java.io.FileInputStream("ok.xls")
        val savedFile=gridfs.createFile(xls)
        savedFile.filename="ok.xls"
        savedFile.save
        println("savedfile id: %s".format(savedFile._id.get))
        val file=gridfs.findOne(savedFile._id.get)
        val bytes=file.get.source.map(_.toByte).toArray
        println(bytes)
    }
}

this yields

gridfs $ sbt run
[info] Loading global plugins from /Users/jean/.sbt/plugins
[info] Set current project to gridfs-test (in build file:/Users/jean/dev/sdev/src/perso/gridfs/)
[info] Running TestGridFS 
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
savedfile id: 504c8cce0364a7cd145d5dc1
[error] (run-main) java.nio.charset.MalformedInputException: Input length = 1
java.nio.charset.MalformedInputException: Input length = 1
    at java.nio.charset.CoderResult.throwException(CoderResult.java:260)
    at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:319)
    at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:158)
    at java.io.InputStreamReader.read(InputStreamReader.java:167)
    at java.io.BufferedReader.fill(BufferedReader.java:136)
    at java.io.BufferedReader.read(BufferedReader.java:157)
    at scala.io.BufferedSource$$anonfun$iter$1$$anonfun$apply$mcI$sp$1.apply$mcI$sp(BufferedSource.scala:38)
    at scala.io.Codec.wrap(Codec.scala:64)
    at scala.io.BufferedSource$$anonfun$iter$1.apply(BufferedSource.scala:38)
    at scala.io.BufferedSource$$anonfun$iter$1.apply(BufferedSource.scala:38)
    at scala.collection.Iterator$$anon$14.next(Iterator.scala:148)
    at scala.collection.Iterator$$anon$25.hasNext(Iterator.scala:463)
    at scala.collection.Iterator$$anon$19.hasNext(Iterator.scala:334)
    at scala.io.Source.hasNext(Source.scala:238)
    at scala.collection.Iterator$$anon$19.hasNext(Iterator.scala:334)
    at scala.collection.Iterator$class.foreach(Iterator.scala:660)
    at scala.collection.Iterator$$anon$19.foreach(Iterator.scala:333)
    at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
    at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:99)
    at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:250)
    at scala.collection.Iterator$$anon$19.toBuffer(Iterator.scala:333)
    at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:237)
    at scala.collection.Iterator$$anon$19.toArray(Iterator.scala:333)
    at TestGridFS$.main(test.scala:15)
    at TestGridFS.main(test.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
java.lang.RuntimeException: Nonzero exit code: 1
    at scala.sys.package$.error(package.scala:27)
[error] {file:/Users/jean/dev/sdev/src/perso/gridfs/}default-b6ab90/compile:run: Nonzero exit code: 1
[error] Total time: 1 s, completed 9 sept. 2012 14:34:22

I don't understand what the charset problem can be, I just wrote the file to the database. when querying the base I DO see the files and chunks in there, but can't seem to be able to read them.

I tried this with mongo 2.0 and 2.2, casbah 2.4 and 3.0.0-M2 to no avail, and don't see what I could do to get the bytes, on mac OSX mountain lion.

PS: To run the test, you can use the following build.sbt

name := "gridfs-test"

version := "1.0"

scalaVersion := "2.9.1"

libraryDependencies += "org.mongodb" %% "casbah" % "2.4.1"

libraryDependencies += "org.mongodb" %% "casbah-gridfs" % "2.4.1"

resolvers ++= Seq("Typesafe Releases" at "http://repo.typesafe.com/typesafe/releases/",
      "sonatype release" at "https://oss.sonatype.org/content/repositories/releases",
      "OSS Snapshots" at "https://oss.sonatype.org/content/repositories/snapshots/")

Here is the stacktrace I get :


Solution

  • I found a way to read the file contents back from mongodb. The source method relies on underlying.inpustream which is defined in GridFSDBFile.

    Every test I did which uses underlying.inpustream failed with the same error. However, the API proposes another way to access the files : writeTo. writeTo does not use underlying.inpustream.

    Here is the "fixed" code from the question :

    import com.mongodb.casbah.Imports._
    import com.mongodb.casbah.gridfs.Imports._
    
    object TestGridFS{
        def main(args: Array[String]){
            val mongoConn = MongoConnection()
            val mongoDB = mongoConn("gridfs_test")
            val gridfs = GridFS(mongoDB) // creates a GridFS handle on ``fs``
            val xls = new java.io.File("ok.xls")
            val savedFile=gridfs.createFile(xls)
            savedFile.filename="ok.xls"     
            savedFile.save
            println("savedfile id: %s".format(savedFile._id.get))
            val file=gridfs.findOne(savedFile._id.get)
            val byteArrayOutputStream = new java.io.ByteArrayOutputStream()
            file.map(_.writeTo(byteArrayOutputStream))
            byteArrayOutputStream.toByteArray
        }
    }
    

    the last line, byteArrayOutputStream.toByteArray gives you an array of bytes which can then be used however you see fit.