Search code examples
scalaprecisionbigdecimal

scala.math.BigDecimal : 1.2 and 1.20 are equal


How to keep precision and the trailing zero while converting a Double or a String to scala.math.BigDecimal ?

Use Case - In a JSON message, an attribute is of type String and has a value of "1.20". But while reading this attribute in Scala and converting it to a BigDecimal, I am loosing the precision and it is converted to 1.2

Scala REPL screenshot


Solution

  • @Saurabh What a nice question! It is crucial that you shared the use case!

    I think my answer lets to solve it in a most safe and efficient way... In a short form it is:

    Use jsoniter-scala for parsing BigDecimal values precisely.

    Encoding/decoding to/from JSON strings for any numeric type can by defined per codec or per class field basis. Please see code bellow:

    Add dependencies into your build.sbt:

    libraryDependencies ++= Seq(
      "com.github.plokhotnyuk.jsoniter-scala" %% "jsoniter-scala-core"   % "2.17.4",
      "com.github.plokhotnyuk.jsoniter-scala" %% "jsoniter-scala-macros" % "2.17.4" % Provided // required only in compile-time
    )
    

    Define data structures, derive a codec for the root structure, parse the response body and serialize it back:

    import com.github.plokhotnyuk.jsoniter_scala.core._
    import com.github.plokhotnyuk.jsoniter_scala.macros._
    
    case class Response(
      amount: BigDecimal,
      @stringified price: BigDecimal)
        
    implicit val codec: JsonValueCodec[Response] = JsonCodecMaker.make {
      CodecMakerConfig
        .withIsStringified(true) // switch it on to stringify all numeric and boolean values in this codec
        .withBigDecimalPrecision(34) // set a precision to round up to decimal128 format: java.math.MathContext.DECIMAL128.getPrecision
        .withBigDecimalScaleLimit(6178) // limit scale to fit the decimal128 format: BigDecimal("0." + "0" * 33 + "1e-6143", java.math.MathContext.DECIMAL128).scale + 1
        .withBigDecimalDigitsLimit(308) // limit a number of mantissa digits to be parsed before rounding with the specified precision
    }
      
    val response = readFromArray("""{"amount":1000,"price":"1.20"}""".getBytes("UTF-8"))
    val json = writeToArray(Response(amount = BigDecimal(1000), price = BigDecimal("1.20")))
    

    Print results to the console and verify them:

    println(response)
    println(new String(json, "UTF-8"))
    
    Response(1000,1.20)
    {"amount":1000,"price":"1.20"}   
    

    Why the proposed approach is safe?

    Well... Parsing of JSON is a minefield, especially when you are going to have precise BigDecimal values after that. Most JSON parsers for Scala do it using Java's constructor for string representation which has O(n^2) complexity (where n is a number of digits in the mantissa) and do not round results to the safe option of MathContext (by default the MathContext.DECIMAL128 value is used for that in Scala's BigDecimal constructors and operations).

    It introduces vulnerabilities under low bandwidth DoS/DoW attacks for systems that accept untrusted input. Below is a simple example how it can be reproduced in Scala REPL with the latest version of the most popular JSON parser for Scala in the classpath:

    ...
    Starting scala interpreter...
    Welcome to Scala 2.12.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_222).
    Type in expressions for evaluation. Or try :help.
    
    scala> def timed[A](f: => A): A = { val t = System.currentTimeMillis; val r = f; println(s"Elapsed time (ms): ${System.currentTimeMillis - t}"); r } 
    timed: [A](f: => A)A
    
    scala> timed(io.circe.parser.decode[BigDecimal]("9" * 1000000))
    Elapsed time (ms): 29192
    res0: Either[io.circe.Error,BigDecimal] = Right(999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999...
    
    scala> timed(io.circe.parser.decode[BigDecimal]("1e-100000000").right.get + 1)
    Elapsed time (ms): 87185
    res1: scala.math.BigDecimal = 1.0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000...
    

    For contemporary 1Gbit networks 10ms of receiving a malicious message with the 1M-digit number can produce 29 seconds of 100% CPU load on a single core. More than 256 cores can be effectively DoS-ed at the full bandwidth rate. The last expression demonstrates how to burn a CPU core for ~1.5 minutes using a message with a 13-byte number if subsequent + or - operations were used with Scala 2.12.8.

    And, jsoniter-scala take care about all these cases for Scala 2.11.x, 2.12.x, 2.13.x, and 3.x.

    Why it is the most efficient?

    Below are charts with throughput (operations per second, so greater is better) results of JSON parsers for Scala on different JVMs during parsing of an array of 128 small (up to 34-digit mantissas) values and a medium (with a 128-digit mantissa) value of BigDecimal accordingly:

    enter image description here

    enter image description here

    The parsing routine for BigDecimal in jsoniter-scala:

    • uses BigDecimal values with compact representation for small numbers up to 36 digits

    • uses more efficient hot-loops for medium numbers that have from 37 to 284 digits

    • switches to the recursive algorithm which has O(n^1.5) complexity for values that have more than 285 digits

    Moreover, jsoniter-scala parses and serializes JSON directly from UTF-8 bytes to your data structures and back, and does it crazily fast without using of run-time reflection, intermediate ASTs, strings or hash maps, with minimum allocations and copying. Please see here the results of 115 benchmarks for different data types and real-life message samples for GeoJSON, Google Maps API, OpenRTB, and Twitter API.