Tried to get the unique value from the BigQuery table using BeamSQL in Google Dataflow. Using Group By clause implemented the condition in BeamSQL (sample query below). One of the column has float data type. While executing the Job got below exceptions,
Caused by: org.apache.beam.sdk.coders.Coder$NonDeterministicException: org.apache.beam.sdk.coders.RowCoder@81d6d10 is not deterministic because: All fields must have deterministic encoding. Caused by: org.apache.beam.sdk.coders.Coder$NonDeterministicException: FloatCoder is not deterministic because: Floating point encodings are not guaranteed to be deterministic.
BeamSQL Query:
PCollection ST= mainColl.apply(SqlTransform.query("SELECT ID,ITEM,UNITPRICE FROM PCOLLECTION GROUP BY ID,ITEM,UNITPRICE"));
It would be good if some one help me to solve this issue.
Please note, if we remove the float column then BeamSQL query works fine.
This is indicating that you should not use floating point values (in this case probably UNITPRICE
value) in the aggregation (group by) scheme, because their output is non-deterministic (i.e. it can change based on the precision change). For instance, consider this example:
WITH
data AS (
SELECT 100 AS id, 'abc' as item, 0.3448473362800000001 AS unitprice
UNION ALL
SELECT 200 AS id, 'xyz' as item, 0.49300013 AS unitprice
UNION ALL
SELECT 500 AS id, 'pqr' as item, 0.67322332200000212 AS unitprice
)
select id, item, unitprice from data
group by id, item, unitprice
Output for this comes to be:
100 abc 0.34484733628
200 xyz 0.49300013
500 pqr 0.6732233220000021
in which, the unitprice
values look a bit different.
To avoid this, you can go two routes:
cast(unitprice as string) as unitprice
in your query.unitprice
as non-grouped entity (which is a logical option in most cases), and just do max(unitprice) as unitprice
or avg(unitprice) as unitprice
in your query, while grouping by id, item
.Hope this helps.