If I have a parquet file with columns that have, for example, types Decimal(38, 22) or Decimal(20, 4), is there a way to compare them to the existing schema in database in python (for example check if Decimal(38, 22) corresponds to the same column that has type numeric(38.22) in db)? As far as I understand, pyarrow and python in general reads Decimal values as double. Is there a way to read the file and represent such values in Decimal, and compare it to db schema, including the precision and scale?
You can use pyarrow to inspect the schema of a parquet file and find out what each decimal field precision and scale are:
import pyarrow as pa
import pyarrow.parquet as pq
parquet_file = pq.ParquetFile("table.parquet")
for field in parquet_file.schema_arrow:
if pa.types.is_decimal(field.type):
print(field.name, field.type.scale, field.type.precision)