I am creating a table with some known columns and some dynamic columns. I would like to specify the data types for the known columns and infer the data types for the unknown columns. Is there a way to do this?
If I create a schema with only the known columns, then the other columns are ignored when creating the table:
n_legs = pa.array([2, 4, 5, 100])
animals = pa.array(["Flamingo", "Horse", "Brittle stars", "Centipede"])
pydict = {'n_legs': n_legs, 'animals': animals}
partialSchema = pa.schema([('n_legs', pa.int32())])
pa.Table.from_pydict(pydict, schema=partialSchema)
pyarrow.Table
n_legs: int32
----
n_legs: [[2,4,5,100]]
^^^ The animals
column was omitted instead of inferred.
One solution could be to specify the data type for your inputs before you create the table, when you are creating your arrays. Then you do not need to specify a schema:
n_legs = pa.array([2, 4, 5, 100], pa.int32())
animals = pa.array(["Flamingo", "Horse", "Brittle stars", "Centipede"])
pydict = {'n_legs': n_legs, 'animals': animals}
pa.Table.from_pydict(pydict)
pyarrow.Table
n_legs: int32
animals: string
----
n_legs: [[2,4,5,100]]
animals: [["Flamingo","Horse","Brittle stars","Centipede"]]