I'm using TFX (more precisely TensorFlow Data Validation) with the infer_schema method documented there https://www.tensorflow.org/tfx/data_validation/api_docs/python/tfdv/infer_schema. It generates a schema from a csv file describing column types.
It works well on Float, Bytes, categories... But I would also like to detect Dates. I haven't found it in tutorials or guides. The proto message that is generated supports Dates, so that would not be an issue (see TimeDomain). https://github.com/tensorflow/metadata/blob/master/tensorflow_metadata/proto/v0/schema.proto
I tried with a CSV file with that format (non-US date format), it is recognized as Byte :(
date, amount
15/08/2001, 0.3120682494
16/08/2001, 0.9310268917
17/08/2001, 0.902986235
The code is the same as in the tutorial, so more or less:
train_stats = tfdv.generate_statistics_from_csv(data_location="/content/csv_with_dates.csv")
schema = tfdv.infer_schema(statistics=train_stats)
tfdv.display_schema(schema=schema)
which displays:
Type Presence Valency Domain
Feature name
'date' BYTES required -
'amount' FLOAT required -
Could I make it work? How?
Not at the moment maybe in an upcoming version. if you check the link that you've mentionned you'll find that features support the following types (dates are not included):
enum FeatureType {
TYPE_UNKNOWN = 0;
BYTES = 1;
INT = 2;
FLOAT = 3;
STRUCT = 4;
}