Im doing text classification using bert
please help me to solve these
these is my data set
content catalogPath_level_1
mobility academy rolling stock technical and p... TM
trackguard sicas ecc ... TM
zuverlässigkeit von verteiltem juridical recor... TMR
wbt simis d hardware projektierung innenraum... TMR
model that convert text and create tokenizer
def tokenize_function(text):
return tokenizer(text.numpy(), padding=True, truncation=True, return_tensors='tf')
def tf_tokenize(text):
result = tf.py_function(tokenize_function, [text], Tout=tf.int32)
result.set_shape([None, None])
return result
layers
text_input = tf.keras.layers.Input(shape=(), dtype=tf.float32, name='input_ids')
tokenized_input = tf.keras.layers.Lambda(tf_tokenize)(text_input)
outputs = bert_encoder(tokenized_input)
pooled_output = outputs[0][:, 0]
# Neural network layers
l = tf.keras.layers.Dropout(0.1, name="dropout")(pooled_output)
l = tf.keras.layers.Dense(4, activation='sigmoid', name="output")(l)
# Use inputs and outputs to construct a final model
model = tf.keras.Model(inputs=[text_input], outputs = [l])
optimizers
optimizer = tf.keras.optimizers.Adam()
loss = tf.keras.losses.CategoricalCrossentropy()
model.compile(optimizer=optimizer,
loss=loss,
metrics='accuracy')
model.fit(X_train, Y_train, epochs=10)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[23], line 1
----> 1 model.fit(X_train, Y_train, epochs=10)
File ~\AppData\Roaming\Python\Python310\site-packages\keras\utils\traceback_utils.py:70, in filter_traceback..error_handler(*args, **kwargs)
67 filtered_tb = _process_traceback_frames(e.__traceback__)
68 # To get the full stack trace, call:
69 # `tf.debugging.disable_traceback_filtering()`
---> 70 raise e.with_traceback(filtered_tb) from None
71 finally:
72 del filtered_tb
File ~\AppData\Roaming\Python\Python310\site-packages\keras\engine\data_adapter.py:1082, in select_data_adapter(x, y)
1079 adapter_cls = [cls for cls in ALL_ADAPTER_CLS if cls.can_handle(x, y)]
1080 if not adapter_cls:
1081 # TODO(scottzhu): This should be a less implementation-specific error.
-> 1082 raise ValueError(
1083 "Failed to find data adapter that can handle input: {}, {}".format(
1084 _type_name(x), _type_name(y)
1085 )
1086 )
1087 elif len(adapter_cls) > 1:
1088 raise RuntimeError(
1089 "Data adapters should be mutually exclusive for "
1090 "handling inputs. Found multiple adapters {} to handle "
1091 "input: {}, {}".format(adapter_cls, _type_name(x), _type_name(y))
1092 )
ValueError: Failed to find data adapter that can handle input: , ( containing values of types {""})
I think the problem can be with the data type, some advices:
text_input
, it should be dtype=tf.string
, since your data are stringstokenize_function
: should process the string directly with text = text.numpy().decode('utf-8')
before the return
Y_train
should be one-hot encoded, try using to_categorical
from tensorflow.keras.utils
X_train
too X_train_list = X_train.to_list()
model.fit(...)