Search code examples
pythonbert-language-model

Failed to find data adapter that can handle input: , ( containing values of types {""}) Content classification


Im doing text classification using bert

please help me to solve these

my data set

these is my data set

            content                                     catalogPath_level_1 
    mobility academy rolling stock technical and p...   TM
    trackguard sicas ecc ...                            TM  
    zuverlässigkeit von verteiltem juridical recor...   TMR 
    wbt simis d hardware projektierung innenraum...     TMR 

model

model that convert text and create tokenizer

def tokenize_function(text):
    return tokenizer(text.numpy(), padding=True, truncation=True, return_tensors='tf')

def tf_tokenize(text):
    result = tf.py_function(tokenize_function, [text], Tout=tf.int32)
    result.set_shape([None, None])
    return result

layers

layers

text_input = tf.keras.layers.Input(shape=(), dtype=tf.float32, name='input_ids')
tokenized_input = tf.keras.layers.Lambda(tf_tokenize)(text_input)
outputs = bert_encoder(tokenized_input)
pooled_output = outputs[0][:, 0]

# Neural network layers
l = tf.keras.layers.Dropout(0.1, name="dropout")(pooled_output)
l = tf.keras.layers.Dense(4, activation='sigmoid', name="output")(l)

# Use inputs and outputs to construct a final model
model = tf.keras.Model(inputs=[text_input], outputs = [l]) 

optimizers

optimizers

optimizer = tf.keras.optimizers.Adam()
loss = tf.keras.losses.CategoricalCrossentropy()

model.compile(optimizer=optimizer,
              loss=loss,
              metrics='accuracy')

fit

model.fit(X_train, Y_train, epochs=10)

error at fit

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[23], line 1
----> 1 model.fit(X_train, Y_train, epochs=10)

File ~\AppData\Roaming\Python\Python310\site-packages\keras\utils\traceback_utils.py:70, in filter_traceback..error_handler(*args, **kwargs)
     67     filtered_tb = _process_traceback_frames(e.__traceback__)
     68     # To get the full stack trace, call:
     69     # `tf.debugging.disable_traceback_filtering()`
---> 70     raise e.with_traceback(filtered_tb) from None
     71 finally:
     72     del filtered_tb

File ~\AppData\Roaming\Python\Python310\site-packages\keras\engine\data_adapter.py:1082, in select_data_adapter(x, y)
   1079 adapter_cls = [cls for cls in ALL_ADAPTER_CLS if cls.can_handle(x, y)]
   1080 if not adapter_cls:
   1081     # TODO(scottzhu): This should be a less implementation-specific error.
-> 1082     raise ValueError(
   1083         "Failed to find data adapter that can handle input: {}, {}".format(
   1084             _type_name(x), _type_name(y)
   1085         )
   1086     )
   1087 elif len(adapter_cls) > 1:
   1088     raise RuntimeError(
   1089         "Data adapters should be mutually exclusive for "
   1090         "handling inputs. Found multiple adapters {} to handle "
   1091         "input: {}, {}".format(adapter_cls, _type_name(x), _type_name(y))
   1092     )

ValueError: Failed to find data adapter that can handle input: , ( containing values of types {""})

Solution

  • I think the problem can be with the data type, some advices:

    • for the text_input, it should be dtype=tf.string, since your data are strings
    • tokenize_function: should process the string directly with text = text.numpy().decode('utf-8') before the return
    • Y_train should be one-hot encoded, try using to_categorical from tensorflow.keras.utils
    • change X_train too X_train_list = X_train.to_list()
    • remember to use the previous on the model.fit(...)