I try to run the example of tensorflow ranking with some custom data. The example works with their data.
Basically I want to create a tensorflow Dataset with the function tensorflow.data.Dataset.from_generator()
to get the dataset for tf ranking.
I've created the dataset with
from sklearn.datasets import dump_svmlight_file
dump_svmlight_file(X=X, y=y, f=f, query_id=query_id)
And it looks like this:
0 qid:10 0:53156 1:6456 2:700
1 qid:10 0:48112 1:3535 2:700
2 qid:10 0:48112 1:3655 2:16500
3 qid:10 0:51641 1:8871 2:1200
4 qid:10 0:13207 1:2790 2:400
5 qid:10 0:8175 1:1656 2:700
6 qid:21 0:8175 1:1776 2:2700
7 qid:21 0:9620 1:2424 2:1600
8 qid:21 0:8079 1:2443 2:700
9 qid:25 0:13428 1:3777 2:800
I then create the dataset with the following code:
_NUM_FEATURES_OWN=3
_LIST_SIZE_OWN=10
train_dataset_OWN = tf.data.Dataset.from_generator(
tfr.data.libsvm_generator(_TRAIN_DATA_PATH_OWN, _NUM_FEATURES_OWN, _LIST_SIZE_OWN),
output_types=(
{str(k): tf.float32 for k in range(1,_NUM_FEATURES_OWN+1)},
tf.float32
),
output_shapes=(
{str(k): tf.TensorShape([_LIST_SIZE_OWN, 1])
for k in range(1,_NUM_FEATURES_OWN+1)},
tf.TensorShape([_LIST_SIZE_OWN])
)
)
And get the dataset. But when I try to iterate through it I'll get the error message:
train_dataset_OWN.make_one_shot_iterator().get_next()
InvalidArgumentError: TypeError: 'NoneType' object does not support item assignment
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/script_ops.py", line 207, in __call__
ret = func(*args)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/data/ops/dataset_ops.py", line 449, in generator_py_func
values = next(generator_state.get_iterator(iterator_id))
File "/root/.local/lib/python3.6/site-packages/tensorflow_ranking/python/data.py", line 477, in inner_generator
num_features, list_size, doc_list)
File "/root/.local/lib/python3.6/site-packages/tensorflow_ranking/python/data.py", line 424, in _libsvm_generate
features.get(feature_id)[idx, 0] = value
TypeError: 'NoneType' object does not support item assignment
[[{{node PyFunc}}]] [Op:IteratorGetNextSync]
I've created an example notebook here: https://colab.research.google.com/drive/1hAVJrQmbXD5h1pZfCKpkvSJib4_OaL1J
I've been wrestling with the same problem.
What worked for me was indexing the features from 1 instead of from 0, for example:
0 qid:10 1:53156 2:6456 3:700
1 qid:10 1:48112 2:3535 3:700
2 qid:10 1:48112 2:3655 3:16500
3 qid:10 1:51641 2:8871 3:1200
4 qid:10 1:13207 2:2790 3:400
5 qid:10 1:8175 2:1656 3:700
6 qid:21 1:8175 2:1776 3:2700
7 qid:21 1:9620 2:2424 3:1600
8 qid:21 1:8079 2:2443 3:700
9 qid:25 1:13428 2:3777 3:800