My sklearn_crfsuite model does not learn anything

I'm trying to create an annotations prediction model, following the tutorial here, but my model doesn't learn anything. Here is a sample of my training data and labels:

[{'bias': 1.0, 'word.lower()': '\nreference\nissue\ndate\ndgt86620\n4\n \n19-dec-05\nfalcon\n7x\ntype\ncertification\n27_4-100\nthis\ndocument\nis\nthe\nintellectual\nprop...nairbrakes\nhandle\nposition\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n0\ntable\n1\n:\nairbrake\ncas\nmessages\n', 'word[-3:]': 'es\n', 'word[-2:]': 's\n', 'word.isupper()': False, 'word.istitle()': False, 'word.isdigit()': False, 'postag': 'POS', 'postag[:2]': 'PO', 'w_emb_0': 0.03418987928976114, 'w_emb_1': 0.617338281 1066742, 'w_emb_2': 0.004420982990809508, 'w_emb_3': 0.08293022662242588, 'w_emb_4': 0.22162269482070363, 'w_emb_5': 0.4334545347397811, 'w_emb_6': 0.7844891779932379, 'w_emb_7': 0.028043262790094503, 'w_emb_8': 0.5233847386564157, 'w_emb_9': 0.9685677133128328, 'w_em b_10': 0.19379126558708126, 'w_emb_11': 0.2809608896964926, 'w_emb_12': 0.384759230815804, 'w_emb_13': 0.15385904662767336, 'w_emb_14': 0.5206500040610533, 'w_emb_15': 0.009148526006733215, 'w_emb_16': 0.5894118695171416, 'w_emb_17': 0.7356989708459056, 'w_emb_18': 0. 5576774100159024, 'w_emb_19': 0.2185294430010376, 'BOS': True, '+1:word.lower()': 'reference', '+1:word.istitle()': False, '+1:word.isupper()': True, '+1:postag': 'POS', '+1:postag[:2]': 'PO'}, {'bias': 1.0, 'word.lower()': 'reference', 'word[-3:]': 'NCE', 'word[-2:]' : 'CE', 'word.isupper()': True, 'word.istitle()': False, 'word.isdigit()': False, 'postag': 'POS', 'postag[:2]': 'PO', 'w_emb_0': -0.390038, 'w_emb_1': 0.30677223, 'w_emb_2': -1.010975, 'w_emb_3': 0.3656154, 'w_emb_4': 0.5319459, 'w_emb_5': 0.45572615, 'w_emb_6': -0.4 6090943, 'w_emb_7': 0.87250936, 'w_emb_8': 0.036648277, 'w_emb_9': -0.3057043, 'w_emb_10': 0.33427167, 'w_emb_11': -0.19664396, 'w_emb_12': -0.64899784, 'w_emb_13': -0.1785065, 'w_emb_14': -0.117423356, 'w_emb_15': 0.16247013, 'w_emb_16': 0.11694676, 'w_emb_17': -0.30 693895, 'w_emb_18': -1.0026807, 'w_emb_19': 0.9946743, '-1:word.lower()': '\nreference...n \n \n \n \n \n \n \n \n0\ntable\n1\n:\nairbrake\ncas\nmessages\n', '-1:word.istitle()': False, '-1:word.isupper()': False, '-1:postag': 'POS', '-1:postag[:2]': 'PO', '+1:word.lower()': 'issue', '+1:word.istitle()': False, '+1:word. isupper()': True, '+1:postag': 'POS', '+1:postag[:2]': 'PO'}, {'bias': 1.0, 'word.lower()': 'issue', 'word[-3:]': 'SUE', 'word[-2:]': 'UE', 'word.isupper()': True, 'word.istitle()': False, 'word.isdigit()': False, 'postag': 'POS', 'postag[:2]': 'PO', 'w_emb_0': -1.220 4882, 'w_emb_1': 0.8920707, 'w_emb_2': -3.8380668, 'w_emb_3': 1.5641377, 'w_emb_4': 2.1918254, 'w_emb_5': 1.8509868, 'w_emb_6': -2.0664182, 'w_emb_7': 3.1591077, 'w_emb_8': -0.33126026, 'w_emb_9': -1.4278139, 'w_emb_10': 0.9291533, 'w_emb_11': -0.6761407, 'w_emb_12': -2.9582167, 'w_emb_13': -0.5395561, 'w_emb_14': -0.8363763, 'w_emb_15': 0.25568742, 'w_emb_16': 0.4932978, 'w_emb_17': -1.6198335, 'w_emb_18': -4.183924, 'w_emb_19': 4.281094, '-1:word.lower()': 'reference', '-1:word.istitle()': False, '-1:word.isupper()': True, '-1:p ostag': 'POS', '-1:postag[:2]': 'PO', '+1:word.lower()': 'date', '+1:word.istitle()': False, '+1:word.isupper()': True, '+1:postag': 'POS', '+1:postag[:2]': 'PO'}...]
y_train = ['O', 'O', 'O'...'I-data-c-a-s_message-type'....'B-data-c-a-s_message-type']

and here is the model definition and training:

crf = sklearn_crfsuite.CRF(
            algorithm='lbfgs',
            c1=0.1,
            c2=0.1,
            max_iterations=100,
            all_possible_transitions=True
        )
crf.fit(X_train, y_train)

y_pred = crf.predict(X_test)
sorted_labels = sorted(labels, key=lambda name: (name[1:], name[0]))

msg = metrics.flat_classification_report(y_test, y_pred, labels=labels, digits=4)
print(msg)

and unfortunately, my model doesn't learn anything:

                           precision    recall  f1-score   support   
B-data-c-a-s_message-type     0.0000    0.0000    0.0000        23  
I-data-c-a-s_message-type     0.0000    0.0000    0.0000        90
                micro avg     0.0000    0.0000    0.0000       113
                macro avg     0.0000    0.0000    0.0000       113
             weighted avg     0.0000    0.0000    0.0000       113

Solution

The problem is solved. As you can see above, the support (number of evaluation samples) is a total of 113. However, the number of samples in the training set was just about 14 !! which is too small ! and I've just not noticed this difference. I've inverted the training and test datasets, and now, performances are something like this:

                            precision    recall  f1-score   support
B-data-c-a-s_message-type     0.0000    0.0000    0.0000     0     
I-data-c-a-s_message-type     0.6364    1.0000    0.7778     14
                micro avg     0.6364    1.0000    0.7778     14                    
                macro avg     0.3182    0.5000    0.3889     14             
             weighted avg     0.6364    1.0000    0.7778      14