machine-learningnlpmodelmetricsevaluation# F1-Score and Accuracy for Text-Similarity

I am trying to understand how to calculate F1-Score and accuracy between texts while fine-tuning a QA model.

Let's assume we have this:

`labels = [I am fine, He was born in 1995, The Eiffel tower, dogs]`

`preds = [I am fine, born in 1995, Eiffel, dog]`

In this case, it is clear that the predictions are pretty accurate, but how can I measure the F1-Score here? Dog and dogs are not an exact match, but they are very similar.

Solution

One popular metric for text similarity is the Levenshtein distance or edit distance, which measures the minimum number of single-character edits (insertions, deletions, or substitutions) required to transform one string into another.

Try implementing below code. Adjust `threshold`

as per your requirement.

```
import Levenshtein
def text_similarity_evaluation(labels, preds, threshold=0.8):
tp, fp, fn = 0, 0, 0
for label, pred in zip(labels, preds):
similarity_score = 1 - Levenshtein.distance(label, pred) / max(len(label), len(pred))
if similarity_score >= threshold:
tp += 1
else:
fp += 1
fn = len(labels) - tp
precision = tp / (tp + fp)
recall = tp / (tp + fn)
f1_score = 2 * (precision * recall) / (precision + recall)
return precision, recall, f1_score
# Example usage
labels = ["I am fine", "He was born in 1995", "The Eiffel tower", "dogs"]
preds = ["I am fine", "born in 1995", "Eiffel", "dog"]
precision, recall, f1_score = text_similarity_evaluation(labels, preds, threshold=0.8)
print("Precision:", precision)
print("Recall:", recall)
print("F1-Score:", f1_score)
```

- Validation data without targets
- Is calling stack a LIFO queue correct?
- Why is the accuracy for my Keras model always 0 when training?
- Why KL divergence is negative in Pytorch?
- Is this classification model overfitting?
- How to calculate the computational complexity of machine learning algorithms
- Subsample size in scikit-learn RandomForestClassifier
- Common causes of nans during training of neural networks
- One class SVM - Outliers on test set very low relative to training set
- time series forecasting visit dates with customer classes graph not accurate
- How to accumulate gradients for large batch sizes in Keras
- Model precision is 0% in confusion matrix
- UndefinedMetricWarning: Recall and F-score are ill-defined and being set to 0.0 in labels with no true samples. 'recall', 'true', average, warn_for)
- Precision and F-score are ill-defined warning while using Python machine learning model
- How to fix this classification report warning?
- Is there a way to use found sequential patterns as input for a clustering algorithm
- Recursion in FP-Growth Algorithm
- How to visualize an XGBoost tree from GridSearchCV output?
- What is the role of normalization function in TensorFlow?
- Data augmentation not increasing dataset size
- CUML RandomForestClassifier TypeError An Integer is required
- scikit-learn custom transformer throws NotFittedError from underlying model
- Hyperopt set timeouts and modify space during execution
- Rescaling after feature scaling, linear regression
- Scala Support Vector Machine library
- Ways to get intermediate steps during optimization process in qiskit.algorithms.optimizers.ADAM
- r neuralnet package -- multiple output
- subplot for shap summary_plot
- How to improve accuracy score for this regression problem?
- AutoTrain advanced CLI: error: unrecognized arguments: --fp16 --use-int4