Search code examples
nlphuggingface-transformersnlp-question-answering

Fine-tune T5 pre-trained model on a specific domain for question answering


I need to build a question-answering system on a specific domain of Finance, I have documents data containing all the information about the field,

Can I fine-tune T5 pre-trained model (large) unsupervised training on the documents so it can answer related questions based on my documents corpus?
The documents corpus I have is quite large, so I cannot just use it as a context in the current QA within T5,

I am open to your suggestions!


Solution

  • What I found is that it is not really feasible to fine-tune T5 LLM word embeddings, you can only use context or fine-tune the model on a dataset of QA, but not retrain the model on a specific domain like finance which was my case,
    I ended up building the QA system using Haystack which is an open-source library offering project architecture to build NLP QA systems based on transformers you can specify
    https://github.com/deepset-ai/haystack