I need to build a question-answering system on a specific domain of Finance, I have documents data containing all the information about the field,
Can I fine-tune T5 pre-trained model (large) unsupervised training on the documents so it can answer related questions based on my documents corpus?
The documents corpus I have is quite large, so I cannot just use it as a context in the current QA within T5,
I am open to your suggestions!
What I found is that it is not really feasible to fine-tune T5 LLM word embeddings, you can only use context or fine-tune the model on a dataset of QA, but not retrain the model on a specific domain like finance which was my case,
I ended up building the QA system using Haystack which is an open-source library offering project architecture to build NLP QA systems based on transformers you can specify
https://github.com/deepset-ai/haystack