Search code examples
nlphuggingface-transformersbert-language-model

Can we calculate feature importance in Huggingface Bert?


We can fit a LinearRegression model on the regression dataset and retrieve the coeff_ property that contains the coefficients found for each input variable. These coefficients can provide the basis for a crude feature importance score. This assumes that the input variables have the same scale or have been scaled prior to fitting a model.

What about Bert? Can we get coef_ variable from the model and use it to calculate feature importance like LinearRegression model in text classification task?


Solution

  • Captum is a prominent tool (from pytorch/Facebook) for interpreting transformers and you can get a score for the attention the model pays to specific tokens at specific layers. See a tutorial here: https://captum.ai/tutorials/Bert_SQUAD_Interpret or here https://github.com/pytorch/captum