Search code examples
allennlp

Specifiying feedforward size based on vocab size in crf_tagger using allennlp config


I'm migrating my allennlp model from classes to config and there's one last construct I'm having problems with.

I'm using a feedforward projection layer in my LSTM CRF decoder, i.e.

vocab_size = vocab.get_vocab_size("tokens")

feedforward = FeedForward(
    input_dim=encoder.get_output_dim(), 
    num_layers=2,
    hidden_dims=[text_field_embedder.get_output_dim(), vocab_size],
    activations=[Activation.by_name(Activation.by_name("relu"),Activation.by_name("linear")(),],
    dropout=[0.15,0.15],
    )

model = CrfTagger(
    vocab=vocab, 
    text_field_embedder=text_field_embedder,
    encoder=encoder,
    feedforward=feedforward,
    )

The issue I'm running into is how to express the last hidden dim size (vocab_size) in json, since it's dependent on the runtime value of vocab.get_vocab_size("tokens")?

It seems that I need to either construct FeedForward inside CrfTagger (so I have access to vocab at runtime) or create my own FeedForward derived class.

I'm wondering if there's a cleaner way, is there a way I can register a constructor for FeedForward (essentially a factory function)?


Solution

  • Great question! From my exploration (and the investigation of another person on the team), it doesn't look like there's currently a clean way to dynamically create a feedforward directly from a config file with output_dim = vocab_size. It seems like the best option is to create a feedforward subclass that is lazily constructed with vocab_size.