Framework or toolkit to implement RNN-transducer

I'm studying the end-to-end architecture for automatic speech recognition systems.

RNN transducer (RNN-T) is one of the popular end-to-end methods, but it is so difficult to train.

Therefore I'm looking for a framework or a toolkit that can help me to easily implement the baseline model and then make modifications as I wish.

Thanks in advance!

Solution

For those interested, I'm currently using ESPnet toolkit which mainly focuses on end-to-end speech recognition and end-to-end text-to-speech.