java websocket netty azure-cognitive-services speech

byte[] can convert into continuous input stream

Project Websocket Server built By Netty
Netty Client send Request：

File file = new File("D:\\zh-16000-30s.pcm");
FileInputStream fis = new FileInputStream(file);

int length = 0;
int dataSize = 4096;
byte[] bytes = new byte[dataSize];

int status = 0;
// simulator Andorid or IOS push Streaming 
while ((length = fis.read(bytes, 0, dataSize)) != -1) {

    JSONObject jsonObject = new JSONObject();
    jsonObject.put("audio", Base64.getEncoder().encodeToString(Arrays.copyOf(bytes, length)));\\
    jsonObject.put("status", status);
    WebSocketFrame frame = new TextWebSocketFrame(jsonObject.toJSONString());
    ch.writeAndFlush(frame);
    status = 1;
}

if(length == -1){
    status = 2;
}
if(status == 2){
    JSONObject jsonObject = new JSONObject();
    jsonObject.put("audio", "");
    jsonObject.put("status", status);
    WebSocketFrame frame = new TextWebSocketFrame(jsonObject.toJSONString());
    ch.writeAndFlush(frame);
}

Netty Server Hanlder:

protected void channelRead0(ChannelHandlerContext ctx, WebSocketFrame frame) throws Exception {
        // ping and pong frames already handled

        if (frame instanceof TextWebSocketFrame) {
            // Send the uppercase string back.
            String request = ((TextWebSocketFrame) frame).text();
            JSONObject jsonObject = JSONObject.parseObject(request);
            Integer status = jsonObject.getInteger("status");
            byte[] recByte = Base64.getDecoder().decode(jsonObject.getString("audio"));
            if(status.intValue() == 0){
                ctx.channel().attr(AttributeKey.newInstance("login")).getAndSet(recByte);
            }else if(status.intValue() == 1){
                byte[] a = (byte[]) ctx.channel().attr(AttributeKey.valueOf("login")).get();
                byte[] c=new byte[a.length+recByte.length];  
                System.arraycopy(a, 0, c, 0, a.length);  
                System.arraycopy(recByte, 0, c, a.length, recByte.length); 
                ctx.channel().attr(AttributeKey.valueOf("login")).getAndSet(c);
            }else if(status.intValue() == 2){
                // the end of file or streaming 
                saveAudio((byte[]) ctx.channel().attr(AttributeKey.valueOf("login")).get());

            }

            ctx.channel().writeAndFlush(new TextWebSocketFrame(request.toUpperCase(Locale.US)));
        } else {
            String message = "unsupported frame type: " + frame.getClass().getName();
            throw new UnsupportedOperationException(message);
        }
    }

i want to use Microsoft Speech Streaming Recognition

Sample code snippet：

// Creates an instance of a speech config with specified
    // subscription key and service region. Replace with your own subscription key
    // and service region (e.g., "westus").
    SpeechConfig config = SpeechConfig.fromSubscription("YourSubscriptionKey", "YourServiceRegion");

    // Create an audio stream from a wav file.
    // Replace with your own audio file name.
    PullAudioInputStreamCallback callback = new **WavStream**(new FileInputStream("YourAudioFile.wav"));
    AudioConfig audioInput = AudioConfig.fromStreamInput(callback);

code snippet 2:

private final InputStream stream;

public WavStream(InputStream wavStream) {
    try {
        this.stream = parseWavHeader(wavStream);
    } catch (Exception ex) {
        throw new IllegalArgumentException(ex.getMessage());
    }
}

@Override
public int read(byte[] dataBuffer) {
    long ret = 0;

    try {
        ret = this.stream.read(dataBuffer, 0, dataBuffer.length);
    } catch (Exception ex) {
        System.out.println("Read " + ex);
    }

    return (int)Math.max(0, ret);
}

@Override
public void close() {
    try {
        this.stream.close();
    } catch (IOException ex) {
        // ignored
    }
}

Question：

How can i convert byte[] to inputStream continuous .

for example:

I speak 30s sound, suppose 1s equals netty server receive once package
netty server send 1s package to Microsoft speech Recognition
Microsoft speech server return middle result
netty client send complete, Microsoft Recognized at the same time

thanks

Solution

Is your question about netty websocket server? Or about the Speech SDK objects?

My recommendation for using the Speech SDK in this manner would be to use a push stream instead of a pullstream. Generally, it's easier to manage on your side. Pseudo-code:

// FOR SETUP... BEFORE you are accepting audio in your websocket server
//              (or on first acceptance of the first packet of audio):
// create push stream
// create audio config from push stream
// create speech config
// create speech recognizer from speech config and audio config
// hook up event handlers for intermediate results (recognizing events)
// hook up event handlers for final results (recognized events)
// start recognition (recognize once or start continuous recognition)

// ON EACH AUDIO packet your websocket server accepts:
// push the audio data into the push stream with

// ON EACH recognizing event, send back the result.text to your client
// ON EACH recognized event, send back the result.text to your client

--rob chambers [MSFT]