java arrays audio javasound fast-forward

Fast Forward implementation in Realtime Audio byte array in java

I am managing audio capturing and playing using java sound API (targetDataLine and sourceDataLine). Now suppose in a conference environment, one participant's audio queue size got greater than jitter size (due to processing or network) and I want to fast forward the audio bytes I have of that participant to make it shorter than jitter size.

How can I fast forward the audio byte array of that participant?

I can't do it during playing as normally Player thread just deque 1 frame from every participant's queue and mix it for playing. The only way I can get that is if I deque more than 1 frame of that participant and mix(?) it for fast-forwarding before mixing it with other participants 1 dequeued frame for playing? Thanks in advance for any kind of help or advice.

Solution

I found a fantastic git repo (sonic library, mainly for audio player) which actually does exactly what I wanted with so much controls. I can input a whole .wav file or even chunks of audio byte arrays and after processing, we can get speed up play experience and so more. For real time processing I actually called this on every chunk of audio byte array.

I found another way/algo to detect whether a audio chunk/byte array is voice or not and after depending on it's result, I can simply ignore playing non voice packets which gives us around 1.5x speedup with less processing.

public class DTHVAD {
public static final int INITIAL_EMIN = 100;
public static final double INITIAL_DELTAJ = 1.0001;
private static boolean isFirstFrame;
private static double Emax;
private static double Emin;
private static int inactiveFrameCounter;
private static double Lamda; //
private static double DeltaJ;

static {
    initDTH();
}

private static void initDTH() {
    Emax = 0;
    Emin = 0;
    isFirstFrame = true;
    Lamda = 0.950; // range is 0.950---0.999
    DeltaJ = 1.0001;
}

public static boolean isAllSilence(short[] samples, int length) {
    boolean r = true;
    for (int l = 0; l < length; l += 80) {
        if (!isSilence(samples, l, l+80)) {
            r = false;
            break;
        }
    }
    return r;
}

public static boolean isSilence(short[] samples, int offset, int length) {

    boolean isSilenceR = false;
    long energy = energyRMSE(samples, offset, length);
    // printf("en=%ld\n",energy);

    if (isFirstFrame) {
        Emax = energy;
        Emin = INITIAL_EMIN;
        isFirstFrame = false;

    }

    if (energy > Emax) {
        Emax = energy;
    }

    if (energy < Emin) {

        if ((int) energy == 0) {
            Emin = INITIAL_EMIN;

        } else {
            Emin = energy;

        }
        DeltaJ = INITIAL_DELTAJ; // Resetting DeltaJ with initial value

    } else {
        DeltaJ = DeltaJ * 1.0001;
    }

    long thresshold = (long) ((1 - Lamda) * Emax + Lamda * Emin);
    // printf("e=%ld,Emin=%f, Emax=%f, thres=%ld\n",energy,Emin,Emax,thresshold);
    Lamda = (Emax - Emin) / Emax;

    if (energy > thresshold) {

        isSilenceR = false; // voice marking

    } else {
        isSilenceR = true; // noise marking

    }

    Emin = Emin * DeltaJ;

    return isSilenceR;
}

private static long energyRMSE(short[] samples, int offset, int length) {
    double cEnergy = 0;
    float reversOfN = (float) 1 / length;
    long step = 0;

    for (int i = offset; i < length; i++) {
        step = samples[i] * samples[i]; // x*x/N=
        // printf("step=%ld cEng=%ld\n",step,cEnergy);
        cEnergy += (long) ((float) step * reversOfN);// for length =80
        // reverseOfN=0.0125

    }
    cEnergy = Math.pow(cEnergy, 0.5);
    return (long) cEnergy;

}

}

Here I can convert my byte array to short array and detect whether it is voice or non voice by

frame.silence = DTHVAD.isSilence(encodeShortBuffer, 0, shortLen);