Search code examples
javaarraysaudiojavasoundfast-forward

Fast Forward implementation in Realtime Audio byte array in java


I am managing audio capturing and playing using java sound API (targetDataLine and sourceDataLine). Now suppose in a conference environment, one participant's audio queue size got greater than jitter size (due to processing or network) and I want to fast forward the audio bytes I have of that participant to make it shorter than jitter size.

How can I fast forward the audio byte array of that participant?

I can't do it during playing as normally Player thread just deque 1 frame from every participant's queue and mix it for playing. The only way I can get that is if I deque more than 1 frame of that participant and mix(?) it for fast-forwarding before mixing it with other participants 1 dequeued frame for playing? Thanks in advance for any kind of help or advice.


Solution

  • I found a fantastic git repo (sonic library, mainly for audio player) which actually does exactly what I wanted with so much controls. I can input a whole .wav file or even chunks of audio byte arrays and after processing, we can get speed up play experience and so more. For real time processing I actually called this on every chunk of audio byte array.

    I found another way/algo to detect whether a audio chunk/byte array is voice or not and after depending on it's result, I can simply ignore playing non voice packets which gives us around 1.5x speedup with less processing.

    public class DTHVAD {
    public static final int INITIAL_EMIN = 100;
    public static final double INITIAL_DELTAJ = 1.0001;
    private static boolean isFirstFrame;
    private static double Emax;
    private static double Emin;
    private static int inactiveFrameCounter;
    private static double Lamda; //
    private static double DeltaJ;
    
    static {
        initDTH();
    }
    
    private static void initDTH() {
        Emax = 0;
        Emin = 0;
        isFirstFrame = true;
        Lamda = 0.950; // range is 0.950---0.999
        DeltaJ = 1.0001;
    }
    
    public static boolean isAllSilence(short[] samples, int length) {
        boolean r = true;
        for (int l = 0; l < length; l += 80) {
            if (!isSilence(samples, l, l+80)) {
                r = false;
                break;
            }
        }
        return r;
    }
    
    public static boolean isSilence(short[] samples, int offset, int length) {
    
        boolean isSilenceR = false;
        long energy = energyRMSE(samples, offset, length);
        // printf("en=%ld\n",energy);
    
        if (isFirstFrame) {
            Emax = energy;
            Emin = INITIAL_EMIN;
            isFirstFrame = false;
    
        }
    
        if (energy > Emax) {
            Emax = energy;
        }
    
        if (energy < Emin) {
    
            if ((int) energy == 0) {
                Emin = INITIAL_EMIN;
    
            } else {
                Emin = energy;
    
            }
            DeltaJ = INITIAL_DELTAJ; // Resetting DeltaJ with initial value
    
        } else {
            DeltaJ = DeltaJ * 1.0001;
        }
    
        long thresshold = (long) ((1 - Lamda) * Emax + Lamda * Emin);
        // printf("e=%ld,Emin=%f, Emax=%f, thres=%ld\n",energy,Emin,Emax,thresshold);
        Lamda = (Emax - Emin) / Emax;
    
        if (energy > thresshold) {
    
            isSilenceR = false; // voice marking
    
        } else {
            isSilenceR = true; // noise marking
    
        }
    
        Emin = Emin * DeltaJ;
    
        return isSilenceR;
    }
    
    private static long energyRMSE(short[] samples, int offset, int length) {
        double cEnergy = 0;
        float reversOfN = (float) 1 / length;
        long step = 0;
    
        for (int i = offset; i < length; i++) {
            step = samples[i] * samples[i]; // x*x/N=
            // printf("step=%ld cEng=%ld\n",step,cEnergy);
            cEnergy += (long) ((float) step * reversOfN);// for length =80
            // reverseOfN=0.0125
    
        }
        cEnergy = Math.pow(cEnergy, 0.5);
        return (long) cEnergy;
    
    }
    

    }

    Here I can convert my byte array to short array and detect whether it is voice or non voice by

    frame.silence = DTHVAD.isSilence(encodeShortBuffer, 0, shortLen);