Search code examples
javaaudiosynchronizationaudio-streamingaudio-recording

How to synchronize a TargetDataLine and SourceDataLine in Java (Synchronize audio recording and playback)


I am trying to create a Java application that is able to play an audio playback, record the user voice and tell if the user sing in tune and at the right time.

For the moment, I just focus on the record and play audio (tune recognition is out of scope).

For this purpose, I used TargetDataLine and SourceDataLine from the Java audio API. At first, I start the audio record and then I launch the audio playback. Since I want to ensure that the user sing at the right time, I need to keep a synchronization between the audio recorded and the audio played.

For example, if the audio playback starts 1 second after the audio recording, I know that I will ignore the first second of data in the record buffer.

I use the following code for my tests (the code is far from being perfect but it's just for testing purpose).

import javax.sound.sampled.*;
import java.io.File;
import java.io.IOException;

class AudioSynchro {

private TargetDataLine targetDataLine;
private SourceDataLine sourceDataLine;
private AudioInputStream ais;
private AudioFormat recordAudioFormat;
private AudioFormat playAudioFormat;

public AudioSynchro(String sourceFile) throws IOException, UnsupportedAudioFileException {
    ais = AudioSystem.getAudioInputStream(new File(sourceFile));

    recordAudioFormat = new AudioFormat(44100f, 16, 1, true, false);
    playAudioFormat = ais.getFormat();
}

//Enumerate the mixers
public void enumerate() {
    try {
        Mixer.Info[] mixerInfo =
                AudioSystem.getMixerInfo();
        System.out.println("Available mixers:");
        for(int cnt = 0; cnt < mixerInfo.length;
            cnt++){
            System.out.println(mixerInfo[cnt].
                    getName());
        }
    } catch (Exception e) {
        e.printStackTrace();
    }
}

//Init datalines
public void initDataLines() throws LineUnavailableException {
    Mixer.Info[] mixerInfo =
            AudioSystem.getMixerInfo();

    DataLine.Info targetDataLineInfo = new DataLine.Info(TargetDataLine.class, recordAudioFormat);

    Mixer targetMixer = AudioSystem.getMixer(mixerInfo[5]);

    targetDataLine = (TargetDataLine)targetMixer.getLine(targetDataLineInfo);

    DataLine.Info sourceDataLineInfo = new DataLine.Info(SourceDataLine.class, playAudioFormat);

    Mixer sourceMixer = AudioSystem.getMixer(mixerInfo[3]);

    sourceDataLine = (SourceDataLine)sourceMixer.getLine(sourceDataLineInfo);
}

public void startRecord() throws LineUnavailableException {
    AudioInputStream stream = new AudioInputStream(targetDataLine);

    targetDataLine.open(recordAudioFormat);

    byte currentByteBuffer[] = new byte[512];

    Runnable readAudioStream = new Runnable() {
        @Override
        public void run() {
            int count = 0;
            try {
                targetDataLine.start();
                while ((count = stream.read(currentByteBuffer)) != -1) {
                    //Do something
                }
            }
            catch(Exception e) {
                e.printStackTrace();
            }
        }
    };
    Thread thread = new Thread(readAudioStream);
    thread.start();
}

public void startPlay() throws LineUnavailableException {
    sourceDataLine.open(playAudioFormat);
    sourceDataLine.start();

    Runnable playAudio = new Runnable() {
        @Override
        public void run() {
            try {
                int nBytesRead = 0;
                byte[] abData = new byte[8192];
                while (nBytesRead != -1) {
                    nBytesRead = ais.read(abData, 0, abData.length);
                    if (nBytesRead >= 0) {
                        int nBytesWritten = sourceDataLine.write(abData, 0, nBytesRead);
                    }
                }

                sourceDataLine.drain();
                sourceDataLine.close();
            }
            catch(Exception e) {
                e.printStackTrace();
            }
        }
    };
    Thread thread = new Thread(playAudio);
    thread.start();
}

public void printStats() {
    Runnable stats = new Runnable() {

        @Override
        public void run() {
            while(true) {
                long targetDataLinePosition = targetDataLine.getMicrosecondPosition();
                long sourceDataLinePosition = sourceDataLine.getMicrosecondPosition();
                long delay = targetDataLinePosition - sourceDataLinePosition;
                System.out.println(targetDataLinePosition+"\t"+sourceDataLinePosition+"\t"+delay);

                try {
                    Thread.sleep(20);
                } catch (InterruptedException e) {
                    e.printStackTrace();
                }
            }
        }
    };

    Thread thread = new Thread(stats);
    thread.start();
}

public static void main(String[] args) {
    try {
        AudioSynchro audio = new AudioSynchro("C:\\dev\\intellij-ws\\guitar-challenge\\src\\main\\resources\\com\\ouestdev\\guitarchallenge\\al_adagi.mid");
        audio.enumerate();
        audio.initDataLines();
        audio.startRecord();
        audio.startPlay();
        audio.printStats();
    } catch (IOException | LineUnavailableException | UnsupportedAudioFileException e) {
        e.printStackTrace();
    }
}

}

The code initialize the 2 datalines, starts the audio recording, starts the audio playback and displays statistics. The enumerate() method is used to display the mixers available on the system. You have to change the mixers used in the initDataLines() method depending on your system to do your own tests. The printStats method() starts a thread that ask the position in microsecond of the 2 datalines. This is the data that I try to use to keep a track of the synchronization. What I observe is that the 2 datalines don't stay synchronized all the time. Here is a short extract of my output console :

130000 0 130000

150000 748 149252

170000 20748 149252

190000 40748 149252

210000 60748 149252

230000 80748 149252

250000 100748 149252

270000 120748 149252

290000 140748 149252

310000 160748 149252

330000 180748 149252

350000 190748 159252

370000 210748 159252

390000 240748 149252

410000 260748 149252

430000 280748 149252

450000 300748 149252

470000 310748 159252

490000 340748 149252

510000 350748 159252

530000 370748 159252

As we can see, the delay may vary from 10 milliseconds regularly so I can't tell with precision which position in recording buffer match with the beginning of the playback buffer. Especially, in the previous example, I Don't know if I should start at the position 149252 or 159252. When it comes to audio processing, 10 milliseconds is important and I would like something more accurate (1 or 2 milliseconds is acceptable). Moreover, it sounds really weird that when there is a difference between 2 measures, it's still a gap of 10 milliseconds.

I then tried to push my tests further but I don't get better results : - Tried with bigger or smaller buffers - Tried a buffer twice bigger for the playback. Since the audio file is in stereo more bytes are consumed (2 bytes/frame for recording and 4/bytes/frame for playing) - Tried to record and play on the same audio device

In my opinion, there two strategy to synchronize the 2 buffers : - What I try to do. Determine with precision the position in the recording buffer where the playback start. - Synchronize the start of the recording and the playback.

In both of these strategies, I need to guaranty that the synchronization is maintained.

Has any of you ever experienced this type of problem ?

At the moment, I use Java 12 and JavaFx for my application but I am ready to use another framework. I have not tried but it may be possible to get better results and more control with the frameworks lwjgl (https://www.lwjgl.org/ is based on OpenAl) or beads (http: // www.beadsproject.net/). If anyone of you knows his frameworks and can give me a return, I'm interested.

Finally, the last acceptable solution is to change the programming language.


Solution

  • I've not done much with TargetDataLines yet, but I think I can offer a useful observation and suggestion.

    First, the test you have written is probably measuring variance in the multi-threading algorithm, not slippage in the timing of the files. The way the JVM bounces back and forth between processing threads can be quite unpredictable. There is a good article on real time, low-latency coding in Java that you might read for background information.

    Secondly, the way that Java uses blocking queues with audio IO provides a lot of stability. If it didn't, we'd hear all sorts of audio artifacts during playback or on our recordings.

    Here is an idea to try: create a single runnable that has a while loop that processes an identical number of frames from both the TargetDataLine and the SourceDataLine in the same iteration. This runnable can be loosely coupled (use booleans to turn on/off the lines).

    The main benefit is that you know that every loop iteration is producing coordinated data.


    EDIT: Here are a couple examples of what I've done with frame counting: (1) I have an audio loop that counts frames as it processes. All timings are determined strictly by the number of frames processed. I never bother with taking readings from the position of the SDL. I've written a metronome, and it initiates a synthesized click every N frames (where N is based on tempo). At the Nth frame, the data for the synthesized click is mixed into the audio data that is being sent out of the SDL. The accuracy in timing that I have obtained by this method is outstanding.

    Another application, on the Nth frame, I initiated a visual/graphical event. The graphics loop is usually set to 60fps and the audio to 44100 fps. The initiation is handled via loose coupling: a boolean for the event is flipped by the audio thread (nothing more than that, cluttering the audio thread with extraneous activity is hazardous, can lead to stuttering and dropouts). The graphics processing loop (aka "game loop") picks up the boolean change and handles it in its own time (60 fps). I've had some nice visual + aural synchronization occur this way, including having objects whose brightness tracks with the volume of the sound being played. This is similar to the digital VU meters that many have written using Java.

    Depending on the level of accuracy you are hoping for, I think frame counting can be sufficient. I don't know of any other way, with Java, that provides as much accuracy.