Search code examples
androidkotlinaudiospeech-recognitionvoice-recognition

Android Kotlin: Saving RecognitionIntent user's audio to app's cache folder


After reading dozens of posts regarding this topic I've come to the conclusion I'm behind a very difficult goal, but shouldn't be impossible.

You can use my app to learn/improve English and I'm trying to implement some "Listen and Repeat" audio lessons, where the user will listen to a paragraph through Android's text-to-speech (I'm already able to save text-to-speech audio to cache folder as a wav file), and then the user will repeat the listened text. After recording is done I'll use musicg library to compare both audio tracks for similarity and give user a score.

I tried this approach with no success. I was able to get a pcm audio file saved, but this file is not recognisable as audio not only by any player, I cannot even play the audio with AudioTrack Android's class.

I tried also to convert the pcm audio to wav with this link to no avail too. The resulting .wav generated file won't play in my Mac not either in my mobile.

Also tried SpeechRecognizer class, impossible to save recognised audio, onBufferReceived is never called.

My "last chance" is to use intent.getData() from RecognitionIntent to extract the audio uri and then use contentResolver to generate the audio file, but intent.getData() always returns null.

This is my Kotlin code:

class AudioLessonsActivity : AppCompatActivity(), RecognitionListener, TextToSpeech.OnInitListener {
    private val permission = 100
    private lateinit var returnedText: TextView
    private lateinit var toggleButton: ToggleButton
    private lateinit var progressBar: ProgressBar
    private lateinit var speech: SpeechRecognizer
    private lateinit var recognizerIntent: Intent
    private var logTag = "VoiceRecognitionActivity"

    private lateinit var ttobj: TextToSpeech
    private val mUtteranceID = "totts"

    lateinit var recordAudioResultLauncher: ActivityResultLauncher<Intent>

    override fun onInit(status: Int) {
        if (status == TextToSpeech.SUCCESS) {
            val result = ttobj.setLanguage(Locale.US)

            val toSpeak = "Hello my friend"
            ttobj.speak(toSpeak, TextToSpeech.QUEUE_FLUSH, null,"")
            saveToAudioFile(toSpeak)

            if (result == TextToSpeech.LANG_MISSING_DATA || result == TextToSpeech.LANG_NOT_SUPPORTED) {
                Log.e("TTS","The Language not supported!")
            }
        }
    }

    private fun setRecordAudio() {
        val recordAudioIntent = Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH)
        recordAudioIntent.putExtra(
            RecognizerIntent.EXTRA_LANGUAGE_MODEL,
            RecognizerIntent.LANGUAGE_MODEL_FREE_FORM
        )
        recordAudioIntent.putExtra(RecognizerIntent.EXTRA_LANGUAGE, Locale.US.toString())
        recordAudioIntent.putExtra(RecognizerIntent.EXTRA_PROMPT, "Say anything, please")
        recordAudioResultLauncher.launch(recordAudioIntent)
    }

    private fun promptSpeechInput() {
        recordAudioResultLauncher = registerForActivityResult(
            ActivityResultContracts .StartActivityForResult()
        ) { result ->
            if (result.resultCode == Activity.RESULT_OK) {
                val recordData = result.data?.data //=> always null

                //val data = result.data?.extras!![Intent.ACTION_REC] as Uri?
                //val bundle: Bundle? = recordData?.extras
                //ArrayList<String> matches = bundle.getStringArrayList(RecognizerIntent.EXTRA_RESULTS)

                val audioUri: Uri? = recordData
                val filestream: InputStream? =
                    audioUri?.let { contentResolver.openInputStream(it) }
            }
        }
    }

    private fun saveToAudioFile(text: String) {
        val mAudioFilename = this.cacheDir.toString() + "/file1.wav"
        ttobj.synthesizeToFile(text, null, File(mAudioFilename), mUtteranceID)
    }

    private fun doSpeech(){
        ttobj = TextToSpeech(this, this)
    }

    override fun onCreate(savedInstanceState: Bundle?) {
        super.onCreate(savedInstanceState)
        setContentView(R.layout.activity_audiolessons)

        promptSpeechInput()
        doSpeech()

        title = "KotlinApp"
        returnedText = findViewById(R.id.textView)
        progressBar = findViewById(R.id.progressBar)
        toggleButton = findViewById(R.id.toggleButton)
        progressBar.visibility = View.VISIBLE
        speech = SpeechRecognizer.createSpeechRecognizer(this)
        Log.i(logTag, "isRecognitionAvailable: " + SpeechRecognizer.isRecognitionAvailable(this))
        speech.setRecognitionListener(this)
        recognizerIntent = Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH)

        recognizerIntent.putExtra("android.speech.extra.GET_AUDIO_FORMAT", "audio/AMR")

        recognizerIntent.putExtra(RecognizerIntent.EXTRA_LANGUAGE, "en")
        recognizerIntent.putExtra(RecognizerIntent.EXTRA_LANGUAGE, "en-GB")
        recognizerIntent.putExtra(RecognizerIntent.EXTRA_LANGUAGE, Locale.US.toString())
        //intent.putExtra(RecognizerIntent.EXTRA_LANGUAGE_PREFERENCE, Locale.UK.toString())

        recognizerIntent.putExtra(RecognizerIntent.EXTRA_LANGUAGE_MODEL, RecognizerIntent.LANGUAGE_MODEL_FREE_FORM)
        recognizerIntent.putExtra(RecognizerIntent.EXTRA_MAX_RESULTS, 3)
        toggleButton.setOnCheckedChangeListener { _, isChecked ->
            if (isChecked) {
                progressBar.visibility = View.VISIBLE
                progressBar.isIndeterminate = true
                ActivityCompat.requestPermissions(this@AudioLessonsActivity,
                    arrayOf(Manifest.permission.RECORD_AUDIO),
                    permission)
            } else {
                progressBar.isIndeterminate = false
                progressBar.visibility = View.VISIBLE
                speech.stopListening()
            }
        }
    }

    override fun onRequestPermissionsResult(requestCode: Int, permissions: Array<String?>,
                                            grantResults: IntArray) {
        super.onRequestPermissionsResult(requestCode, permissions, grantResults)
        when (requestCode) {
            permission -> if (grantResults.isNotEmpty() && grantResults[0] == PackageManager.PERMISSION_GRANTED) {
                //speech.startListening(recognizerIntent)
                setRecordAudio()
            } else {
                Toast.makeText(this@AudioLessonsActivity, "Permission Denied!", Toast.LENGTH_SHORT).show()
            }
        }
    }
}

I have to add file should be in .wav format, as musicg only supports wav files.


Solution

  • Ok, I'll answer myself. I've finally got it thanks to this great OmRecorder library.

    I can now get a playable .wav audio file in my app's cache folder with user's voice.