Search code examples
pythontensorflowaudiospeech-recognition

Why am I getting a UnicodeDecodeError?


I'm following this https://www.tensorflow.org/tutorials/audio/simple_audio I'm at this part:

commands = np.array(tf.io.gfile.listdir(str(data_dir)))
commands = commands[(commands != 'README.md') & (commands != '.DS_Store')]
print('Commands:', commands)

After running this file:

import os
import pathlib

import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
import tensorflow as tf


from tensorflow.keras import layers
from tensorflow.keras import models
from IPython import display

seed = 42
tf.random.set_seed(seed)
np.random.seed(seed)


DATASET_PATH = 'data/mini_speech_commands'

data_dir = pathlib.Path(DATASET_PATH)
if not data_dir.exists():
  tf.keras.utils.get_file(
      'mini_speech_commands.zip',
      origin="http://storage.googleapis.com/download.tensorflow.org/data/mini_speech_commands.zip",
      extract=True,
      cache_dir='.', cache_subdir='data')


commands = np.array(tf.io.gfile.listdir(str(data_dir)))
commands = commands[(commands != 'README.md') & (commands != '.DS_Store')]
print('Commands:', commands)

I get this error:

Traceback (most recent call last):
  File "E:\Project አዝናኝ\AI\Assistant\keyword.py", line 30, in <module>
    commands = np.array(tf.io.gfile.listdir(data_dir))
  File "E:\Project አዝናኝ\AI\Assistant\lib\site-packages\tensorflow\python\lib\io\file_io.py", line 767, in list_directory_v2
    if not is_directory(path):
  File "E:\Project አዝናኝ\AI\Assistant\lib\site-packages\tensorflow\python\lib\io\file_io.py", line 689, in is_directory
    return is_directory_v2(dirname)
  File "E:\Project አዝናኝ\AI\Assistant\lib\site-packages\tensorflow\python\lib\io\file_io.py", line 703, in is_directory_v2
    return _pywrap_file_io.IsDirectory(compat.path_to_bytes(path))
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position 15: invalid start byte

Solution

  • Try changing your full project path to English or move your file to a directory that does not have weird characters. I see your language is not English so it's very likely your full path contains some characters like that. The problem is _pywrap_file_io.IsDirectory tries to make use of the full path and do some magic with it so if your path contains esoteric characters it fails.