Search code examples
pythondeep-learningdata-extractionos.path

how do I resolve IndexError: list index out of range?


I am trying to replicate this repository: https://github.com/sujiongming/UCF-101_video_classification. I get the following error when I run the 2_extract_files.py file.

Traceback (most recent call last):
  File "2_extract_files.py", line 99, in <module>
    main()
  File "2_extract_files.py", line 96, in main
    extract_files()
  File "2_extract_files.py", line 38, in extract_files
    video_parts = get_video_parts(video_path)
  File "2_extract_files.py", line 76, in get_video_parts
    filename = parts[3]
IndexError: list index out of range

The code is as follows:

def extract_files():
    data_file = []
    folders = ['./train/', './test/']

    for folder in folders:
        class_folders = glob.glob(folder + '*')

        for vid_class in class_folders:
            class_files = glob.glob(vid_class + '/*.avi')

            for video_path in class_files:
                video_parts = get_video_parts(video_path)

                train_or_test, classname, filename_no_ext, filename = video_parts
                if not check_already_extracted(video_parts):

                    src = train_or_test + '/' + classname + '/' + \
                        filename
                    dest = train_or_test + '/' + classname + '/' + \
                        filename_no_ext + '-%04d.jpg'
                    call(["ffmpeg", "-i", src, dest])

                nb_frames = get_nb_frames_for_video(video_parts)

                data_file.append([train_or_test, classname, filename_no_ext, nb_frames])

                print("Generated %d frames for %s" % (nb_frames, filename_no_ext))

    with open('data_file.csv', 'w') as fout:
        writer = csv.writer(fout)
        writer.writerows(data_file)

    print("Extracted and wrote %d video files." % (len(data_file)))

def get_nb_frames_for_video(video_parts):
    train_or_test, classname, filename_no_ext, _ = video_parts
    generated_files = glob.glob(train_or_test + '/' + classname + '/' +
                                filename_no_ext + '*.jpg')
    return len(generated_files)

def get_video_parts(video_path):
    parts = video_path.split('/')
    filename = parts[3]
    filename_no_ext = filename.split('.')[0]
    classname = parts[2]
    train_or_test = parts[1]

    return train_or_test, classname, filename_no_ext, filename

can anyone tell me what I'm doing wrong and guide me on how to get the list index right. Thanks in advance.

Window 10
Python 3.7.6

Solution

  • It is recommended to use just os.path.split(video_path) and os.path.splitext() and work your way through, it safer and also more portable:

    def get_video_parts(video_path):
        head, filename = os.path.split(video_path)
        filename_no_ext, ext = os.path.splitext(filename)
        head, classname = os.path.split(head)
        head, train_or_test = os.path.split(head)
    
        return train_or_test, classname, filename_no_ext, filename
    

    https://docs.python.org/3/library/os.path.html#os.path.split

    I'm a bit outdated - so chances that you would like to try out pathlib for more high level operations on paths objects - in this case it would probably be combination of path.stem() to get the name of the last part without extension and path.parent() to go up.

    https://docs.python.org/3/library/pathlib.html#module-pathlib