python deep-learning data-extraction os.path

how do I resolve IndexError: list index out of range?

I am trying to replicate this repository: https://github.com/sujiongming/UCF-101_video_classification. I get the following error when I run the 2_extract_files.py file.

Traceback (most recent call last):
  File "2_extract_files.py", line 99, in <module>
    main()
  File "2_extract_files.py", line 96, in main
    extract_files()
  File "2_extract_files.py", line 38, in extract_files
    video_parts = get_video_parts(video_path)
  File "2_extract_files.py", line 76, in get_video_parts
    filename = parts[3]
IndexError: list index out of range

The code is as follows:

def extract_files():
    data_file = []
    folders = ['./train/', './test/']

    for folder in folders:
        class_folders = glob.glob(folder + '*')

        for vid_class in class_folders:
            class_files = glob.glob(vid_class + '/*.avi')

            for video_path in class_files:
                video_parts = get_video_parts(video_path)

                train_or_test, classname, filename_no_ext, filename = video_parts
                if not check_already_extracted(video_parts):

                    src = train_or_test + '/' + classname + '/' + \
                        filename
                    dest = train_or_test + '/' + classname + '/' + \
                        filename_no_ext + '-%04d.jpg'
                    call(["ffmpeg", "-i", src, dest])

                nb_frames = get_nb_frames_for_video(video_parts)

                data_file.append([train_or_test, classname, filename_no_ext, nb_frames])

                print("Generated %d frames for %s" % (nb_frames, filename_no_ext))

    with open('data_file.csv', 'w') as fout:
        writer = csv.writer(fout)
        writer.writerows(data_file)

    print("Extracted and wrote %d video files." % (len(data_file)))

def get_nb_frames_for_video(video_parts):
    train_or_test, classname, filename_no_ext, _ = video_parts
    generated_files = glob.glob(train_or_test + '/' + classname + '/' +
                                filename_no_ext + '*.jpg')
    return len(generated_files)

def get_video_parts(video_path):
    parts = video_path.split('/')
    filename = parts[3]
    filename_no_ext = filename.split('.')[0]
    classname = parts[2]
    train_or_test = parts[1]

    return train_or_test, classname, filename_no_ext, filename

can anyone tell me what I'm doing wrong and guide me on how to get the list index right. Thanks in advance.

Window 10
Python 3.7.6

Solution

It is recommended to use just os.path.split(video_path) and os.path.splitext() and work your way through, it safer and also more portable:

def get_video_parts(video_path):
    head, filename = os.path.split(video_path)
    filename_no_ext, ext = os.path.splitext(filename)
    head, classname = os.path.split(head)
    head, train_or_test = os.path.split(head)

    return train_or_test, classname, filename_no_ext, filename

https://docs.python.org/3/library/os.path.html#os.path.split

I'm a bit outdated - so chances that you would like to try out pathlib for more high level operations on paths objects - in this case it would probably be combination of path.stem() to get the name of the last part without extension and path.parent() to go up.

https://docs.python.org/3/library/pathlib.html#module-pathlib