Search code examples
pythonjupyter-notebookgoogle-colaboratory

Google Colab having problems with Drive folders containing lots of files


I have imported several folders from Drive onto Google Colab. The smaller folders work fine when listing directories, but when I try to list the directories in the larger folders, Colab gives me an error.

I am aware that there are other ways of listing directories, but this same issue is causing problems further down the line when I try to access the files for training.

I am using this to import the files:

from google.colab import drive
drive.mount('/content/drive')

And then describing the folders as follows:

TRAIN = '../content/drive/My Drive/train/'
TEST = '../content/drive/My Drive/test/'

When I try to do the following:

print(os.listdir(TEST))
print(os.listdir(TRAIN))

TEST prints fine. It has circa 8000 files (all images).

TRAIN prints some times, others it doesn't! It has circa 32,000 files (all images too). It prints this when I try to run it:

OSError: [Errno 5] Input/output error: '../content/drive/My Drive/train/'

Does anyone know how to fix this in Google colab?

I've found that if after importing the files I wait for a while and then run the prints, it runs, suggesting that Colab takes a while to process the files from Drive even after the cell importing stops running.


Solution

  • Drive FUSE operations can time out when the number of files in a directory becomes large.

    I/O operations for Drive directories are proportional to the number of files in the directory. Since there's a fixed timeout in the FUSE client, when the number of files becomes large enough, operations in the directory will fail.

    A work-around is to organize your files into subdirectories so that the number of files or folders in a single directory doesn't become so large.