Search code examples
pythongoogle-colaboratorytxt

Read txt files in google colab using google drive


I would like to read txt files using google drive.

In my google drive I have a dataset into this folder '/content/gdrive/My Drive/DATASETS/DOXES/*.txt' the data looks like this enter image description here

1) For some reason I cannot see in the ending of names the txt, is that a problem?

2) I would like to read this texts in google colab, I have done the following but it;s not working

from google.colab import drive
drive.mount("/content/gdrive")

# Import dataset from google drive
dataset_filepaths = glob.glob('/content/gdrive/My Drive/DATASETS/DOXES/') 

print('dataset_filepaths:', len(dataset_filepaths))
> 1

Updated: Thank you so much @Corralien for your help! I did the following to read the files

for filepath in tqdm.tqdm(dataset_filepaths):
  f = open(filepath, "r")
  print(f.read())

Which is working well, is there any way to create a pandas with all these txt files?


Solution

  • You have to use the wildcard to list files (and not directory):

    # You can also use *.txt instead of *
    txt_files = glob.glob('/content/gdrive/My Drive/DATASETS/DOXES/*')  # <- HERE
    
    for filename in txt_files:
        # do stuff here
    

    Update

    Is there any way to create a pandas with all these txt files?

    dfs = []
    for filepath in tqdm.tqdm(dataset_filepaths):
        df = pd.read_csv(filepath)
        dfs.append(df)
    out = pd.concat(dfs, axis=0)