Search code examples
machine-learningdeep-learningfast-ai

FastAI Question on data loading using TextList


My end-goal to implement ULMFit using FastAI to predict disaster tweets(as a part of this Kaggle competition). What I'm trying to do is read the tweets from a Dataframe. But for reasons unknown to me, I'm stuck at the data loading stage. I'm simply unable to do so using the below method -

from fastai.text.all import *
train= pd.read_csv('../input/nlp-getting-started/train.csv')

dls_lm = (TextList.from_df(path,train,cols='text',is_lm=True)
            .split_by_rand_pct(0.1)
            #.label_for_lm()           
            .databunch(bs=64))

This line throws - NameError: name 'TextList' is not defined.

I'm able to work around this problem with the below code -

dls_lm = DataBlock(
        blocks=TextBlock.from_df('text', is_lm=True),
        get_x=ColReader('text'), 
        splitter=RandomSplitter(0.1) 
    # using only 10% of entire comments data for validation inorder to learn more
)
dls_lm = dls_lm.dataloaders(train, bs=64, seq_len=72)

Why does this work and not the previous method?

Notebook Link for reference.


Solution

  • Which version of fastai are you running?

    import fastai
    print(fastai.__version__)
    

    TextList class is from FastAI v1, but it seems to me your import path is for Fastai v2, and in v2, TextList is changed with https://docs.fast.ai/text.data.html#TextBlock (thats why it's working with the Datablock part wich is the good way to handle this)