Search code examples
machine-learningkerasnlptensorflow2.0tokenize

Why does my python code gives the type error as the dict object is not callable when loading a list of dictionaries into a Tokenizer object?


I am trying to program a sarcasm detection model using sarcasm data set from Kaggle using Jupiter notebook. I have downloaded the dataset to my pc and have modified it as a list of dictionaries. the dictionary consists of three keys as article_link, is_sarcastic, and headline.

my code below gives the following error:


TypeError Traceback (most recent call last) in 7 tokenizer.fit_on_texts(sentences) 8 ----> 9 my_word_index=tokenizer.word_index() 10 11 print(len(word_index))

TypeError: 'dict' object is not callable

import os
import pandas

os.getcwd()
import json


os.chdir('C:/Users/IMALSHA/Desktop/AI content writing/Cousera Deep Neural Networks course/NLP lectures')

#loading data 
with open('Sarcasm_Headlines_Dataset.json','r') as json_file: 
    data_set=json.load(json_file)

#defining lists
sentences=[]
labels=[]
urls=[]

for item in data_set:
    sentences.append(item['headline'])
    labels.append(item['is_sarcastic'])
    urls.append(item['article_link'])

import tensorflow as tf
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences


tokenizer=Tokenizer(oov_token="<oov>")
tokenizer.fit_on_texts(sentences)

word_index=tokenizer.word_index()

print(len(word_index))
print(word_index)

sequences=tokenizer.texts_to_sequences(sentences)
paded=pad_sequences(sequences)

print(paded[2])

Solution

  • The problem is the following:

    word_index=tokenizer.word_index()
    

    Probably, you want to store tokenizer's word_index into word_index variable. Instead, you are calling tokenizer.word_index as if it was a method/function, but it is a dictionary.

    So, I think that you have to apply the following correction:

    word_index=tokenizer.word_index