Search code examples
pythonazure-storagegensim

Azure blob storage model access for gensim in python


I am trying to load my model files using below code

import gensim
import os
from azure.storage.blob import BlobServiceClient
from smart_open import open

azure_storage_connection_string = "DefaultEndpointsProtocol=https;AccountName=lnipcfdevlanding;AccountKey=xxxxxxxxx"
client = BlobServiceClient.from_connection_string(azure_storage_connection_string)
file_prefix="azure://landing/TechnologyCluster/VectorCreation/embeddings/"
fin = open(file_prefix+"word2vec.Tobacco.fasttext.model", transport_params=dict(client=client))
clustering.embedding = gensim.models.Word2Vec.load(fin)

But it is failing with below error AttributeError: '_io.TextIOWrapper' object has no attribute 'endswith'

I assume the way I am passing file to gensim.models.Word2Vec.load is not the right way. I could not find any good example that how to pass the filename which is on Azure blob storage, if I give complete uri it does not work, what is the right way to achieve this ?


Solution

  • Please check :

    AttributeError usually occurs when save() or load() function is called on an object instance instead of class as that is a class method..

    Please note that the information in file can be incomplete ,check if the binary tree is missing.In this case, the query may be obtained ,but you may not be able to continue with a model loaded with this way.

    Check if the file path must be saved in word2vec-format file

    binary is bool, if True, indicates data is in binary word2vec format.

    Note from: https://radimrehurek.com/gensim/models/keyedvectors.html.

    Check if this way can be worked around by importing required models and by checking version compliance : -

    gensim.models.Word2Vec.load_word2vec_format('model', binary=True)

    or withKeyedVectors.load in place of .Word2Vec.load_... according to what supports fasttext.

    Also check whether the model is correctly supports the function.

    References:

    1. python - AttributeError: 'Word2Vec' object has no attribute 'endswith' - Stack Overflow
    2. models.keyedvectors – Store and query word vectors — gensim (radimrehurek.com)