Search code examples
pythonpandastensorflowkerasone-hot-encoding

How to save one hot encoder?


I am trying to save a one hot encoder from keras to use it again on different texts but keeping the same encoding.

Here is my code :

df = pd.read_csv('dataset.csv ')
vocab_size = 200000
encoded_docs = [one_hot(d, vocab_size) for d in df.text]

How can I save this encoder and use it again later ?

I found this in my research but one_hot() seems to be a function and not an object (sorry if this is plain wrong I am fairly new to python).


Solution

  • Mentioning the Answer in this Section (although it is present in Comments Section), for the benefit of the Community.

    To Save the Encoder, you can use the below code:

    import pickle
    with open("encoder", "wb") as f: 
        pickle.dump(one_hot, f)
    

    Then to Load the Saved Encoder, use the below code:

    encoder = pickle.load(f) 
    encoded_docs =[encoder(d, vocab_size) for d in df.text]
    

    Since the function, from.keras.preprocessing.text import one_hot uses hash() to generate quasi-unique encodings, we need to use a HashSeed for reproducing our Results (getting same result even after multiple executions).

    Run the below code in the Terminal, for Setting the HashSeed:

    enter image description here