As described, I load a trained word2vec model through pyspark.
word2vec_model = Word2VecModel.load("saving path")
After using that, I want to delete it since it will take much memory space on single node (I used the findSynonyms function, and the doc says it should be local used only) I tried to use
del word2vec_model
gc.collect()
but it seems that doesn't word. And it's not an rdd file, I can't use .unpersist(). I didn't find any like unload() fuction in the doc.
Anyone could help me or give me some advice?
You can ensure that the object is dereferenced by the py4j gateway by running the following statement:
Given word2vec_model
a pyspark Transformer
:
spark
a SparkSession
:spark.sparkContext._gateway.detach(word2vec_model._java_obj)
sc
a SparkContext
:sc._gateway.detach(word2vec_model._java_obj)
Explanations:
Transformer
and each transformer holds an instance of JavaObject
in a private _java_obj
attribute. SparkContext
's py4j gateway.detach
method on the wrapper object (instance of JavaObject
)