adj_mx = tf.SparseTensor(indices=indices, values=values, dense_shape=dense_shape)
adj_mx_serialized = tf.io.serialize_sparse(adj_mx).numpy().tobytes()
I did this and saved as blob to sql.
I know what I did is probably a mistake, but I didn't notice it at the time. And I really would love to be able to de-serialize it without needing to serialize it again, correctly this time. I have done this for half a million matrices, so I want to know if it is remotely possible to reverse this serialization.
According to TensorFlow, the serialize_sparse function returns:
"A 3-vector (1-D Tensor), with each column representing the serialized SparseTensor's indices, values, and shape (respectively)."
#...
sparse_adj = nx.adjacency_matrix(G, weight='weight')
coo = sparse_adj.tocoo()
indices = np.mat([coo.row, coo.col]).transpose()
values = coo.data
dense_shape = (aa_limit, aa_limit)
adj_mx = tf.SparseTensor(indices=indices, values=values, dense_shape=dense_shape)
This was the construction.
I know how many indices we have (i mean i know len(coo.row)
which is equal to len(coo.col)
). The values are floats.
Do I have any chance?
What I've tried:
tf.io.deserialize_many_sparse(blob, dtype=tf.float32)
InvalidArgumentError: {{function_node __wrapped__DeserializeManySparse_device_/job:localhost/replica:0/task:0/device:CPU:0}} Serialized sparse should have non-zero rank [] [Op:DeserializeManySparse]
serialized_data = np.frombuffer(blob, dtype=np.uint8)
deserialized = tf.io.deserialize_many_sparse(serialized_data, dtype=tf.float32)
ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type int).
I'm afraid this is irreversible.
Serializing a sparse tensor produces a string. Converting this to NumPy, this string has object dtype. In NumPy, an array of object dtype does not contain the objects themselves. Rather, it contains pointers to those objects.
As a way to prove this, here's a code example which prints the length of the serialized data. Regardless of how much data is contained in the tensor, (which you can change by changing the aa_limit
parameter) this will print that the serialized length is 12 or 24 bytes long.
import scipy
import tensorflow as tf
import numpy as np
aa_limit = 10
sparse_adj = scipy.sparse.random(aa_limit, aa_limit, density=0.5)
print(sparse_adj)
coo = sparse_adj.tocoo()
indices = np.mat([coo.row, coo.col]).transpose()
values = coo.data
dense_shape = (aa_limit, aa_limit)
adj_mx = tf.SparseTensor(indices=indices, values=values, dense_shape=dense_shape)
tf.sparse.to_dense(tf.sparse.reorder(adj_mx))
adj_mx_serialized = tf.io.serialize_sparse(adj_mx).numpy().tobytes()
print("serialized length", len(adj_mx_serialized))
So the step where you convert a NumPy array to bytes with .tobytes()
is not reversible - NumPy does not follow those pointers and serialize each object they point to. Unless the process which created these objects has not exited, these pointers are useless. They represent the location in memory where this tensor was allocated, rather than the contents of the tensor.
In the future, when trying to reversibly serialize NumPy arrays, I recommend you look into either np.save()
, (which is built in to numpy) or the msgpack-numpy package (which is faster and smaller than np.save()
).