i wish to send a list consisting of multi-dimensional NumPy arrays over the socket to my server and restore its format right after. The List of arrays (variable aggregated_ndarrays) looks like the following:
[array([[[[-1.04182057e-01, 9.81570184e-02, 8.69736895e-02,
-6.61955923e-02, -4.51700203e-02],
[ 5.26290983e-02, -1.18473642e-01, 2.64136307e-02,
-9.26332623e-02, -6.63961545e-02],
[-8.80082026e-02, 7.90973455e-02, -1.13944486e-02,
-1.51292123e-02, 7.65037686e-02],
[-9.15177837e-02, 7.08795676e-04, -1.08281896e-03,
8.65678713e-02, 6.68114647e-02],
[-8.45356733e-02, -6.90313280e-02, -5.81113175e-02,
-1.14920050e-01, -4.11906727e-02]],
...
3.35839503e-02, 6.30911887e-02, 4.10411768e-02,
-3.64055522e-02, -3.56383622e-02, 9.80690420e-02,
8.15757737e-02, -1.00057133e-01, 1.16158882e-02,
-9.82330441e-02, 9.00610462e-02, -1.01473713e-02,
-2.64037345e-02, 1.37711661e-02, 6.63968623e-02]], dtype=float32), array([-0.02089943, -0.0020895 , -0.00506333, 0.03931976, 0.04795408,
-0.01520141, -0.03287903, 0.0037387 , 0.01339047, -0.0576841 ],
dtype=float32)]`
this is the client:
import socket
# deserialize the weights, so they can be processed more easily
# Convert `Parameters` to `List[np.ndarray]`
aggregated_ndarrays: List[np.ndarray] = parameters_to_ndarrays(aggregated_parameters)
# send the aggregated weights to the central server together with the number of training- examples
print("Attempting server connection")
conn = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
conn.connect(("127.0.0.1", 8088))
conn.send(str((aggregated_ndarrays,n_examples_fit)).encode())
and the server:
sock.bind(("127.0.0.1", 8088))
sock.listen()
print("Created server socket and listening %s" % sock)
conn, addr = sock.accept()
print("Accepted client connection")
weights, n1 = conn.recv(999_999).decode().rsplit(" ", 1)
i previously tried to send the data over the socket with json.dumps but im getting the error TypeError: Object of type ndarray is not JSON serializable
.When sending the data as encoded bytes and trying to send it to the server side the received data is just a plain decoded string instead of a list of multi-dimensional NumPy arrays.
I am using python 3.10.
numpy
has a .tobytes()
method which will convert a numpy array into a bytes
object that can be transmitted. It has a .frombuffer()
method to convert back to a numpy array, but it will be a single dimension and default to float32
. Other data must be sent to reconstruct the original data type and shape or the array.
TCP is not a message-based protocol, so you cannot simply send the bytes and expect to receive them as a complete message in one recv()
call. You must design a byte stream that has the information needed to determine a complete message has been received, and buffer received data until a complete message can be extracted.
socket.makefile()
is a method that will buffer data and has the file-like methods readline
and read
. The former reads newline-terminated data, and the latter reads a fixed number of bytes. Both may return less data if the socket is closed.
Below is a simple protocol that uses a single newline-terminated line of JSON as a header with the metadata needed to reconstruct a numpy array and socket.makefile
to read the header line and byte data and extract the numpy array:
server.py
import json
import numpy as np
import socket
with socket.socket() as s:
s.bind(('localhost', 5000))
s.listen()
while True:
client, addr = s.accept()
print(f'{addr}: connected')
with client, client.makefile('rb') as rfile:
while True:
header = rfile.readline()
if not header: break
metadata = json.loads(header)
print(f'{addr}: {metadata}')
serial_data = rfile.read(metadata['length'])
data = np.frombuffer(serial_data, dtype=metadata['type']).reshape(metadata['shape'])
print(data)
print(f'{addr}: disconnected')
client.py
import json
import numpy as np
import socket
def transmit(sock, data):
serial_data = data.tobytes()
metadata = {'type': data.dtype.name,
'shape': data.shape,
'length': len(serial_data)}
sock.sendall(json.dumps(metadata).encode() + b'\n')
sock.sendall(serial_data)
with socket.socket() as s:
s.connect(('localhost', 5000))
data = np.array([[1,2,3],[4,5,6],[7,8,9]], dtype=np.float32)
transmit(s, data)
data = np.array([[[1,2],[3,4]],[[5,6],[7,8]]], dtype=np.int16)
transmit(s, data)
Output:
('127.0.0.1', 3385): connected
('127.0.0.1', 3385): {'type': 'float32', 'shape': [3, 3], 'length': 36}
[[1. 2. 3.]
[4. 5. 6.]
[7. 8. 9.]]
('127.0.0.1', 3385): {'type': 'int16', 'shape': [2, 2, 2], 'length': 16}
[[[1 2]
[3 4]]
[[5 6]
[7 8]]]
('127.0.0.1', 3385): disconnected
pickle
can be used for serialization as well. It has the advantage that metadata is built-in and it works nicely with the file-like stream created by socket.makefile
. The disadvantage is that it isn't secure and a malicious client can take advantage of that.
server.py
import pickle
import numpy as np
import socket
with socket.socket() as s:
s.bind(('localhost', 5000))
s.listen()
while True:
client, addr = s.accept()
print(f'{addr}: connected')
with client, client.makefile('rb') as rfile:
while True:
try:
data = pickle.load(rfile)
except EOFError: # Throws exception if incomplete or socket closed
break
print(data)
print(f'{addr}: disconnected')
client.py
import pickle
import numpy as np
import socket
def transmit(sock, data):
serial_data = pickle.dumps(data)
sock.sendall(serial_data)
with socket.socket() as s:
s.connect(('localhost', 5000))
data = np.array([[1,2,3],[4,5,6],[7,8,9]], dtype=np.float32)
transmit(s, data)
data = np.array([[[1,2],[3,4]],[[5,6],[7,8]]], dtype=np.int16)
transmit(s, data)
Output:
('127.0.0.1', 3578): connected
[[1. 2. 3.]
[4. 5. 6.]
[7. 8. 9.]]
[[[1 2]
[3 4]]
[[5 6]
[7 8]]]
('127.0.0.1', 3578): disconnected