I'm trying to allocate a really big dataset (~28GB of RAM in an ndarray) into theano shared variables, using borrow=True to avoid replicating the memory. In order to do so, I'm using the following function:
def load_dataset(path):
# Load dataset from memory
data_f = np.load(path+'train_f.npy')
data_t = np.load(path+'train_t.npy')
# Split into training and validation
return (
(
theano.shared(data_f[:-1000, :], borrow=True),
theano.shared(data_t[:-1000, :], borrow=True)
), (
theano.shared(data_f[-1000:, :], borrow=True),
theano.shared(data_t[-1000:, :], borrow=True)
)
)
In order to avoid data conversions, prior to saving the arrays to disk I already defined them to be in the correct format (afterwards filling them and dumping them into disk with np.save()):
data_f = np.ndarray((len(rows), 250*250*3), dtype=theano.config.floatX)
data_t = np.ndarray((len(rows), 1), dtype=theano.config.floatX)
It seems, though, that theano tires to replicate the memory anyway, dumping me the following error:
Error allocating 25594500000 bytes of device memory (out of memory). Driver report 3775729664 bytes free and 4294639616 bytes total.
Theano is configured to work on the GPU (GTX 970).
Instead of using theano.shared
, it is possible to use theano.tensor._shared
to force the data to be allocated into CPU memory. The fixed code ends up like this:
def load_dataset(path):
# Load dataset from memory
data_f = np.load(path+'train_f.npy')
data_t = np.load(path+'train_t.npy')
# Split into training and validation
return (
(
theano.tensor._shared(data_f[:-1000, :], borrow=True),
theano.tensor._shared(data_t[:-1000, :], borrow=True)
), (
theano.tensor._shared(data_f[-1000:, :], borrow=True),
theano.tensor._shared(data_t[-1000:, :], borrow=True)
)
)