I've modified the torch-dataframe to return self
instead of void in order to achieve simple method chaining for a torch.class
. Unfortunately this seems to be causing havoc with memory issues:
th> require 'Dataframe'; df = torch.load('dataset_4_torch.t7')
[4.8434s]
th> b = df:create_subsets() -- Works
[0.7384s]
th> df:create_subsets() -- Fails even if called before the b = df:create_...
/home/max/tools/torch/install/bin/luajit: not enough memory
I've tried overwriting the default print that is called on all returned objects but it didn't help.
Here's some memory profiling:
th> collectgarbage("count")
1836.24609375
[0.0002s]
th> require 'Dataframe'; df = torch.load('dataset_4_torch.t7')
[4.6875s]
th> collectgarbage("count")
59659.619140625
[0.0003s]
th> b = df:create_subsets()
[0.7571s]
th> collectgarbage("count")
62303.567382812
[0.0001s]
th> df:create_subsets()
/home/max/tools/torch/install/bin/luajit: not enough memory
If this problem is to hard then I would appreciate an example of how to properly apply the method chaining patterns for a torch.class.
Thanks for helping me out. It turns out that I put in an assertion and in the error message I added at some point a tostring of the entire column. This worked in our testing environment but in a real dataset a massive string was generated and hence the memory issue.