My C library generates a very big array of POD structs. What is the most efficient way to pass it to Ruby side? On Ruby side a raw array of values is fine for me.
My current solution works by storing each element and field separately and it is very slow. Profiling showed that this functions takes about ~15% of program time on average data and it is not even computational part.
I've read about Data_Wrap_Struct
, but not sure that i need it.
If I will pass a raw void*
to string and then unpack it on the Ruby side, will it be much faster?
struct SPacket
{
uint32_t field1;
uint32_t field2;
uint16_t field3;
uint8_t field4;
};
VALUE rb_GetAllData(VALUE self) // SLOOOW
{
size_t count = 0;
struct SPacket* packets = GetAllData(&count);
VALUE arr = rb_ary_new2(count);
for(size_t i = 0; i < count; i++)
{
VALUE sub_arr = rb_ary_new2(4);
rb_ary_store(sub_arr, 0, UINT2NUM(packets[i].field1));
rb_ary_store(sub_arr, 1, UINT2NUM(packets[i].field2));
rb_ary_store(sub_arr, 2, UINT2NUM(packets[i].field3));
rb_ary_store(sub_arr, 3, UINT2NUM(packets[i].field4));
rb_ary_store(arr, i, sub_arr);
}
return arr;
}
Your method copies your C array into a Ruby array. You could avoid this my creating a Ruby collection class that wraps the C array using Data_Wrap_Struct
and acts directy on it.
Data_Wrap_Struct
is a macro that takes a Ruby class and a C struct
(and optionally a couple of pointers to functions for memory management that I’m deliberately omitting) and create an instance of that class that has the struct
“attached”. In the functions that provide the implementation of this classes methods you then use Data_Get_Struct
to “unwrap” the struct
that you can then access in the function.
In this case, something like this:
// declare a variable for the new class
VALUE rb_cSPacketCollection;
// a struct that will be wrapped by the class
struct SPacketDataStruct {
struct SPacket * data;
int count;
};
VALUE rb_GetAllData() {
struct SPacketDataStruct* wrapper = malloc(sizeof (struct SPacketCollectionWrapper));
wrapper->data = GetAllData(&wrapper->count);
return Data_Wrap_Struct(rb_cSPacketCollection, 0, 0, wrapper);
}
and in your Init_whatever()
method you’ll need to create the class:
rb_cSPacketCollection = rb_define_class("SPacketCollection", rb_cObject);
This alone isn’t much use, you need to define some methods on this new class. As an example you could create a []
method to allow access to the individual SPacket
s:
VALUE SPacketCollection_get(VALUE self, VALUE index) {
// unwrap the struct
struct SPacketDataStruct* wrapper;
Data_Get_Struct(self, struct SPacketDataStruct, wrapper);
int i = NUM2INT(index);
// bounds check
if (i >= wrapper->count) {
rb_raise(rb_eIndexError, "Index out of bounds");
}
// just return an array in this example
VALUE arr = rb_ary_new2(4);
rb_ary_store(arr, 0, UINT2NUM(wrapper->data[i].field1));
rb_ary_store(arr, 1, UINT2NUM(wrapper->data[i].field2));
rb_ary_store(arr, 2, UINT2NUM(wrapper->data[i].field3));
rb_ary_store(arr, 3, UINT2NUM(wrapper->data[i].field4));
return arr;
}
and then in your Init_
method, after creating the class you define the method:
rb_define_method(rb_cSPacketCollection, "[]", SPacketCollection_get, 1);
Note Data_Get_Struct
is a macro and the usage is slightly odd, in that it doesn’t return the unwrapped struct
.
Since you’ve started using Data_Wrap_Struct
by this stage, you could go further and create a new class that wraps an individual SPacket
struct and operates directly on it:
// declare a variable for the new class
VALUE rb_cSPacket;
//and a function to get a field value
// you'll need to create more methods to access the other fields
// (and possibly to set them)
VALUE SPacket_field1(VALUE self) {
struct SPacket* packet;
Data_Get_Struct(self, struct SPacket, packet);
return UINT2NUM(packet->field1);
}
In your Init_
function, create it and define the methods:
rb_cSPacket = rb_define_class("SPacket", rb_cObject);
rb_define_method(rb_cSPacket, "field1", SPacket_field1, 0);
This may entail a bit of work to create all the getters and setters for the fields, it will depend on how you’re using it. Something like ffi could help here, but I don’t know how ffi would deal with the collection class – it would probably be worth looking into.
Now change your []
function to return an instance if this new class:
VALUE SPacketCollection_get(VALUE self, VALUE index) {
//unwrap the struct
struct SPacketDataStruct* wrapper;
Data_Get_Struct(self, struct SPacketDataStruct, wrapper);
int i = NUM2INT(index);
//bounds check
if (i >= wrapper->count) {
rb_raise(rb_eIndexError, "Index out of bounds");
}
//create an instance of the new class, and wrap it around the
//struct in the array
struct SPacket* packet = &wrapper->data[i];
return Data_Wrap_Struct(rb_cSPacket, 0, 0, packet);
}
With this you can now do something like this in Ruby:
c = get_all_data # in my testing I just made this a global method
c[2].field1 # returns the value of field1 of the third SPacket in the array
It might be worth creating an each
method on the collection class, and then you can include the Enumerable
module and make available a load of methods:
VALUE SPacketCollection_each(VALUE self) {
//unwrap the struct as before
struct SPacketDataStruct* wrapper;
Data_Get_Struct(self, struct SPacketDataStruct, wrapper);
int i;
for(i = 0; i < wrapper->count; i++) {
//create a new instance if the SPacket class
// wrapping this entry
struct SPacket* packet = &wrapper->data[i];
rb_yield(Data_Wrap_Struct(rb_cSPacket, 0, 0, packet));
}
return self;
}
in Init_whatever
:
rb_define_method(rb_cSPacketCollection, "each", SPacketCollection_each, 0);
rb_include_module(rb_cSPacketCollection, rb_mEnumerable);
In this example I haven’t been concerned about things like object identity and memory management. With everything backed by the same array you could have multiple objects that all share the same data, you’ll have to consider whether this is okay for your use. Also you may have noticed I’ve malloc
ed but not free
d. You’ll need to determine who “owns” the data array and make sure you don’t introduce any memory leaks. You can pass a function to Data_Wrap_Struct
that will be called when the object is garbage collected to free memory.
If you haven’t already seen it, the Pickaxe book has a good chapter on C extensions, and is now available online.