Note: This post is similar, but not quite the same as a more open-ended questions asked on Reddit: https://www.reddit.com/r/rakulang/comments/vvpikh/looking_for_guidance_on_getting_nativecall/
I'm trying to use the md4c c library to process a markdown file with its md_parse
function. I'm having no success, and the program just quietly dies. I don't think I'm calling it with the right arguments.
Documentation for the function is here: https://github.com/mity/md4c/wiki/Embedding-Parser%3A-Calling-MD4C
I'd like to at least figure out the minimum amount of code needed to do this without error. This is my latest attempt, though I've tried many:
use v6.d;
use NativeCall;
sub md_parse(str, int32, Pointer is rw ) is native('md4c') returns int32 { * }
md_parse('hello', 5, Pointer.new());
say 'hi'; # this never gets printed
md4c is a SAX-like streaming parser that calls your functions when it encounters markdown elements. If you call it with an uninitialised Pointer, or with an uninitialised CStruct then the code will SEGV when the md4c library tries to call a null function pointer.
The README says:
The main provided function is md_parse(). It takes a text in the Markdown syntax and a pointer to a structure which provides pointers to several callback functions.
As md_parse() processes the input, it calls the callbacks (when entering or leaving any Markdown block or span; and when outputting any textual content of the document), allowing application to convert it into another format or render it onto the screen.
The function signature of md_parse is:
int md_parse(const MD_CHAR* text, MD_SIZE size, const MD_PARSER* parser, void* userdata);
In order for md_parse() to work, you will need to:
The 4th parameter to md_parse() is void* userdata
which is a pointer that you provide which gets passed back to you as the last parameter of each of the callback functions. My guess is that it's optional and if you pass a null value then you'll get called back with a null userdata parameter in each callback.
Followup
This turned into an interesting rabbit hole to fall down.
The code that makes it possible to pass a Raku sub as a callback parameter to a native function is quite complex and relies on MoarVM ops to build and cache the FFI callback trampoline. This is a piece of code that marshals the C calling convention parameters into a call that MoarVM can dispatch to a Raku sub.
It will be a sizeable task to implement equivalent functionality to provide some kind of nativecast
that will generate the required callback trampoline and return a Pointer
that can be assigned into a CStruct.
But we can cheat
We can use a simple C function to return the pointer to a generated callback trampoline as if it was for a normal callback sub. We can then store this pointer in our CStruct
and our problem is solved. The generated trampoline is specific to the function signature of the Raku sub we want to call, so we need to generate a different NativeCall binding for each function signature we need.
The C function:
void* get_pointer(void* p)
{
return p;
}
Binding a NativeCall sub for the function signature we need:
sub get_enter_leave_fn(&func (uint32, Pointer, Pointer))
is native('./getpointer') is symbol('get_pointer') returns Pointer { * }
Initialising a CStruct
attribute:
$!enter_block := get_enter_leave_fn(&enter_block);
Putting it all together:
use NativeCall;
enum BlockType < DOC QUOTE UL OL LI HR H CODE HTML P TABLE THEAD TBODY TR TH TD >;
enum SpanType < EM STRONG A IMG SPAN_CODE DEL SPAN_LATEXMATH LATEXMATH_DISPLAY WIKILINK SPAN_U >;
enum TextType < NORMAL NULLCHAR BR SOFTBR ENTITY TEXT_CODE TEXT_HTML TEXT_LATEXMATH >;
sub enter_block(uint32 $type, Pointer $detail, Pointer $userdata --> int32) {
say "enter block { BlockType($type) }";
}
sub leave_block(uint32 $type, Pointer $detail, Pointer $userdata --> int32) {
say "leave block { BlockType($type) }";
}
sub enter_span(uint32 $type, Pointer $detail, Pointer $userdata --> int32) {
say "enter span { SpanType($type) }";
}
sub leave_span(uint32 $type, Pointer $detail, Pointer $userdata --> int32) {
say "leave span { SpanType($type) }";
}
sub text(uint32 $type, str $text, uint32 $size, Pointer $userdata --> int32) {
say "text '{$text.substr(0..^$size)}'";
}
sub debug_log(str $msg, Pointer $userdata --> int32) {
note $msg;
}
#
# Cast functions that are specific to the required function signature.
#
# Makes use of a utility C function that returns its `void*` parameter, compiled
# into a shared library called libgetpointer.dylib (on MacOS)
#
# gcc -shared -o libgetpointer.dylib get_pointer.c
#
# void* get_pointer(void* p)
# {
# return p;
# }
#
# Each cast function uses NativeCall to build an FFI callback trampoline that gets
# cached in an MVMThreadContext. The generated callback code is specific to the
# function signature of the Raku function that will be called.
#
sub get_enter_leave_fn(&func (uint32, Pointer, Pointer))
is native('./getpointer') is symbol('get_pointer') returns Pointer { * }
sub get_text_fn(&func (uint32, str, uint32, Pointer))
is native('./getpointer') is symbol('get_pointer') returns Pointer { * }
sub get_debug_fn(&func (str, Pointer))
is native('./getpointer') is symbol('get_pointer') returns Pointer { * }
class MD_PARSER is repr('CStruct') {
has uint32 $!abi_version; # unsigned int abi_version
has uint32 $!flags; # unsigned int flags
has Pointer $!enter_block; # F:int ( )* enter_block
has Pointer $!leave_block; # F:int ( )* leave_block
has Pointer $!enter_span; # F:int ( )* enter_span
has Pointer $!leave_span; # F:int ( )* leave_span
has Pointer $!text; # F:int ( )* text
has Pointer $!debug_log; # F:void ( )* debug_log
has Pointer $!syntax; # F:void ( )* syntax
submethod TWEAK() {
$!abi_version = 0;
$!flags = 0;
$!enter_block := get_enter_leave_fn(&enter_block);
$!leave_block := get_enter_leave_fn(&leave_block);
$!enter_span := get_enter_leave_fn(&enter_span);
$!leave_span := get_enter_leave_fn(&leave_span);
$!text := get_text_fn(&text);
$!debug_log := get_debug_fn(&debug_log);
}
}
sub md_parse(str, uint32, MD_PARSER, Pointer is rw) is native('md4c') returns int { * }
my $parser = MD_PARSER.new;
my $md = '
# Heading
## Sub Heading
hello *world*
';
md_parse($md, $md.chars, $parser, Pointer.new);
The output:
./md4c.raku
enter block DOC
enter block H
text 'Heading'
leave block H
enter block H
text 'Sub Heading'
leave block H
enter block P
text 'hello '
enter span EM
text 'world'
leave span EM
leave block P
leave block DOC
In summary, it's possible. I'm not sure if I'm proud of this or horrified by it. I think a long-term solution will require refactoring the callback trampoline generator into a separate nqp op that can be exposed to Raku as a nativewrap
style operation.