Search code examples
cxmlperlvtd-xml

How can I use VTD-XML inside Perl with Inline::C?


I've recently discovered the power of the VTD-XML approach to XML parsing, mainly its speed. Just to be specific, I have built the C version 2.10 ( there are Java, C++ and C# implementations too ).

My objective is simple: I want to extract data from XML using VTD-XML for parsing, and using Perl to work with data. The easy way may be dump data with a C program I made, and send them via pipe to the Perl program. Maybe not elegant but it works.

Another, less easy way, consists of a Perl program that calls the C data collector subroutine using Inline::C.

So I started studying Inline::C and managed to do basic things I need to pass data back to Perl from C subroutines using Perl C API functions. Problems arise in the compiling phase when I write the C collector subroutine in the C source under Inline::C control.

There are symbol conflicts like this: bind() is defined both in socket.h ( Perl ) and in autoPilot.h ( VTD-XML ). Symbol conflicts can be avoided building VTD-XML as a shared library with an explicit export map ( gcc -Wl,-version-script=foo.map )... Is this the right way to go? Are there better ways?


Solution

  • I did reach my goal by adding a layer of indirection: awful, as it seems to me it works.

    First of all, I made a shared library containing the VTD-XML API. Building this shared object, I had to avoid global scope pollution, exporting only symbols needed.

    Then I built another shared library. This second shared libray hides the VTD-XML API and is supposed to be used from Perl via Inline::C. In this shared object I wrote a handful of functions, using libvtd.so partially exposed API.

    The idea looks like this:

    Perl -> Inline::C dynamic loader -> wrapper_API.so -> libvtd.so 
    

    Major issues came from runtime loading of shared libraries and from symbol collision/resolution.

    Here is how I build libvtd.so, making it easy for the so called wrapper_API.so to use it.

    Unfortunately, VTD-XML doesn't build a libvtd.so shared object, so I had to build it myself linking together several .o object files with gcc:

    gcc -shared -fPIC -Wl,-soname,libvtd.so.2.10 -Wl,--version-script=vtd-xml.map \
    -o libvtd.so.2.10 libvtd.o arrayList.o fastIntBuffer.o fastLongBuffer.o \
    contextBuffer.o vtdNav.o vtdGen.o autoPilot.o XMLChar.o XMLModifier.o intHash.o \
     bookMark.o indexHandler.o transcoder.o elementFragmentNs.o
    

    Symbol visibility was tuned with the linker option -Wl,--version-script=vtd-xml.map, where the map file being:

    {
        global:
            the_exception_context;
            toString;
            getText;
            getCurrentIndex;
            toNormalizedString;
            toElement;
            toElement2;
            createVTDGen;
            setDoc;    
            parse;
            getNav;
            freeVTDGen;
            freeVTDNav;
            getTokenCount;
        local:
            *;  
    };
    

    Global ( "exported" ) symbols are under the global: section, while the catchall * under local says all other symbols are only known locally.

    All object modules come from the VTD-XML distribution, with the exception of libvtd.o: this custom object was needed to address issues with exception handling library cexept.h. libvtd.c is only two lines of code.

    #include "customTypes.h"
    struct exception_context the_exception_context[ 1 ];
    

    In the compilation phase I had to adjust CFLAGS of to make Position Independent Code ( gcc -fPIC option ), in order to make shared objects.

    readelf tool was useful to check symbol visibility:

    readelf --syms libvtd.so.2.10

    Symbol table '.dynsym' contains 35 entries:
       Num:    Value          Size Type    Bind   Vis      Ndx Name
       ...
       280: 000000000000d010   117 FUNC    LOCAL  DEFAULT   12 writeIndex
       281: 000000000003c5d0   154 FUNC    LOCAL  DEFAULT   12 setCursorPosition
       282: 000000000003c1f0    56 FUNC    LOCAL  DEFAULT   12 resetIntHash
       ...
       331: 0000000000004f50  3545 FUNC    GLOBAL DEFAULT   12 toElement
       332: 00000000000071e0   224 FUNC    GLOBAL DEFAULT   12 getText
       333: 000000000000d420   114 FUNC    GLOBAL DEFAULT   12 freeVTDGen
       ...
       339: 000000000000b600   731 FUNC    GLOBAL DEFAULT   12 toElement2
       340: 000000000000e650   120 FUNC    GLOBAL DEFAULT   12 getNav
       341: 0000000000025750 70567 FUNC    GLOBAL DEFAULT   12 parse
    

    The wrapperAPI.so consists of several functions that use VTD-XML API, its custom types, but accept and return only standard C types and/or structs. The wrapper came straight from a former standalone C program.