Search code examples
c#.netdllikvm

.net: is dependency loading for DLLs different than for EXEs?


I have a very strange problem. I did some very crazy stuff: I converted a fat uber-jar of hadoop libraries I assembled with the sbt-assembly plugin to a dll using IKVM. I wrote a small test program which just boils down to the following:

var u = new java.net.URI("hdfs://my-namenode:8020/");
var fs = org.apache.hadoop.fs.FileSystem.get(u, new org.apache.hadoop.conf.Configuration());
foreach(var s in fs.listStatus(new org.apache.hadoop.fs.Path("/"))) {
    Console.WriteLine(s.getPath().toString());
}

When I run this in a console application with my hadoop.dll and the required IKVM dlls added as references, this lists the content of my HDFS.

However, when I wrap exactly this code in a DLL, add the SAME dependencies to that DLL and call this from my console application, I get:

No FileSystem for scheme: hdfs

When I specify the correct class name in my Hadoop conf via the fs.hdfs.impl key, I get a ClassNotFoundException.

Are dependencies resolved differently in executables than in DLLs or may it be an IKVM specific behaviour?

EDIT: Another strange behaviour: When I construct the FileSystem once in my console application and THEN call that method in the DLL, it runs.


Solution

  • I found the answer myself (again...)

    It does not have to do how .net handles dependency loading, but it is how IKVM (and with that respect Java) handles the dynamic loading of classes.

    I dug around the Hadoop source code, and found the following bit:

    private ClassLoader classLoader;
    {
      classLoader = Thread.currentThread().getContextClassLoader();
      if (classLoader == null) {
        classLoader = Configuration.class.getClassLoader();
      }
    }
    

    The line classLoader = Thread.currentThread().getContextClassLoader(); is of special interest here. The context class loader of my console application is its context - with no reference to any of the Hadoop classes, hence the ClassNotFoundException when explicitly setting fs.hdfs.impl to org.apache.hadoop.hdfs.DistributedFileSystem.

    Fortunately, the Configuration class has a method setClassLoader, so when doing this when constructing the configuration:

    var conf = new org.apache.hadoop.conf.Configuration();
    conf.setClassLoader(conf.getClass().getClassLoader());
    conf.set("fs.hdfs.impl", "org.apache.hadoop.hdfs.DistributedFileSystem");
    

    it works! This is because conf.getClass().getClassLoader() returns the classloader of conf's context - i.e., the hadoop.dll converted uber-jar which has the class.

    It is still necessary to explicitly state the filesystem classes with fs.XXXX.impl though because the automatic filesystem resolving mechanism looks like this:

    private static void loadFileSystems() {
      synchronized (FileSystem.class) {
        if (!FILE_SYSTEMS_LOADED) {
          ServiceLoader<FileSystem> serviceLoader = ServiceLoader.load(FileSystem.class);
          for (FileSystem fs : serviceLoader) {
            SERVICE_FILE_SYSTEMS.put(fs.getScheme(), fs.getClass());
          }
          FILE_SYSTEMS_LOADED = true;
        }
      }
    

    As you can see, the file systems are resolved here:

    ServiceLoader<FileSystem> serviceLoader = ServiceLoader.load(FileSystem.class);
    

    this method uses the Thread.currentThread().getContextClassLoader() again, which means my console application which does not have the hadoop classes.

    So, tl;dr: after creating the Configuration, set its ClassLoader manually to the context class loader of the dll.