Search code examples
pythonc++windowspython-3.xswig

Simstring (python) installation in windows


I am trying to install simstring python wrapper in windows by https://github.com/Georgetown-IR-Lab/simstring. For linux it works fine but for windows it is giving me error while installing.

    D:\Users\source\repos>python setup.py install
    running install
    running build
    running build_py
    running build_ext
    building '_simstring' extension
    C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.12.25827\bin\HostX86\x64\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -I. -IC:\ProgramData\Anaconda3\include -IC:\ProgramData\Anaconda3\include "-IC:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.12.25827\ATLMFC\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.12.25827\include" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.6.1\include\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.16299.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.16299.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.16299.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.16299.0\winrt" /EHsc /Tpexport.cpp /Fobuild\temp.win-amd64-3.6\Release\export.obj
    export.cpp
    export.cpp(7): fatal error C1083: Cannot open include file: 'iconv.h': No such file or directory
    error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio\\2017\\Community\\VC\\Tools\\MSVC\\14.12.25827\\bin\\HostX86\\x64\\cl.exe' failed with exit status 2

After this I included iconv.h in the project. But now it shows different error.

running install
running build
running build_py
running build_ext
building '_simstring' extension
C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.12.25827\bin\HostX86\x64\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -I. -IC:\ProgramData\Anaconda3\include -IC:\ProgramData\Anaconda3\include "-IC:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.12.25827\ATLMFC\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.12.25827\include" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.6.1\include\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.16299.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.16299.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.16299.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.16299.0\winrt" /EHsc /Tpexport.cpp /Fobuild\temp.win-amd64-3.6\Release\export.obj
export.cpp
d:\users\aki\source\repos\simstring\cdbpp.h(101): warning C4267: 'initializing': conversion from 'size_t' to 'uint32_t', possible loss of data
export.cpp(37): error C2664: 'size_t libiconv(libiconv_t,const char **,size_t *,char **,size_t *)': cannot convert argument 2 from 'char **' to 'const char **'
export.cpp(37): note: Conversion loses qualifiers
export.cpp(140): note: see reference to function template instantiation 'bool iconv_convert<std::string,std::wstring>(libiconv_t,const source_type &,destination_type &)' being compiled
        with
        [
            source_type=std::string,
            destination_type=std::wstring
        ]
error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio\\2017\\Community\\VC\\Tools\\MSVC\\14.12.25827\\bin\\HostX86\\x64\\cl.exe' failed with exit status 2

Any help or guidance is appreciated.


Solution

  • Ground notes:

    • I managed to go with the build process but I got stuck at one point. I created [SO]: Compile error for (char based) STL (stream) containers in Visual Studio (I spent quite some time on that issue). I got that working somehow, but there were other (similar?) errors when trying to build SimString, so I had to strip some (Nix based) code (that didn't compile) out

    • SimString is written in C++. When C++ (C) code is built, the result is PE or Portable Executable (.exe, .dll). Check [SO]: LNK2005 Error in CLR Windows Form (@CristiFati's answer) for more details regarding how code gets transformed. When dealing with an .exe that depends on (loads) .dlls, there are certain restrictions:

      • The .exe (in this case python.exe)'s architecture (032bit (pc032) vs. 064bit (pc064) or (x86 vs. x64 (or AMD64))) must match the one of any .dll that it loads (and other .dll that a loaded .dll loads, and so on), so all the dlls in the dependency tree, otherwise the .dll won't load

      • The platform (Debug vs. Release) should match in some cases. Here's what could happen if it didn't: [SO]: When using fstream in a library I get linker errors in the executable (@CristiFati's answer), but I don't think that we are in that situation

      • The build tools should also match in some (other) cases. Examples:

        • Compiler type ([SO]: Python extensions with C: staticforward (@CristiFati's answer))

        • The CRT runtime ([SO]: Errors when linking to protobuf 3 on MS Visual C (@CristiFati's answer))

        • The CRT runtime version is important in our case. Check [Python.Wiki]: WindowsCompilers for compatibilities between Python and VStudio versions. Note that this only applies for Python versions downloaded and installed (if you built your Python from sources, then you should use the same build tool - but I guess it's not the case here)

          • I see you are using VStudio 2017, so the compatible versions are Python 3.5 and Python 3.6 1. I have ~10 Python installations on my machine (some installed, some built by me - with different compiler; most of them are pc064, I also have some VEnvs, but that shouldn't make any difference). I also have 5 VStudio versions installed, in my case, setup.py automatically selects VStudio 2015 (but it's ok, since as VStudio 2017 it has compiler v14.0)
      • SimString depends on LibIconv which also comes as a .dll (actually there are more, but we only care about one). Checking the .dll (see below) with Dependency Walker reveals that it's x86 (pc032) 2. That means that either:

        • Python 032bit (x86) should be used. This is the variant that I'm going to go with. From 1 and 2, the only available version on my machine is Python 3.6 pc032 (Python 3.5 is my version of choice, I also have it in 032bit format, but I messed it up and didn't reinstall it)

        • Build LibIconv from source, and get rid of restriction 2. But, that could take time, and it's outside the scope of the current question. If there will be a question about building it, I'll take some time and give it a shot, as I enjoy that kind of tasks ([SO]: How to build a DLL version of libjpeg 9b? (@CristiFati's answer))

    Walkthrough:

    • Create a dir and CD to it (should be empty). This will be the %ROOT_DIR%, and all the paths that I'm going to use will be relative to it (except of course for absolute ones), and this will be the default dir (when unspecified)

    • Download SimString sources ([GitHub]: Georgetown-IR-Lab/simstring - simstring-master.zip)

    • Unzip the archive - it will do it in a dir simstring-master (will be automatically created)

    • Create a dir libiconv. Inside it, download:

      1. [SourceForge]: gnuwin32/GnuWin - libiconv-1.9.2-1-lib.zip

      2. [SourceForge]: gnuwin32/GnuWin - libiconv-1.9.2-1-bin.zip

      3. Extract needed stuff from these files:

        • From #1.:

          • include dir - used at compile phase

          • lib dir - used at link phase

          • Both phases are performed by setup.py (below)

        • From #2.:

          • bin dir - used at runtime (when using (importing) the module)
    • CD to the simstring-master dir. To build the extension, I'm using setup.py's build_ext command (invoked recursively by install - as seen in your output): [Python 3.Docs]: distutils.command.build_ext - Build any extensions in a package

    • Running build_ext, will yield your error:

      export.cpp(7): fatal error C1083: Cannot open include file: 'iconv.h': No such file or directory
      

      That is because Python build system doesn't know what we did (in the libiconv dir). To let it know, pass the:

      1. -I (--include-dirs) - will be translated to [MS.Docs]: /I (Additional include directories)

      2. -L (--library-dirs) - will be translated to [MS.Docs]: /LIBPATH (Additional Libpath)

      3. -l (--libraries) - will be translated to [MS.Docs]: LINK Input Files

      flags (python setup.py build_ext --help will display all of them). For now, don't pass #2. and #3. because we won't get to the link phase (where they are required):

      (py36x86_test) E:\Work\Dev\StackOverflow\q048528041\simstring-master>"e:\Work\Dev\VEnvs\py36x86_test\Scripts\python.exe" setup.py build_ext -I"../libiconv/include"
      running build_ext
      building '_simstring' extension
      C:\Install\x86\Microsoft\Visual Studio Community\2015\VC\BIN\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -I. -I../libiconv/include -Ic:\Install\x86\Python\Python\3.6\include -Ic:\Install\x86\Python\Python\3.6\include "-IC:\Install\x86\Microsoft\Visual Studio Community\2015\VC\INCLUDE" "-IC:\Install\x86\Microsoft\Visual Studio Community\2015\VC\ATLMFC\INCLUDE" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.16299.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.6.1\include\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.16299.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.16299.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.16299.0\winrt" /EHsc /Tpexport.cpp /Fobuild\temp.win32-3.6\Release\export.obj
      export.cpp
      export.cpp(112): warning C4297: 'writer::~writer': function assumed not to throw an exception but does
      export.cpp(112): note: destructor or deallocator has a (possibly implicit) non-throwing exception specification
      export.cpp(126): warning C4297: 'writer::~writer': function assumed not to throw an exception but does
      export.cpp(126): note: destructor or deallocator has a (possibly implicit) non-throwing exception specification
      export.cpp(37): error C2664: 'size_t libiconv(libiconv_t,const char **,size_t *,char **,size_t *)': cannot convert argument 2 from 'char **' to 'const char **'
      export.cpp(37): note: Conversion loses qualifiers
      export.cpp(140): note: see reference to function template instantiation 'bool iconv_convert<std::basic_string<char,std::char_traits<char>,std::allocator<char>>,std::wstring>(libiconv_t,const source_type &,destination_type &)' being compiled
      with
      [
          source_type=std::basic_string<char,std::char_traits<char>,std::allocator<char>>,
          destination_type=std::wstring
      ]
      error: command 'C:\\Install\\x86\\Microsoft\\Visual Studio Community\\2015\\VC\\BIN\\cl.exe' failed with exit status 2
      
    • Things to do (found out fixing the errors one by one, only export.cpp required changes):

      1. #define ICONV_CONST const (cl.exe doesn't automatically cast constness)

      2. #define __SIZEOF_WCHAR_T__ 2 (as sizeof(wchar_t) is 2)

      3. Strip out the code that doesn't compile (that I talked about at the beginning): STL containers with 4 byte chars don't compile on Win, wanted to fix the code, and when Win will support such chars, the code will compile OOTB, but I wasn't able to, so I had to do whatever was done for OSX. As a consequence, #ifdef __APPLE__ should be replaced by #if defined(__APPLE__) || defined(WIN32) (5 occurrences)

      Note that #1. and #2. could (should) be done either by cmdline (-D flag, but I wasn't able to specify a value for a defined flag), or in setup.py (so they are only defined once even if they need to be declared in lots of files), but I didn't spend too much time on it, so I'm replacing them directly in the source code.

      Either apply the changes manually, either save:

      --- export.cpp.orig 2016-11-30 18:53:32.000000000 +0200
      +++ export.cpp  2018-02-14 13:36:31.317953200 +0200
      @@ -19,9 +19,18 @@
       #endif/*USE_LIBICONV_GNU*/
      
       #ifndef ICONV_CONST
      +#if defined (WIN32)
      +#define ICONV_CONST const
      +#else
       #define ICONV_CONST
      +#endif
       #endif/*ICONV_CONST*/
      
      +#if defined (WIN32)
      +#define __SIZEOF_WCHAR_T__ 2
      +#endif
      +
      +
       template <class source_type, class destination_type>
       bool iconv_convert(iconv_t cd, const source_type& src, destination_type& dst)
       {
      @@ -269,7 +278,7 @@
           iconv_close(bwd);
       }
      
      -#ifdef __APPLE__
      +#if defined(__APPLE__) || defined(WIN32)
       #include <cassert>
       #endif
      
      @@ -283,7 +292,7 @@
               retrieve_thru(dbr, query, this->measure, this->threshold, std::back_inserter(ret));
               break;
           case 2:
      -#ifdef __APPLE__
      +#if defined(__APPLE__) || defined(WIN32)
       #if __SIZEOF_WCHAR_T__ == 2
               retrieve_iconv<wchar_t>(dbr, query, UTF16, this->measure, this->threshold, std::back_inserter(ret));
       #else
      @@ -294,7 +303,7 @@
       #endif
               break;
           case 4:
      -#ifdef __APPLE__
      +#if defined(__APPLE__) || defined(WIN32)
       #if __SIZEOF_WCHAR_T__ == 4
               retrieve_iconv<wchar_t>(dbr, query, UTF32, this->measure, this->threshold, std::back_inserter(ret));
       #else
      @@ -317,7 +326,7 @@
               std::string qstr = query;
               return dbr.check(qstr, translate_measure(this->measure), this->threshold);
           } else if (dbr.char_size() == 2) {
      -#ifdef __APPLE__
      +#if defined(__APPLE__) || defined(WIN32)
       #if __SIZEOF_WCHAR_T__ == 2
               std::basic_string<wchar_t> qstr;
       #else
      @@ -333,7 +342,7 @@
               iconv_close(fwd);
               return dbr.check(qstr, translate_measure(this->measure), this->threshold);
           } else if (dbr.char_size() == 4) {
      -#ifdef __APPLE__
      +#if defined(__APPLE__) || defined(WIN32)
       #if __SIZEOF_WCHAR_T__ == 4
               std::basic_string<wchar_t> qstr;
       #else
      

      as simstring_win.diff. That is a diff. See [SO]: Run / Debug a Django application's UnitTests from the mouse right click context menu in PyCharm Community Edition? (@CristiFati's answer) (Patching UTRunner section) for how to apply patches on Win (basically, every line that starts with one "+" sign goes in, and every line that starts with one "-" sign goes out).
      I also submitted this patch to [GitHub]: Georgetown-IR-Lab/simstring - Support for Win, and it was merged today (180222).

      (py36x86_test) E:\Work\Dev\StackOverflow\q048528041\simstring-master>"c:\Install\x64\Cygwin\Cygwin\AllVers\bin\patch.exe" -i "../simstring_win.diff"
      patching file export.cpp
      
      (py36x86_test) E:\Work\Dev\StackOverflow\q048528041\simstring-master>rem Looking at export.cpp content, you'll notice the changes
      
      (py36x86_test) E:\Work\Dev\StackOverflow\q048528041\simstring-master>"e:\Work\Dev\VEnvs\py36x86_test\Scripts\python.exe" setup.py build_ext  -I"../libiconv/include" -L"../libiconv/lib" -llibiconv
      running build_ext
      building '_simstring' extension
      C:\Install\x86\Microsoft\Visual Studio Community\2015\VC\BIN\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -I. -I../libiconv/include -Ic:\Install\x86\Python\Python\3.6\include -Ic:\Install\x86\Python\Python\3.6\include "-IC:\Install\x86\Microsoft\Visual Studio Community\2015\VC\INCLUDE" "-IC:\Install\x86\Microsoft\Visual Studio Community\2015\VC\ATLMFC\INCLUDE" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.16299.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.6.1\include\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.16299.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.16299.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.16299.0\winrt" /EHsc /Tpexport.cpp /Fobuild\temp.win32-3.6\Release\export.obj
      export.cpp
      export.cpp(121): warning C4297: 'writer::~writer': function assumed not to throw an exception but does
      export.cpp(121): note: destructor or deallocator has a (possibly implicit) non-throwing exception specification
      export.cpp(135): warning C4297: 'writer::~writer': function assumed not to throw an exception but does
      export.cpp(135): note: destructor or deallocator has a (possibly implicit) non-throwing exception specification
      C:\Install\x86\Microsoft\Visual Studio Community\2015\VC\BIN\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -I. -I../libiconv/include -Ic:\Install\x86\Python\Python\3.6\include -Ic:\Install\x86\Python\Python\3.6\include "-IC:\Install\x86\Microsoft\Visual Studio Community\2015\VC\INCLUDE" "-IC:\Install\x86\Microsoft\Visual Studio Community\2015\VC\ATLMFC\INCLUDE" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.16299.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.6.1\include\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.16299.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.16299.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.16299.0\winrt" /EHsc /Tpexport_wrap.cpp /Fobuild\temp.win32-3.6\Release\export_wrap.obj
      export_wrap.cpp
      C:\Install\x86\Microsoft\Visual Studio Community\2015\VC\BIN\link.exe /nologo /INCREMENTAL:NO /LTCG /DLL /MANIFEST:EMBED,ID=2 /MANIFESTUAC:NO /LIBPATH:c:\Install\x86\Python\Python\3.6\Libs /LIBPATH:../libiconv/lib /LIBPATH:e:\Work\Dev\VEnvs\py36x86_test\libs /LIBPATH:e:\Work\Dev\VEnvs\py36x86_test\PCbuild\win32 "/LIBPATH:C:\Install\x86\Microsoft\Visual Studio Community\2015\VC\LIB" "/LIBPATH:C:\Install\x86\Microsoft\Visual Studio Community\2015\VC\ATLMFC\LIB" "/LIBPATH:C:\Program Files (x86)\Windows Kits\10\lib\10.0.16299.0\ucrt\x86" "/LIBPATH:C:\Program Files (x86)\Windows Kits\NETFXSDK\4.6.1\lib\um\x86" "/LIBPATH:C:\Program Files (x86)\Windows Kits\10\lib\10.0.16299.0\um\x86" libiconv.lib /EXPORT:PyInit__simstring build\temp.win32-3.6\Release\export.obj build\temp.win32-3.6\Release\export_wrap.obj /OUT:build\lib.win32-3.6\_simstring.cp36-win32.pyd /IMPLIB:build\temp.win32-3.6\Release\_simstring.cp36-win32.lib
         Creating library build\temp.win32-3.6\Release\_simstring.cp36-win32.lib and object build\temp.win32-3.6\Release\_simstring.cp36-win32.exp
      Generating code
      Finished generating code
      
      (py36x86_test) E:\Work\Dev\StackOverflow\q048528041\simstring-master>dir /b "build\lib.win32-3.6"
      _simstring.cp36-win32.pyd
      
    • Finally, it built. the .pyd is just a .dll. This is how it looks like in Dependency Walker:

      _simstring.pyd

    • Let's try to see if we can use it:

      (py36x86_test) E:\Work\Dev\StackOverflow\q048528041\simstring-master>"e:\Work\Dev\VEnvs\py36x86_test\Scripts\python.exe" sample.py
      Traceback (most recent call last):
        File "E:\Work\Dev\StackOverflow\q048528041\simstring-master\simstring.py", line 18, in swig_import_helper
          fp, pathname, description = imp.find_module('_simstring', [dirname(__file__)])
        File "e:\Work\Dev\VEnvs\py36x86_test\lib\imp.py", line 296, in find_module
          raise ImportError(_ERR_MSG.format(name), name=name)
      ImportError: No module named '_simstring'
      
      During handling of the above exception, another exception occurred:
      
      Traceback (most recent call last):
        File "sample.py", line 3, in <module>
          import simstring
        File "E:\Work\Dev\StackOverflow\q048528041\simstring-master\simstring.py", line 28, in <module>
          _simstring = swig_import_helper()
        File "E:\Work\Dev\StackOverflow\q048528041\simstring-master\simstring.py", line 20, in swig_import_helper
          import _simstring
      ModuleNotFoundError: No module named '_simstring'
      

      That is because when importing SimString, which in turn imports _simstring (the .pyd), Python doesn't find it. To fix this:

      • Add the .pyd path to %PYTHONPATH%

      • As seen in the pic, the .pyd depends on libiconv2.dll, so the OS must know where to look for it. Simplest way is to add its path to %PATH% ([MS.Docs]: Dynamic-Link Library Search Order)

      (py36x86_test) E:\Work\Dev\StackOverflow\q048528041\simstring-master>set PYTHONPATH=%PYTHONPATH%;build\lib.win32-3.6
      
      (py36x86_test) E:\Work\Dev\StackOverflow\q048528041\simstring-master>set PATH=%PATH%;..\libiconv\bin
      
      (py36x86_test) E:\Work\Dev\StackOverflow\q048528041\simstring-master>"e:\Work\Dev\VEnvs\py36x86_test\Scripts\python.exe" sample.py
      ('Barack Hussein Obama II',)
      ('James Gordon Brown',)
      ()
      ('Barack Hussein Obama II',)
      

    Final notes:

    • There is some output from the module, it's identical to the one on Nix (Ubuntu) (where I also built it - there I encountered no problem), I'm not sure whether it's semantically correct or not

    • I didn't run setup.py's install command (and I'm not gonna), one thing that I can think of that could go wrong (although I'm not sure it will), is not copying / including libiconv2.dll into the .whl. If so, you'll probably need to modify setup.py (changes should be minor)