Search code examples
windowscythonexecutablepyrogram

Compiling python project with external modules to machine code using Cython


I am looking for solution to protect source code of my project to distribute it. All projects for Windows exclusively.

I tried Nuitka and Pyarmor for this purpose. But both solutions are not completely suitable for me. I'm willing to put up with some of the inconveniences associated with packaging into a single executable file, but I need to understand if this is even possible to do using Cython.

I've read a lot of forum threads, tried to find some tutorials on YouTube, but all of this relates to some projects involving complex mathematical calculations.

I don't care about the speed of the project. All I need is to be able to compile the entire project into a single Windows executable file and not be able to get the Python source code from it.

For instance, tell me how can I build machine code binary of this simple python Telegram bot with external library Pyrogram and using Cython ?

from pyrogram import Client, types, errors, filters

API_ID = 12345678
API_HASH = 'abcdefgijklmnop1234567890'

app = Client('account', API_ID, API_HASH)

@app.on_message(filters.outgoing & filters.text)
def handle_message(client: Client, message: types.Message):
    print(message.text

app.run()

I tried to build a project using cython to C code first and then to executable. But constantly getting ModuleError when try to run built file.

Although, when there is no external modules the build executable works until Python is installed on the computer. I wonder is that possible to at least build standalone executable using only python standard library without external modules ?

I almost despaired, but some developers on YouTube claim that this is real, although they do not provide any evidence or tutorials.


Solution

  • Thanks to GPT-4o I managed to solve my problem. And I am exited to share with all people wondering how to keep your python sources in secret and share standalone execatuables with other people without a threat source code can be exposed using Cython.

    Step 1.

    We store all sensitive source code in bot.pyx and add a function start there that we will import to entrypoint file called main.py

    # bot.pyx
    from pyrogram import Client, types, errors, filters
    import pyrogram, pyaes
    
    API_ID = 12345678
    API_HASH = 'abcdefgijklmnop1234567890'
    
    app = Client('account', API_ID, API_HASH)
    
    @app.on_message(filters.outgoing & filters.text)
    def handle_message(client: Client, message: types.Message):
        print(message.text)
    
    def start():
        app.run()
    

    Step 2.

    We create setup.py file to build our sources to .pyd binary with Cython.

    # setup.py
    from setuptools import setup
    from Cython.Build import cythonize
    import os
    
    os.environ['PYTHONDEVMODE'] = '1'
    
    setup(
        ext_modules=cythonize('bot.pyx'),
    )
    

    Run this command:

    python setup.py build_ext --inplace

    Step 3.

    We create entrypoint file main.py that will run our start function that will run the source code itself.

    # main.py
    from bot import start # will be imported from `bot.cp311-win_amd64.pyd`
    import pyrogram, pyaes
    
    if __name__ == "__main__":
        start()
    

    Make sure you imported all dependencies in main.py. In my case it is pyrogram and pyaes. Otherwise you will get ModuleError or ImportError when trying to run your executable.

    Step 4.

    We build an executable with Pyinstaller. And we do not forget to include our .pyd binary.

    pyinstaller --onefile --add-binary "bot.cp311-win_amd64.pyd;." main.py

    That's it ! We have got standalone dist/main.exe file with all sources compiled firstly to bot.c and then compiled to bot.cp311-win_amd64.pyd binary.

    You can sew in any logic in the source code without a threat it can be modified. For example bind the executable to work only on certain machine if you have a commercial interest.

    Proof of concept:

    I extracted all files from dist/main.exe using pyinstxtractor.py (you can find it here)

    python pyinstxtractor.py dist/main.exe

    I checked what has been extracted to main.exe_extracted folder and there was only main.pyc, bot.cp311-win_amd64.pyd and other python files not relating to my source code.

    I converted main.pyc bytecode to see the code inside to make sure there is no my main sources. I used this free website

    And here what we have got:

    # Decompiled with PyLingual (https://pylingual.io)
    # Internal filename: main.py
    # Bytecode version: 3.11a7e (3495)
    # Source timestamp: 1970-01-01 00:00:00 UTC (0)
    
    from bot import start
    import pyrogram
    import pyaes
    if __name__ == '__main__':
        start()
    

    So noone would be able to extract our original sources that were converted to C code and then to binary .pyd file.

    But be aware that strings in my case one stored in API_HASH variable still can be extracted from .pyd binaries.

    Thus, to protect strings it is better to build them in runtime, i.g.

    ...
    API_HASH = ''.join(['y','o','u','r','_','s','t','r','i','n','g'])
    ...