When we write Cython code (with types), this will eventually be compiled like C-compiled code and we can't recover the source code (except disassembling but then this is similar to disassembling C code), as seen in Are executables produced with Cython really free of the source code?.
But what happens when we write "normal Python code" (interpreted code without types) in a Cython .pyx file and we produce an executable? How much of it will be visible in the strings of the executable?
Example:
import bottle, random, json
app = bottle.Bottle()
@bottle.route('/')
def index():
return 'hello'
@bottle.route('/random')
def testrand():
return str(random.randint(0, 100))
@bottle.route('/jsontest')
def testjson():
x = json.loads('{ "1": "2" }')
return 'done'
bottle.run()
In this case I see in the test.c
:
static const char __pyx_k_1_2[] = "{ \"1\": \"2\" }";
static const char __pyx_k_json[] = "json";
static const char __pyx_k_main[] = "__main__";
static const char __pyx_k_name[] = "__name__";
static const char __pyx_k_test[] = "__test__";
static const char __pyx_k_loads[] = "loads";
static const char __pyx_k_import[] = "__import__";
static const char __pyx_k_cline_in_traceback[] = "cline_in_traceback";
So in example 2, won't all these strings be easily visible in the executable?
In general you won't be able to avoid having those strings in the resulting executable, this is just how python works - they are needed at the run time.
If we look at a simple C-code:
void do_nothing(){...}
int main(){
do_nothing();
return 0;
}
compile and link it statically. When the linker is done, the call of do_nothing
(let's assume it is not inlined or optimized out) is just a jump to a memory-address - the name of the function is no longer needed and can be erased from the resulting executable.
Python works differently: there is no linker, we don't use raw memory-addresses during the run time to call some functionality, but use Python-machinery to find it for us given the name of the package/module and of the function - thus we need this information - the names - during the run time. And thus they must be provided during the runtime.
However, if you are game changing the produced c-file you could make the life of the "hacker" somewhat harder.
When there is a string needed for calling Python-functionality, this will result in the following code (e.g. import json
):
static const char __pyx_k_json[] = "json";
static PyObject *__pyx_n_s_json;
static __Pyx_StringTabEntry __pyx_string_tab[] = {
...
{&__pyx_n_s_json, __pyx_k_json, sizeof(__pyx_k_json), 0, 0, 1, 1},
...
{0, 0, 0, 0, 0, 0, 0}
};
static CYTHON_SMALL_CODE int __Pyx_InitGlobals(void) {
if (__Pyx_InitStrings(__pyx_string_tab) < 0) __PYX_ERR(0, 1, __pyx_L1_error);
...
}
...
__pyx_t_1 = __Pyx_Import(__pyx_n_s_json, 0, 0); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 1, __pyx_L1_error)
so one could save "json"
as "irnm"
(every character shifted by -1) and then restore the real name during the run time before __Pyx_InitStrings
is called in __Pyx_InitGlobals
.
So now, just dumping the strings in exe would lead to nothing saying combination of characters. One even could go further and load the real names from somewhere after the program started, if this is worth the trouble.