Search code examples
pythonmemory-managementmemory-leaksmemory-addresssymbol-table

i hope access the symbol table in python


First of all, since the writing is long, I say the apology first.

I'm studying symbol table in Python and trying to extract the memory address of symbols by directly accessing the symbol table (without id()).

So I referenced Eli bendersky's blog. I understand that PySTEntry_Type manages the symbol table(or itself). So, I thought that by using the contents of PySTEntry_Type, the memory address of symbols could be found without id().

And I started to analyze memory. But the value in memory doesn't match what I know.

First, I investigated the symtable and _symtable_entry structures.

struct symtable {
    PyObject *st_filename;          /* name of file being compiled,
                                       decoded from the filesystem encoding */
    struct _symtable_entry *st_cur; /* current symbol table entry */
    struct _symtable_entry *st_top; /* symbol table entry for module */
    PyObject *st_blocks;            /* dict: map AST node addresses
                                     *       to symbol table entries */
    PyObject *st_stack;             /* list: stack of namespace info */
    PyObject *st_global;            /* borrowed ref to st_top->ste_symbols */
    int st_nblocks;                 /* number of blocks used. kept for
                                       consistency with the corresponding
                                       compiler structure */
    PyObject *st_private;           /* name of current class or NULL */
    PyFutureFeatures *st_future;    /* modules future features that affect
                                       the symbol table */
    int recursion_depth;            /* current recursion depth */
    int recursion_limit;            /* recursion limit */
};

typedef struct _symtable_entry {
    PyObject_HEAD
    PyObject *ste_id;        /* int: key in ste_table->st_blocks */
    PyObject *ste_symbols;   /* dict: variable names to flags */
    PyObject *ste_name;      /* string: name of current block */
    PyObject *ste_varnames;  /* list of function parameters */
    PyObject *ste_children;  /* list of child blocks */
    PyObject *ste_directives;/* locations of global and nonlocal statements */
    _Py_block_ty ste_type;   /* module, class, or function */
    int ste_nested;      /* true if block is nested */
    unsigned ste_free : 1;        /* true if block has free variables */
    unsigned ste_child_free : 1;  /* true if a child block has free vars,
                                     including free refs to globals */
    unsigned ste_generator : 1;   /* true if namespace is a generator */
    unsigned ste_coroutine : 1;   /* true if namespace is a coroutine */
    unsigned ste_comprehension : 1; /* true if namespace is a list comprehension */
    unsigned ste_varargs : 1;     /* true if block has varargs */
    unsigned ste_varkeywords : 1; /* true if block has varkeywords */
    unsigned ste_returns_value : 1;  /* true if namespace uses return with
                                        an argument */
    unsigned ste_needs_class_closure : 1; /* for class scopes, true if a
                                             closure over __class__
                                             should be created */
    unsigned ste_comp_iter_target : 1; /* true if visiting comprehension target */
    int ste_comp_iter_expr; /* non-zero if visiting a comprehension range expression */
    int ste_lineno;          /* first line of block */
    int ste_col_offset;      /* offset of first line of block */
    int ste_opt_lineno;      /* lineno of last exec or import * */
    int ste_opt_col_offset;  /* offset of last exec or import * */
    struct symtable *ste_table;
} PySTEntryObject;

PyAPI_DATA(PyTypeObject) PySTEntry_Type;

And I extracted and organized the data in PySTEntry_Type using my code and gdb.

extracted data list from PySTEntry_Type
0xa376e0 : PySTEntry_Type (PySTEntry_Object)
0xa3cde0 : PyType_Type
0x74690c : String data (0x74690c : "symtable entry")
0x5782f0 : .text section
0x49b56a : .text section
0x5cb440 : PyObject_GenericGetAttr
0xa301c0 : ????
gdb-peda$ x/100x 0xa376e0
0xa376e0 <PySTEntry_Type>:  0x00000001  0x00000000  0x00a3cde0  0x00000000
0xa376f0 <PySTEntry_Type+16>:   0x00000000  0x00000000  0x0074690c  0x00000000
0xa37700 <PySTEntry_Type+32>:   0x00000068  0x00000000  0x00000000  0x00000000
0xa37710 <PySTEntry_Type+48>:   0x005782f0  0x00000000  0x00000000  0x00000000
0xa37720 <PySTEntry_Type+64>:   0x00000000  0x00000000  0x00000000  0x00000000
0xa37730 <PySTEntry_Type+80>:   0x00000000  0x00000000  0x0049b56a  0x00000000
0xa37740 <PySTEntry_Type+96>:   0x00000000  0x00000000  0x00000000  0x00000000
0xa37750 <PySTEntry_Type+112>:  0x00000000  0x00000000  0x00000000  0x00000000
0xa37760 <PySTEntry_Type+128>:  0x00000000  0x00000000  0x00000000  0x00000000
0xa37770 <PySTEntry_Type+144>:  0x005cb440  0x00000000  0x00000000  0x00000000
0xa37780 <PySTEntry_Type+160>:  0x00000000  0x00000000  0x00040000  0x00000000
0xa37790 <PySTEntry_Type+176>:  0x00000000  0x00000000  0x00000000  0x00000000
0xa377a0 <PySTEntry_Type+192>:  0x00000000  0x00000000  0x00000000  0x00000000
0xa377b0 <PySTEntry_Type+208>:  0x00000000  0x00000000  0x00000000  0x00000000
0xa377c0 <PySTEntry_Type+224>:  0x00000000  0x00000000  0x00000000  0x00000000
0xa377d0 <PySTEntry_Type+240>:  0x00a301c0  0x00000000  0x00000000  0x00000000
0xa377e0 <PySTEntry_Type+256>:  0x00000000  0x00000000  0x00000000  0x00000000
0xa377f0 <PySTEntry_Type+272>:  0x00000000  0x00000000  0x00000000  0x00000000
0xa37800 <PySTEntry_Type+288>:  0x00000000  0x00000000  0x00000000  0x00000000
0xa37810 <PySTEntry_Type+304>:  0x00000000  0x00000000  0x00000000  0x00000000
0xa37820 <PySTEntry_Type+320>:  0x00000000  0x00000000  0x00000000  0x00000000
0xa37830 <PySTEntry_Type+336>:  0x00000000  0x00000000  0x00000000  0x00000000
0xa37840 <PySTEntry_Type+352>:  0x00000000  0x00000000  0x00000000  0x00000000
---Type <return> to continue, or q <return> to quit---
0xa37850 <PySTEntry_Type+368>:  0x00000000  0x00000000  0x00000000  0x00000000
0xa37860 <PySTEntry_Type+384>:  0x00000000  0x00000000  0x00000000  0x00000000
#This is my code

import numpy as np
from ctypes import string_at
from sys import getsizeof
from binascii import hexlify
import os, sys

def print_8byte(addr, size):                         #Output in 8 bytes for easy viewing
        binary = hexlify(string_at(addr, size))     
        for i in range(int(size/8)):
                print(binary[i*16:i*16+16])

if __name__ == "__main__":
        print_8byte(0xa376e0, 400)    #0xa376e0 is PySTEntry_type

        while(1):
                addr = int(input("addr : "), 0)
                size = int(input("size : "), 0)
                print_8byte(addr, size)

hash@hash-desktop:~$ python3 test.py
b'0100000000000000'
b'e0cda30000000000'    #0xa3cde0 : PyType_Type
b'0000000000000000'
b'0c69740000000000'    #0x74690c : String data (0x74690c : "symtable entry")
b'6800000000000000'
b'0000000000000000'
b'f082570000000000'    #0x5782f0 : .text section
b'0000000000000000'
b'0000000000000000'
b'0000000000000000'
b'0000000000000000'
b'6ab5490000000000'    #0x49b56a : .text section
b'0000000000000000'
b'0000000000000000'
b'0000000000000000'
b'0000000000000000'
b'0000000000000000'
b'0000000000000000'
b'40b45c0000000000'    #0x5cb440 : PyObject_GenericGetAttr
b'0000000000000000'
b'0000000000000000'
b'0000040000000000'
b'0000000000000000'
b'0000000000000000'
b'0000000000000000'
b'0000000000000000'
b'0000000000000000'
b'0000000000000000'
b'0000000000000000'
b'0000000000000000'
b'c001a30000000000'    #0xa301c0 : ?????
b'0000000000000000'
b'0000000000000000'
b'0000000000000000'
b'0000000000000000'
b'0000000000000000'
b'0000000000000000'
b'0000000000000000'
b'0000000000000000'
b'0000000000000000'
b'0000000000000000'
b'0000000000000000'
b'0000000000000000'
b'0000000000000000'
b'0000000000000000'
b'0000000000000000'
b'0000000000000000'
b'0000000000000000'
addr : 

However, there seems to be no field matching the above symtable and _symtable_entry structures.

Am I misunderstanding PySTEntry_Type? Even if I misunderstand, why doesn't match the value type in the memory and the field of the structure?

sorry for long text and data and Thank you for reading

ps. no difference extracted data using gdb and my code. The value of 0xa301c is as follows and can be checked through my code.

addr : 0xa301c0
size : 400
b'884e640000000000'
b'0600000000000000'
b'1000000000000000'
b'0100000000000000'
b'0000000000000000'
b'c4e2730000000000'
b'0600000000000000'
b'2000000000000000'
b'0100000000000000'
b'0000000000000000'
b'fd68740000000000'
b'0600000000000000'
b'1800000000000000'
b'0100000000000000'
b'0000000000000000'
b'ad68740000000000'
.
.
.
.


Solution

  • If you want to examine a CPython symbol table, use the symtable module. What you're doing doesn't make sense.

    Assuming you're actually looking at PySTEntry_Type and not some completely unrelated section of virtual memory, what you're looking at is the type object for low-level symbol table entry objects. This thing is to symbol table entries as int is to 12. It does not represent a symbol table or a symbol table entry. It contains information about the operations symbol table entries support.

    CPython does not preserve symbol tables beyond the bytecode compilation phase. You cannot examine the symbol tables for a running program, because they don't exist. You can use symtable to create symbol tables for a string representing Python code.