TLDR: PyDict_SetItem
increments the key and the value, but where in the code does this happen?
PyDict_SetItem makes a call to insertdict
.
insertdict immediately performs Py_INCREF
on both the key and the value. However, at the end of the success path, it then does a Py_DECREF
on the key, (but not the value). There must be some part of this code that I am missing where it does an extra PY_INCREF
on the key, before it does this PY_DECREF
. My question is where and why does this extra PY_INCREF take place? Why is the initial Py_INCREF
at the start of insertdict
insufficient?
From this, it seems at first glance that PyDict_SetItem
only increases the reference count of the value, but not the key. This is not true, of course. For example, in PyDict_SetItemString, which takes a char *
, converts it into a PyObject via PyUnicode_FromString
(which returns a new value), performs a Py_DECREF
on that new value after calling PyDict_SetItem
. If PyDict_SetItem
does not increment the key, and PyDict_SetItemString
decrements the key it just created, the program may eventually segfault. Given that doesn't happen, it seems like I am missing something here.
Finally, this code should prove that PyDict_SetItem
increments both the key and the value, and that the caller should decref both the key and value unless they were borrowed references/ or will give the key and values to someone else.
#include <Python.h>
#include <stdio.h>
int main(void)
{
Py_Initialize();
PyObject *dict = NULL, *key = NULL, *value = NULL;
int i = 5000;
char *o = "foo";
if (!(dict = PyDict_New())) {
goto error;
}
if (!(key = Py_BuildValue("i", i))) {
goto error;
}
if (!(value = Py_BuildValue("s", o))) {
goto error;
}
printf("Before PyDict_SetItem\n");
printf("key is %i\n", key->ob_refcnt); /* Prints 1 */
printf("value is %i\n", value->ob_refcnt); /* Prints 1 */
printf("Calling PyDict_SetItem\n");
if (PyDict_SetItem(dict, key, value) < 0) {
goto error;
}
printf("key is %i\n", key->ob_refcnt); /* Prints 2 */
printf("value is %i\n", value->ob_refcnt); /* Prints 2 */
printf("Decrefing key and value\n");
Py_DECREF(key);
Py_DECREF(value);
printf("key is %i\n", key->ob_refcnt); /* Prints 1 */
printf("value is %i\n", value->ob_refcnt); /* Prints 1 */
Py_Finalize();
return 0; // would return the dict in normal code
error:
printf("error");
Py_XDECREF(dict);
Py_XDECREF(key);
Py_XDECREF(value);
Py_Finalize();
return 1;
}
You can compile like:
gcc -c -I/path/to/python/include/python3.7m dict.c
gcc dict.o -L/path/to/python/lib/python3.7/config-3.7m-i386-linux-gnu -L/path/to/python/lib -Wl,-rpath=/path/to/python/lib -lpython3.7m -lpthread -lm -ldl -lutil -lrt -Xlinker -export-dynamic -m32
The Py_DECREF(key);
in insertdict
doesn't happen for all successes. It happens on a success when an equal key was already present, either because there was an existing entry or because the dict was a split-table dict sharing keys with other dicts that had that key. On that path, the provided key isn't inserted, so the original Py_INCREF(key);
needs to be canceled.
On the key-not-present path, insertdict
hits a different return
statement and doesn't decref the key:
if (ix == DKIX_EMPTY) {
/* Insert into new slot. */
assert(old_value == NULL);
if (mp->ma_keys->dk_usable <= 0) {
/* Need to resize. */
if (insertion_resize(mp) < 0)
goto Fail;
}
Py_ssize_t hashpos = find_empty_slot(mp->ma_keys, hash);
ep = &DK_ENTRIES(mp->ma_keys)[mp->ma_keys->dk_nentries];
dictkeys_set_index(mp->ma_keys, hashpos, mp->ma_keys->dk_nentries);
ep->me_key = key;
ep->me_hash = hash;
if (mp->ma_values) {
assert (mp->ma_values[mp->ma_keys->dk_nentries] == NULL);
mp->ma_values[mp->ma_keys->dk_nentries] = value;
}
else {
ep->me_value = value;
}
mp->ma_used++;
mp->ma_version_tag = DICT_NEXT_VERSION();
mp->ma_keys->dk_usable--;
mp->ma_keys->dk_nentries++;
assert(mp->ma_keys->dk_usable >= 0);
ASSERT_CONSISTENT(mp);
return 0;
}