python multithreading volatile memory-barriers

Is Python (cpython) behavior with respect to memory barriers and atomicity etc. guaranteed?

I was wondering about the equivalent of Java's "volatile", and found this answer.

An equivalent to Java volatile in Python

Which (basically) says that everything is effectively volatile in python, at least in cpython, because of the GIL. Which makes sense, everything is locked by the GIL, no memory barriers to worry about, etc. But I would be happier if this were documented and guaranteed by specification, rather than have it be a result of the way that cpython happens to currently be implemented.

Because, say I want one thread to post data and others to read it, so I can choose something like this:

class XFaster:
    def __init__(self):
        self._x = 0

    def set_x(self, x):
        self._x = x

    def get_x(self, x):
        return self._x


class XSafer:
    def __init__(self):
        self._x = 0
        self._lock = threading.Lock()

    def set_x(self, x):
        with self._lock:
            self._x = x

    def get_x(self, x):
        with self._lock:
            return self._x

I'd rather go with XFaster or even not use a getter and setter at all. But I also want to do things reliably and "correctly". Is there some official documentation that says this is OK? What about say putting a value in a dict or appending to a list?

In other words, is there a systematic, documented way of determining what I can do without a threading.Lock (without digging through dis or anything like that)? And also preferably in a way that won't break with a future python release.

On edit: I appreciate the informed discussion in comments. But what I would really want is some specification that guarantees the following:

If I execute something like this:

# in the beginning
x.a == foo
# then two threads start

# thread 1:
x.a = bar

# thread 2
do_something_with(x.a)

I want to be sure that:

when thread 2 reads x.a it reads either foo or bar
if the read in thread 2 occurs physically later than the assignment in thread 1, then it actually reads bar

Here are some things I want not to happen:

the threads get scheduled on different processors, and the assignment x.a=bar from thread 1 isn't visible to the thread 2
x.__dict__ is in the middle of being re-hashed and so thread 2 reads garbage
etc

Solution

TLDR: CPython guarantees that its own data structures are thread-safe against corruption. This does not mean that any custom data structures or code are race-free.

The intention of the GIL is to protect CPython's data structures against corruption. One can rely on the internal state being thread-safe.

global interpreter lock (Python documentation – Glossary)

The mechanism used by the CPython interpreter to assure that only one thread executes Python bytecode at a time. This simplifies the CPython implementation by making the object model (including critical built-in types such as dict) implicitly safe against concurrent access. [...]

This also implies correct visibility of changes across threads.

However, this does not mean that any isolated statement or expression is atomic: Almost any statement or expression can invoke more than one bytecode instruction. As such the GIL does explicitly not provide atomicity for these cases.

In specific, a statement such as x.a=bar may execute arbitrary many bytecode instructions by invoking a setter via object.__setattr__ or the descriptor protocol. It executes at least three bytecode instructions for bar lookup, x lookup and a assignment.

As such, Python guarantees visibility/consistency, but provides no guarantees against race conditions. If an object is mutated concurrently, this must be synchronised for correctness.