Are concurrent non-atomic writes to never-read memory safe?

Note: not a duplicate of this question.

Provided a region of memory is never read, when are concurrent non-atomic writes safe to it? This is language-agnostic and mostly related to CPU architectural constraints.

The idea is for dropping bytes in non-seekable objects like TCP socket in a multi-threaded app, but in cases where BPF drop filters aren't viable.

Are such writes okay at byte resolution? I do currently have non-mission-critical code in the wild assuming this (by doing concurrent read syscalls to it to sinkhole all received data entirely) and haven't had issues so far.
Do the writes have to be limited to their own cache lines? One core could write and the other read and they could run into consensus issues. However, I'd think this would only impact the (never-read) memory region?
Do the writes have to be limited to their own (usually 4 KiB) pages? I could see TLB issues in a few cases if it assumes unsyncronized instruction means no TLB synchronization is needed at all, though this is admittedly very unlikely (most of that same synchronization is needed just to handle concurrent requests with different addresses correctly).
Is it just never safe at all in general?

Solution

No mainstream CPU ISAs do hardware race detection, so it's safe in asm.
Atomic stores (not RMWs like CAS or ++atomic_var) use the same asm instructions as normal stores.

Cache is coherent on all CPU ISAs (not necessarily GPUs), at least between cores that we run threads across, which is why C / C++ volatile works similar to atomic<T> with memory_order_relaxed, on normal compilers that treat volatile accesses as requiring a load or store in the asm. (When to use volatile with multi threading? - never, except in Linux kernel code which rolls its own atomics with inline asm for ordering and GNU C semantics for volatile which include doing the load or store with a single instruction if possible.)

So for a CPU core to commit a store from its private store buffer into coherent L1d cache, it needs to get MESI Exclusive ownership of the cache line it's storing into before it can commit. In C++ terms, this is why a "modification order" exists for all atomic variables. In asm, such an order exists for every cache line.

If you had an ARM board with a microcontroller + DSP with non-coherent shared memory, you could still have both cores store to the same address without anything going wrong, other than getting potentially surprising values if you ever read that shared memory region. e.g. a core with cache would probably see the value it last stored, even if the other core had done a later store.

In high-level languages, it's generally not safe to assume that your code using non-atomic assignments compiles to equivalent asm to atomic with relaxed, unless you use something like C volatile on mainstream compilers that handle it the way you'd expect.

On a C implementation like clang -fsanitize=thread, it may abort or print extra stuff to stderr when two threads concurrently write the same non-atomic int *.

I don't know what you're talking about with write system calls; that seems very distantly related to asm stores, and the kernel will need to implement POSIX semantics for write atomicity (atomic update of the file-position and the I/O, in case multiple processes or threads are sharing the same open file descriptor.) See https://man7.org/linux/man-pages/man2/write.2.html#BUGS for some mention of that, e.g. Linux 3.14 and later implement this correctly.

TLBs aren't relevant here.