Feature or enhancement
Proposal:
Right now, Py_SETREF is implemented in non-atomic terms, which makes it subject to operation reordering issues on weakly ordered architectures (which caused a problem fixing gh-149816, where even switching from separate Py_CLEAR and later assignment to set_sni_cb to a single "atomic" Py_SETREF didn't fix the issue on ARM, because SSL's callback was being invoked asynchronously).
Py_SETREF is naturally implemented in terms of an atomic exchange, and at least for the free-threaded builds, it seems like it should be implemented as such to avoid such raciness, supplementing the existing:
#ifdef _Py_TYPEOF
#define Py_SETREF(dst, src) \
do { \
_Py_TYPEOF(dst)* _tmp_dst_ptr = &(dst); \
_Py_TYPEOF(dst) _tmp_old_dst = (*_tmp_dst_ptr); \
*_tmp_dst_ptr = (src); \
Py_DECREF(_tmp_old_dst); \
} while (0)
#else
#define Py_SETREF(dst, src) \
do { \
PyObject **_tmp_dst_ptr = _Py_CAST(PyObject**, &(dst)); \
PyObject *_tmp_old_dst = (*_tmp_dst_ptr); \
PyObject *_tmp_src = _PyObject_CAST(src); \
memcpy(_tmp_dst_ptr, &_tmp_src, sizeof(PyObject*)); \
Py_DECREF(_tmp_old_dst); \
} while (0)
#endif
// Similarly long nonsense for Py_XSETREF
with a new atomic-based approach for free-threaded builds:
#define Py_SETREF(dst, src) Py_DECREF(_Py_atomic_exchange_ptr(&(dst), (src)))
#define Py_XSETREF(dst, src) Py_XDECREF(_Py_atomic_exchange_ptr(&(dst), (src)))
Please poke holes in this approach. I have a lot of experience with writing multithreaded C code, but will never be convinced I've covered every race condition no matter how simple the code.
Has this already been discussed elsewhere?
No response given
Links to previous discussion of this feature:
No response
Feature or enhancement
Proposal:
Right now,
Py_SETREFis implemented in non-atomic terms, which makes it subject to operation reordering issues on weakly ordered architectures (which caused a problem fixing gh-149816, where even switching from separatePy_CLEARand later assignment toset_sni_cbto a single "atomic"Py_SETREFdidn't fix the issue on ARM, because SSL's callback was being invoked asynchronously).Py_SETREFis naturally implemented in terms of an atomic exchange, and at least for the free-threaded builds, it seems like it should be implemented as such to avoid such raciness, supplementing the existing:with a new atomic-based approach for free-threaded builds:
Please poke holes in this approach. I have a lot of experience with writing multithreaded C code, but will never be convinced I've covered every race condition no matter how simple the code.
Has this already been discussed elsewhere?
No response given
Links to previous discussion of this feature:
No response