Ralf Hildebrandt <Ralf.Hildebrandt(a)charite.de> wrote
Fri, 7 Jul 2017 13:24:15 +0200:
current git checkout crashes during peak time:
This issue is being tracked in RADSECPROXY-77.
Ralf has been very helpful debugging this issue. I think we might've
found it -- realm data structures are reference counted (also in a
static configuration) but increasing and decreasing the count is not
protected.
Two threads trying to increase the counter (id2realm()) simultaneously
risk overwriting the other threads update, resulting in the counter
being one less than expected. This in turn will make the refcount go
down to 0 too early (radsrv()), and get freed. After that it's just a
matter of time before malloc() hands out the memory previously occupied
by the realm and overwriting happens.
This is consistent with the observed partial overwriting of the realm
data structure and that it happens to only one of the two realms in
Ralf's config. It's also consistent with the observation that this
happens only once the frequency of requests are high enough for two
requests being handled _simultaneously_ in two threads running on two
separate CPU cores.
Ralf is currently running with a patch that fixes this by taking a
separate mutex before increasing or decreasing the reference count for a
realm.