Steffen Klemer klemer@dfn.de wrote Mon, 20 Nov 2017 14:27:24 +0100:
This looks like RADSECPROXY-47 [0] which was identified earlier but sadly never addressed.
[0] https://project.nordu.net/projects/RADSECPROXY/issues/RADSECPROXY-47
Oh, as it was marked 'fixed' I thought that the whole if(client->ssl==NULL) stuff was the (non-complete) fix of this bug :).
It's not marked as fixed.
The attached patch against 1.6.9 fixes this while trying to be non-invasive to the code flow. There is still the chance of running SSL_write on a 'broken' tls connection but this is handled by openssl and 'cnt' anyway -- one could perhaps adjust the debug message then :).
How do you mean openssl and 'cnt' handle this?
IIRC (and from the behavior of the patched version running here) openssl notices on its own that the connection is gone and adjusts the return-value ('cnt'-variable) which we check anyway. This in mind the only thing we have to do is to wake up the sleeping process which is what I (try to) do with the sslalive variable.
Is this true for the case where we break out of the loop in tlsserverrd() because !radsrv(rq) too? I'd guess no.
An alternative fix might be to simply check for client->ssl != NULL before calling SSL_write() outside of the replyq->mutex lock. Did you consider this? Worth investigating?
The race would be closer but you could still have tlsserverrd() execute 'client->ssl=NULL' between your proposed check and the call to SSL_write(). If we want to go this way I think the only option is a new
You're right, of course.
mutex per tlsserverrd-tlsserverwr pair that hinders setting ssl=NULL before executing SSL_write() with it. But I don't see why this should be necessary as openssl seems to be clever enough.
Perhaps if we would explicitly close in the !radsrv(rq) case mentioned above.
While writing this: Where exactly do we ssl_free() the content of client->ssl?
Seems like tlsservernew() does that, once tlsserverrd() has returned, using another copy of the pointer to the SSL object (!).