Hey list,
at the very busy (many threads, many packets, 8 cores) DFN radsecproxy
we recently got some crashes looking like
#0 0x00002aeb05fffe54 in SSL_write (s=0x0, buf=0x2aeb482a1d00, num=20) at ssl_lib.c:989
No locals.
#1 0x000000000040ce3c in tlsserverwr (arg=0x2aeb48177620) at tls.c:339
cnt = 0
error = <optimized out>
client = 0x2aeb48177620
replyq = 0x2aeb48291e60
reply = 0x2aeb482c2190
#2 0x00002aeb06650064 in start_thread (arg=0x2aeb3212d700) at pthread_create.c:309
__res = <optimized out>
pd = 0x2aeb3212d700
now = <optimized out>
unwind_buf = {cancel_jmp_buf = {{jmp_buf = {47189645776640, 1280935663696751663, 0, 47190015195840, 15, 47189645776640, 4904604809065540655, 4904632170130975791}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0},
data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
not_first_call = <optimized out>
pagesize_m1 = <optimized out>
sp = <optimized out>
freesize = <optimized out>
__PRETTY_FUNCTION__ = "start_thread"
I think this is a race condition among 2 threads in the lines
https://git.nordu.net/?p=radsecproxy.git;a=blob;f=tls.c;h=567a6be3491751cb7…
and
https://git.nordu.net/?p=radsecproxy.git;a=blob;f=tls.c;h=567a6be3491751cb7…
(i.e. client->ssl will be nulled before SSL_write() is called )
The attached patch against 1.6.9 fixes this while trying to be non-invasive to the code flow. There is still the chance of running SSL_write on a 'broken' tls connection but this is handled by openssl and 'cnt' anyway -- one could perhaps adjust the debug message then :).
The same problem exists for the dtls-case for which I don't have a test case at the moment but a patch could look exactly the same.
https://git.nordu.net/?p=radsecproxy.git;a=blob;f=dtls.c;h=f8660925ab28caef…
/Steffen
--
DFN-Verein Steffen Klemer
Alexanderplatz 1 +49 30 884299 307
10178 Berlin klemer(a)dfn.de
Germany http://www.dfn.de
eduroam Beratung:
Tel.: 030 88 42 99 91 21
eduroam technischer Support:
Tel.: 030 88 42 99 91 20
email: eduroam(a)dfn.de
Fax: 030 88 42 99 370
http://www.dfn.de
Vorstand: Prof. Dr. Hans-Joachim Bungartz (Vorsitzender)
Dr. Ulrike Gutheil, Dr. Rainer Bockholt
Geschäftsführung: Dr. Christian Grimm, Jochem Pattloch
Hi,
We have a setup where the radius server changes it's IP address
(maintaining the same DNS name) every night, because a new radius server
instance is created and the old one is shut down.
As that happens, it seems radsecproxy fails to reconnect to the new one and
keeps trying to connect to the old IP address. It doesn't try to resolve
the server host name again before trying to reconnect.
Is there any configuration settings which can enable this feature in
radsecproxy?
If not, I feel this feature should be implemented. After a few failed
attempts, it should resolve server host name before connecting again.
--
Cheers
Arun