Hi,
On 24/03/2021 08:49, Fabian Mauchle wrote:
Hi Paul,
On 23.03.21, 13:04, "Paul Dekkers" paul.dekkers@surf.nl wrote: radsecproxy will start to send the request to fallback.server.here because the dynamic part didn't resolve yet: it's not blocking. Only as soon as the config for the dynamic realm is in place, when the dynamicLookupCommand had a result, it will continue with that host.
Just reading from the code:
The first request for a new realm will trigger the dynamic process. Essentially, a copy of the realm structure is created (for the actual realm to look up), including a copy of the dynamic server spec, but now including the realm to look up. This first request is implicitly placed in the queue for the dynamic server (before the lookupCommand has even been called). Now the server processes are started, which first calls the dynamicLookupCommand. During this time, the server is in 'startup' state, in which it will not be considered a valid server and will not get any more requests queued. If any more requests arrive for this realm, they will get sent to the other servers configured for the realm (if any). This is so that if the first request is retransmitted, as it took too long to resolve and connect to the dynamic server, it will fall back to the others. The clients timeout acting as an implicit timeout for the dynamic lookup (really, if it the lookup takes longer than the clients timeout we shouldn’t wait for it any longer)
This results in part of the conversation going via one path, part of it via another. This breaks "the first" authentication for a realm.
Of course there is the race condition if two clients happen to send their request within the time to resolve the dynamic server, one will get sent to the fallback server immediately and subsequent requests might get sent to the dynamic server later when it is established. From what I've seen in production, this case has been very rare (but maybe in larger countries its another story). However, even if the packets take different paths, this shouldn’t break the authentication - changing the paths due to lost requests (udp) or reset connections (tls) can happen any time.
It is really just one request. It's very easy to reproduce for me in that the first EAP authentication just always fails.
I actually think it's rare on servers with load, with frequent lookups for realms, but common on servers with little load.
Paul