Hi,
On 24/03/2021 08:49, Fabian Mauchle wrote:
Hi Paul,
On 23.03.21, 13:04, "Paul Dekkers" <paul.dekkers(a)surf.nl> wrote:
radsecproxy will start to send the request to fallback.server.here
because the dynamic part didn't resolve yet: it's not blocking. Only as
soon as the config for the dynamic realm is in place, when the
dynamicLookupCommand had a result, it will continue with that host.
Just reading from the code:
The first request for a new realm will trigger the dynamic process. Essentially, a copy
of the realm structure is created (for the actual realm to look up), including a copy of
the dynamic server spec, but now including the realm to look up. This first request is
implicitly placed in the queue for the dynamic server (before the lookupCommand has even
been called).
Now the server processes are started, which first calls the dynamicLookupCommand. During
this time, the server is in 'startup' state, in which it will not be considered a
valid server and will not get any more requests queued.
If any more requests arrive for this realm, they will get sent to the other servers
configured for the realm (if any). This is so that if the first request is retransmitted,
as it took too long to resolve and connect to the dynamic server, it will fall back to the
others. The clients timeout acting as an implicit timeout for the dynamic lookup (really,
if it the lookup takes longer than the clients timeout we shouldn’t wait for it any
longer)
This results in part of the conversation going via one path, part of it
via another. This breaks "the first" authentication for a realm.
Of course there is the race condition if two clients happen to send their request within
the time to resolve the dynamic server, one will get sent to the fallback server
immediately and subsequent requests might get sent to the dynamic server later when it is
established. From what I've seen in production, this case has been very rare (but
maybe in larger countries its another story).
However, even if the packets take different paths, this shouldn’t break the
authentication - changing the paths due to lost requests (udp) or reset connections (tls)
can happen any time.
It is really just one request. It's very easy to reproduce for me in
that the first EAP authentication just always fails.
I actually think it's rare on servers with load, with frequent lookups
for realms, but common on servers with little load.
Paul