api-gateway: nginx will cache resolved DNS addresses
Activity

Marcin ChalczynskiAugust 8, 2017 at 8:48 AM
PR https://github.com/mendersoftware/mender-api-gateway-docker/pull/71
based on MaciekB's example.
verified to work both when scaling up, and scaling down, including killing off all instances of a service and restarting.
I noticed a few non-critical quirks, but I have some suspicions this is sth about my local environment (acting up lately):
when a service goes away and pops back up, it never appears back in the compose output (the one you get attached to at
./demo up
)it's visible if you do {{./demo logs ... }} in a different terminal though, and everything works correctly
sometimes when killing/restarting the service, there's a http connection timeout and you get detached from compose output
again, everything still works, as confirmed by reattaching or dumping logs
I'm mentioning this because I've never seen it with any previous POC, and I've done a lot fiddling around with this.

Maciej BorzeckiAugust 7, 2017 at 12:09 PM
this might be enough
NAMES='mender-useradm mender-device-auth mender-device-adm mender-device-auth'
while true; do
dig $NAMES |grep -v -e '^;' -v -e '^$' -v -e '^\.' > /tmp/addrs.new
if test -e /tmp/addrs; then
if ! cmp /tmp/addrs.new /tmp/addrs; then
echo '-- reload'
else
echo '-- no reload'
fi
fi
mv /tmp/addrs.new /tmp/addrs
sleep 10
done

Marcin ChalczynskiAugust 7, 2017 at 11:35 AM
ok, but I'd propose to have this optimisation in right from the start:
have a primitive python script that does
nslookup
on known servicesand dumps the ips to a file
and compares current ips to previous ones, if any new ips are detected - reloads nginx
have cron run it however often we want

MaciejAugust 7, 2017 at 10:37 AM
@Maciej Borzecki for sure, probably doing whole graceful shutdown and so on. @Marcin Chalczynski anything sane would be minutes probably doe to the cost
as mentioned this is pretty much a hack, if we are out of options we can try this - still slightly more convenient than reseting whole container
we could also be a bit smart and detect if it needs reload if we want to spend more time on optimising

Marcin ChalczynskiAugust 7, 2017 at 10:21 AMEdited
Well, what worries me is that the reload interval probably couldn't be on the order of seconds.
What would be a sane value here? 1 min, 5mins? Anyway the longer the better, but this means longer downtime for the user (EDIT: what I mean is - they're likely to get frustrated and just restart the whole setup in the meantime).
nginx caches resolved DNS names. As an effect, when restarting a number of services in the backend, nginx may incorrectly direct proxy requests.
The reason is that each service may get a different IP address when it starts. The best case is that docker assigns a new (previously unused) IP address to created service, in which case nginx will have no upstream to direct the requests to. The worst case is that service that had one address before restart, gets a new IP address that was used by another service. In this case, nginx will direct requests to incorrect service and accessing endpoints will return 404.
Possible solutions:
try to figure out if
proxy_pass
with variable and a resolved fix the problemprovide a helper that sends SIGHUP to nginx if addresses change