=== rberger_ is now known as rberger | ||
=== mathiaz_ is now known as mathiaz | ||
erichammond | elmo: I just noticed that us-east-1.ec2.archive.ubuntu.com resolves to four IP addresses now (within EC2). | 22:31 |
---|---|---|
erichammond | elmo: Are these in different availability zones? | 22:31 |
elmo | erichammond: yep, they are! | 22:31 |
erichammond | nice | 22:31 |
erichammond | elmo: Are there DNS names for the individual hosts so that I can add failover to my apt.sources ? | 22:32 |
erichammond | For example, us-east-1.ec2.archive.ubuntu.com would be the round robin for load balancing and the individual hosts might have names like us-east-1-mirror1, -mirror2, -mirror3, -mirror4 | 22:35 |
erichammond | If I just add us-east-1 to /etc/apt/sources.list (as defaulted in lucid AMI) this provides load balancing, but if the IP address I happen go get is down, then I have no failover. | 22:35 |
elmo | erichammond: hmm, I thought we tested this and if the IP address is down down, apt will give up and try the next one - or am I misremembering? | 22:36 |
erichammond | elmo: We tested it and it does not retry. In fact, the apt software may never even get the chance to see multiple IP addresses. | 22:37 |
erichammond | I'm currently using the RightScale Ubuntu mirrors which have the individual host names as well as the round robin name. | 22:37 |
elmo | really? sorry, can you remind me why it won't see the multiple IP addresses? | 22:37 |
erichammond | I might be wrong on that, but I thought it simply asks DNS for an IP address and gets one of them randomly. | 22:38 |
erichammond | I do know that I tested this when one of the Canonical archives was down and it did not retry with the archive that was up. | 22:39 |
erichammond | With Rightscale I list the sequence: roundrobin, mirror1, mirror2, mirror3. | 22:39 |
erichammond | This gets load balancing from the "roundrobin" name. If the IP address I happen to request is down, it downloads packages from the next available mirror. | 22:40 |
elmo | it definitely gets all of the IPs back - and I know a web browser will retry the next IP if one IP of a round robin is down | 22:41 |
erichammond | There is a slight added expense of having to get the "apt-get update" from all mirors, but at least the "upgrade" only comes from the first match. | 22:41 |
elmo | I'll check with apt; the reason I'm reluctant is that we use the same DNS round robin for failover for archive.ubuntu.com proper | 22:41 |
elmo | so if it really doesn't work with apt that's a big problem | 22:41 |
erichammond | fair 'nuff | 22:42 |
elmo | DNS RR isn't ideal, it doesn't cover the case of a server timing out rather than being completely down, but it definitely should do basic failover | 22:42 |
erichammond | It should be easy to test if you have access to a DNS server. | 22:43 |
elmo | sure - it's more that I need to pack and sleep - but I'll open an RT ticket about it and get someone on my (former) team to check into it - do you want to be Cc-ed? | 22:45 |
erichammond | elmo: I love being in the loop, thanks :) | 22:49 |
erichammond | elmo: Looks like there's no need to create an RT ticket. | 23:13 |
erichammond | apt-get in Ubuntu 1.04 Lucid does cycle through the different IP addresses when one or more are down. | 23:13 |
erichammond | It even shows you which one it's trying as it tests each one. | 23:13 |
elmo | \o/ | 23:13 |
erichammond | I'm pretty sure that it didn't do this back in Hardy, so it must have been added in the last two years. | 23:14 |
erichammond | Since I'm upgrading everything to Lucid (gradually) I'm not going to worry about it. | 23:14 |
elmo | cool | 23:14 |
erichammond | er, Ubuntu "10.04" | 23:14 |
erichammond | elmo: Looks like I'm going to have to lose face some more. I just did tests with apt-get on Hardy and it has the same failover behavior with round robin DNS entries. I have no explanation for the failure I remember. Hopefully I remember this test and conversation and don't bother you again in another year. | 23:26 |
erichammond | Hm, I wonder if there are different failure modes, some of which retry and some that don't. | 23:28 |
elmo | hehe | 23:28 |
elmo | there could be - in particular, a network failure that doesn't return immediate failure will still have bad behaviour | 23:28 |
erichammond | Yes, it's very slow (which allows me to see that it's trying different IP addresses) | 23:29 |
Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!