[22:31] <erichammond> elmo: I just noticed that us-east-1.ec2.archive.ubuntu.com resolves to four IP addresses now (within EC2).
[22:31] <erichammond> elmo: Are these in different availability zones?
[22:31] <elmo> erichammond: yep, they are!
[22:31] <erichammond> nice
[22:32] <erichammond> elmo: Are there DNS names for the individual hosts so that I can add failover to my apt.sources ?
[22:35] <erichammond> For example, us-east-1.ec2.archive.ubuntu.com would be the round robin for load balancing and the individual hosts might have names like us-east-1-mirror1, -mirror2, -mirror3, -mirror4
[22:35] <erichammond> If I just add us-east-1 to /etc/apt/sources.list (as defaulted in lucid AMI) this provides load balancing, but if the IP address I happen go get is down, then I have no failover.
[22:36] <elmo> erichammond: hmm, I thought we tested this and if the IP address is down down, apt will give up and try the next one - or am I misremembering?
[22:37] <erichammond> elmo: We tested it and it does not retry.  In fact, the apt software may never  even get the chance to see multiple IP addresses.
[22:37] <erichammond> I'm currently using the RightScale Ubuntu mirrors which have the individual host names as well as the round robin name.
[22:37] <elmo> really? sorry, can you remind me why it won't see the multiple IP addresses?
[22:38] <erichammond> I might be wrong on that, but I thought it simply asks DNS for an IP address and gets one of them randomly.
[22:39] <erichammond> I do know that I tested this when one of the Canonical archives was down and it did not retry with the archive that was up.
[22:39] <erichammond> With Rightscale I list the sequence: roundrobin, mirror1, mirror2, mirror3.
[22:40] <erichammond> This gets load balancing from the "roundrobin" name.  If the IP address I happen to request is down, it downloads packages from the next available mirror.
[22:41] <elmo> it definitely gets all of the IPs back - and I know a web browser will retry the next IP if one IP of a round robin is down
[22:41] <erichammond> There is a slight added expense of having to get the "apt-get update" from all mirors, but at least the "upgrade" only comes from the first match.
[22:41] <elmo> I'll check with apt; the reason I'm reluctant is that we use the same DNS round robin for failover for archive.ubuntu.com proper
[22:41] <elmo> so if it really doesn't work with apt that's a big problem
[22:42] <erichammond> fair 'nuff
[22:42] <elmo> DNS RR isn't ideal, it doesn't cover the case of a server timing out rather than being completely down, but it definitely should do basic failover
[22:43] <erichammond> It should be easy to test if you have access to a DNS server.
[22:45] <elmo> sure - it's more that I need to pack and sleep - but I'll open an RT ticket about it and get someone on my (former) team to check into it - do you want to be Cc-ed?
[22:49] <erichammond> elmo: I love being in the loop, thanks :)
[23:13] <erichammond> elmo: Looks like there's no need to create an RT ticket.
[23:13] <erichammond> apt-get in Ubuntu 1.04 Lucid does cycle through the different IP addresses when one or more are down.
[23:13] <erichammond> It even shows you which one it's trying as it tests each one.
[23:13] <elmo> \o/
[23:14] <erichammond> I'm pretty sure that it didn't do this back in Hardy, so it must have been added in the last two years.
[23:14] <erichammond> Since I'm upgrading everything to Lucid (gradually) I'm not going to worry about it.
[23:14] <elmo> cool
[23:14] <erichammond> er, Ubuntu "10.04"
[23:26] <erichammond> elmo: Looks like I'm going to have to lose face some more. I just did tests with apt-get on Hardy and it has the same failover behavior with round robin DNS entries.  I have no explanation for the failure I remember.  Hopefully I remember this test and conversation and don't bother you again in another year.
[23:28] <erichammond> Hm, I wonder if there are different failure modes, some of which retry and some that don't.
[23:28] <elmo> hehe
[23:28] <elmo> there could be - in particular, a network failure that doesn't return immediate failure will still have bad behaviour
[23:29] <erichammond> Yes, it's very slow (which allows me to see that it's trying different IP addresses)