[00:10] alexisb: free now if you are [00:11] anastasiamac: looks like you need to pull juju/testing etc to get latest versions [00:12] wallyworld, ok, brt [00:12] this elements that the build contains are in charm.v4 [00:13] but it looks like there is a conflict with gopkg.in/charm.v4 vs my github.com/juju/charm.v4?... have no idea how to resolve it [00:13] wallyworld: ^^ [00:13] i'll try pulling testing though I cannot c if/how it's related.. [00:13] anastasiamac: the error indicates that juju/testing is out of date [00:14] k. thnx ;) [00:20] wallyworld: pulled testing but the error for charm.v4 is the same.. === kadams54 is now known as kadams54-away [00:25] wallyworld: SHAM-WOW! http://reviews.vapour.ws/r/658/ [00:42] wallyworld: https://bugs.launchpad.net/juju-core/+bug/1403689 [00:42] Bug #1403689: Server should store tools of unknown or unsupported series [01:06] thumper: just finished meeting, thanks, will look [01:06] np [01:44] ericsnow: ping, in moonstone [02:02] ok, I'm turning distractions off and going to try to focus for an hour [02:02] now if only the kids comply... [02:03] * thumper switches music from random to heavy mix [02:07] thumper, morning - mind if this kid distracts you for a moment? [02:07] * thumper looks at mattyw [02:08] thumper, I made this small change in juju/utils the other day - I'm not sure if it belong there - but as it was small I thought - better to ask forgiveness than permission - what do you think? http://reviews.vapour.ws/r/634/ [02:09] seems reasonable [02:13] thumper, do I $$merge$$ utils? [02:13] I don't remember [02:13] try it and if the bot doesn't say anything [02:13] manually do it [02:14] thumper, thanks for the help - you can go back to [02:19] thumper, me again - I don't have permission, can you hit merge for me? [02:19] done [02:19] thanks very much [02:59] wallyworld, around? [03:00] jog: in a meeting, can be with you soon [03:01] I'm considering setting bug 1396099 as a blocker on master, please see my latest comment when you're available [03:01] Bug #1396099: AWS/Joyent/manual/maas: juju deploy error "connection is shut down" [03:05] so in cloud-init.log I see some log lines that read writing /home/ubuntu/.ssh/authorized_keys .. but when I look in the file, the keys don't match my jenv. Any pointers on where to start poking around? [03:05] jog: can I send you some clickbait, or interesting wired articles or anything? [03:06] jog: just trying to buy some time before CI gets blocked [03:06] heh heh [03:08] davecheney: hey just reviewing some of your comments [03:09] davecheney: to maybe save you a bit of time: as this is refactoring, i don't want to change any existing code i don't have to touch [03:09] davecheney: e.g.: eitherState (which i agree is a little strange) [03:09] katco: if you don't clean it up now [03:09] then when ? [03:10] refactoring sounds like the perfect time to clean house [03:10] davecheney: i will be making changes to this area for awhile longer [03:10] davecheney: this piece is just so i can land some leadership functional tests [03:11] davecheney: once that's done, i will be circling back and giving this package some more TLC :) [03:12] sgtm [03:12] davecheney: i do appreciate your thoughtful reviews [03:12] davecheney: in fact, if you want to call anything out, but not open it as an issue, i'd love that so i can reference it [03:29] folks, the latest version of juju/utils contains a call to errors.UserNotFound https://github.com/juju/utils/blob/master/file_unix.go. But that call doesn't exist [03:30] I think we should just return err here - what does everyone else think [03:40] wallyworld, jog: I wonder if that's the new certupdater work. it can lead to the API server being restarted soon after the machine agent starts up. [03:41] mattyw: is errors.UserNotFound in a later version of juju/utils? [03:41] mattyw: can we just update [03:41] s/work/worker/ [03:41] mattyw: otherwise I'm in favor of just returning err too [03:43] wallyworld, jog: that issue has certainly been happening a lot in CI lately [03:44] yup, it's been a big problem for us lately [03:46] jw4, is in the latest version [03:47] jw4, you mean the latest version of juju/errors? [03:48] jw4, ah yes, it's in the latest version of juju errors - I thought I'd tried that, but apprently not [03:50] mattyw: I meant the latest version of juju/utils because I misunderstood you - but I'm glad you found it anyway :D [03:52] davecheney: FYI, I've added a set.Ints (http://reviews.vapour.ws/r/659/) and updated the PortSet patch to use it (http://reviews.vapour.ws/r/617/) [03:52] davecheney: thanks for the nudge [03:52] ericsnow: nothing yet on why authorized_keys doesn't contain our keys [03:52] wwitzel3: :( [03:52] ericsnow: I did take care of the autoDelete though [03:53] wwitzel3: sweet [03:53] wwitzel3: I'm pretty sure once the connection issue is resolved we'll be bootstrapping on GCE!!! [03:54] ericsnow: yeah, seems that way .. I'm going to login manually while the attempting to connect loops are running and add the key to authorized_keys and see if that fixes it [03:55] wwitzel3: sneaky :) [03:55] ericsnow: ta [03:55] davecheney: it was easier than expected :) [03:55] ericsnow: yeah, the cloud-init.log file looks good, it appears to update the auth_keys file for both ubuntu and root [03:56] ericsnow: but our keys never end up in there :( [03:56] wwitzel3: ?!? [03:58] wwitzel3: where does cloud-init get the stuff it's supposed to add? [03:58] ericsnow: so far, no luck, I've added the keys, but the Attempting to connect is still just hanging. [03:58] ericsnow: it is from the jenv for gce [03:58] thumper: i was trying to sort out my xmas leave [03:58] https://sites.google.com/a/canonical.com/operations/people-and-culture/dashboard [03:59] this page is now 404 [03:59] wwitzel3: oh, right [03:59] where is the calendar that tells us what days we need to claim ? [03:59] hr somehwere [03:59] waigani: do you have that link somewhere? [03:59] waigani: you had it the other day [03:59] wwitzel3: what feeds that data to cloud-init on the new instance? [03:59] waigani: the christmas leave page [04:00] wwitzel3: I have a feeling it's that metadata :( [04:00] ericsnow: error: Could not load host key: /etc/ssh/ssh_host_ed25519_key [04:00] thumper: https://sites.google.com/a/canonical.com/operations/people-and-culture/general?pli=1 [04:01] davecheney: ^^^ [04:01] waigani: ta [04:02] ericsnow: it looks like we are connecting successfully, but that error is resulting in a disconnect [04:03] wwitzel3: ah [04:03] ericsnow: the keys still aren't getting populated correct, but that error is also preventing us from connecting [04:04] ericsnow: I'm going to keep swing my bat around in this china shop for a while, I'll let you know what i come up with him the morning [04:04] ericsnow, hey [04:04] wwitzel3: k [04:04] dimitern: hey [04:04] menn0: jog: sorry, just finished meeting [04:04] ericsnow, can you have a quick look at this please? http://reviews.vapour.ws/r/656/ [04:05] dimitern: sure [04:05] wallyworld, just trying to decide what to do about bug 1396099 [04:05] when juju starts, it will restart the state server api to accommodate newly known machine addresses [04:05] Bug #1396099: AWS/Joyent/manual/maas: juju deploy error "connection is shut down" [04:05] ericsnow, thanks [04:06] jog: if the CI scripts immediately connect to the state server api, they could come undone as the state server api will be restarted very shortly after the state server comes up [04:06] this restart is necessary to accommodate a new server certicicate being generated [04:07] the certificate regeneration is needed to allow https connections over the state server IP addresses [04:08] jog: is it plausible to add a small delay to the CI script? [04:08] wallyworld: how long does it go before restarting after coming up? [04:08] wallyworld, yeah that sounds like what's happening, but if we add a sleep to the test script, customer who are also scripting will experience intermittent connection failures. [04:08] could use a quick review on https://github.com/juju/charm/pull/81 if anyone has a few moments :) [04:08] ericsnow: as soon as it learns about the state server machine addresses, not sure of the exact time [04:08] * jog is actually testing a sleep now [04:08] a worthy Actions schema format [04:08] wallyworld, can't we use the upgrade blocker or something to block deployments until api server cert is regenerated? [04:09] dimitern: it's a change listener, the addresses could change anytime [04:09] thumper: ta [04:09] wallyworld, hmm right [04:09] dimitern: so before this, any htths connection over the state server ip address would fail [04:09] wallyworld: I've see issues with the API client used in backups commands being good for only one request (and then the connection shows up as disconnected), so perhaps that's related [04:10] wallyworld: ah [04:10] ericsnow: the backup client, if run after the state server has fully come up, will not be affected by this [04:11] wallyworld: yeah, that's what I gathered from what you just said :) [04:11] well, by fail, i mean certificate verification would fail [04:13] dimitern: LGTM [04:13] ericsnow, cheers [04:14] wallyworld, well, the api client could be made more robust I guess [04:15] wallyworld, I mean since this happens at every bootstrap the initial connection (or say first 3 attempts) might fail, but shouldn't be logged as errors [04:16] wallyworld, and the sleep could be in there, rather than in the ci script [04:24] ericsnow: thanks for that clarification - copied:= *meta... looked like a pointer assignment to me [04:26] dimitern: the issue is that if it is really quick, a client might grab a connection which is then lost with the restart. we could look at delaying the state server start until after the first address change [04:26] wallyworld, that second part sgtm [04:26] i'll raise a bug [04:30] jw4: yep, * != &, but our brains don't handle that so well sometimes :) [04:34] ericsnow: this should be on a t-shirt :D [04:35] ericsnow: :-p [04:35] ericsnow: yours did [04:36] jw4: sure, this time... :) [04:36] hehe [05:07] jam1: hi, i see you guys did some work on the status spec. i made a couple of v1 compatibility comments near the top. i also see you didn't like "broken", which is to blocked as busy is to waiting [05:08] wallyworld: so it isn't entirely, we did discuss it a bit [05:08] specifically, Broken is actually "come look and fix something with *me*" [05:08] which is what we had as broken [05:08] yes [05:08] just like busy, which is waiting on me [05:08] (yes, it means you need to relate me to something else, but that distinction vs you need to fix my config isn't very compelling) [05:08] wallyworld: so still, not exactly [05:09] sorry, I had a typo [05:09] not just my config, could also be disk spac eetc [05:09] *blocked* is come look at me [05:09] doesn't matter, we can go with what's there [05:09] i see there's a lot of extra unit states [05:11] wallyworld: so I would like you to understand how we put the rationale together, I realize in the scale of things the specifics aren't huge, but there was a fair discussion and I was happy with how the mapping worked out. [05:11] i think i disagree that error shouldn't be on agent-state, as just because a hook fails, doesn't mean that the software isn't running, but you guys would have discussed that [05:11] wallyworld: so one guiding thing there is that we want to move to where we have 1 Juju agent for the machine and all its units [05:11] not 1 unit-agent per unit [05:12] yes, that's true [05:12] wallyworld: so you do need a place to say "this unit failed its hook", without saying that all units are dead [05:12] so error is best done on unit then [05:12] fair point [05:12] wallyworld: IIRC there is *also* error on the agent for compat [05:12] and then we drop that in favor of "failed" [05:12] when the agent itself is unresponsive [05:12] yes [05:13] so you agree with my v1 compat comments? [05:13] s/do/do [05:13] s/so/do [05:13] I haven't gotten to them yet, just chatting with you here [05:13] sure, sorry [05:14] wallyworld: np. So Mark feels strongly that we can drop pending, because nobody is actually depending on it, but we'll keep Started and Error [05:14] We don't need Installed, because nothing stayed there very long [05:14] Down is a fair point, though [05:14] i figured that would be the case for Installed, just wanted to be 10000% sure [05:15] i wasn't sure if we wanted to be really anal about keeping 100% compat [05:15] i think we need to keep Stopped also [05:15] s/need/should [05:16] as we don't know who has scripts that depend on it [05:16] wallyworld: this is agent-state stopped, right? Are we actually able to depend on it? We can run it by fwereade or someone, but I thought Stopped only existed as long as the Database hadn't cleaned up that unit yet [05:17] We may need to keep it for the same purpose, though. [05:17] jam1: yes, the current agent-state Stopped [05:17] we might be able to safely drop it, as for Installed, just want to be sure [05:49] I'd really like to land this so TheMue can land the changes in master tomorrow: https://github.com/juju/charm/pull/81 [05:49] it's a pretty simple change which tweaks the way Charm parses actions schemas from yaml [05:50] so when cloud-init writes the authroized_keys file, where does it get the information it writes in to there from? [05:51] s/tomorrow/today [05:52] (but a huge usability improvement from the charm author's perspective) [05:52] bodie_, looking [05:53] wallyworld, looks like something with 24c1b80d is affecting upgrades across multiple substrates [05:53] dimitern, thanks! [05:53] let me look at what the rev is [05:54] jog: this PR https://github.com/juju/juju/pull/1291 ? [05:56] wallyworld, yes [05:56] i have no knowledge of that work, what do the CI logs say is the problem? [06:01] wallyworld, joyent, aws, hp, azure, maas, KVM, ... all timing out after waiting 10 minutes for juju status after the upgrade... so at a minimum the time for an upgrade to complete has increased === kadams54 is now known as kadams54-away [06:02] wouldn't surprise me if it's more than that, more likely to be a breakage, the juju state server upgrade and machine log would be helpful [06:03] so in the cloud-init log I see 014-12-18 05:51:56,497 - util.py[DEBUG]: Writing to /home/ubuntu/.ssh/authorized_keys - wb: [384] 1416 bytes [06:03] which implies that it is writing the keys, but when I look at that file, there are only the keys from the provider [06:03] I wonder if Google is overwriting them after we write them .. [06:04] could be, i know nothing about gce [06:06] hrmm actually I think maybe I am just not writing them to the proper place in the metadata [06:06] wallyworld, I can attach log or if you want to look sooner one instance is under the artifacts here: http://juju-ci.vapour.ws:8080/job/aws-upgrade-precise-amd64/2176/ [06:06] but then what is cloud-init writing? .. hrmm [06:07] jog: looking [06:10] wallyworld, this looks like the same issue menn0 fixed yesterday " login blocked because upgrade is in progress" [06:11] i wasn't aware of that fix - do the logs look the same or similar? [06:13] i can see a test fix [06:15] the logs sure do have a lot of terminated connections [06:15] fwereade: ping for where you're at with active/goal [06:16] bodie_, reviewed [06:16] dimitern, thanks, have been following along making a few changes :) [06:16] I really ought to call it a day though [06:17] I think most of this should be very straightforward -- can you sync up with TheMue? I have to get up early to travel in the morning [06:17] he offered to pick it up since I'm leaving town [06:17] or... expect him to ping you back in there [06:17] wallyworld, this was yesterdays fix https://github.com/mjs/juju/commit/f22f2f07ace804fbce81b66bfe938439a6878a29 [06:17] or something which works well and makes everyone happy ;) [06:17] bodie_, ok, feel free to land this, but I'd like you to address the suggestions in a follow-up if you don't mind? [06:19] bodie_, I will sync up with TheMue [06:20] jog: could be related, but doesn't seem like it. i can't see off hand from the logs what the issue is. more detailed investigation is required [06:20] are these upgrade failures intermittent? [06:21] wallyworld, I might be able to help there, let me have a look at the logs [06:21] ty :-) [06:24] wallyworld, dimitern nearly all substrates started failing upgrade tests with pull 1291... so not intermittent, rather very consistent [06:25] jog, why is the job destroying the environment? http://juju-ci.vapour.ws:8080/job/aws-upgrade-precise-amd64/2175/console after upgrade [06:25] * wallyworld bbiab [06:26] jog, this is seems fishy 2014-12-18 03:27:58 INFO juju.provider.common destroy.go:15 destroying environment "aws-upgrade-precise-amd64" [06:26] it waits 10 minutes checking 'juju status' and then gives up and destroys the environment, so the resources are available for the next test [06:26] jog, ah, ok [06:28] dimitern, wallyworld, I think we should block on this, it might be harder to figure out if addition code lands [06:36] dimitern, that sounds perfect, much appreciated! [06:39] jog, looking at the logs so far it seems the upgrade was completed on machine-0, but the upgrade block wasn't lifted [06:43] fwereade: poke when you're awake [06:44] man, it got late. We are so close to getting GCE to bootstrap. [06:46] wwitzel3, \o/ [06:46] dimitern, I'm going to go ahead and ship the charm fix, and open a new PR with the requested changes referenced. I also have a branch for updating tests on master to reflect this stuff [06:48] bodie_, sweet, thanks [06:54] jog, this is the issue: cannot set agent version for machine 0: not found or dead [06:54] jog, and machine-0 is obviously alive and well, so something around env-uuid changes in state recently does not work properly [06:55] fwiw, the PR to fix the charm actions parsing in juju master is http://reviews.vapour.ws/r/661/ if anyone fancies having a look [06:55] dimitern, I opened bug https://bugs.launchpad.net/juju-core/+bug/1403738 [06:55] Bug #1403738: upgrade tests fail on multiple substrates with revision 24c1b80d [07:07] dimitern, do you need anything else from me? If not my day is long over. [07:07] jog, can you re-run one of the failing jobs, but using logging-config: =TRACE in envs.yaml ? [07:08] jog, with =INFO we're practically loosing all context during the upgrade - all upgrade jobs should run with at least logging-config: =DEBUG [07:09] ok [07:09] wallyworld: fwereade: we feel pretty good about where the status spec is at (you should have gotten an email). So comments are welcome. [07:10] jam1: ty, will look [07:15] wallyworld: are you around at all during the break? [07:15] I feel like it might be good to have a hangout to discuss finer points, but I know everyone is officially not-working [07:17] jam1: sure, i can be available, what time is the break? [07:18] wallyworld: I mean Holiday break [07:18] eg, next week [07:18] jam1: oh, right :-) that will be fine too [07:19] maybe one evening my time, afternoon your time, which will be midday for william [07:19] wallyworld: k, I know I'm out of town from 25-1st, but I'll have Monday/Tues that I'm just relaxing around the house [07:19] ok, maybe aim for monday depending on william's availability? === urulama|out is now known as urulama [07:22] fwereade: you free for a catchup in 10? [07:38] jog, updated the bug [08:36] jam1: storage phase 1 spec has openstack cinder volumes in scope, yet the in scope providers are listed as maas, local, aws. i thought openstack was out of scope for phase 1 [08:42] jam1: you may have missed my message when your connection bounced [08:42] storage phase 1 spec has openstack cinder volumes in scope, yet the in scope providers are listed as maas, local, aws. i thought openstack was out of scope for phase 1 [09:04] wallyworld: thanks for the heads up, the network here likes to stay up for approximately 2min before needing to be reset… [09:04] I did try to flag that with a comment, can you make sure there is a note if mine didn't go through ? [09:04] sure, will do, just about to be called for dinner, will do it straight after [09:26] morning [09:32] TheMue: o/ === rogpeppe3 is now known as rogpeppe [09:48] dimitern: ping [09:52] voidspace, pong [09:54] dimitern: hey, hi [09:54] dimitern: davecheney suggests that network.SubnetInfo should use net.IP for the AllocatableIPLow and High [09:55] dimitern: what do you think? [09:55] voidspace, sgtm [09:55] dimitern: we have to convert back to strings where we *use them* [09:55] dimitern: in state and on the wire [09:55] dimitern: and it doesn't save us validation as constructing an IP doesn't return an error (you have to check for a nil value) [09:56] so I'm not sure what it buys, beyond more conversions [09:58] dimitern: TheMue: little girl just woken up - neighbour has agreed to babysit (wife out), but I have to set that up [09:58] dimitern: TheMue: will take a few minutes, so will be late to standup again... sorry [09:59] voi [09:59] voidspace: ok [10:05] is a unit's public-address also supposed to be accessible from within an environment? [10:05] dimitern: ^ [10:07] TheMue: dimitern: babysitter here, omw [10:07] rogpeppe, you mean like in ec2 automatic public ips ? [10:07] the question is really: is it reasonable to have a single address for a service endpoint that works both within the environment (from unit to unit) and from outside it? [10:07] rogpeppe, short answer - it depends [10:08] dimitern: i mean the public-address as reported by the unit-get public-address charm tool [10:08] rogpeppe, in joyent for example you can, but not in ec2 or openstack (depends on how floating ips are configured) [10:09] dimitern: i thought it worked ok in ec2 as the public-address resolves correctly whether you're inside or outside the cloud [10:09] TheMue: dimitern: struggling to join... [10:09] dimitern: but perhaps i'm misremembering? [10:09] "Trying to join the call. Please wait..." [10:10] dimitern: i'm more concerned with the intra-environment behaviour here [10:10] rogpeppe, you're talking about different things [10:10] dimitern: as i already know that public-address might not be accessible from outside the env - that's an issue we always need to deal with [10:10] rogpeppe, the ec2 instance dns name resolves to internal ip in ec2 or to the public ip outside [10:11] dimitern: and that's what we report for public-address, right? [10:11] dimitern: or has that changed? [10:12] perhaps i should phrase the question like this: can i be sure that a unit can connect to another unit's public-address as well as its private-address? [10:13] voidspace, dimitern: FWIW, I'm +1 on using net.IP when we know we've got IP addresses [10:14] voidspace: and net.ParseIP does validate, even though it doesn't return an explicit error [10:16] rogpeppe, it depends on the provider [10:17] dimitern: hmm, that's not great [10:18] rogpeppe, in standup now, i'll get back to you in a bit [10:19] dimitern: ta [10:45] another network-related question: is it possible for a unit to find out the public address of another unit that it's related to? [10:45] dimitern, fwereade: ^ [10:46] rogpeppe, so you can rely on a split horizon dns name to work internally and externally in ec2 and openstack (if so configured) [10:47] rogpeppe, not automatically - via relation settings [10:47] dimitern: and public-address will always return a split-horizon dns name [10:47] ? [10:47] rogpeppe, I wouldn't say always [10:47] dimitern: ok, so i can't rely on this at all then? [10:48] rogpeppe, in juju status - most likely, as unit.PublicAddress - if set by the provider (IIRC some providers were explicitly changed to either return dns name or ip for various reasons) [10:49] dimitern: my situation is that i have a web service that returns some data to the client which includes the address of another service for the client to connect to. [10:49] dimitern: i'd like that client to work correctly whether it's inside the environment or outside it [10:49] rogpeppe, right [10:49] dimitern: it currently looks like that's not possible without returning more than one address [10:50] rogpeppe, yes [10:50] dimitern: and that a standard http relation isn't sufficient to find out the public address [10:50] rogpeppe, hopefully this will change as more networking model stuff lands [10:50] dimitern: i'm guessing that it will only change if this specific requirement is on the roadmap [10:51] rogpeppe, alternatively, you can use custom networking config inside the charm [10:51] dimitern: how would that work? [10:52] rogpeppe, so your webservice charm returns this info via relation settings? [10:52] dimitern: no, via http [10:53] rogpeppe, ok, so the webservice is not running inside a charm? [10:53] dimitern: it's part of the http API that it exposes [10:53] dimitern: yes, it is running inside a charm [10:53] dimitern: all the services here are running as charms, possibly excluding the client [10:53] rogpeppe, and you want to return a single hostname/ip that's usable both internally and externally? [10:54] dimitern: ideally, yes [10:55] rogpeppe, so first, the only way I can think of is to use a split horizon dns name [10:55] dimitern: but i can't rely on that, right? [10:55] rogpeppe, and it needs to be supported either by the cloud itself (like ec2) or by another exposed service running a dns server [10:57] rogpeppe, not right now, because the addresses we store in state are not in a single place [10:58] rogpeppe, that's why it's unreliable - unit.PublicAddress() returns whatever its assigned machine's Addresses() method returns [10:58] dimitern: an arbitrary selection, presumably? [10:59] dimitern: because Addresses can return many addresses [10:59] dimitern: i think i'll just go with returning several addresses to the client and relying on them to try all of them [11:00] rogpeppe, nope [11:00] dimitern: anything else i think is being unreasonably optimistic/platform-dependent [11:00] rogpeppe, since a recent change I made all addresses are consistently ordered - public ips before hostnames, then cloud-local, etc. [11:01] dimitern: ok, so it's still an arbitrary selection but at least a stable choice, then? [11:01] rogpeppe, however the uncertainty still exists, as those addresses are merged from the instance addresses (coming from the provider) and machine ones (as discovered by the net package) [11:01] rogpeppe, it is stable yes [11:01] dimitern: hmm, so that means that IP addresses are always chosen over host names? [11:02] dimitern: so you'll never get a split-horizon DNS name? [11:03] rogpeppe, and provider addresses always shadow machine ones - so if the provider (e.g. like maas) adds dns names in addition to ips in response of calling instance.Addresses() - you'll get those, then machine addrs in a single list, ordered [11:03] rogpeppe, effectively, since that change it's even less likely (can only happen if there are only hostnames) [11:04] dimitern: so i'm right that it'll never return a DNS name when an IP address is available? [11:04] dimitern: "it" == "unit-get public-address" [11:04] rogpeppe, but this can change - preferring ips over hostnames was a requirement for api endpoints, but for intra-environment communication can be different [11:05] dimitern: in this case, this is about from-the-outside access to the environment [11:06] rogpeppe, it depends what ip - if it's public, yes - it will always come before any hostnames; if cloud-local - hostnames come first [11:06] dimitern: aren't ip addresses less stable than dns names? [11:07] rogpeppe, in maas'es case for example we have "vm0.maas" "192.168.10.1" and "127.0.0.1" - hostname will be chosen [11:07] rogpeppe, ips are absolute (much more so at least than hostnames that can resolve to anything) [11:07] dimitern: yeah, but ip addresses can be on short-term lease [11:08] dimitern: in this case i need a stable address that can be used to contact a service [11:08] rogpeppe, that's rarely an issue [11:08] rogpeppe, I agree [11:08] rogpeppe, and can suggest raising a bug about it :) [11:09] dimitern: ok [11:09] rogpeppe, i.e. have a way to get a hostname if possible for public-address [11:10] dimitern: it would be quite nice if a unit was able to get the public address of a related unit as well as its priviate address too [11:11] rogpeppe, it still can - if the remote unit sets its address into the relation settings [11:12] rogpeppe, also, there might not be a public address to get [11:12] dimitern: also, this means that "unit-get public-address" in ec2 will never return an address that is reachable from within the environment, right? [11:12] rogpeppe, e.g. in maas all ips are cloud-local [11:13] dimitern: that's true. i'm thinking that public-address is provided by default (when available) along with private-address [11:14] rogpeppe, that's like this since several months now actually - after DNSName got dropped from Environ [11:14] dimitern: just to be clear, there's currently no way to obtain the ec2 split-horizon DNS name within juju, right? [11:14] rogpeppe, ec2 now only reports ips (public and private) [11:15] rogpeppe, there are lots of ways :) - fetching a metadata url from the charm for example [11:15] rogpeppe, but not a "usual" way [11:16] dimitern: yeah, i don't want to write ec2-specific code in my charm [11:16] dimitern: that kinda loses the point of juju [11:16] rogpeppe, i can't recall what was the reasoning behind removing DNSName() from Environ [11:17] rogpeppe, true, i'm not suggesting it seriously [11:17] dimitern: i guess one way forward would be to expose a way to get all a unit's addresses from a unit, (and possibly from a related unit) [11:20] rogpeppe, yeah - a client api call "get all unit addresses" [11:21] dimitern: i'm actually thinking of: unit-get addresses [11:21] rogpeppe, which then can be called from a hook tool that does that [11:21] dimitern: and relation-get addresses [11:22] rogpeppe, the address-get hook tool that's on the roadmap for 15.04 will do this [11:22] dimitern: the client side is another question - the only way to get a unit's address is through status currently, right? [11:22] rogpeppe, by default it will return a single address; you can specify -r -maybe --all as well [11:23] dimitern: ah, replacing unit-get ? [11:23] rogpeppe, yep - unit-get will only live for backwards compatibility but will get dropped and aliased to something else in the mean time (most likely address-get) [11:50] mm, heston blumenthal mass-produces a pretty decent christmas cake [12:05] fwereade: :) [12:05] * rogpeppe has mince pies downstairs [12:07] dimitern: as it turns out the public IP address also works ok internally in ec2 [12:08] rogpeppe, nice! :) [12:14] fwereade: sounds yummy. feeling better today ? [12:15] jam2, yeah, more-or-less with it again [12:29] dimitern: fancy giving my PR another quick look-over (including string to net.IP change) [12:29] dimitern: http://reviews.vapour.ws/r/644/ [12:30] voidspace, sure thing [12:35] voidspace, great, just 1 typo [12:35] well, not a typo - more like an omission [12:36] dimitern: ok, cool - thanks [12:44] dimitern: hmmm... looks like the change to SubnetInfo to use net.IP instead of string is causing a panic [12:45] dimitern: ... Panic: runtime error: hash of unhashable type network.SubnetInfo (PC=0x414676) [12:45] [12:45] dimitern: investigating [12:45] dimitern: hah, jc.SameContents is panicking trying to compare [12:46] voidspace, use jc.DeepEquals instead? [12:47] dimitern: I presume SameContents is being used to not be order dependent [12:47] dimitern: (this is a pre-existing test) [12:47] dimitern: I'll try it though [12:47] it works [12:47] ShipIt! [12:53] voidspace, it's used for maps (or was it slices?) but works for gcc-go (ppc) and golang-go [12:54] voidspace, \o/ [14:23] ericsnow: bug #1403662 lgtm, so I'm going prepare the commit to backout the hack we have working around it [14:24] ericsnow: also I still didn't get the ssh issue solved .. even with explictly adding the sshKeys to the GCE metadata. [14:24] ericsnow: it is like there is a step we are missing [14:51] ericsnow: ok, so I have the keys being properly uploaded to GCE, now I'm just trying to resolve this "error: Could not load host key: /etc/ssh/ssh_host_ed25519_key" [14:51] ericsnow: that is the error the juju ssh client keeps disconnecting on [14:52] Good morning all. [14:55] ericsnow: looks like we are running in to this bug https://bugs.launchpad.net/cloud-init/+bug/1382118 [14:55] Bug #1382118: Cloud-init doesn't support SSH ed25519 keys [15:13] wwitzel3: looks like you're on to something [15:15] well that looks less than promising [15:20] natefinch: it does mean we are nearly done with bootstrapping (I think) [15:21] ericsnow: awesome. Any idea if that bug is going to be a major problem for you guys? [15:22] natefinch: I imagine so but wwitzel3 may have a better idea of it [15:28] natefinch:I don't know much about cloud-init .. can we send it some a set of custom commands to run for us? [15:29] natefinch: we either need the sshd_config on the instance to remove the line expecting ed25519 keys or we need to issue a ssh-keygen to create the expected key file. [15:39] man I wish we used github for releases.... trying to figure out what code is in what release in launchpad is a huge pain [15:40] natefinch: tags dont have the same name as releases in lp? [15:41] perrito666: ha, I missed we had tags on juju... it's a separate tab in the list of branches. Thanks for that. === kadams54 is now known as kadams54-away === kadams54-away is now known as kadams54 [16:29] fwereade: PTAL: http://reviews.vapour.ws/r/663/ [16:30] fwereade: that drops the AZ from unit-get and adds it as an env variable [16:30] fwereade: let me know if that's what you had in mind [16:34] ericsnow, catching up wit hthe related code: I'm wondering (1) why we set context.availabilityZone in factory.updateContext, especially given the doc comment on updateContext and (2) why there aren't any tests for it in factory_test.go? [16:36] fwereade: I'll take a look [16:38] fwereade: FWIW, a lot of what I did for availability zones was following the precedent of what I considered to be the similar fields [16:40] ericsnow, hmm, what are the untested factory/context bits? I made an effort to fix them -- not saying I *did* catch them all, but I'd like to know so I can fix them ;) [16:42] fwereade: oh, I probably just missed addin the AZ-related tests [16:42] fwereade: if I see something missing I'll let you know :) [16:42] gah, trunk is currently closed? [16:42] the room's title lies! [16:42] ericsnow, I don't want to claim particular superiority of the existing code, though: in particular all the tests that directly construct a *HookContext are Bad Tests, and they're only like that because I was [focused on delivering business value elsewhere|too damn lazy] (strike out whichever does not apply) [16:43] we really should modify mup to report CI status. [16:43] katco: the only thing that never lies is this page: https://bugs.launchpad.net/juju-core/+bugs?field.status%3Alist=TRIAGED&field.status%3Alist=INPROGRESS&field.importance%3Alist=CRITICAL&field.tag=ci+regression+&field.tags_combinator=ALL [16:43] ericsnow, (context: I added Factory and moved NewHookContext into export_test.go, but didn't fix all the tests that used it) [16:43] * natefinch has that labeled in his bookmarks toolbar as "CI Blockers" [16:44] it seems like this question comes up CONSTANTLY and from different people [16:44] it's clearly something we need to make more clear somehow [16:44] katco: yeah, I don't know enough about launchpad to know if there's some way to create a status page or something [16:45] katco: rather than a random filter of existing bugs [16:45] natefinch: there is; i wrote some emacs lisp to interface w/ launchpad [16:45] natefinch: ah, yeah that's probably what it would be: does this query return anything? blocked. no? not blocked. [16:45] fwereade: got it [16:45] and then have mup periodically do that [16:45] and change the room topic [16:46] and maybe announce it [16:46] ~1m or so [16:46] please no announcements every minute :) [16:46] no i mean if it changes haha [16:47] "CI IS OPEN AND ALL IS WELL! HEAR YE HEAR YE!" [16:48] haha [16:48] ^ this guy gets it! [16:48] it would be nice, but not really sufficient, I think. It's easy to miss stuff in IRC, and you don't see the room title except on login.... I'm much rather have a webpage with a URL I can at least attempt to remember that I can point people to. [16:49] natefinch: +1; but fwiw, the topic is always up for me and i can set notifications on keywords/people [16:49] natefinch: if you use xchat you see the room title all the time [16:50] natefinch: but i website would be a great first step. i don't even care if it's blank and the background changes from #00FF00 to #FF0000 [16:50] Oh, yeah, there it is at the top of the screen... [16:50] and it should probably be hooked into reports.vapour or w/e [16:50] it's just like a foot above where I usually look on my IRC window [16:50] hah... [16:51] natefinch: you remember the question I asked you about netmasks the other day? How to work out the last ip in a subnet. [16:51] natefinch: I was taking the number of zeros in the netmask, then OR'ing (2 ** numZeros -1) with the first IP [16:52] natefinch: which works [16:52] natefinch: but instead you can do 1 << numZeros [16:52] natefinch: which gives you the number of IPs in the subnet [16:52] voidspace: ahh, bit twiddling [16:52] natefinch: and is a bit more elegant (then add that to the first IP) [16:52] natefinch: yeah, fun [16:52] voidspace: evidently useful for more than CS101 and job interviews ;) [16:53] hehe [16:53] and crypto [16:54] voidspace: well, I guess... though if you're writing your own bit twiddling code for crypto, you're probably doing it wrong. [16:54] natefinch: yeah probably [16:55] voidspace: but I get your meaning. Certainly bit twiddling is useful in many circumstances... I was mostly joking :) [16:55] natefinch: :-) [17:05] ericsnow, hey [17:05] dimitern: what's up? [17:06] ericsnow, mattyw asked a question today about which repos are handled by the automatic RB diff creation from PRs [17:06] dimitern: currently just core and utils [17:06] ericsnow, and now that I think of it - I wondered as well [17:07] ericsnow, ah, ok, thanks [17:07] dimitern: It's been on my todo list to add the rest of the ones that RB knows about [17:07] ericsnow, and where is the bot/script that does that live? [17:07] s/live// [17:08] ericsnow, iirc it's running on some ec2 instance [17:08] dimitern: it's a github webhook (pointing to a RB URL) [17:08] dimitern: oh, the GH bot? I don't know [17:08] ericsnow, ah right [17:09] ericsnow, so each repo has a webhook configured? [17:09] ericsnow, each := juju and utils I mean [17:09] dimitern: exactly [17:10] ericsnow, ok,thanks [17:10] dimitern: np [17:14] wwitzel3, you around? === kadams54 is now known as kadams54-away [17:19] alexisb: o/ [17:20] alexisb: hey, alexis - just saying hi [17:20] alexisb: happy christmas and see you next year :-) [17:20] alexisb: yes, the bug ended up not being a blocker for gce [17:21] wwitzel3: and you Wayne - I'm signing off for the year shortly... have a good holiday [17:22] voidspace: ahh, have a great one! :) [17:22] voidspace: happy new year! [17:22] katco: thanks :-) Have a good break and see you on the other side. [17:23] hey there voidspace we havent chatted in like forever [17:23] voidspace: you too! best to you and your family [17:23] you must not be on my calendar [17:23] howdy and happy holidays to you! [17:23] alexisb: I don't think I am... [17:23] alexisb: we can sort that out next year :-) [17:23] voidspace, I will fix that [17:23] coolio [17:41] mgosuites are a torture [18:03] so anyone have any pointers on why the /var/lib/juju folder and nonce.txt would not be being created? [18:04] when bootstrapping [18:11] anybody seen these types of errors out of go 1.4 [18:11] $ go get -u -v github.com/docker/swarm/... [18:11] package github.com/docker/swarm: /home/kapil/src/github.com/docker/swarm is from https://github.com/docker/swarm/, should be from https://github.com/docker/swarm [18:23] right, EOY [18:23] bye all [18:24] have a great holiday and new year, and see you there... [18:34] ericsnow: I'm going to grab some food, but I'll leave my session up [18:34] wwitzel3: k [18:34] wwitzel3: ping me when you're back [18:35] perrito666: so it looks like restore has to be all committed by Jan 9... [18:36] ericsnow: so it seems [18:36] perrito666: or we'll need to disable backups in 1.22 like we did in 1.21 [18:36] we are getting there if CI has enough non locked time [18:37] perrito666: :) [18:38] if anyone has a few minutes to spare, I could use a review on http://reviews.vapour.ws/r/659/ [18:39] it's a basically a copy of utils/set.Strings with s/String/Int/ :) [18:39] ericsnow: ship it! [18:39] natefinch: thanks [18:39] ericsnow: I had looked at it before but forgot to hit the ship it button [19:07] ericsnow: back [19:08] brt [19:13] during bootstrap what is responsible for creating the /var/lib/juju on the instance? [19:14] right now, during ssh, when we login, there is no /var/lib/juju/nonce.txt file [19:14] in fact, there is no /var/lib/juju folder at all [19:14] the finish bootstrap command expects that nonce.txt file [19:14] (on GCE) [19:14] so that is why GCE is failing atm [19:22] wwitzel3: cloud init makes it === kadams54-away is now known as kadams54 [20:16] is there a good, clear example of using suites to set up a full juju stack? i'm getting nil-reference exceptions and it's not exactly clear to me why [20:19] katco: mm? [20:19] There's the dummy provider, but proceed with caution... there be dragons [20:19] perrito666: so you know how we chain suites in tests? [20:19] katco: I have no clue [20:20] perrito666: i am probably wording it poorly [20:20] embed this suite in that suite etc etc [20:20] yeah [20:31] natefinch: I might have let you a review in priv [20:33] katco: I think it's the JujuConnSuite [20:33] natefinch: what is? [20:34] katco: the suite that gives you a bootstrapped env [20:34] katco: /home/nate/src/github.com/juju/juju/juju/testing/conn.go [20:34] natefinch: ah ironically that's where the panic happens :p [20:34] hello folks [20:34] thumper: morning mate [20:34] katco: what is your error? [20:34] natefinch: it's the confluence of using that package with another suite i think [20:34] thumper: http://reviews.vapour.ws/r/627/ [20:34] natefinch: btw, if we have blocking bugs the topic might benefit from that info [20:34] thumper: and follow up branch http://reviews.vapour.ws/r/657/ [20:35] waigani: I answered one question on one of your reviews [20:35] natefinch: is anyone on your team dealing with https://bugs.launchpad.net/juju-core/+bug/1396099 ? [20:35] Bug #1396099: AWS/Joyent/HP/manual/maas: juju deploy error "connection is shut down" [20:35] I don't always trust the assignee === kadams54_ is now known as kadams54-away [20:35] thumper: btw, thanks for taking over yesterday [20:36] perrito666: which review? [20:36] perrito666: got it [20:36] http://reviews.vapour.ws/r/645/diff/#http://reviews.vapour.ws/r/645/diff/# [20:36] perrito666: that's fine [20:36] waigani: that, but only once [20:37] * perrito666 lends his computer a second to his mother in law to look up the recipe of a sweet chrismas bread and is profusely ... criticized... for having the kb layout US [20:38] hehe [20:39] perrito666: where are the tests for the Restore func? [20:40] waigani: not yet pushed, sadly when I click fixed it sumits immediately instead of remaining in my draft [20:40] waigani: but you added a ? [20:40] which is not very clear [20:41] perrito666: sorry about that , I dropped it [20:42] thumper: so i'm trying to use the cmd/jujud/agent/testing/agent.go suite, and if i use that alone i get errors with various workers trying to operate on my host's real fs [20:42] thumper: and then the test just keeps trying to dial the state server which apparently didn't come up [20:43] thumper: so i tried to bring in juju/juju/testing, and that gives me a nil reference exception (hold) [20:43] katco: do you have the code handy? [20:43] thumper: do you feel like doing a peer coding session? [20:44] could do... === kadams54-away is now known as kadams54_ [20:44] thumper: it's not pushed up anywhere yet [20:44] ok [20:44] waigani: no problem I was curious if that actually was a way to say wtf === kadams54_ is now known as kadams54-away [20:57] ericsnow: my mind is a bit clouded, the cool upload is already landed right? [20:57] perrito666: right [20:59] tx man [21:02] wallyworld: ping? [21:03] ah wallyworld ping me with relative low priority when you have a moment [21:18] where does nonce.txt get written to the new instance during bootstrap? [21:18] ericsnow: it gets added to the stuff cloud init does [21:18] natefinch: but where? [21:19] ericsnow: github.com/juju/juju/environs/cloudinit/cloudinit_ubuntu.go#105 [21:20] natefinch: that doesn't write anything to the new host though, right? [21:21] natefinch: doesn't that happen in cloudinit/sshinit/configure.go? [21:21] ericsnow: yeah, it adds a line to cloud init... github.com/juju/juju/cloudinit/options.go#374 [21:21] natefinch: right [21:30] hey people heads up https://github.com/blog/1938-vulnerability-announced-update-your-git-clients [21:38] natefinch: is there an initial /var/lib/juju/nonce.txt included in the cloud images? [21:39] ericsnow: no clue [21:39] natefinch: :( [21:53] perrito666: hi [21:53] wallyworld: hi [21:53] wallyworld: ill privmsg you [22:04] hmmm.... I think I've found another upgrade regression that's unrelated to what I'm looking at [22:07] menn0: hi [22:08] wallyworld: hi [22:08] wallyworld: i think I figured out what i was going to ask you [22:08] ask away [22:08] wallyworld: but i'm about to ask you to review a fix to one of the CI blockers [22:08] sure [22:09] wallyworld: it's a bit of a reorg of the apiworker in the machine agent [22:09] * katco looks at menn0 nervously [22:09] is this the upgrade bug with uuid? [22:09] yep [22:10] menn0, travel approved [22:10] wallyworld: but that change has exposed a "bug" in the machine agent [22:10] alexisb: thanks [22:10] wallyworld: the container setup code was running during upgrades [22:10] menn0: so we are roomies, thumper will be jealous [22:11] ah, container setup code should run after upgrades shouldn't it [22:11] thumper has nothing to worry about. you're all his :-p [22:11] \o/ [22:11] wallyworld: yep [22:11] wallyworld: give me a few minutes and you can see what i've done. it's not completely straightforward to fix this. [22:12] i guess till now the container setup didn't really matter when it ran so we got away with it [22:12] wallyworld: well kinda, [22:12] wallyworld: there was always a lurking bug there [22:13] yeah [22:13] i men it was luck nothing broke till now [22:13] mean [22:13] wallyworld: if the machines collection migration happened at about the same time as the container setup it could have blown up anyway [22:13] wallyworld: it's just much more likely now [22:14] menn0: one thing also that needs doing is to delay the api worker start up until after the first address change has come in [22:14] to allow the server cert to be regenerated [22:14] with the machine ip addresses [22:15] thus avoiding a restart after any clients that manage to connect realy, really quickly [22:15] wallyworld: yeah, the other CI problem [22:15] oh, they raised a blocker for that? [22:15] wtf [22:15] sigh [22:15] that will not be an issue in practice [22:16] i wish blockers we added to the juju-dev topic [22:16] so we could see them [22:30] wallyworld: they are but it seems to be a manual process [22:30] wallyworld: a bot should do it [22:30] yes [22:32] wallyworld: here's that fix https://github.com/juju/juju/pull/1343 [22:33] menn0: ta, in ameeting will look soon [22:34] thumper: can you look at https://github.com/juju/juju/pull/1343 pls? wallyworld is probably best placed to review this but he's in a meeting but I'd like to get this CI blocker fixed. [22:37] menn0: ack [22:41] menn0: Ship It! [22:41] thumper: ta [22:50] when did juju rename its ssh keys to juju_id_rsa? [22:50] or has it always been like that [22:52] menn0: change looks good, thanks for fixing [22:52] wallyworld: great [22:52] stokachu: i'm not 100% sure, i thought it was always like that [22:52] wallyworld, thumper: thanks for the reviews [22:52] wallyworld: ok cool [22:53] wallyworld: this API server restart issue is affecting my personal test scripts too :) [22:53] wallyworld: it'll probably bite anyone who has scripted deployments [22:54] i'm about to relocate, will look at it as soon as i'm online again ain about 20 mins [22:55] i wonder, if using JUJU_HOME causes the sshkeys to be renamed [22:55] * thumper sighs [22:55] I need to take the dog to the vet [22:55] bbl [22:56] though i can't find anywhere in the code where juju_id_rsa is referenced [23:24] mm, making a function take a variadic argument does not imply I get a slice of that type right? [23:26] perrito666: the argument is used as a slice, but you are not guaranteed its length is > 0 [23:27] but, is it a slice? [23:27] yes [23:27] well... you mean like down at the AST level? [23:27] like if you used reflection would it be a slice? [23:27] katco: like I want to append it to a [][]string :) [23:27] katco: exactly [23:27] oh, yep. it's a slice :) [23:28] so I have a slice of string slices and I append the variadic arg to it (tests) [23:28] but when I try to DeepEqual each of the slices I get an error about the capacity [23:29] i am not sure how go constructs the capacity under the covers for variadic parameters [23:29] my guess is that it matches the number of parameters passed in exactly? [23:30] but it's almost certainly constant, so i'd allocate your test values with whatever capacity it says go has provided [23:38] katco: It seems that it expects the subslices to have been properly allocated [23:38] so I have to make them and then fill them with the contents of the variadic arg [23:38] perrito666: deep equals? [23:38] it does not support dirrect assignation [23:38] katco: yup [23:38] i guess that makes sense [23:39] I would expect go to have that kind of information for those slices [23:43] katco: https://github.com/juju/juju/pull/1326/files#diff-32f2baace5b89ccd33a7a5a4c0619b3bR67 [23:43] I end up having to do that [23:46] If anybody want to take a second look at http://reviews.vapour.ws/r/645/ Ill be thankful [23:47] perrito666: that code makes send to me. what is unintuitive about it? [23:47] katco: well it makes a string slice and copies a string slice [23:48] but I am most likely de-referencing some pointers there which might be what was breaking my code [23:49] oh, so if you just do a mgoArgs = append(mgoArgs, mongoRestoreArgs...) it complains? [23:49] I have not tried, if you look closely I dont want to append the elements of mongoRestoreArgs but the slice itself [23:50] perrito666: oh i think i see what you meant [23:51] perrito666: the variadic slice didn't have the correct capacity [23:51] katco: exactly, which is odd [23:51] perrito666: i think there is a way to size it down... hm [23:51] katco: well what I end up doing is clear so I will leave it that way [23:53] perrito666: maybe make the new slice, and then do a copy? [23:53] to elide the for loop? [23:54] I somehow fear copy will blow in a similar way? [23:54] perrito666: http://stackoverflow.com/questions/12768744/re-slicing-slices-in-golang [23:54] perrito666: well if it's truly the capacity that's erroring out deepequals, as long as your destination slice is the correct capacity i think it should be fine [23:55] looks like even better is this: mongoRestoreArgs = mongoRestoreArgs[0:] [23:55] katco: I might give it a try [23:55] * perrito666 tries [23:55] perrito666: should give you a slice ref with the correct cap. if not, try mongoRestoreArgs[0:len(mongoRestoreArgs)] [23:55] with a nice comment :) [23:57] katco: blows [23:57] perrito666: both? [23:57] katco: yup (I used a new variable because I find reassignation is a bit ugly)