[00:22] wallyworld: yeah, i plan to day [00:23] just waiting for a smoke test build to pass [00:23] davecheney: thumper and i had a couple of fixes [00:23] is it too late? [00:24] wallyworld: it is not too late [00:24] remember my tagging score card [00:24] 0 for 2 [00:25] davecheney: it would be great if you could +1 this then. you're the only person besides thumper who is around to ask sorry. https://codereview.appspot.com/11493043/ [00:25] no wories [00:26] thanks. do you have a link to the release notes? [00:26] https://docs.google.com/a/canonical.com/document/d/1ZBV6m0D1cfJQGoHW7EzJEc2qZeJFR38teHM_4OiYkeM/edit# [00:26] i'll add some shit [00:26] capital! [00:29] environs/azure isn't in environs/all [00:29] but it is leaking into jujud somewhere [00:29] i need to fix that quickly [00:52] davecheney: in my trunk, azure is in all [00:52] yeah, for some reason I couldn't see it last night [00:52] i'll have to submit a branch to remove it [00:52] +1 trivial from me [00:53] but I'll do it when you have it up [00:53] wallyworld: LGTM on your branch [00:53] ooops, sam is up [00:53] afk for a few mins [00:53] davecheney: thanks, will land [01:09] thumper: should i flip that mp to approved? iiuc there's a release about to be cut so maybe wait until after that? [01:11] https://codereview.appspot.com/11557043 [01:12] wallyworld: thumper could I give some love for this change ? [01:12] sure, looking [01:12] sidnei: sure, check it in [01:13] davecheney: oh, apparently i don't have perms. do the honors? https://code.launchpad.net/~sidnei/juju-core/mongo-2.4-compat/+merge/175653 [01:13] sidnei: screw that, i'll give you perms [01:13] /o\ [01:13] ok :) [01:14] try now [01:14] done [01:15] or. slow launchpad is slow. [01:15] now for realz. [01:17] davecheney: I had a patch from agl to handle tls renegs... can you take a look please? [01:17] bigjools: FUCKING A [01:17] you can say that if it works [01:18] check your email [01:18] bigjools: please send [01:18] got it [01:18] bigjools: speaking of azure signup [01:19] did some dude try to call your to welcome you to the azure failmy [01:19] family [01:19] using what number? *cough* [01:20] wallyworld: hangout? [01:20] i had to give a mobile number so they send me the activation code [01:20] ok [01:20] i guess that is how they got my phone number [01:20] i did use my @canonical email [01:20] maybe this is VIP treatment [01:25] ummmm maybe I did that [01:25] never got called [01:25] I can't remember what I did yesterday let alone 2 months ago [01:25] good point [01:26] bigjools: patch didn't apply cleanly, but i'll hack it in manually [01:26] ok [01:26] that's interesting [01:26] what is he hacking on... [01:27] thumper: url? [01:27] https://plus.google.com/hangouts/_/e105bdb56dcdc0f23d38c9cdafc18eea12967111?hl=en [01:27] no idea, the tls package hasn't received any updates for a looooong time [01:28] I do know that google have their own internal version of Go [01:28] but this isn't a fork [01:28] it's just the way they managage external source into the big mega google/ repo [01:28] so possibly he was working from that copy [01:28] anyhoo [01:30] davecheney: notice he said it's not suitable for upstream, so we still have a problem [01:30] yup [01:31] if they are that against it, fixing go-curl is the only realistic option [01:31] sidnei: thouest forget the commit message [01:31] bigjools: one elephant at a time [02:21] ok, tagging time [02:27] davecheney: did you get it working? [02:29] bigjools: not yet [02:29] it's next on my list after tagging 1.11.3 [02:29] kk [02:29] davecheney: after a spurious bit failure, my last branch for release finally merged [02:29] wallyworld: what is the push location for the _real_ trunk now ? [02:29] s/it/bot [02:30] ~gobot/juju-core of something i think [02:30] lucky(~/devel/juju-core) % bzr push lp:~gobot/juju-core [02:30] bzr: ERROR: Invalid url supplied to transport: "lp:~gobot/juju-core": No such person or team: gobot [02:31] ahh, hyphen [02:31] go-ot sorry [02:31] i think you also need to add credentials [02:31] not sure [02:31] lucky(~/devel/juju-core) % bzr push lp:~go-bot/juju-core/trunk [02:31] there's an email somewhere perhaps, i'll try and find it [02:31] bzr: ERROR: Cannot lock LockDir(chroot-76960720:///~go-bot/juju-core/trunk/.bzr/branch/lock): Transport operation not possible: readonly transport [02:32] yeah, you need credentials [02:32] jam said that my key shuld be there [02:32] tey fook [02:32] no, the push needs the correct form of the url [02:32] bzr+ssh://.... or something i think [02:33] lucky(~/devel/juju-core) % bzr push bzr+ssh://bazaar.launchpad.net/~go-bot/juju-core/trunk [02:33] bzr: ERROR: Cannot lock LockDir(chroot-92566480:///~go-bot/juju-core/trunk/.bzr/branch/lock): Transport operation not possible: readonly transport [02:34] davecheney: bzr+ssh://go-bot@bazaar.launchpad.net/~go-bot/juju-core/trunk [02:34] i think [02:34] lp:juju-core should work [02:34] bigjools: it's protected [02:35] oh you need to use go-butt [02:35] bigjools: is that like a ro-butt ? [02:35] you could bzr lp-login first then [02:36] then push to lp:juju-core [02:36] i didn't know you could do that [02:36] lucky(~/devel/juju-core) % bzr lp-login [02:36] dave-cheney [02:36] lucky(~/devel/juju-core) % bzr push bzr+ssh://bazaar.launchpad.net/~go-bot/juju-core/trunk [02:36] bzr: ERROR: Cannot lock LockDir(chroot-67593168:///~go-bot/juju-core/trunk/.bzr/branch/lock): Transport operation not possible: readonly transport [02:36] lucky(~/devel/juju-core) % bzr push lp:juju-core [02:36] bzr: ERROR: Cannot lock LockDir(chroot-92087248:///%2Bbranch/juju-core/.bzr/branch/lock): Transport operation not possible: readonly transport [02:36] oh, do I have to lp-login as go-but ? [02:36] davecheney: did you try my suggestion above? [02:37] add the go-bot@ [02:37] wallyworld: yes i tried your suggestion [02:37] oh, not that one [02:37] << tit [02:38] lucky(~/devel/juju-core) % bzr push bzr+ssh://go-bot@bazaar.launchpad.net/~go-bot/juju-core/trunk [02:38] Permission denied (publickey). [02:38] very close now [02:38] ConnectionReset reading response for 'BzrDir.open_2.1', retrying [02:38] key isn't actually there [02:38] oh, your key is not registered? [02:39] no, it's not, i just checked [02:40] wallyworld: i don't know how to answer that [02:40] nothing did not work before [02:40] davecheney: see https://launchpad.net/~go-bot [02:40] i can't see your key [02:42] ok, i'll fix it [02:42] ok, how do I fix it ? [02:42] ie, nothing didn't work before hand [02:43] looks like I need to login as go-bot and rummage around [02:57] i don't think i have perms to add your key [02:58] maybe ask in -ops [02:58] davecheney: If I use go.build package, build.ImportPath(".", 0) and use that in a test, will it always import the directory I think it should? [02:59] thumper: not 100% sure [02:59] never tried that [02:59] by rights it shuld use whatever the value of $(pwd) is [02:59] but that can sometimes be surprising [03:03] hmm... [03:03] I want to write a test to make sure people don't mistakenly expand the dependencies of a package [03:04] so I want to get the list of imports for the current package [03:04] it seems to work [03:04] but the "." makes me hessitant [03:04] also annoying if I want to put the test in a helper package [03:04] in python there is __file__ [03:04] anything like that in go? [03:06] what about this [03:07] go list -f '{{ .Imports }}' $PKG [03:08] thumper: you want build.ImportPath("launchpad.net/juju-core/cmd/juju", 0) or something [03:09] that doesn't work [03:09] &os.PathError{Op:"open", Path:"launchpad.net/juju-core/agent", Err:0x2} ("open launchpad.net/juju-core/agent: no such file or directory") [03:10] ok [03:10] you are right to be concerned about ".' [03:11] it seems to work, but I'm not convinced it is always going to work [03:12] rightly so [03:13] hangon, [03:13] think I have it [03:39] davecheney: [03:39] func (*deps) TestPackageDependencies(c *gc.C) { [03:39] c.Assert( [03:39] FindJujuCoreImports(c, "launchpad.net/juju-core/agent"), [03:39] gc.DeepEquals, [03:39] []string{"log", "state", "version"}) [03:39] } [03:39] whaddaya think? [03:43] * thumper proposes [04:26] wallyworld: is that cost and constraint evaluation done in the core code or each provider? [04:27] (as opposed to defining the structs I mean) [04:27] umm. core code i think, let me check [04:27] that's cool if so [04:28] yeah, core code [04:28] there's glue in each provider to plug in the data [04:28] and the core code does the rest [04:28] so the common logic is all outside the providers [04:29] nice [04:29] so we just define the structs and Robert's your father's brother [04:30] that's the idea more or less [04:31] the openstack and ec2 implementations give some idea of what to do [04:37] yeah [04:37] wallyworld: oh, does cpupower have any meaning outside of ec2? [04:38] bigjools: nope [04:39] wallyworld: what should it be set to, anything? [04:39] as long as it's the same for each type I guess [04:39] no, don't set it. it will be ignored [04:39] ah ok [05:22] I used a ¢ symbol in a comment and now I get this "illegal UTF-8 encoding" compile error, what do I do? === tasdomas_afk is now known as tasdomas [06:03] bigjools: delete that character ? [06:03] did your editor put a BOM at the top fo the file ? [06:03] not that I can tell [06:04] like this ? http://play.golang.org/p/MHE29ZM7Ln [06:04] it was working ok until I did a go fmt [06:04] oooh interesting [06:04] is the \c all mangled as well ? [06:05] it now looks like: ¢ [06:05] if I remove the  Go bitches at me [06:07] bigjools: can you provoke the problem in another editor ? [06:07] * bigjools tries again [06:07] or can you send me the file pre fmt ? [06:09] I used vim to make a basic file [06:09] it works [06:09] post format it works [06:10] ¢ looks like latin1 encoding interpreted as utf-8 [06:10] except it added that  [06:11] hmmm vim adds it later [06:11] with no fmt [06:11] * bigjools sighs heavily [06:14] dimitern: the other way around, no? [06:14] jtv: ah, yes right [06:15] if it was interpreted as utf there will be things inside [07:02] fwereade_: ping [07:04] rogpeppe: hey [07:04] dimitern: yo! [07:05] rogpeppe: so about SetAgentAlive again [07:05] dimitern: okay... [07:06] rogpeppe: did we make a way to expose the same functionality through the api connection? [07:06] rogpeppe: i think that was what we decided last [07:06] dimitern: i can't entirely remember if we decided to make it explicit or implicit [07:06] dimitern: both have advantages and disadvantages [07:06] rogpeppe: it was something like auto start a pinger on connect [07:07] dimitern: yeah, that's the implicit approach [07:07] dimitern: how do we currently deal with the machine nonce? [07:08] rogpeppe: we set it at provisioning time and can verify it in the agent [07:08] dimitern: yeah, but do we deal with it in the API at all yet? [07:08] rogpeppe: i don't think so, let me check [07:09] rogpeppe: no [07:09] rogpeppe: why do we need that? [07:10] rogpeppe: i was thinking to do the api migration in steps [07:10] rogpeppe: machiner doesn't need it, only the MA and provisioner [07:10] dimitern: that's the thing that ensures we don't have two machine agents connected at the same time [07:10] rogpeppe: how so? [07:11] dimitern: isn't that the whole reason for the nonce? [07:12] dimitern: my suggestion is that we integrate the nonce with the login request, and start the pinger then. [07:12] rogpeppe: expand please [07:13] dimitern: currently when you log in, you provide an entity tag and password, right? [07:13] rogpeppe: right [07:14] dimitern: so it's only at that stage that the API server knows who's at the other end [07:14] dimitern: we need to know who's at the other end if we're to start a pinger [07:15] dimitern: but we don't want to start a pinger if an agent comes up with the wrong nonce (because then they'll immediately go away again) [07:15] dimitern: so we could add the nonce to the login request, so the login will fail without the correct nonce. [07:15] dimitern: (for machine agents only) [07:15] morning [07:16] dimitern: and then start the pinger on a successful login request (and stop it when the connection goes away) [07:16] rogpeppe: hmm [07:16] rogpeppe: so add an optional field to apiinfo? [07:17] rogpeppe: that's alright, but how can juju status get this to show it (i.e. AgentAlive) ? [07:17] dimitern: we can still have an API AgentAlive call [07:18] rogpeppe: but for the CLI/clients only [07:18] dimitern: probably, yes [07:18] rogpeppe: and assuming we can start the same pinger out-of-order to get the already running one, should be fine [07:19] dimitern: i don't understand that [07:20] rogpeppe: i mean the pinger we start on MA login will be running, and then AgentAlive tries to find whether it's running, but it will be in another process perhaps [07:20] dimitern: that's always been the case [07:20] rogpeppe: so it's fine then [07:20] dimitern: yeah [07:21] dimitern: you don't even need a Pinger to ask if something's alive [07:22] rogpeppe: ok then, so the machiner can be simplified a bit [07:23] rogpeppe: sorry, the MA actually [07:23] dimitern: yeah [07:23] rogpeppe: no need to call CheckProvisioned - if it's not we won't be able to connect [07:23] dimitern: yup [07:23] rogpeppe: but the error was to be a different one from "login failed", so it won't retry [07:24] dimitern: good point, yes [07:25] rogpeppe: I think I can see it now, the CL I mean to enable all that [07:25] dimitern: great [07:25] rogpeppe: if you don't mind I'll try it out and propose it for discussion [07:26] dimitern: sounds good [07:26] rogpeppe: as a first step towards api -> machiner [07:29] dimitern: i'm not sure there's any problem with LongWait being 10 seconds BTW [07:29] dimitern: can you think of one? [07:29] rogpeppe: why so long? [07:29] rogpeppe: why not 1h then? :) [07:30] dimitern: because some it's important in some cases as a worst case; and it's not long to wait for a failed test. [07:30] dimitern: it will never happen when tests pass, so it doesn't impact anything in the usual case [07:31] rogpeppe: as long it's only a failure timeout then ok [07:31] dimitern: that's the whole point of LongWait, isn't it? [07:31] rogpeppe: i assumed so yes, but in some places it seemed less clear [07:32] dimitern: anywhere it's part of the test-passing path, i'd like to know [07:33] rogpeppe: it should be easy enough to spot - running go test before and after the branch changes a few times and comparing times [07:33] dimitern: yeah, change it to 1 minute and see which tests take longer [07:34] rogpeppe: have you tried that yourself? [07:34] dimitern: no - i'll do it now [07:40] * TheMue hates import cycles [07:51] dimitern: useful test - thanks for the suggestion. it found two places we were using LongWait inappropriately [07:52] rogpeppe: nice! [07:57] dimitern: make that three places - there were two related tests in state that were doing it wrong [07:58] rogpeppe: i'm glad i asked you to do this test then [07:58] dimitern: me too [08:02] Hi guys, would someone be available for a second review of https://codereview.appspot.com/11524044/ ? [08:06] rvba: I'll take a look, but it doesn't seem there is a first review there [08:08] dimitern: yeah, the first review is here https://codereview.appspot.com/11433044/. You know, I've got this problem that each time I run lbox propose it creates another MP. [08:08] dimitern: thanks! [08:09] rvba: you could probably just do bzr push and publish your comments manually from rietveld [08:10] rvba: the only drawback is you won't see the updated diff on rietveld i guess [08:10] dimitern: well, I think I'll live with the creation of an additional MP each time I run "lbox propose" :) [08:14] rvba: if you rebuild lbox from tip using go 1.1.2 (or whatever the latest release is) I think you should be fine - that's what I did and it works. [08:14] dimitern: okay, I'll try that. [08:15] rvba: but just in case it doesn't better keep a copy of the lbox binary around as backup [08:15] sure [08:15] rvba: and also set a separate $GOPATH for go 1.1.2, so it'll be easier to switch back to your current go version [08:16] Well, because of the go-curl problem I'm sort of stuck with the old version for now. [08:16] rvba: ah.. [08:17] rvba: reviewed [08:17] ta === racedo` is now known as racedo [09:11] mornin' [09:11] mgz: moin [09:13] rogpeppe: so far so good - changed OpenAPIAs and all relevent code to use machine nonce for MA logins, tests pass, only need to add the pinger and will propose [09:15] dimitern: cool [09:21] dimitern: final look? https://codereview.appspot.com/11529043/ [09:22] rogpeppe: sure [09:22] * TheMue removed cyclic dependency, but our packages are sometime really tight entangled [09:24] * fwereade_ out for breakfast, bbiab [09:28] TheMue: what was the issue in this case? [09:31] rogpeppe: reviewed [09:32] rogpeppe: I'm trying to extract the sync tools so that they are better reusable [09:33] rogpeppe: and my wish has been environs/tools [09:34] rogpeppe: sadly I'm using the EC2 HTTP storage reader, and environs/ec2 uses environs/tools. *rofl* [09:37] dimitern: replied [09:37] dimitern: (and i'm going to submit it) [09:40] dimitern: did jam tell you anything about his plans around upgrade worker (in particular the role of his upgrade handler stuff)? [09:45] fwereade_: ping [09:55] rogpeppe: not much that I remember off hand [09:56] dimitern: ok, i'll figure it out [09:56] rogpeppe: it model is similar to notifyWorker [09:56] dimitern: hmm, that seems a bit odd - isn't notifyWorker structured like that because it's generic? [09:57] dimitern: i think the upgrader is the same in every case, isn't it? [09:57] dimitern: i'm a bit concerned that we're taking time to refactor significant logic that really doesn't need refactoring at this stage [09:58] dimitern: when i tried to estimate the time to implement the API, i did not include time to refactor all the client code too [09:59] rogpeppe: i'm not sure, but i think the idea is to refactor the upgrader so it can use the notifyworker, like the machiner [10:00] dimitern: that's a different thing. [10:01] rogpeppe, pong [10:02] fwereade_: i was just wondering why params.AgentTools flattens out all the version fields [10:02] rogpeppe, I'd be +1 on changing that [10:03] I asked jam, he said it basically seemed like a good idea at the time but didn't have strong feelings either way [10:03] dimitern, rogpeppe: fwiw I am -1 on spending time making upgrader a handler-style worker [10:03] fwereade_: it seems like an embedded version.BinaryVersion would be more obvious [10:03] dimitern, rogpeppe: it's got 2 things to handle [10:03] fwereade_: me too [10:04] fwereade_: i'm not even sure it'll work well using NotifyWorker [10:04] rogpeppe, ah, we're on go 1.1 now, so marshalling embedded fields works properly, right [10:04] rogpeppe, hey wait [10:04] fwereade_: no we're not [10:04] rogpeppe, Upgrader... *is* that watching 2 things? oh yes it does the download stuff [10:04] rogpeppe, we're not? [10:05] rogpeppe, I thought we had everything building with go1.1 [10:05] fwereade_: oh, have i missed something? [10:05] rogpeppe, I *think* so [10:05] fwereade_: is 1.1 backported to precise etc? [10:05] rogpeppe, I suspect mgz will be able to give you the precise (heh) details [10:06] fwereade_: i thought tarmac was still running 1.0.2 [10:06] fwereade_: if we're really 1.1 throughout, then \o/ [10:06] rogpeppe, honestly for the tools I'd just as soon send a string [10:06] fwereade_: don't we need the version *and* the URL? [10:07] rogpeppe, sure, sorry, I mean send the version as a string [10:07] fwereade_: ah, you mean for AgentTools [10:07] fwereade_: version.BinaryVersion marshals as a string anyway AFAIR [10:07] rogpeppe, ah, that's nice [10:09] fwereade_: ah, i misremembered - it only does that in mongo (it has SetBSON etc methods) [10:09] dimitern, rogpeppe: the other option is to drop the un- or poorly-tested download-cancelling stuff, and then upgrader *would* make a nice NotifyWorker, I think [10:09] fwereade_: it looks like versions aren't currently used in the API yet, so we can still do it for JSON [10:10] rogpeppe, sgtm [10:11] fwereade_: the important thing, for the time being, is that the upgrader waits around to complete an upgrade for a while after it's killed (because of the one-kills-all logic in the machine agent) [10:11] fwereade_: when we fix things so that we can detect non-fatal errors, we can rethink that [10:11] rogpeppe, I thought we weren't doing that on the api side? [10:13] fwereade_: we are currently. we need to have a hard think about error handling before we change that. [10:13] rogpeppe, I don't think it's worth worrying about preserving that, so long as we just defer landing a NotifyWorker version until it's going via the api? [10:13] rogpeppe, isn't the only reason for the delayed stop to work around the one-error-kills-everything behaviour? [10:14] fwereade_: not entirely [10:14] rogpeppe, go on [10:14] fwereade_: even if there is a fatal error due to the state going away, i think we want to allow the upgrade to proceed [10:15] fwereade_: because we're not downloading the tools from the state [10:16] rogpeppe, ok, but that just involves dropping the download-cancelling cleverness, right? if we just do a blocking download and return the error when it's done we're dorted [10:16] fwereade_: i think we should probably leave existing logic as is. i know it's tempting to do drive-by refactoring of everything, but it all eats into the critical path [10:16] fwereade_, rogpeppe: https://codereview.appspot.com/11424044 please take a look [10:16] fwereade_: i'm dorted anyway, dunno about you :-) [10:18] rogpeppe, I think that'd hold more water if upgrader were properly tested... as it is the tested bits needed changing, and if we dropped the untested bits we could also drop the somewhat astonishing delayed stop [10:18] rogpeppe, which is at least tested but appears not to be delivering much value now that it's only there to support the untested bits [10:20] * fwereade_ gets "dorted" now, haha :) [10:21] fwereade_, rogpeppe: ^^ that CL implements the first step towards ditching SetAgentAlive [10:21] fwereade_: the delayed stop isn't just there to support the download cancellation [10:21] fwereade_: it's there to make the system more robust in the face of stupid errors. [10:21] dimitern, cool, I probably won't get to it until later today I'm afraid [10:22] fwereade_: (and the download cancellation doesn't work anyway AFAIR because we don't provide a means the abort the http download) [10:23] rogpeppe, those errors being ones in parallel workers, right? we still depend on the outer agent mechanisms working right [10:23] fwereade_: when you can, np, i have a couple more coming [10:24] fwereade_: well yes, it has to get into the upgrader logic [10:30] rogpeppe, so... if it starts an upgrader correctly (which we depend on anyway) it won't stop the upgrader unless there's an API problem, right? if there *is* an api problem, no upgrades; if not, we'll get the upgrade, and if we ignore the tomb once we've started downloading there's no further need for delay... is there? [10:31] rogpeppe, the only thing that'll kill the upgrader is ErrTerminateAgent, and that's a potential stupid error the delayed stop doesn't defend against anyway [10:33] fwereade_: first: i'm not sure we want to ignore the tomb when we're downloading - a download can take a long time. [10:33] fwereade_: second: it will stop the upgrader if any API-based agent returns any error [10:34] rogpeppe, first, var upgraderKillDelay = 5 * time.Minute [10:34] rogpeppe, second: I thought the whole idea was that it *wouldn't* do that? [10:34] fwereade_: yes - that happens only for the very first download [10:35] fwereade_: in fact it doesn't happen if we come up successfully and the version is as expected [10:35] fwereade_: AFAIR... [10:35] rogpeppe, yeah, we do indeed usually not wait that long [10:36] rogpeppe, but if it's acceptable for the upgrader to block everything for 5 mins while it's figuring stuff out, I think it's also ok for it to wait for the duration of a download [10:37] fwereade_: the 5 minute wait only happens in a very specific circumstance [10:38] rogpeppe, likewise the wait for a download [10:38] fwereade_: it might actually be better structured as a statement outside the main upgrader loop actually [10:39] fwereade_: in my experience, a network download can take lots more than 5 minutes to time out [10:40] rogpeppe, let's try a different approach -- describe a bug that the delayed stop will actually defend against that *isn't* a bug in the mechanism that starts the upgrader anyway [10:41] fwereade_: say there's a compatibility problem in the unit agent, for example - it encounters an "impossible" state and bombs out [10:41] s/unit agent/uniter/ [10:41] rogpeppe, then the uniter waits 5s and gets restarted [10:42] rogpeppe, the upgrader continues to toddle along just fine [10:42] fwereade_: except that as things are *currently*, the upgrader is killed too [10:42] rogpeppe, we have been over that [10:42] fwereade_: because we have no way of distinguishing temporary from permanent errors [10:43] fwereade_: i'd like to fix the upgrader logic *after* we've added that error logic [10:43] rogpeppe, the upgrader is killed too because isFatal always returns true; but when it's an api worker, it won't kill everything [10:43] rogpeppe, we write a simple upgrader, and defer landing it until we can also land it as an api worker instead of a state worker [10:43] fwereade_: we don't use isFatal for the APIWorker runner [10:43] runner := worker.NewRunner(allFatal, moreImportant) [10:44] rogpeppe, WTF [10:44] fwereade_: it has to be like that [10:44] fwereade_: because we can't currently know when an API connection is borked [10:45] fwereade_: we do need to do that, but we haven't done it yet [10:45] rogpeppe, so maybe it's not hooked up yet, because we seem tohave some pathology where we still somehow write reams of code and never integrate it [10:46] fwereade_: what's not hooked up yet? [10:46] rogpeppe, but "does Ping return an error is I think adequate" [10:46] fwereade_: i don't believe it is [10:46] fwereade_: what happens if the underlying state connection that the API server is talking to goes away? [10:47] rogpeppe, then handling that is the api server's job [10:47] fwereade_: yes, and we don't know how to do that yet [10:47] rogpeppe, it is *not* the agent's concern though [10:48] fwereade_: it is currently - that's the only way we can recover from.... oh, in fact we never do recover. we're stuffed. [10:48] rogpeppe, the answer to possible crapness in one component is *not* to smear workarounds across the rest of the system, it's to fix the problematic component [10:49] fwereade_: i agree. but i don't want to remove our current workaround until we've fixed that problematic component [10:50] rogpeppe, using allFatal just makes the agent crap as well [10:52] fwereade_: so... it seems to me that there are two fixes we really need [10:53] fwereade_: 1) add another worker that watches api.State.Broken and exits with a fatal error if it is. [10:53] fwereade_: 2) change the api server so that if it sees an error that implies the mongo state connection is broken, it quits [10:54] rogpeppe, yeah, both good things [10:54] rogpeppe, more important than implementing the API workers how I asked more than a month ago? very much not so [10:55] fwereade_: i wasn't planning on rewriting all the workers to change them to use the API, and i don't think it's necessary [10:56] fwereade_: that's what i'm resisting [10:56] * TheMue smiles [10:56] tests are passing [10:56] rogpeppe, we *have* to rewrite the upgrader, because you implemented it so that we have to send secrets out to every single agent [10:57] rogpeppe, since we're doing it I would prefer to do it right [10:57] fwereade_: we don't have to change anything other than change it to read on a notify channel and read the tools when we get a notification [10:57] fwereade_: that's a fairly trivial change, i think [10:57] fwereade_: definitely not a rewrite [10:58] rogpeppe, apart from all the magic untested complex code in there [10:58] fwereade_: that's a totally orthogonal issue [10:58] fwereade_: it may be the case, but we don't need to fix everything now [10:59] rogpeppe, you know we could have written a NotifyWatcher implementation in the time you've been kicking up dust over this issue? [11:00] fwereade_: but it would probably be wrong in some subtle way. i like not rewriting things when possible. [11:02] rogpeppe, you are free to think I'm wrong about things [11:02] rogpeppe, you are not free to persistently go off and implement WTF you want to in the face of my instructions [11:03] fwereade_: no, i think you have good points, [11:03] fwereade_: and i think we want to go in the same direction [11:06] fwereade_: to get things straight here: your instructions here are to rewrite the upgrader? [11:09] rogpeppe, to (1) do the API runner as originally requested, so rogue tasks don't take the others down; to (2) implement a simpler upgrader as a NotifyWorker against the API, that does a blocking download and returns the appropriate upgrade error when done; and to (3) unhook the state one and activate the machine one, for the machine agent only [11:10] s/the machine one/the api one/ [11:10] fwereade_: ok, thanks [11:19] rogpeppe: ping [11:20] dimitern: pong [11:20] rogpeppe: how about that review? [11:20] ;) [11:20] dimitern: reviewed [11:20] rogpeppe: thanks [11:30] Would any of the Go experts here know how I'd go about debugging this crash? I went through it with gdb but it seems to be happening somewhere pretty deep down in the malloc logic... http://paste.ubuntu.com/5890497/ [11:31] I don't even know why it'd list two goroutines, since as far as I know this code doesn't even use goroutines. [11:34] fwereade_: standup? [11:40] Phew. Go 1.0.2 will produce a proper traceback. The 1.1.1 from the PPA we're using is a dev version, not a released one, right? [12:50] Anyone up for two really tiny reviews? https://codereview.appspot.com/11404044/ / https://codereview.appspot.com/11511043/ [12:59] rvba: will look after lunch [12:59] * rogpeppe goes for lunch [12:59] Thanks. [13:10] * TheMue steps out and continues later [13:28] rogpeppe, is there some sort of cache around the api? [13:29] rogpeppe, i can do actions, like deploy/add_units/add_relation, then do an all watcher immediately after and not get the results back [13:29] ie. the api is behaving as though its eventual consistent. [13:32] hmm.. nevermind maybe user error [13:52] looking for a second review on https://codereview.appspot.com/11424044/ and reviews on https://codereview.appspot.com/11572043/ [13:54] fwereade_: when you can as well ^^ === tasdomas is now known as tasdomas_afk === tasdomas_afk is now known as tasdomas [13:58] hazmat: the events coming from the allwatcher are only delivered when the global polling goroutine gets around to checking for changes [13:59] hazmat: the poll interval is 5s [13:59] hazmat: so that's your worst case delay [13:59] hazmat: although... [13:59] hazmat: the first result should come immediately [14:01] rogpeppe, immediately, but not nesc. reflective of current reality. [14:01] anybody else unable to connect to irc.canonical.com? [14:01] hazmat: no, it should be reflective of current reality [14:01] rogpeppe, because the status api call wasn't fully implemented i did status emulation on the watcher [14:02] using the initial watch response value [14:02] but that doesn't always seem to reflect the last ops done [14:02] running into this against some unit tests === tasdomas is now known as tasdomas_afk [14:05] rogpeppe, here's an example.. http://pastebin.ubuntu.com/ [14:05] whoops [14:05] http://pastebin.ubuntu.com/5890883/ [14:05] i deploy two services, add a relation, add a unit.. then do a status immediately after, it reflects only the new machines, a watcher initial result doesn't even show the machines. [14:06] ideally these should both show the services, units, relations, and machines of previous api ops [14:07] for this simple sync usage on the client, watches here use a separate client connection (env.status is a watch translator due to lack of details in rpc status) [14:11] hazmat: what does wait_for_units do? [14:12] rogpeppe, in this case nothing, because its waiting for units to be in the 'started' state, but it can't see any units, so its basically a no-op here. [14:12] hazmat: does it create a new watcher or use an existing one? [14:12] rogpeppe, normally its a watch against units agent-state to reach started [14:12] rogpeppe, new watcher [14:14] hazmat: it seems odd that the first event isn't producing the current status, because it gets the contents of all collections before producing it [14:14] hazmat: could you show me the whole code? (or just a trace of the API messages would be even better) [14:14] rogpeppe1, hence the question is there some sort of cache behavior? [14:14] rogpeppe1, okay.. [14:14] hazmat: ah... [14:15] hazmat: i've just realised [14:15] hazmat: yes, of course, there *is* some sort of cache behaviour! [14:15] hazmat: because the allwatcher is shared between all clients [14:15] hazmat: and we keep a single copy of all the data [14:15] rogpeppe1, i thought each client had independent pointer into that data [14:16] hazmat: yes, each client does [14:16] hazmat: but the underlying data is a cache that's only updated when new events arrive (by that 5s poll interval) [14:16] hazmat: sorry for the mis-info! [14:16] rogpeppe1, okay.. so i think the critical fix here then for would be for status op to report status instead of just machines. [14:17] hazmat: i think the status op should provide the whole status, yes [14:17] otherwise, there's no way a client can see its mods in th eenv. [14:17] hazmat: and then we can make the status command use that [14:17] hazmat: mgz is working on some stuff to enable that [14:18] mgz, ping :-) [14:18] hazmat: hey [14:20] mgz, rogpeppe1 mentioned you might be working on cli using api? [14:20] hazmat: no, that's not quite it [14:20] well, mostly as a side effect [14:20] specifically i was wondering about status support, since the client api is hard to use/verify ops without status working given 5s delay on watcher. [14:20] hazmat: the work mgz is doing makes it possible to get all the status info over the api, but not the status call itself [14:21] hazmat: if I read the log correctly, your issue is that just using watcher apis you don't get all the status info you could with manual `juju status` calls? [14:21] hazmat: you can get info on individual machines i think [14:21] rogpeppe1, that's not useful when i'm verifying services, units, relations.. [14:22] hazmat: one possibility would be to add a Sync call to the API [14:22] rogpeppe1, that would help.. but i'm not sure its something we want in the api [14:22] hazmat: to sync any watchers [14:23] hazmat: but yeah, i think you may be right [14:23] rogpeppe1, the watchers are already sync'd, its the a dos api against updating the stream. [14:24] hazmat: i don't know how much dos could be done this way that couldn't be done through other api calls. [14:24] which to combat would just introduce the delay elsewhere... what this usage needs is non-cached access. hence status [14:24] hazmat: yeah [14:24] rogpeppe1, well we have fetch service info, no fetch unit info, or machine afaik [14:25] hazmat: ironically my original plan was to do the status API call as the first actual use of the API [14:26] rogpeppe1: https://codereview.appspot.com/11572043/ ? [14:29] dimitern: i don't think NewAgentAPI is the right place to start the pinger [14:29] mgz, no.. the watch apis do give all the status, but their cached, so its eventual consistent behavior, which is causing issues with unit tests, the issue would be solved if the rpc status call actually returned status info, instead of just a partial machine info [14:29] rogpeppe1: why? [14:29] dimitern: because it's called once for each machiner API call [14:29] dimitern: so you're quickly going to have many many pingers :-) [14:30] rogpeppe1: isn't that the singleton root? [14:30] rogpeppe1: for the machiner [14:30] dimitern: nope [14:30] rogpeppe1: aaah [14:30] rogpeppe1: got you! [14:30] rogpeppe1: it's the entry point, but for every call [14:30] dimitern: yes [14:31] rogpeppe1: so then after the CheckProvisioned in admin then [14:31] dimitern: in srvAdmin.Login, you mean? [14:32] rogpeppe1: yeah? [14:32] dimitern: yes, i think that's the right place [14:32] rogpeppe1: ok, will move it there [14:32] hazmat: I see. seems fixable, but not really what I'm poking right now. [14:33] dimitern: or perhaps in apiRootForEntity [14:33] rogpeppe1: I've go another CL for you as well :) https://codereview.appspot.com/11574044/ [14:34] rogpeppe1: in apiRootForEntity with the idea that once it becomes a "root factory" it will be more appropriate? [14:34] dimitern: yeah [14:34] rogpeppe1: good point [14:34] dimitern: i've still got that CL around somewhere [14:35] rogpeppe1: it got rejected i think [14:35] dimitern: i don't think it was reviewed actually [14:35] rogpeppe1: ah, it might be [14:36] dimitern: yeah, no reviews: https://codereview.appspot.com/10684044 [14:36] rogpeppe1: I think fwereade mentioned something he didn't like there, but can't remember details [14:37] dimitern: oh, i'd like to know if there was [14:39] dimitern: this was the crux of it. your NewPinger code would go into the NewMachineRoot function. [14:39] https://codereview.appspot.com/10684044/diff/1015/state/apiserver/facades/machine.go [14:39] rogpeppe1: yeah, well we can still do it next week [14:40] rogpeppe1: haven't look into it in detail myself [14:45] dimitern: reviewed [15:05] I just did an lbox propose but it didn't make a Rietveld code review for me? [15:07] Anyone up for a review? https://codereview.appspot.com/11404044/ [15:07] hey guys [15:07] I'm about to go through the reviews from the top [15:08] rogpeppe1: thanks [15:08] rogpeppe1, I thought https://codereview.appspot.com/10684044 looked cool but hit the nice-to-have buttons not the must-have buttons, so I didn't want anyone to take it over [15:08] fwereade: yeah, i appreciate that [15:09] rogpeppe1, but since it's here, and you're here to respond, I'm on it now (it's at the top after all :)) [15:09] fwereade: that's appreciated. it wasn't complete though - in particular i hadn't added any more tests [15:11] rogpeppe1, ok, give me 5 mins to reload my state on it just in case there was something I didn't like, but I think I'm happy for you to continue with it, just make it a side task not the main one please ;) [15:11] fwereade: ta [15:11] fwereade, I just submitted a small proposoal but lbox propose didn't create a Rietveld code review for me? [15:11] fwereade: i haven't touched it since i got back BTW [15:11] mattyw, huh, weird [15:11] fwereade, I take it we're still using Rietveld [15:11] fwereade: a proposal against juju-core? [15:11] mattyw, I use +activereviews as my task list anyway so rest assured I'll see it [15:12] s/fwereade/mattyw/ [15:12] fwereade, ok, shall I try to make a Rietveld review for it or just ignore it? [15:12] rogpeppe1, yeah juju-core [15:12] mattyw: hmm, weird. can you run lbox propose --debug and paste the result? [15:13] mattyw: fwiw, rvba has been having some perhaps similar issues with lbox [15:15] rogpeppe1, hmm, running debug tells me it can't login - I guess that's the problem ;) [15:15] rogpeppe1, yeah, direction looks sane, thanks :) [15:16] mattyw: try removing ~/.lpad_oauth* and trying again [15:16] oh, incidentally, did everyone see http://foaas.herokuapp.com/ [15:16] fwereade: no [15:16] rogpeppe1, it made me smile [15:16] fwereade: me too === rogpeppe1 is now known as rogpeppe [15:17] LOL [15:18] * fwereade didn't really have a proper lunch and is off to have one now, reviews will have to wait a bit [15:18] rogpeppe, got it working without having to resort to removing .lpad_oauth [15:18] * fwereade will probably just have a sandwich at his desk though, so don;t fret ;p [15:40] jtv, ping [15:40] Hi fwereade [15:40] jtv, heyhey, it looks like you have 2 LGTMs on "Drop startInstanceParams" [15:41] Ah great, thanks [15:41] jtv, unless you particularly want to, please don't feel you need to wait for my review [15:41] Thans! [15:42] *Thanks [15:42] jtv, thank *you*, I'm really appreciating what you're doing [15:42] rogpeppe: all the other client-side facades for workers use the worker name as the facade type name [15:42] rogpeppe: and i think it's reasonable as well, for agent facades (currently MA only) State seems more appropriate [15:43] hi-ho, hi-ho, to tarmac-land we go [15:43] dimitern: i think they should all use State [15:43] rogpeppe: i don't agree [15:43] dimitern: that was my original plan, agreed with fwereade AFAIR [15:43] rogpeppe: why? [15:43] fwereade: ? [15:43] dimitern: because the Machiner state API facade is not itself a Machiner [15:43] dimitern: it is actually the state [15:44] dimitern: with a facade in front of it [15:44] rogpeppe: it's not state either [15:44] dimitern: so calling it State in all those places seems like a Good Thing to me [15:44] rogpeppe: it's api [15:44] rogpeppe, dimitern: I don't *love* state but IMO it's at least consistent with existing usage and as a name for the source of truth [15:44] dimitern: the API is just a front-end for the state [15:45] rogpeppe: so MachinerAPI then [15:45] dimitern: and naming it State means we don't need to rename all the variables in every piece of client code [15:45] rogpeppe: better than State [15:45] rogpeppe: that's not a big deal imo [15:45] rogpeppe: we don't have to rename them either way [15:45] dimitern: it means that it's potentially easy to move code from one worker to another [15:46] dimitern: or factor it out [15:46] rogpeppe, dimitern: I'm ok sticking with state to reduce churn, personally, but I don't have a strong opinion either way [15:46] rogpeppe: i don't see how this should be something to consider [15:46] rogpeppe: facades are specifically done for a worker/agent [15:46] dimitern: sure. but from the point of view of that agent, it's talking to the state [15:47] dimitern: so having it called, say machiner.State, seems like it works well to me [15:47] rogpeppe: i think it's misleading [15:47] dimitern: it means that when you read the code, it's fairly obvious that all these pieces of code are talking to the same underlying thing [15:47] rogpeppe: maybe not Machiner then, MachinerAPI is actually closer to the intent [15:48] rogpeppe: when it has API in its name it's pretty obvious [15:48] dimitern: i don't see a significant benefit from machiner.API (over machiner.State) (and i'm not keen on machiner.MachinerAPI [15:48] ) [15:49] dimitern: as fwereade says, it reduces churn, and that seems like a good thing [15:49] dimitern: i'd like to be able to change all the client code with minimal fuss [15:49] rogpeppe: who's going to change all the other facades then? [15:50] rogpeppe: which are already using these names [15:50] dimitern: i'm happy to do it. i've already done the upgrader one. [15:50] rogpeppe: ok then [15:50] dimitern: given that nothing is using them yet, it's not much of an issue [15:50] rogpeppe: i *still* don't agree [15:50] rogpeppe: but will do it [15:50] rogpeppe: 2:1 against :) [15:50] rogpeppe, did you assign yourself to the time.Format bug? [15:51] dimitern: thanks a lot. i do think it makes sense though, if only from a grandfathered-in p.o.v. [15:51] fwereade: no [15:51] fwereade: but i have merged a fix [15:51] fwereade: so should mark the bug fix committed [15:51] fwereade: have you got the link to hand? [15:52] rogpeppe, not offhand -- it's just that sidnei also fixed it but a bit late [15:52] fwereade: i think mark assigned the bug to me actually [15:53] mgz, gotcha.. could you clarify what your working on?.. from the kanban its not relatable at all? goamz for extra ip.. (which is vpc support incidentally) and lxc container addressability. [15:53] rogpeppe, oh, double-bad luck, sidnei found and fixed it completely independently [15:53] not sure how either of relates to the api bits [15:54] i did a branch on that one too (time.format), but couldn't ever the get the full test suite to pass.. [15:54] hazmat: specifically I need a mechanism for keeping machine address details updated in state [15:54] rogpeppe, I've marked it fix committed [15:54] fwereade: thanks [15:55] which seems easiest to do by monitoring machines in the process of coming up and polling over the cloud's api for changes [15:56] rogpeppe: i've updated this as well https://codereview.appspot.com/11572043/ [15:56] dimitern: looking [15:56] then everything else can just watch state for address stuff, rather than having to go run something provider specific at the unit level [16:00] dimitern: reviewed [16:00] rogpeppe: thanks\ === flaviamissi is now known as flaviamissi_ [16:06] rvba, a question about https://codereview.appspot.com/11578043/ [16:08] mattyw, I think that one's a trivial, go ahead and merge it [16:08] mattyw, tyvm [16:09] fwereade, trivial is my middle name [16:09] mattyw, haha [16:09] fwereade, I don't think I can merge it, I only have the option to say it's already merged in lp [16:10] mattyw, ok, I've approved it for you [16:10] mattyw, keep half an eye on it just in case the bot doesn't like something [16:10] fwereade, thanks :) [16:19] Oh no. [16:20] Our branches are spending too long in review. Getting really hard conflicts popping up in the meantime. [16:25] dimitern, https://codereview.appspot.com/11424044/ LGTM, but ponder my comments before you merge [16:29] fwereade: thanks; looking [16:43] fwereade, dimitern: https://codereview.appspot.com/11586043 [16:43] fwereade: let's rename it to Nonce when we have UA connecting the same way? [16:43] dimitern: will we need a nonce for the unit agent? [16:43] rogpeppe: fwereade mentioned that [16:44] rogpeppe: looking [16:44] fwereade: +1 to omitempty - i nearly mentioned that [16:45] rogpeppe: will do [16:48] dimitern, let's rename it *now* if we're going to [16:48] dimitern, it may be that we never will [16:49] dimitern, but I suspect that one day units will be able to move from one machine to another [16:49] fwereade: will we? [16:49] fwereade: I prefer not to complicate more that CL renaming all MachineNonce instances to Nonce [16:50] dimitern, I remember thinking they might need it but I can't remember why [16:50] fwereade: but can do a follow up [16:50] dimitern, ok, so long as we don't release with MachineNonce in the api, or we'll never get away from it ;p [16:50] fwereade: why? [16:50] fwereade: it's just an argument [16:50] a field even [16:52] dimitern, that some clients will use one version of, and other clients will use another version of [17:00] fwereade: ok, will do a follow-up after the third branch has landed [17:01] dimitern, cool [17:01] fwereade: and rename *all* MachineNonce to Nonce (even at top-level in agent.Conf) [17:01] fwereade: or you meant only the one in API Info actually? [17:01] dimitern, fwereade: fairly trivial: https://codereview.appspot.com/11589043/ [17:02] dimitern, I really just meant the one in api info [17:02] dimitern, agent.conf is alredy compatibility-infected, best not touch it unless we have to [17:03] fwereade: ok [17:03] fwereade: if it's only that one, then I guess it's no bigger change to do it in the current CL [17:03] dimitern, that'd be great [17:03] dimitern, tyvm [17:03] rogpeppe: how will that make the api nicer? [17:04] rogpeppe, LGTM [17:04] rogpeppe: first review ;) [17:04] dimitern, check out params.AgentTools [17:04] rogpeppe: i really don't get that CL [17:04] sorry [17:05] dimitern: a version will be encoded as a nice string rather than several fields. params.AgentTools can also just use version.Binary rather than explicitly redeclaring all the version fields. [17:05] dimitern, it lets us just have a version.Binary in a params struct, and get it sent nicely over the wire, rather than as like 7 distinct fields [17:05] fwereade: thanks [17:06] rogpeppe, fwereade: ah! I see, thanks [17:08] dimitern, https://codereview.appspot.com/11572043/ also LGTM with one tweak [17:10] dimitern, and https://codereview.appspot.com/11574044/ LGTM too [17:11] fwereade: tyvm [17:11] right, that's 6pm: I've survived the week and I'm going to lie down in a park for a bit [17:11] I'm on holiday mon-weds but will probably be around mon am a bit [17:12] fwereade: enjoy [17:12] so I can talk to whoever's here about where we're going with environment config [17:12] fwereade: i'm also off now [17:12] happy weekends all! [17:12] fwereade: i've done the upgrader but not tested it... [17:12] rogpeppe, sweet, tyvm [17:12] g'night all [17:14] fwereade: why the need to test is the pinger a resource? [17:14] fwereade: enjoy your holiday [17:14] fwereade: I can add a test that after disconnecting it's no longer alive - will that be enought? [17:14] fwereade: ah, sorry didn't see you're previous msgs [17:26] so, time for me to step out too [17:26] enjoy your weekend, dimitern [17:26] TheMue: thanks, you too! [17:27] dimitern: will do so, tomorrow invented to a breakfast and later to a bbq ;) [17:27] invited [17:54] gary_poster: I'll take one