[01:23] wallyworld: you back? [01:51] thumper: yes [02:00] wallyworld: cool, Rietveld: https://codereview.appspot.com/13690043 [02:00] looking [02:12] thumper: I missed a few minutes of the meeting yesterday when there was some talk about the SanityCheckConstraints stuff. Just wanted to check if it's doing what you want it to do, re containers [02:12] axw: a chat would be good :) [02:12] ok [02:13] let me just finish off what I'm doing [02:13] and I'll poke [02:13] nps [02:17] thumper: done. i have a small one for you if.when you have a moment https://codereview.appspot.com/13274045 [02:35] wallyworld: done, and one for you Rietveld: https://codereview.appspot.com/13691043 [02:35] axw: hangout? [02:35] thumper: yup [02:35] * wallyworld looks [02:36] axw https://plus.google.com/hangouts/_/111aeaf1ea142cc8d744a4af72f06a427b54f212?hl=en [02:39] thumper: https://code.launchpad.net/~axwalk/juju-core/sanity-check-constraints/+merge/185015 [02:49] thumper: did you forget a pre-req on your mp? [02:49] wallyworld: yes [02:49] oops [02:49] np :-) [02:49] i thought i saw the same code again :-) [02:53] DebugHook failure on bot :-( [02:54] resubmit time [03:23] wallyworld: https://codereview.appspot.com/13679044/ [03:24] looking [03:24] done [03:24] I thought it might be close to your heart [03:24] :-) [03:27] wallyworld: there is no content to check [03:27] wallyworld: it is a notify only, sends struct{} [03:28] oh? [03:28] wallyworld: basicly it says "hey, something changed" [03:28] yeah [03:28] that is how it [03:28] that's not good imo [03:28] is implemented [03:28] bad design :-( [03:28] it is what it is [03:28] introduces race conditions [03:28] bad, bad, bad [03:28] not that we really care about [03:28] not really [03:28] not in this use case at least [03:28] but yes [03:28] I do agree with you [03:29] * wallyworld sigh heavily :-( [03:36] wallyworld: so much for being funny [03:37] huh? [03:37] wallyworld: no one should call the api with no values [03:37] oh :-) [03:37] I'll add an early return [03:37] if you insist [03:37] only if you want [03:37] but it should never happen [04:19] thumper: small one? https://codereview.appspot.com/13272051 [04:21] * thumper looks at wallyworld's review [04:22] \o/ [04:24] land it [04:25] thumper: thanks :-) [04:58] * thumper actually tries his code [05:03] hmm... [05:03] why is everything written twice in the local log file [05:03] I don't recall this happening before [05:11] haha [05:11] oops [05:11] my fault [05:11] arse biscuits [05:12] hmm will be fixed in two commits [05:25] * thumper submits a fix [05:26] oh poo [05:28] * thumper has a solution to the poo [05:28] but not today [05:29] wallyworld: Rietveld: https://codereview.appspot.com/13694043 [05:38] * thumper EODs [05:38] wallyworld: and another Rietveld: https://codereview.appspot.com/13587045 [05:39] axw: sorry I didn't get to yours today, slightly more effort than I though actually getting through my stuff [05:39] have a good weekend folks [05:39] thumper: no worries [05:39] you too [05:39] wallyworld: if you want land the first one line fix for the duplicate lines, you should be fine. [05:41] looking [05:44] wallyworld: unfortunately I think the cloudinit tests are going to blow on that branch [05:44] * thumper checks [05:44] ah. i hit approved, so we'll see i guess [05:47] I just confirmed they break [05:47] committing a fix now [05:47] and running al the tests [05:48] ok [05:49] just one lot of fixes [05:50] hmm hasn't failed yet [05:51] i'll resubmit when it fails [05:51] ta [05:51] * thumper leaves now [06:06] mornin' all [07:04] quick review anyone? https://codereview.appspot.com/13300050/ [07:04] fwereade__, dimitern: ^ [07:21] rogpeppe, taking a look [07:21] rogpeppe, eh, reviewed already [07:21] fwereade__: two pairs of eyes wouldn't harm [07:21] * fwereade__ is enhappied by this [07:21] rogpeppe, sure [07:23] rogpeppe, LGTM too, nice [07:23] davecheney, fix-tools-release-script has been up for a while, is something blocking it? [07:24] davecheney, eh, that's not its name, but close enough [07:24] fwereade__: not following you, sorry [07:24] * davecheney does not feel blocked today [07:24] davecheney, https://code.launchpad.net/~dave-cheney/juju-core/153-fix-release-tools-script/+merge/183556 [07:24] * fwereade__ approves of this state of affairs [07:24] oh, i just missed that [07:24] or the bot was screwed [07:24] or seomthing [07:25] probably :) [07:25] done [07:25] thanks for the reminder [07:25] ah, now i remember [07:25] that also needs to be backported to 1.14.0 [07:34] hmm, actually, no need to backport [07:55] morning [07:57] dimitern: morning [08:01] dimitern, heyhey [08:02] dimitern, I was just thinking of provisioner and the authkeys-from-env-config thing [08:02] dimitern, at the api level I think there's a pretty serious difference between the lxc and the environ provisioners [08:03] fwereade__: i think i tend to agree - they could easily be two API facades [08:04] fwereade__: (LocalProvisioner and GlobalProvisioner perhaps) [08:04] fwereade__, rogpeppe, so 2 facades then [08:04] rogpeppe, dimitern [08:05] fwereade__ [08:05] rogpeppe, dimitern: sorry having trouble articulating [08:05] rogpeppe, dimitern: ok, they both watch the environment config [08:05] fwereade__: we really need to split the env config [08:05] rogpeppe, dimitern: the env prov because it needs it, the local one because that's where it gets its authkeys from [08:06] rogpeppe, dimitern: but in api terms [08:06] rogpeppe, dimitern: those auth keys should be expressed as a property of the *machine* [08:06] fwereade__: interesting; yes, that seems right to me [08:06] rogpeppe, dimitern: and in *both* cases we should be getting authkeys from a machiney api [08:07] fwereade__, sorry, I don't get that [08:07] rogpeppe, dimitern: then in the local case no need for an env config watch (other than that's how the machine auth watch is implemented in the background) [08:07] fwereade__, why of a machine? [08:07] dimitern, because it's obviously not an env property [08:08] dimitern, it's really a user property [08:08] dimitern, but it applies per machine [08:08] dimitern, and we need t figure it out in the context of provisioning an actual machine [08:08] fwereade__, sorry, still waking up - what auth keys are we talking about? [08:09] dimitern, ah ok -- the ssh authorized_keys is set up by cloudinit [08:09] dimitern, that information is stored in environ config [08:09] dimitern, because we smoke lots of crack [08:09] dimitern, and because of this the lxc provisioner depends on an environment watch that it should not [08:09] fwereade__, ah, ok, yeah [08:10] dimitern, and the env provisioner uses its env watch for both the env that it needs, and the authkeys that come almost accidentally [08:10] fwereade__, and you need the lxc provisioner to watch machines and get this info from a machine? [08:11] dimitern, possibly [08:11] dimitern, it may not in fact be practical given time constraints [08:11] dimitern, but you are in a better position to determine precise feasibility than I am [08:12] dimitern, so I wanted to make you aware of the forces that have shaped the current implementation and where I think they've led us astray [08:12] fwereade__, my plan for today is to land the uniter and finish of some misc cleanup tasks around the api, so i can start on the provisioner next week [08:12] dimitern, cool, just giving you a heads up :) [08:12] dimitern, thanks [08:12] fwereade__, cheers :) [08:13] fwereade__, I still need to do a ec2 live test in addition to the local one, as rogpeppe suggested https://codereview.appspot.com/13401050/ === tasdomas_afk is now known as tasdomas [08:20] fwereade__, btw I found and interesting panic in the cmd package yesterday [08:21] fwereade__, with a local environment, going into ~/.juju/local/log/ and then destroying the env; trying any juju command gives a panic "getcwd: .. something" === tasdomas is now known as tasdomas_afk [08:56] fwereade__: fairly trivial CL: https://codereview.appspot.com/13648048 [08:56] dimitern, TheMue, anyone: ^ [08:57] *click* [09:02] rogpeppe: reviewed [09:10] dimitern: there's an intermittent test failure in apiuniter - do you know about this? [09:10] https://code.launchpad.net/~axwalk/juju-core/sshstorage/+merge/184708/comments/421694 [09:11] axw, you should've left that one alone actually [09:11] ? [09:11] axw, it's going away and now I'm facing some conflicts with my branch that removes it [09:11] ok [09:11] mgz, how do you resolve a conflict like this? Conflict adding files to worker/apiuniter. Created directory. [09:12] mgz, I want it gone [09:12] sorry dimitern I'm not talking about the debug-hook one if that's what you think [09:12] this is different [09:12] dimitern: bzr rm the dir, then bzr resolve on that path [09:14] mgz, that worked, thanks [09:14] axw, worker/apiuniter is going away entirely [09:14] dimitern: okey dokey [09:15] axw, sorry, I should've mentioned that [09:15] nps [09:25] TheMue, ping [09:28] fwereade__, dimitern, TheMue: this is the avoid-allFatal CL for inclusion in 1.14: https://codereview.appspot.com/13696043 [09:28] fwereade__: pong [09:29] dimitern, would you take a look quickly? [09:29] TheMue, just wanted to check everything was clear re status data dict [09:30] TheMue, and to clarify something I may not have touched on [09:30] TheMue, the data dict will be used by the api to cross-reference what relation(s) may have encountered error(s) [09:30] fwereade__, rogpeppe, looking [09:31] TheMue, so the relation therein must be identified in a way that they can understand [09:31] dimitern: there's no new code in there, BTW [09:31] dimitern: just patches selectively applied from trunk [09:31] fwereade__: they = GUI? [09:31] TheMue, in the narrow, yes; in general, consumers of the public api [09:32] fwereade__: ok [09:32] rogpeppe, shouldn't CodeNotFound also be fatal? [09:32] TheMue, so your choices are key, tag, or id [09:33] TheMue, tag is most in keeping with the rest of the ui, but it's a strange thing to store inside state [09:34] dimitern: that's orthogonal to this fix, i think [09:34] fwereade__: never have been in contact with tags so far. short explanation? [09:34] rogpeppe, LGTM [09:34] TheMue, key is not at all in keeping with the rest of the api, *except* that it's how they already identify relations, because that's how relations are identified in the AllWatcher stream [09:34] dimitern: i wanted to port this change only, but if you think other stuff should go into 1.14 too, then it should be considered [09:34] dimitern: thanks [09:35] TheMue, unit-wordpress-7, machine-7-lxc-3, service-nyancat, user-fwereade [09:35] fwereade__: ah, thx [09:35] TheMue, relation-wordpress.db#mysql.server [09:35] rogpeppe, no, let's keep it simpler - we're close to releasing 2.0 (or whatever) soon anyway as stable [09:36] dimitern: thanks [09:36] dimitern: i'll just do a live test, as it is the release, then i'll merge [09:37] fwereade__: keys would be most natural for me [09:37] TheMue, so, anyway, I'm kinda -1 on keeping tags in the info dict -- storing tags in state feels like a layering violation [09:38] TheMue, key is a little bit sucky but convenient in many ways [09:38] fwereade__: why sucky? [09:41] TheMue, because the API should be using a consistent vocabulary [09:42] TheMue, tags were formalized for the api and are used here there and everywhere [09:42] TheMue, but very inconsistently in the case of the public api [09:42] TheMue, hence the key in the relation info from allwatcher [09:42] woohoo! finally the uniter got merged [09:42] * fwereade__ cheers at dimitern [09:43] this marks the end of a 3-week struggle :) [09:43] dimitern: *clap* *clap* [09:43] thanks guys [09:44] fwereade__: yep, just reading the API how it uses tags [09:46] TheMue, however it is not really appropriate for the uniter to be thinking in terms of key, because that's really an internal state thing [09:46] TheMue, so [09:47] TheMue, I am starting to feel that id is actually the best identifier [09:47] fwereade__: while digging deeper i'm getting the same idea [09:47] TheMue, the trouble with that is that we don't put it in the relation info dict from the allwatcher [09:48] fwereade__: one moment, will take a look there [09:48] TheMue, so if we do id (which which I *think* we actually should) we need to report it in RelationInfo [09:48] TheMue, and that's not hard but it does need to be done so we can construct a sensible status dict [09:51] rogpeppe, that LGTM too fwiw [09:51] fwereade__: thanks [09:54] TheMue, id is strictly superior to key anyway, because it's actually unique [09:57] fwereade__: so RelationInfo has to be extended by "Id int"? [09:58] TheMue, yeah [09:58] TheMue, should literally just be a matter of adding the field and trusting in the miracle of reflection [09:58] fwereade__: ok, trying to get a picture out of those puzzle pieces ;) [09:58] fwereade__: hehe [09:59] fwereade__: the hard part is, that the bso annotation of Key is _id :) [09:59] TheMue, I know, that code in particular is a sort of nexus of suck [10:00] *lol* [10:19] fwereade__: just seen, EntityId() of RelationInfo returns "relation" as kind and the key as id. now the struct has an Id field. how shall i explain this weird stuff to my children? :/ [10:19] TheMue, oh, hell, EntityId is *really* badly named then [10:20] TheMue, I guess that everywhere we use Id is really pretty wrong except for on Relation [10:20] TheMue, but relation has a Key that should be name and is serialized as _id [10:20] TheMue, however [10:21] fwereade__: so it's aboslutely logical *rofl* [10:21] TheMue, it's completely screwed up [10:21] fwereade__: it's called "semantical encryption" [10:22] TheMue, haha [10:27] TheMue, fwiw adding relation id to RelationInfo is a good in itself, I reckon that'd make a nice trivial CL in itself [10:30] fwereade__: ok, will separate it [10:37] TheMue, remember that will also need to be backported onto 1.14 [10:41] fwereade__: here i'll need assistance on how i have to apply the changes 1.14 [10:41] fwereade__, a small branch https://codereview.appspot.com/13474046 [10:41] dimitern, had literally just opened it :) [10:41] fwereade__: so far only worked on trunk [10:41] fwereade__, :) [10:45] dimitern, nice, LGTM; just consider using Satisfies, I think it'd read a bit nicer [10:46] fwereade__, you mean instead of jc.IsTrue ? [10:47] dimitern, yeah [10:47] dimitern, c.Check(err, jc.Satisfies, params.IsCodeUnauthorized) [10:47] fwereade__: standup? [10:47] dimitern, puts the err, which is what we're testing, front and centre [10:47] rogpeppe, oops [10:48] fwereade__, well I don't think it'll read much nicer - instead of c.Assert(params.IsCodeX(err), jc.IsTrue) it'll be c.Assert(err, jc.Satisfies, params.IsCodeX) [10:48] dimitern, it puts the value we're checking in the expected place [10:48] fwereade__, ok, I can do that [10:49] goooooogle [11:06] fwereade__, dimitern: https://codereview.appspot.com/13252050/ [11:07] TheMue, reviewed [11:07] dimitern: thx [11:52] * TheMue => lunch [11:53] trivial review, backport patch to 1.14, anyone? https://codereview.appspot.com/13251047 [12:05] rogpeppe, reviewed [12:05] dimitern: thanks! [12:10] fwereade__: you around for a quick chat about Prepare stuff? [12:18] * rogpeppe expected hundreds of conflicts and only got three. fantastic! [12:29] fwereade__: ping [12:31] dimitern: yeah, the point about jc.Satisfies is not particularly that the assert looks nicer, but that if it fails, gocheck will print the actual error value that failed [12:35] rogpeppe, ah, good to know [12:44] wallyworld, dimitern, fwereade__, TheMue, mgz: initial type declarations for proposed environment configuration storage scheme: https://codereview.appspot.com/13235054 [12:45] rogpeppe, will look a bit later [12:46] dimitern: thanks [12:46] rogpeppe, say, how come your gocheck -> gc branches missed a lot of stuff in environs? [12:46] dimitern: for example? [12:46] rogpeppe, or other places with more than 2 levels of nested dirs [12:46] rogpeppe, like environs/jujutest/tests.go [12:47] rogpeppe, or environs/jujutest/livetests.go [12:47] rogpeppe, pong, sorry [12:47] dimitern: it's because i excluded all files that didn't end in _test.go [12:47] rogpeppe, ah :) I suspected that [12:47] dimitern: it helped exclude false positives, but yeah there's a bit of cleaning up to do [12:47] fwereade__: quick chat? [12:49] rogpeppe, yeah, sounds good, I'm just looking at that types CL [12:49] fwereade__: ah, cool. there's a bit more than that here: http://paste.ubuntu.com/6101477/ [12:50] fwereade__: https://plus.google.com/hangouts/_/0d62d7c4bc1a5c5ba5bdc03d45dc0ad913031048?authuser=1 [13:42] fwereade__: btw, it looks like the stuff with boolean operators in maas tags was actually not implemented in maas... there's a comment next to that code that reads "Currently tag constraints consist of just comma and/or whitespace tag names, all of are required for a match. Extending this to support full boolean expressions would be possible, and some forward compatibility is attempted." [13:42] natefinch, that is *awesome* news [13:42] * fwereade__ is happy [13:43] fwereade__: yep. so, it really is just "match all these tag names" with comma or whitespace delimiters [13:48] natefinch: what a nice comment that code has [13:50] mgz: classic YAGNI though [13:51] mgz: or more likely, YAGII - You Ain't Gonna Implement It [13:52] hm, it might have happened, maas has been yoyoing between being worked on and not for like a year now [13:53] fwereade__: so when/if that types proposal ok with you, i'll push it, then strip it down in the next CL, leaving some bits as TODOs. [13:53] mgz: the docs don't say anything about it as far as I can tell, but that doesn't mean i wasn't implemented [13:53] most of the complexity in that code is just for the pre-checking tag name validity [13:53] which, if I understood the previous conversation corectly, we're just not going to do in juju-core [13:53] natefinch: it wasn't implemented [13:54] rogpeppe, I would honestly prefer it if you would implement subsets that completed defined tasks, and extend as necessary to fulfil the other requirements [13:54] mgz: it was my inclination not to check the tags before we make the API call. Seems like a waste. If you give a tag that doesn't exist, it won't match anything. Not sure if the API has a special response for using invalid tags [13:54] fwereade__: ok [13:55] it's historicailly not been a waste, because juju sucks at reporting errors when no machine mactches a constraint [13:55] rogpeppe, at the end you can point to how it exactly matched your original plan and I can be contrite :) [13:55] so, there was value in telling people early their constraint was impossible, rather than letting them sit looking at a pending machine in status for ever [13:56] fwereade__: :) [13:56] mgz, axw is pushing on that at the moment [13:56] fixing that issue is more relevent in juju-core, so less care up front is fine [13:56] mgz: yeah, i was going to say, that sounds like something we should fix outside of maas [13:57] so, pretty much all you want to do is pass the string through to the maas machine allocation, possibly with a tiny bit of normalisation [14:00] natefinch: fwiw, if you use tags which don't exist when acquiring a node in MAAS, you'll get a 400 error with the list of the non-existant tags in the response's content. [14:01] rvba: nice, I had hoped so :) [14:12] natefinch, rvba: is there a way to ask about a list of tags, for preflighting? [14:12] natefinch, rvba: failing as early as possible still remains a good thing to do [14:13] fwereade__: yes, there is an api method to get the list of all the tags. [14:13] fwereade__: yeah, but since tags can be added at any time, I'd feel better about just handling the error when we ask for a machine configuration that doesn't exist [14:13] What natefinch said, this would introduce a race. [14:13] natefinch, yeah, the SanityCheckConstraints stuff is definitely secondary [14:14] fwereade__: there should hopefully be very little visible difference to the user [14:14] rbva, tbh I'm not so concerned about people racing to use just-added tags ;p [14:14] rvba, but point taken nonetheless [14:14] fwereade__: the docs say that tags can take a minute to propagate... I can totally see someone adding a tag and then trying to use it within that time frame [14:15] Exactly what I was about to say. [14:15] fwereade__: though I guess it would probably just fail either way [14:15] If a tag uses a complex xpath expression in its definition, it can take a while to apply. [14:15] natefinch, rvba: ok, so it slightly increases the chance of us itting that window [14:16] (You can either define a tag manually (giving the list of nodes to which it applies) or define the tag with an xpath expression with will be matched against the hardware XML data [currently lshw + lldp] of all the tags) [14:16] fwereade__: we actually need to fail in both spots anyway, since you could have a tag defined that no machines match anyway [14:17] natefinch, indeed so, I have been tending to resist putting too much effort into early failure anyway, when late failure needs to be handled better and can't ever be 100% avoided whatever we do with preflighting [14:17] fwereade__: yep [14:18] natefinch, but, eh, if we have an early failure mechanism I'm not too bothered by juju saying "no" a tiny bit more often in very specific and somewhat questionable circumstances ;) [14:18] mgz, hey, machine address updates [14:19] if you want to catch bad tags, just copy the pyjuju logic straight [14:19] mgz, what's the latest story on what's responsible for what? [14:19] mgz, if it's provisioner-side that might also be useful for dealing with tags as they change [14:19] provisioner puts addresses in state, charms and api gets addresses from state [14:20] (mgz, natefinch: this is also secondary to the focus though) [14:20] mgz, cool [14:20] machiner/uniter do not put things in state [14:20] mgz, great, thanks [14:21] fwereade__: is it possible to change a service's constraints? [14:21] the case I'm most wrroied about with tags is you have a long-lived deployment, and find at some late point that you can't add units any more because the tags you originally specified no longer apply [14:27] mgz, certainly you can [14:28] mgz, the interesting case to me is when you have a deployed machine whose tags change while it's running [14:29] that's also interesting... [14:29] fwereade__: seems like that would be similar to the case of changing the constraints on a service. If you suddenly up the memory requirement on a service, I presume we don't remove it from machines it's on if they don't fit the constraint [14:30] natefinch, yeah, I'm thinking about subsequently matching units against already-provisioned machines [14:30] natefinch, you want it on a machine with tag X; mahcine 7 has tag X; or maybe it just had it when it was provisioned [14:57] I have a large review, but with mostly mechanical changes - anyone interested? https://codereview.appspot.com/13606045 [14:58] rogpeppe, ? ^^ [14:59] dimitern: looking [15:03] ooo, very virtuous dimitern [15:03] mgz, what is? [15:04] tackling import tidying all over the codebase [15:05] mgz, yeah, well, rogpeppe did almost all of the gocheck stuff already, and many modules are properly formatted [15:06] mgz, that'll give me some peace, rather than getting frustrated every now and again when I encounter a badly formatted file :) [15:07] dimitern: reviewed [15:07] rogpeppe, thanks [15:08] rogpeppe, you know, I raised a similar question some time ago and I was told it's about consistency and we should import even a single package n a block [15:08] dimitern: really? [15:08] rogpeppe, yep [15:08] dimitern: i think it should be just fine. it seems pernickety to the point of ridiculousness. [15:08] rogpeppe, these two places where the only ones that didn't follow the rule, and there are plenty of other files that do [15:09] dimitern: what if there are no imports at all? should we do import (\n) ? [15:09] rogpeppe, :) no [15:09] rogpeppe, so if you don't mind terribly let's have these 2 places follow the same pattern [15:09] rogpeppe, again, it's about consistency [15:10] dimitern: fair enough. bleh, though. [15:10] rogpeppe, great, thanks [15:46] rogpeppe, https://codereview.appspot.com/13700043/ last one, i promise :) [15:47] dimitern: so we don't allow any upgrading across more than two minor versions? [15:48] rogpeppe, that's the plan IIRC [15:48] dimitern: i'm a bit wary of doing this until our upgrade logic actually enforces that [15:48] rogpeppe, I think it does now [15:49] rogpeppe, there were some changes recently wrt that [15:49] * dimitern brb [15:49] dimitern: can you find the place that does that? [15:50] dimitern: (of course, even that's not enough, because someone might do the 2nd upgrade before the first one has completed) [16:04] rogpeppe, well, didn't we agree to remove these deprecated things? [16:04] dimitern: i'm just wary - would like to check that we are actually not breaking our compatibility promises [16:05] rogpeppe, how can you guarantee that anyway? [16:06] fwereade_, care to join in? ^^ [16:06] dimitern: well, if we remove these compatibility things and we haven't got anything that checks that we can't upgrade too much too soon, then we know we're potentially breaking something, in a hard-to-diagnose way [16:07] dimitern: did you find the MP that made the check? [16:08] rogpeppe, no, it was some time last week though [16:09] * fwereade_ reads back [16:09] dimitern, fwereade_, natefinch: first branch in the config storage series: https://codereview.appspot.com/13535045 [16:09] fwereade_: ping [16:10] TheMue, pong if it's quick, I might get deeply involved here [16:10] fwereade_: it's quick [16:10] * fwereade_ is listening [16:11] fwereade_: when changing the API we have also SetStatus with status and info as argument. adding data would make it incompatible [16:12] TheMue, how would it make it incompatible? [16:12] fwereade_: third argument where right now are two. [16:12] TheMue, old client -- sets empty data, no change in behaviour [16:13] TheMue, old server -- ignores what gets set, no other change [16:13] TheMue, worst case -- old environment lacks hook error info until someone retries the hook [16:13] TheMue, am I missing some interaction? [16:13] rogpeppe, dimitern: regardless of 1.14, 1.12 is a problem still [16:14] fwereade_: hmm, cannot detect right now. looks ok to me now [16:14] rogpeppe, dimitern: I have not seen that upgrade code landing, I'd need some evidence before I believed we had it [16:14] fwereade_, how so? [16:14] fwereade_, I looked and couldn't see it actually [16:14] dimitern, it'll jump versions [16:15] fwereade_, rogpeppe, didn't we agree to make upgrades possible only +/- 2 minor versions? [16:15] dimitern: yes, i think we did [16:15] dimitern, rogpeppe: we did, but I don't think it's enforced anywhere [16:15] dimitern: but i haven't seen that change [16:16] dimitern: also, as it happens, that's not actually sufficient - i think we need some more checks to make it good [16:16] fwereade_, rogpeppe, so we need that right now, yesterday even [16:16] dimitern, rogpeppe: I support the change you're making, but it needs verification (and probably backporting the fix into 1.14... and maybe even 1.12... if there's any way we can actually get that to people using it?) [16:16] dimitern, fwereade_: in particular, i think the upgrade-juju command should probably check that all agents are all at the current version [16:16] dimitern, yeah, exactly [16:17] rogpeppe, yeah, that sounds reasonable [16:17] rogpeppe, I suspect doing it non-racily will be challenging [16:18] fwereade_, rogpeppe, what do you mean all agents? [16:18] fwereade_: well, it can't work correctly if two clients are both using upgrade-juju [16:18] dimitern, every machine and unit [16:18] fwereade_: but i think it'll work ok otherwise [16:18] fwereade_, rogpeppe, we need to have a flag "upgradeable" somewhere and a worker setting that [16:18] fwereade_: i can't *think* of any particular race [16:19] rogpeppe, I was only even thinking about machines that are still being deployed from 2 upgrades ago ;p [16:19] dimitern: why so? [16:19] dimitern, I'm not sure about that [16:19] rogpeppe, so we know the env has a stable, single version across [16:19] dimitern, very hard to keep track of [16:19] dimitern, rogpeppe: actually, we can do it ok I think [16:19] fwereade_, why? the env provisioner can take care of that - or the master upgrader [16:20] dimitern, fwereade_: if we check that for all agents, their deployed version is the same as the global version, that's a global steady state [16:20] rogpeppe, dimitern: yeah, but the set of agents is not steady [16:20] rogpeppe, dimitern: an old machine might still be coming up with older code ;p [16:20] rogpeppe, dimitern: oh actually [16:20] fwereade_: is that a problem? they'll still have a state entry [16:21] fwereade_, rogpeppe , I see how this can be racy [16:21] rogpeppe, dimitern: they *won't* have their version set [16:21] fwereade_: so we can refuse to upgrade if we find any of those [16:21] rogpeppe, dimitern: so it involves ops on 2 collections, and the possibility of state changing after either, but probably nbd [16:21] fwereade_: (which is catered for by the same logic) [16:21] rogpeppe, dimitern: s/ops/reads/ [16:21] fwereade_: i'm not sure i understand [16:22] fwereade_, rogpeppe , I think we're overcomplicating things here [16:22] fwereade_: i can't see that we need to any more than: for a := range all agents {if a.version != globalVersion {abort}}; set globalVersion = newVersion [16:22] fwereade_, rogpeppe, it's like *any* upgrade is as complex as a major version upgrade [16:22] fwereade_: only one op required [16:23] rogpeppe, range all agents is volatile [16:23] dimitern: only if someone else is concurrently running add-unit [16:24] rogpeppe, or add-relation with subordinate [16:24] dimitern: well, yes, running operations concurrently with upgrade-juju is probably a bad idea [16:24] rogpeppe, dimitern: isn't it (1) a query for any machine docs without agent version X; (2) a query for any unit docs without agent version X; (3) a single op asserting that environment agent version is still what we thought when we started checking [16:24] rogpeppe, dimitern: so that may actually be non-racy [16:25] rogpeppe, dimitern: because any added machines will be using the environment's agent-version [16:25] rogpeppe, dimitern: and iirc any added units will just use their machine's tools [16:25] fwereade_, rogpeppe, sounds reasonable [16:26] fwereade_, rogpeppe, can we implement some kinds of lock - "don't add any machines or units while this is active" ? [16:26] dimitern, I don't think we need it [16:26] dimitern, but we certainly could if we really did [16:26] fwereade_: ... except then we'd need to be able to break the lock... [16:27] fwereade_, I think we do, at least for user-friendliness [16:27] dimitern, what's the user-facing consequence of what I suggested? I don't see how we can end up with bad machines that way [16:27] fwereade_, so any concurrent add-unit or machine ops will fail with a meaningful message [16:27] dimitern, why should we fail them? [16:28] fwereade_, because it takes more than a split second to upgrade all agents? [16:28] dimitern, they'll either get provisioned after we upgrade or before [16:28] dimitern, so they'll either get the new versoin, or the old version [16:28] dimitern, and if they get the old version they'll upgrade themselves as soon as they come up [16:28] fwereade_, hmm.. [16:29] fwereade_, dimitern: i *think* my original proposal still works [16:29] dimitern, and if they get provisioned with the old version, and take a long time to come up, they block the next upgrade because they haven't set any agent version so it certainly doesn't match new ;) [16:30] rogpeppe, I *think* I'm just talking around your suggestion -- check all agents [16:30] fwereade_, dimitern: the only difficulty is that if lots of people are adding units all the time, we won't be able to upgrade [16:30] rogpeppe, I originally thought it wouldn't work but now I think it will [16:30] fwereade_: cool [16:30] fwereade_, rogpeppe, so 1) get all agents != new version (units and machines docs), 2) assert the version != changed, 3) start the upgrade [16:30] dimitern, get all non-matching units, abort if any found [16:31] fwereade_, rogpeppe , if any agents come out of 1) - stop [16:31] dimitern, get all non-matching machines, abort if any found [16:31] dimitern, set agent version, asserting that it's still what it was when we first looked [16:31] dimitern, this will involve doing an end run around Settings, I suspect [16:31] fwereade_, rogpeppe , this sounds like a plan - where should we call this method? in upgrader? [16:31] dimitern: no, in upgrade-juju [16:32] dimitern, it's got to be in the state code that actually tries to do the upgrade, surely [16:32] dimitern: i don't think any server-side logic needs to change [16:32] dimitern: it's just a client-side sanity check [16:32] rogpeppe, if it's not enforced it's not much of a sanity check [16:32] rogpeppe, fwereade_, assuming we have this, what will it give us? [16:33] fwereade_, rogpeppe, aborting the upgrade if any agent is using a different version - that's one [16:33] rogpeppe, less unpredictably edge case combinations of versions [16:33] dimitern: the ability to enforce stepped upgrades [16:33] dimitern: reliably [16:33] rogpeppe, OTOH it prevents us from unfucking a fucked environment which already has a mix [16:33] fwereade_, rogpeppe, how about enforcing the newVersion is <= oldVersion.minor+2 [16:34] fwereade_: if there's an environment which already has a mix, we're fucked anyway because something isn't responding to the version [16:34] rogpeppe, or tools don;t exst for some particular arch/series [16:34] rogpeppe, we ought to be able to upgrade our way out of that situation [16:34] fwereade_: i think that's awkward [16:35] rogpeppe, I guess the total effort is going to be equivalent either way, you could probably just add the tools to the environment [16:35] fwereade_, rogpeppe, ISTM these 2 things are orthogonal - enforcing a minor +/- 2 versions and not doing an upgrade in a mixed env [16:35] fwereade_: yeah [16:35] dimitern, I'm not even sure about allowing downgrades [16:35] dimitern: i think we should probably be enforcing no downgrades too [16:35] ha [16:35] dimitern, yeah, let's say no downgrades [16:36] fwereade_, rogpeppe, ok then so +2 minor only [16:36] fwereade_, rogpeppe, ah, I see now - by ensuring the version we know of is the same across the env we can just do I check with the local version in upgrade-juju [16:36] dimitern, fwereade_: for bonus points, upgrade-juju could automatically upgrade via as many minor versions as needed to get to the destination version [16:37] s/do I/do one/ [16:37] dimitern, fwereade_: if we *don't* do that, then it's going to an an enormous pain for people that aren't tracking every upgrade [16:38] rogpeppe, fwereade_, ok, this deserves a feature card named "minor version upgrades" [16:38] rogpeppe, fwereade_, and can be a stepping stone towards major version upgrades [16:39] dimitern, fwereade_: rather than locking, i wonder if two explicit commands "halt/resume" might work (and be potentially useful in other situations too) [16:39] dimitern, fwereade_: (orthogonal issue, i know, but just thinking) [16:39] fwereade_, rogpeppe, halt/resume what? [16:40] dimitern: any changes to the state [16:40] dimitern: i.e. it would forbid any deploy, add-unit, destroy-unit etc etc [16:40] dimitern: (while "halted") [16:40] rogpeppe, well, not *any* changes surely - we still need to change some stuff in state during upgrades, no? [16:40] rogpeppe, "not now" to both [16:41] fwereade_: +1 [16:41] rogpeppe, automatic stepping would be pretty cool though [16:41] rogpeppe, and possibly halt/resume in the further future still [16:41] definitely not something to do now though [16:41] fwereade_, rogpeppe, ok, so regrettably we won't drop the deprecated stuff for now :/ [16:42] dimitern: think so, i'm afraid [16:42] rogpeppe, fwereade_, but this *needs* to be a blocker for 13.10 [16:42] fwereade_: in case you missed it: https://codereview.appspot.com/13535045 [16:42] rogpeppe, fwereade_ , if not major, at least minor version upgrades should work [16:49] rogpeppe: fwereade_: I notice that we're passing around constraints.Value in a lot of places, and not the pointer to it... is that on purpose? It's currently 6 pointers (and now a slice) and it's only likely to get larger [16:50] natefinch: i very much doubt that performance is an issue here [16:50] rogpeppe: just tweaks my optimization radar [16:50] natefinch: i switched mine off for juju long ago [16:50] natefinch: we are hideously inefficient in so many places [16:51] natefinch: but it really doesn't matter much - i'd measure in a real situation before optimising [16:51] rogpeppe: yeah, I'm sure it doesn't matter... it's not like we're making lists of thousands of these things or something [16:52] natefinch: think how long it takes to start an instance :-) [16:52] rogpeppe: maybe if we didn't copy so much memory everywhere.... ;) (joking, I know that's not the issue) === natefinch is now known as natefinch-afk [17:37] natefinch-afk, thus far the rationale has been that anywhere we have a constraints param we require actual constraints -- they may be empty, but they're not optional [17:38] natefinch-afk, they're *also* a type that's frequently tweaked before being operated on, and we'd need to audit quite carefully for changes [17:42] natefinch-afk, it is something that regularly surprises people though, so obviously it wasn't quite as transparently clear as I had hoped [17:48] right, time to stop [17:48] fwereade_: review of https://codereview.appspot.com/13535045/ appreciated if you have any time, thanks! [17:48] g'night all === natefinch-afk is now known as natefinch [17:54] fwereade_: I don't really see the difference between nil and the struct default... but I understand what you're saying [17:55] natefinch, there's something a bit off about having two legitimate ways to denote "no constraints" [17:59] fwereade_: yes and no... it's similar to *uintptr .... what's the real difference between nil and 0? [17:59] er *uint64 [18:00] fwereade_: I get that one is *unset* and one is "I specifically said 0" ... but in the case of constraints, they're usually the same thing [18:03] natefinch, yeah, it probably wasn't the smartest thing I ever did [18:04] fwereade_: every time I see the value getting passed, my eye twitches, that's all :) [18:08] fwereade_: ironically, I had to write IsEmpty(v constraints.Value) bool because we were doing straight struct comparison before, which don't work any more because the struct has a slice value in it now [18:09] so now we have !IsEmpty(v) instead of v != constraints.Value{} === BradCrittenden is now known as bac [18:27] so, good night and have a nice weekend all [18:27] TheMue: good night [18:38] When was debug-hooks implemented 1.13? [18:44] marcoceppi: code was merged in on 8/4 [18:45] natefinch: yeah, found the release notes, 1.13.1 [18:45] marcoceppi: Cool. [18:47] natefinch: no idea how well that feature is tested, but I've got someone in #juju whos trying it right now with maas [18:50] marcoceppi: I don't know... there are tests in the code, but for things like this, there can be some distance between testing and reality [18:50] natefinch: right, well, about to get some real data for you guys :) [18:51] marcoceppi: hey, I'm glad people are using it. definitely keep an eye on if he's having trouble with anything, and let us know. [18:52] natefinch: yeah, I told him to ping me if anything goes weird [18:53] natefinch: I just realized how terrifying this command is [18:54] marcoceppi: haha yeah, of the ones we have, that's a doozy === gary_poster is now known as gary_poster|away