/srv/irclogs.ubuntu.com/2012/03/20/#juju-dev.txt

hazmatniemeyer, there is some redundancy with the new format since we're showing the service relations : db : [myblog, teamblog] at the service level and showing the same for each unit as well in the relations/relation-errors block00:13
hazmatwe had talked about collapsing them, but per unit rel status is important00:23
* hazmat looks for previous context00:24
niemeyerhazmat: Wasn't the redundancy always there?01:42
niemeyerhazmat: The distinction is we show the active relations01:43
niemeyerhazmat: Maybe all we need is relation-errors?01:43
hazmatniemeyer, yeah.. that sounds reasonable01:43
hazmatthere's not really anypoint to showing it duplicated, and we flag errrors separately01:43
hazmats/their01:44
niemeyerhazmat: yeah.. and if we regret we can always go back01:44
hazmatsounds good, i'll do it up that way01:44
hazmatniemeyer, thanks01:45
niemeyerhazmat: np01:45
hazmathmm.. some of the dot rendering needs that info01:55
hazmatnm02:00
* hazmat yawns04:03
rogmornin' to anyone silly enough to be up at this hour07:04
rogandrewsmedina: pong07:04
bigjoolsspeak for yourself, it's 5:12pm here :)07:12
rogbigjools: :-) i knew it was a silly thing to say.07:16
rogbigjools: australia?07:17
bigjoolsyup07:17
bigjoolsgah, this whole juju branching from LP thing has really screwed my day07:18
rogbigjools: how's that?07:18
bigjoolsI waited for a bootstrap to finish while it was setting up a node (takes ages) and then:07:19
bigjoolsbzr: ERROR: http://bazaar.launchpad.net/%2Bbranch-id/348938/.bzr/repository/packs/c77bcca4ec91aee207a8f9d37b1a8608.pack is redirected to https://launchpad.net07:19
bigjoolswhich is a bug in LP/bzr somewhere07:19
bigjoolsand the whole bootstrap grinds to a halt07:19
rogaren't web services marvellous?07:20
bigjoolsnot today  :)07:20
rogbigjools: BTW how long does it take for you to bootstrap an environment?07:20
rog(usually)07:21
bigjoolsbut I'm still gobsmacked at juju branching off LP at all during deployment/bootstrap :/07:21
rogbigjools: depends on the environment settings, right?07:21
bigjoolsit depends.  I am testing maas, and if I wait for a machine to install, an hour, otherwise it depends on network speed for apt-get update/install etc.07:21
bigjoolsare you talking about juju-origin?07:22
rogbigjools: yeah07:22
bigjoolsI still think that's crazy07:22
rogbigjools: that you should have the option?07:22
bigjoolsthat it tries to check out a branch07:23
rogbigjools: does it do that even when juju-origin is distro or ppa?07:23
bigjoolsis this an artifact of my dev environment, or does it always do that by default?07:23
rogbigjools: i couldn't say i'm afraid07:24
bigjoolslet me rephrase - does it always bzr branch/checkout? or does it try to use other ways to get the code down?07:24
rogbigjools: i was under the impression that if you used juju-origin=distro or ppa that it would just do apt-get07:25
bigjoolsah ok, I didn't know that07:25
bigjoolsso just the default is crazy then :)07:25
rogbigjools: ah, the default depends on your local environment07:26
rogbigjools: i'd forgotten about that.07:26
bigjoolsah, that's what I was getting at07:26
* rog looks at the code07:26
rogbigjools: it looks at the output of apt-cache policy07:27
bigjoolsok07:28
rogbigjools: what does "apt-cache policy juju" produce for you when you run it?07:29
rogbigjools: to be honest, i didn't really understand the motivation behind the code when i ported it to Go. i just copied the semantics and the tests :-) insight into why it's good or bad would be useful...07:30
bigjoolsI really have no idea :/07:31
bigjoolsjuju is not installed locally, anyway, which is why I guess it branches it07:31
wrtpbigjools: no idea what happened there07:36
wrtpbigjools: last thing i saw from you was:07:36
wrtp[07:31] <bigjools> I really have no idea :/07:36
bigjoolswrtp: I just said that juju is not installed from a package07:37
wrtpbigjools: could you paste the exact output of apt-cache policy juju, please?07:37
wrtpbigjools: i'm just looking to see how the code would deal with it - it's got quite a few cases.07:38
bigjoolswrtp: http://pastebin.ubuntu.com/891862/07:38
wrtpbigjools: thanks07:41
wrtpbigjools: yeah, it's the "Installed: (none)" that triggers it07:42
bigjoolswrtp: right07:43
wrtpbigjools: set juju-origin and you'll be fine, i guess07:43
bigjoolswrtp: I am using it to pull my dev branch from LP07:43
bigjoolsbut got caught by that LP bug07:43
wrtpbigjools: the underlying problem is outlined in this email: https://lists.ubuntu.com/archives/juju/2012-March/001337.html07:43
bigjools:)07:43
TheMuerog, fwereade: moin07:50
fwereadeheya wrtp, TheMue, bigjools07:50
wrtpfwereade: yo!07:51
bigjoolshello fwereade07:51
bigjoolswrtp: I found what looks like a bug, can you verify this:07:55
bigjoolsthe user-data contains lines that append to the /etc/init/..conf files07:55
bigjoolsso on multiple boots you end up with it saying to start the provisioning agent etc multiple times07:56
wrtpbigjools: multiple juju bootstraps? or multiple boots of the machine?08:07
wrtpbigjools: BTW it's ironic that the problem i'm currently trying to debug with the Go port is directly to do with the issue you had above...08:08
bigjoolswrtp: heh - each boot uses the cloud-init user-date to append the same stuff to the conf files08:16
wrtpbigjools: hmm. i think juju usually assumes a fresh machine each time.08:17
bigjoolswrtp: not sure why it is using >> in the bash script then08:17
wrtpbigjools: is that in juju or in cloudinit.py ?08:18
bigjoolswrtp: the user-data's runcmd coming via cloudinit08:19
bigjoolsafk for a while08:20
wrtpbigjools: i don't see that, but maybe i'm looking in the wrong place.08:21
wrtpoh yeah, i do now.08:21
wrtpbigjools: i think that might have changed since i did the port. hmm.08:24
wrtpfwereade: you did the upstart stuff, right?08:27
fwereadewrtp, yeah, reading back...08:28
wrtpfwereade: from a test data file (cloud_init_ppa):08:28
wrtp    /var/log/juju, 'cat >> /etc/init/juju-machine-agent.conf <<EOF08:29
wrtpwhy the append?08:29
fwereadewrtp, because, apparently, the crack was strong with me that day, but let me check something08:29
fwereadewrtp, yeah, crack08:30
wrtpbigjools: ^08:30
wrtp:-)08:30
wrtpfwereade: do any of the tests try rebooting the instances?08:31
fwereadewrtp, hm, no08:31
wrtpfwereade: it seems to me that it might be a good idea to try to test this stuff08:31
fwereadewrtp, but rebooting them has AFAICT always worked in the past08:31
wrtpfwereade: lucky :-)08:32
fwereadewrtp, well, it tests that it has behaviour that has been experimentally verified to work, even if that was by sheer luck :/08:32
wrtpfwereade: :-)08:34
fwereadewrtp, it seems that ec2 runs user scripts only one anyway08:34
wrtpfwereade: that would be very sensible08:35
wrtpfwereade: maybe can say it's a bug in MaaS?08:35
fwereadewrtp, nah, my bug08:35
wrtpfwereade: but... isn't it up to the OS to decide whether the run the init scripts or not?08:35
fwereadewrtp, I think of it as being up to cloud-init, and it's not not-a-bug just because ec2 machines happen to be set to run them only once08:36
fwereadewrtp, it was hitherto a latent bug08:36
wrtpfwereade: BTW, looking at juju/providers/common/tests/data/cloud_init_ppa, it looks like the "--session-file" argument is on a different line to the "exec python -m juju.agents.machine" command.08:37
wrtpfwereade: how can that work then?08:37
fwereadewrtp, isn't that yaml being "clever"?08:38
fwereadewrtp, in that section line breaks are represented as pairs of line breaks, AFAICT08:38
wrtpfwereade: quite probably. i haven't found a good explanation of all yaml's cleverness anywhere yet08:39
wrtpfwereade: i'd have thought that stuff inside ' ... ' is treated literally.08:39
wrtpfwereade: jeeze who could ever think of calling it "simple"?08:41
fwereadewrtp, I basically treat yaml as a binary format08:41
fwereadewrtp, it passes through a library that causes it to make sense, somehow, and I'm happy enough with that :p08:41
wrtpfwereade: can we move to json sometime, please?08:41
fwereadewrtp, I have no idea, I think that's one to punt to niemeyer08:42
fwereadewrtp, I don;t know where the original yaml dependency came from08:42
wrtpfwereade: yeah. i think niemeyer likes the indentation-based format.08:42
fwereadewrtp, tbh I'm not *so* bothered by it -- in the paces where users might use it, it's usually pretty clear and obvious08:42
fwereadewrtp, but, yeah, it gets ungainly with our own data08:43
wrtp"In addition, it is only possible to break a long single-quoted line where a space character is surrounded by non-spaces."08:45
wrtpfrom: http://www.yaml.org/spec/1.2/spec.html#id278809708:45
fwereadewrtp, OTOH note that cloud-init is not our code, and that needs to be yaml anyway08:45
wrtpfwereade: yeah, i realise that.08:45
wrtpfwereade: unfortunately.08:45
fwereadewrtp, so this specific example is not really directly relevant to the cause anyway ;)08:46
wrtpfwereade: yeah, it was more of a by-the-by.08:46
wrtpfwereade: i can't see where in that section it says that newlines are removed.08:48
wrtpfwereade: i suppose example 7.9 implies that though.08:49
fwereadewrtp, I'm afraid I have little stomach for a deep dive into the yaml spec, but the way it converted all the \ns to \n\n (in that section only) seems to me to be strong-enough circumstantial evidence for my position ;)08:49
wrtpfwereade: true 'nuff :-)08:50
fwereadewrtp, hmm, I wonder whether deploying machine constraints will break everyone's jujus again :/08:59
wrtpfwereade: it'll probably break the hacks that everyone's been using to get around the lack of machine constraints...08:59
fwereadewrtp, oh, *that* is guaranteed, but *hopefully* people have known they were going away since last year09:00
fwereadewrtp, I'm just worried about the impat on stuff that's already deployed from PPA, at the point it hits the PPA09:00
wrtpfwereade: hmm, yeah. is the zk schema backwardly compatible?09:02
fwereadewrtp, I need to double-check what happens if the constraints key is ever not found... but actually, that should be broken *already* if it's a problem09:03
=== jelmer_ is now known as jelmer
TheMuewrtp, fwereade_: a simple review: https://codereview.appspot.com/584306809:34
TheMuewrtp: btw, why today wrtp and not rog?09:34
fwereade_TheMue, the big question is "what will this be used for?"09:45
TheMuefwereade_: inside WaitAgentAlive() of Unit and Machine, like discussed yesterday with niemeyer09:46
fwereade_TheMue, ok, and we definitely need a timeout for that?09:49
TheMuefwereade_: we should leave waiting users the option to have a timeout (as usual in concurrent and distributed computing). which concerns do you have with timeouts?09:51
fwereade_TheMue, those waiting users being?09:52
fwereade_TheMue, my only concern is that it's code that won't be used, and we don't have enough information to choose a sensible timeout09:53
fwereade_TheMue, the only thinks that wait for agents are command-line tools09:53
fwereade_TheMue, I was under the impression that we deliberately *didn't* time those out so that people can write scripts that bootstrap and just keep going with the next command09:54
fwereade_TheMue, I can understand that there are other plausible scenarios in which we might want to do have a timeout, but they don;t exist yet09:55
fwereade_(sorry mangled english)09:55
TheMuefwereade_: one moment please, doing esta in parallel ;) back in a few seconds09:58
TheMuefwereade_: so, back again, started ESTA for UDS10:23
fwereade_TheMue, ah cool10:23
fwereade_TheMue, anyway it's not bad code; it's not a bad idea; it's just not something we have a reasonable certainty of needing imminently, so I'd prefer it if it just didn't exist10:24
fwereade_TheMue, every line of code is a small weight on each of our brains ;)10:24
TheMuefwereade_: WaitAlive() is the same functionality as it has been before in Unit and Machine. afaik a low level func like this one should provide the possibility for timeouts and it's task of its using code to determine how long it is willing to wait10:25
TheMuefwereade_: no more code duplication later for selects with time.After10:26
TheMuefwereade_: this argument by rog yesterday lead to the reduction to one func10:26
TheMuefwereade_: btw, why no concerns before when the same code has been in Unit and Machine?10:26
fwereade_TheMue, my point is that nobody's going to be doing anything other than "alive, ok = <-watch", without select, regardless10:26
TheMuefwereade_: even you in your tests worked with a timeout10:27
fwereade_TheMue, very specifically, only in the tests10:27
TheMuefwereade_: timeouts and there handlings are essential in distributed and concurrent programming10:27
fwereade_TheMue, by adding this mechanism, you *force* me to choose a timeout10:28
TheMuefwereade_: so you think it's ok that parts of the software block forever until anyone kills the process?10:28
fwereade_TheMue, exactly so10:29
TheMuefwereade_: i could add a -1 for automatic max duration ...10:29
TheMuefwereade_: but again, why hasn't been that question beforin when the same code has been in Agent or later in Unit or Machine?10:29
fwereade_TheMue, that's still forcing me to make a choice, it's just that there's still only 1 meaningful choice10:30
TheMuefwereade_: and btw, nobody is forced to use this function, anyone can use the other functions, they are unchanged10:30
fwereade_TheMue, if I failed to spot and complain about the timeout before, I apologise for my inattention10:31
fwereade_TheMue, sure; and if nobody does use the function, why have it in the first place?10:31
TheMuefwereade_: i will use it, in Unit and Machine10:31
fwereade_TheMue, to do what?10:31
TheMuefwereade_: like discussed yesterday with niemeyer and rog10:32
TheMuefwereade_: wait for an agent10:32
fwereade_TheMue, I'm sorry I missed that; what are the new use cases in which a timeout doesn't break user expectations?10:32
rogpeppeTheMue: out of interest, BTW, what code waits for an agent?10:32
TheMuerogpeppe: would have to look again who is calling it10:33
fwereade_TheMue, it's juju-ssh and juju debug-hooks10:33
fwereade_TheMue, and that's it10:33
rogpeppefwereade_: interesting. why do they wait for an agent? (and which agent do they wait for?)10:34
fwereade_TheMue, in both cases it's the command line and it will break existing behaviour that users expect10:34
TheMuerogpeppe: there are several callers of watch_agent() in todays py code10:34
fwereade_TheMue, I see two10:35
TheMuefwereade_: just counted the search results, they include the tests10:35
fwereade_rogpeppe, ssh waits for the machine agent10:35
rogpeppefwereade_: both of those occurrences just seem to be there to get the ip address of the machine10:36
fwereade_TheMue, yeah; there are 2 non-test uses10:37
fwereade_rogpeppe, I don't entirely agree with it in juju ssh10:37
fwereade_rogpeppe, but that's what was agreed10:37
rogpeppefwereade_: i suppose it's better than polling ec2 for the ip address to appear10:37
fwereade_rogpeppe, but it's needed for debug-hooks because it's utterly meaningless without an active unit agent at the other end effectively forwarding you a session10:38
rogpeppefwereade_: ah10:38
rogpeppeTheMue: the existing watch_agent doesn't seem to have a timeout10:39
TheMuefwereade_, rogpeppe: is anybody of you intereted in continue the agent method implementation (2 x 3 simple methods)? it's real fun, with a lot of principal discussions over different timezones and languages ...10:39
rogpeppe:-)10:40
TheMuerogpeppe: i exactly (!) implemented today behavior as a first draft. and that had to be changed.10:40
fwereade_TheMue, sorry, I am honestly trying to save you work, but I don;t think I'm succeeding :(10:40
rogpeppeTheMue: i seem to remember interfaces and embedding in the first draft :-)10:41
rogpeppeactually, no interface, probably10:42
TheMuerogpeppe: one interface (that only has been to verify that Unit and Machine provide the right methods, else useless) and three simple functions to directly use them10:42
TheMuerogpeppe: embed has already been another way, after also switching to presence10:42
TheMuefwereade_: which work do you wonna save? it's already done and changing it is new work. each time.10:44
TheMuefwereade_: the wish of reducing code duplication (yes, the one with (!) timeout) has been by rog, niemeyer followed it and my only part has been to move it to presence instead of one single func in any of the state files to provide this functionality (waiting with timeout) also for other users in future (afaik there are several more watches).10:47
fwereade_TheMue, I just don't understand why you need a timeout at all10:48
fwereade_TheMue, perhaps one day you will10:48
fwereade_TheMue, but I don't see what it gains us except lines of code10:48
fwereade_TheMue, I clearly failed to adequately express my concerns with the original watcher type10:49
TheMueUnit.WatchHookDebug(), Unit.WatchResolved(), but i don't know if they will base on presence, only some notes of methods that still have to be implemented10:49
fwereade_TheMue, they certainly won't wait on presence10:50
rogpeppeTheMue: i was under the impression that that only agents will be based on the presence package10:50
fwereade_TheMue, they're called by the unit agent itself10:50
fwereade_TheMue, it *knows* it exists ;)10:50
fwereade_rogpeppe, we also have presence nodes for unit relations as I recall10:50
rogpeppefwereade_: interesting.10:51
fwereade_rogpeppe, used to signal active participation in a relation (as distinct from "it may be working now by coincidence, but the unit won't react to changes")10:51
fwereade_TheMue, I seem to recall pointing out the very limited us cases for agent watching before, but perhaps that got lost in the noise10:53
rogpeppefwereade_: is that implied by the unit agent's presence, but duplicated in a different place for convenience? i haven't looked into how this stuff works at all.10:53
fwereade_rogpeppe, unit relations have state, of which "up" and "down" are generally relevant10:53
rogpeppefwereade_: a unit relation can be down when its unit agent is up?10:54
fwereade_rogpeppe, certainly; failed hook?10:54
rogpeppeah, sure10:54
fwereade_rogpeppe, the unit relation state has the last value written by the agent10:55
fwereade_rogpeppe, but if the agent isn't even well enough to maintain its presence node we can be pretty sure that something is rotten in the state of... um, the service10:55
rogpeppefwereade_: i suppose what i'm trying to work out is if the pinger thing is necessary in this case, or whether we can just use zk as usual.10:55
rogpeppeah, ephemeral nodes.10:56
fwereade_rogpeppe, we need some way to know that a remote thing is active10:56
fwereade_rogpeppe, indeed :)10:56
TheMuefwereade_: so the presence package is only for agents?10:56
fwereade_TheMue, no, it's a general replacement for ephemeral nodes, which we have decided aren't worth the trouble10:57
rogpeppefwereade_: if we know the unit agent is alive, doesn't that imply the unit relation is, erm, actively inactive?10:57
TheMuefwereade_: and there is no such usage of ephermeral nodes where the watcher isn't willing to wait endlessly?10:57
fwereade_rogpeppe, if the unit agent is alive we hope/trust that it's also watching its watches10:58
fwereade_TheMue, we don't wait on ephemeral nodes except in the 2 cases I mentioned10:58
rogpeppefwereade_: if the unit agent is dead, we can assume its unit relations are dead?10:59
fwereade_TheMue, and we only directly care about unit relation ephemeral nodes in the context of `juju status` in which we definitely don't want to wait on them10:59
rogpeppei'm speculating wildly. please ignore me.10:59
fwereade_rogpeppe, if it's dead then the service may well still be working correctly underneath11:00
fwereade_rogpeppe, but we can be sure that it won't respond correctly to settings changes etc11:00
fwereade_TheMue, the trouble is that this is not obvious from reading the python code :(11:01
rogpeppefwereade_: i'm trying to think slightly deeper about why we use a pinger. AFAICS it's to signify that there's something active pinging the node. given that (i think) the unit relation node is managed by the same code that manages the unit agent presence node, the activeness of the former could be used to imply the activeness of the latter, perhaps, is what i'm thinking.11:03
rogpeppeoops11:03
TheMuefwereade_: how do those other watches work? what do they watch? the presence of nodes or the change of node contents?11:03
rogpeppeactiveness of the latter could... activeness of the former11:03
fwereade_rogpeppe, I think that's what we already do11:04
rogpeppefwereade_: so perhaps an ephemeral/presence node is unnecessary for the unit relation?11:04
fwereade_TheMue, the only ephemeral watches I am aware of are those in ssh and debug-hooks, which simply wait for the presence of the thing at the other end11:04
fwereade_rogpeppe, hm, that's interesting11:05
TheMuefwereade_: so you say it's ok to wait forever there?11:05
fwereade_rogpeppe, something's knocking at the corner of my mind, gimme a sec11:06
fwereade_TheMue, yes, absolutely11:06
rogpeppefwereade_: i can imagine there might be race conditions that make it difficult11:06
TheMuefwereade_: how shall i test it without blocking the test forever?11:06
fwereade_TheMue, we explicitly got rid of timeouts on command line tools11:06
fwereade_TheMue, you do something like I did in the original presence node tests?11:07
rogpeppeTheMue: what i tend to do is to make sure that it blocks by waiting for a short period, then unblock it and check it unblocks.11:07
TheMuefwereade_: hmm, yes, could do so too, indeed11:08
rogpeppeTheMue: in fact you'll be testing almost exactly the same code...11:09
fwereade_rogpeppe, ServiceRelationState.get_unit_state also checks the ephemeral node, not going to analyse all the callers of that at this stage ;)11:10
rogpeppefwereade_: the unit relation ephemeral node?11:11
fwereade_rogpeppe, yeah11:11
fwereade_rogpeppe, (also we use an ephemeral node to signal presence of an active debug-hooks session, watched by the unit agent; that one should also wait forever)11:11
TheMuefwereade_: still got pain with code waiting for external events endlessly. it disregards the knowledge of > 20 yrs distributed and concurrent systems and caused so much pain.11:11
fwereade_TheMue, it *is* basically the underlying model of juju though11:12
TheMuefwereade_: but maybe indeed it's irrelevant for our system11:12
fwereade_TheMue, I can't actually think of many places where timeouts are appropriate at the juju level11:12
TheMuefwereade_: yeah, seams so11:13
fwereade_TheMue, ZK is keepalive like hell underneath, but from our perspective we can wait forever11:13
rogpeppewe've already got timeouts and retries happening at a low level11:13
fwereade_TheMue, if there are problems we trust ZK to tell us about it11:13
rogpeppefwereade_: yeah11:13
TheMuefwereade_: once again i'm driven by my history where this had been a bad behavior11:13
fwereade_TheMue, yeah, we all carry baggage11:14
fwereade_TheMue, it's hard for me to separate "what the python does" from "how juju should actually do it" ;)11:14
fwereade_TheMue, but I think this is a juju-level property not a code-level one11:14
fwereade_TheMue, if you see what I mean11:14
rogpeppeperhaps the question is not: "should there be a timeout?" but "where should the timeout be?"11:15
TheMuefwereade_: but i would like you to discuss it with niemeyer as he said "yes, implement it in presence" and i don't want to follow know your advise and get another one by him in the evening.11:15
fwereade_TheMue, I think it comes down to "do you trust zookeeper" ;)11:15
fwereade_TheMue, I can totally understand that11:15
rogpeppeTheMue: for now, why not just make a function in the state package, as suggested by niemeyer?11:16
fwereade_TheMue, niemeyer has firm opinions and the final say and while they are not arbitrary or capricious they are hard to predict with 100% accuracy11:16
TheMuerogpeppe: the one in presence as an own branch is by niemeyer11:16
TheMuerogpeppe: we talked about it yesterday when you stepped out11:17
TheMuerogpeppe: the problems of time zones ;)11:17
fwereade_TheMue, did he implement it because he knew it was needed himself, or as an alternative to your watcher proposal?11:17
fwereade_TheMue, on the assumption that it *was* a necessary feature11:17
TheMuerogpeppe: btw, now rogpeppe, this morning wrtp, yesterday rog. whay this?11:17
fwereade_TheMue, I always assumed it was low-level psyops to contribute to an aura of glamorous mystery11:18
rogpeppeTheMue: better than rogpeppe_ and rogpeppe__, i thought11:18
rogpeppefwereade_: that too11:18
TheMuefwereade_: no, the implementation is by me. niemeyer and i talked about moving it to presence yesterday.11:18
TheMuerogpeppe: and why not only one?11:19
rogpeppeTheMue: because my irc client sometimes reconnects when the irc server already thinks i'm connected, so it has to choose a different one11:19
TheMuerogpeppe: ic11:19
TheMuerogpeppe: thankfully it's seldom here. and short time after reconnect i get my nick back automatically11:20
TheMuerogpeppe: irc is sometimes really unstable, yep11:20
rogpeppefwereade_: i think we should let presence.WaitAlive through, despite misgivings. it can go later if we find it's never used.11:21
fwereade_rogpeppe, TheMue: I'm ok with that11:21
TheMuerog, fwereade_: hehe, and when i use it in Unit and Machine i'll get the next comments by you? *LOL*11:23
rogpeppeTheMue: i'll be interested to see what value you choose for the timeout :-)11:23
fwereade_TheMue, well, yes, because IMO if you're using timeouts by default then you're using ZK wrong11:23
* TheMue again wonders why the timeout hasn't been a topic before?11:24
fwereade_TheMue, and I don't think you can come up with a timeout value that is a 100% accurate indicator of "something is wrong" as opposed to "ec2 is taking ages, whaddayagonnado?"11:24
fwereade_TheMue, I'm pretty certain I have already talked with you about the very small set of clients for watch_agent, and their very limited use cases11:25
TheMuefwereade_: but not about the timeout11:25
rogpeppeyeah, i don't think we've really discussed timeouts before11:26
fwereade_TheMue, no; when I said that it was all unnecessary, and the presence package covers all the use cases already, that's what I meant11:26
TheMuefwereade_: i'm not good in intention reading ;)11:26
fwereade_TheMue, nor does it have a custom memory allocator, because it doesn't need that either ;p11:27
fwereade_TheMue, I think I honestly did try to direct you to the places you needed to see to understand the use cases11:27
TheMuefwereade_: indeed11:28
fwereade_TheMue, the fact that all user interactions must block forever is I suspect the crucial bit of floating context that you were missing11:28
fwereade_TheMue, I know that only because I was around when it was discussed and changed to that behaviour11:29
fwereade_TheMue, let's put it this way11:30
TheMuefwereade_: exactly, this "block forever" is hard to get. i know different behaviors from those systems i've done in the past.11:30
fwereade_TheMue, sorry I forgot what I was going to say11:30
TheMuefwereade_: hehe11:31
fwereade_TheMue, really it comes down to trusting ZK11:31
fwereade_TheMue, if we do, we should generally assume that the events will land according to ZK's limited guaranteed; and if we don't, we should panic11:31
TheMuefwereade_: a software written in java? no, never! *scnr*11:32
fwereade_TheMue, haha :)11:32
* TheMue has done JEE for > 7 yrs, it has been no (!) good time11:33
fwereade_TheMue, I can imagine11:33
fwereade_rogpeppe, TheMue: lunchtime :)11:34
TheMuefwereade_: enjoy11:34
rogpeppefwereade_: likewise11:34
rogpeppeTheMue, fwereade_: i'm off to get the train down to london now. will probably be incommunicado tomorrow morning too. see you tomorrow!12:42
TheMuerogpeppe: have fun, take videos, copy slides, publish everything. ;)12:43
hazmatfwereade_, ping13:14
hazmatfwereade_, i'm a little concerned about the ambiguity around the ec2-instance-type arch.. esp as its the most common way people will use constraints13:14
hazmaton ec213:14
fwereade_hazmat, pong13:15
fwereade_hazmat, there was a quiet discussion on the lists about how a 64-bit default would be a good idea, let's do it13:15
niemeyerMornings13:16
fwereade_hazmat, is there some other ambiguity?13:16
hazmatniemeyer, mornings13:16
fwereade_heya niemeyer13:16
hazmatfwereade_, ah right, yeah.. given 64bit on all types its not much of a concern13:18
fwereade_hazmat, I can accept that it's a bit annoying not to be able to type "ec2-instance-type=m1.medium arch=i386"13:18
fwereade_hazmat, but there are what, 4 arch-choice types13:18
fwereade_hazmat, t1.micro is a bit of a joke really13:19
fwereade_hazmat, m1.small is the default in many people's minds, and is what you get if you specify bare "arch=i386"13:19
fwereade_hazmat, I think the proportion of our user base who specifically need 32-bit m1.medium, c1.medium, and t1.micro may have to bear it13:20
hazmatfwereade_, fair enough13:21
fwereade_hazmat, in fact, they get to experience the awesome productivity shortcut of typing fewer characters in total! ("ec2-instance-type" is a bit of a mouthful...)13:21
fwereade_hazmat, niemeyer: of more concern is the HVM image thing13:22
niemeyerfwereade_: hm?13:23
hazmatfwereade_, people will choose the explicit when possible13:23
hazmatfwereade_ are there ubuntu images for hvm?13:23
fwereade_hazmat, niemeyer: I misread the EC2 information back in the day and somehow got the impression that hvm images were a nice improvement on cluster machines, not a hard requirement13:23
fwereade_hazmat, I think so, just a mo13:23
fwereade_hazmat, http://uec-images.ubuntu.com/query/oneiric/server/released.current.txt13:24
fwereade_hazmat, oneiricserverrelease20120222ebsamd64us-east-1ami-beba68d7hvm13:24
hazmatfwereade_, it looks like its only in us-east-113:25
fwereade_hazmat, just the one, but hopefully that's good enough13:25
fwereade_hazmat, cluster instances are only available in us-east-113:26
fwereade_hazmat, it does mean that get_image_id and get_instance_type are no longer independent, but they were already slightly uncomfortably linked13:28
fwereade_hazmat, if I manage to do that quickly this afternoon, would you try to review it in your afternoon?13:29
hazmatfwereade_,  cool, that makes more sense then, as for the image_id/instance_type.. that's fine.13:29
fwereade_hazmat, I also wanted to ask about your branch13:29
hazmatfwereade_, i could, but i'm wondering if its worth the trouble13:29
hazmatre cc larges13:30
hazmatit would be nice i guess13:30
fwereade_hazmat, I'd rather spend a couple of hours to make it not be *guaranteed* to break13:30
fwereade_hazmat, it's just way too shoddy to put up with13:30
hazmatfwereade_, if your up for it, it would be nice to round out our ec2 constraints support with  the support for the biggest baddest vm on the block ;-)13:30
fwereade_hazmat, exactly :)13:30
fwereade_hazmat, I already need to check that field anyway, so I don't *accidentally* get an hvm image that won't work with normal instances ;)13:31
niemeyerjimbaker: You've got a review on https://codereview.appspot.com/583604913:31
hazmatfwereade_, the series from charm branch?13:31
fwereade_hazmat, yeah13:32
fwereade_hazmat, just let me find it13:32
hazmatfwereade_, https://codereview.appspot.com/5845073/13:32
fwereade_hazmat, AFAICT it's a no-op13:32
niemeyerfwereade_: So, what's the deal about 386 vs. amd64?13:32
fwereade_niemeyer, you can choose arch on more instance types than just t1.micro now13:33
hazmatfwereade_, hmm. yeah. with_constraints returns the new constraint..13:33
fwereade_niemeyer, for that, I was already defaulting to 64-bit13:33
niemeyerfwereade_: Ok, so you were just wondering if it was fine to default to amd64 to all?13:33
fwereade_hazmat, and the series certainly is baked into the constraints; it's just happens not to be done at that point13:33
fwereade_niemeyer, I think we discussed that on the lists; seemed to get a muted "yeah, sounds good" sort of response13:34
niemeyerfwereade_: Yeah, it certainly sounds good to me13:34
niemeyerfwereade_: Is there any other contentious point?13:34
fwereade_niemeyer, it's a little ungainly specifying 32-bit instances other than m1.smalls, but I think that's an acceptable price13:34
fwereade_niemeyer, you'd have to ask for "arch=i386 cpu=5" instead of "arch=i386 instance-type=c1.medium"13:36
hazmatfwereade_, hmm.. ic, its part of the service state api13:36
fwereade_s/instance-type/ec2-instance-type/13:36
hazmatk, i'll yank that branch13:36
niemeyerfwereade_: Uh, why?13:36
fwereade_hazmat, yeah: the service knows the charm, the series isn't *important* until you've got an actual unit that needs to find a machine13:36
fwereade_niemeyer, because of the overlapping behaviour that I think we agreed on13:37
fwereade_niemeyer, I could just as easily decouple arch from ec2-instance-type13:37
fwereade_niemeyer, but that opens us up to "arch=i386 ec2-instance-type=c1.xlarge" which is still a nonsensical request13:38
niemeyerfwereade_: If there is a possibility of selecting the architecture, it means that there isn't an overlap13:38
fwereade_niemeyer, true13:39
fwereade_niemeyer, yep, that's what I should do then13:39
fwereade_niemeyer, when it was just t1.micro defaulting to 64-bit seemed to be sensible13:40
niemeyerfwereade_: Yeah, t1.micro is ridiculous enough that it doesn't matter much indeed13:40
niemeyerfwereade_: Please feel free to not fix it now, though13:40
niemeyerfwereade_: I don't know what stage this is, and it feels like a minor bug that could be filed on Launchpad and wait until someone has time to fix it13:41
fwereade_niemeyer, it should be no more than 2 lines of core code to fix the conflict and make arch explicitly default to amd6413:41
niemeyerfwereade_: Ah, ok, sounds less painful than filing a bug even! :)13:41
fwereade_niemeyer, there will be tests to fix ofc, but it really will be cheap :)13:42
fwereade_niemeyer, and it's a semantic bug really which always feels worse to let out into the wild13:42
niemeyerfwereade_: Superb13:42
fwereade_niemeyer, the other question that perhaps you missed is about cluster instances13:43
niemeyerfwereade_: Yeah, I saw it, but it wasn't clear to me what it was about13:43
fwereade_niemeyer, cluster instances require HVM images13:44
niemeyerfwereade_: They need a specific image, I suppose13:44
fwereade_niemeyer, I mistakenly though it was an option rather than a requirement13:44
fwereade_niemeyer, it's not quite so cheap to fix but it's still less than an afternoon's work all told13:44
niemeyerfwereade_: That's going to be interesting.. it means that getting an image has to cross over the charm URL and the constraints13:45
niemeyerfwereade_: I'd not fix that now..13:45
fwereade_niemeyer, they're combined at unit deploy time by the service13:45
fwereade_niemeyer, series is a constraint, just not one exposed directly to the user, because we know we'll get it from the charm13:45
niemeyerfwereade_: We can wait until someone reports a bug about it.. I suspect it will take a while, perhaps enough for it to be fixed already13:46
fwereade_niemeyer, I'm not really confident pressing too far on with the go code until I have a couple of reviews for the early steps... can I please take a little time to polish it up until I have a couple of reviews in?13:49
niemeyerfwereade_: Sure.. I was actually going to ask you about these branches in review13:50
fwereade_niemeyer, cool13:50
niemeyerfwereade_: In one of them you seem to have reverted back to kill the whole idea?13:50
fwereade_niemeyer, the hook context one?13:50
niemeyerfwereade_: Yeah13:50
fwereade_niemeyer, it became clear after sleeping on our discussion that it is *not* a *hook* context13:51
fwereade_niemeyer, it's a tool context13:51
fwereade_niemeyer, which happens to be useful for hooks13:51
fwereade_niemeyer, the hook package itself became unjustifiable; but it fits really nicely IMO as one of a number of things in a cmd/jujuc/server package13:51
fwereade_niemeyer, jujuc is the dumb main.main that calls the server package13:52
fwereade_niemeyer, the server package contains the various tools implemented as Command, and the RPC server that executes them13:52
fwereade_niemeyer, it's a good place for that particular set of, er, synergistic functionality13:53
niemeyerfwereade_: Sounds reasonable13:53
fwereade_niemeyer, and it'll be as convenient as anything to use once we have an agent far enough along that it's *actually* needing to run hooks13:53
niemeyerfwereade_: +113:53
fwereade_niemeyer, sweet13:53
fwereade_niemeyer, well, that's the point of that change :)13:53
niemeyerfwereade_: Cool, thanks13:54
fwereade_niemeyer, as for the others: go-add-cmd-context you looked at once and seemed generally happy with, but there may be contention on FlagSet output13:55
fwereade_niemeyer, go-tweak-supercommand sits on that and is pretty trivial really13:55
fwereade_niemeyer, and go-testing-charm has I think been fixed13:56
niemeyerfwereade_: I'll have a pass at it again as my next task right away13:56
fwereade_niemeyer, awesome, thanks13:57
niemeyerfwereade_: np, sorry for the delay.. I should probably have reviewed these before other things I've been reviewing13:57
niemeyerfwereade_: Btw, this change probably deserves a better summary13:58
fwereade_niemeyer, it's ok, I haven't been entirely blocked, there's always something worth my time :)13:58
niemeyerfwereade_: It's saying "remove hook package", but it's actually pushing things forward13:58
fwereade_niemeyer, haha, I hope the second part is generally assumed to be my intent ;)13:59
niemeyerfwereade_: right :)14:00
jimbakerniemeyer, thanks14:11
rogpeppeniemeyer: morning!14:11
niemeyerrogpeppe: Heya14:11
niemeyerjimbaker: np14:11
rogpeppehmm, not surprising the connection is unreliable on the train...14:13
niemeyerfwereade_: I see I didn't actually review the test files in the branch. I'll have a quick pass at that post LGTM14:14
niemeyerfwereade_: Cool, delivered14:16
niemeyerfwereade_: LGTM still14:16
niemeyerfwereade_: Thanks for the ride through that branch :)14:17
fwereade_niemeyer, excellent14:17
fwereade_niemeyer, and thank you, it was fun :)14:17
rogpeppelaunching ec2 instances from on the train feels kinda cool14:25
TheMuefwereade_: splitted the WaitAlive() into two variants. so i can use the one w/o timeout in Unit and Machine but we keep the one with timeout. if it never will be used it can be removed.14:25
fwereade_TheMue, cool14:37
niemeyerTheMue: Wait, what?14:55
niemeyerTheMue: Why do we need two variants?14:55
niemeyerTheMue: You actually needed the timeout, right?14:56
niemeyerTheMue: There's no other use of this logic today.. if we're finding it incorrect, let's fix the one we have14:57
TheMueniemeyer: this morning fwereade_ convinced me that we don't need a timeout14:58
niemeyerTheMue: Ok, I'm not disputing either way.. let's just make the one function we have suite our use case14:59
TheMueniemeyer: yes, i've done it that way and currently integrate it. so it behaves as we have it today.15:00
niemeyerTheMue: Awesome, thanks15:00
TheMueniemeyer: additionally i kept a version with timeout based on the one w/o. so if we once will use presence nodes and need timeouts we have it. i could also remove it, but i think the situation will come. ;)15:02
niemeyerTheMue: Reading it..15:02
niemeyerTheMue: Reviewed15:09
niemeyerTheMue: Let's drop the function we don't need.. we can merge the timeout on WaitAlive itself when we need it (if ever)15:09
TheMueniemeyer: ok *sigh*15:10
niemeyerTheMue: Yeah, sorry for coercing you into removing code you don't need..15:10
TheMueniemeyer: it's just that i still have problem with systems waiting endlessly. that's based on my experiences with other systems. it indeed seems to be ok for juju, but it will last until i'm feeling happy with it15:12
niemeyerTheMue: That's a different problem that I'm not trying to convince you on15:12
niemeyerTheMue: I'm happy to have a timeout, and it may indeed be the best thing.. maybe even a timeout by default?15:13
niemeyerTheMue: We'll see when we actually have to use this function15:13
TheMueniemeyer: the test for WaitAlive() is hidden in WaitAliveTimeout(). so now i'll move that internal code into the test. ;)15:13
niemeyerTheMue: What I'm unhappy about is having two functions just because we have no idea about what we need15:13
niemeyerTheMue: Heh15:13
niemeyerTheMue: and now you know in practice why "hidden tests" are a bad idea15:14
TheMueniemeyer: yes, you're right, having timeouts also needs to know what to do if a timeout happens15:14
niemeyerTheMue: I'm not against having a timeout.. we'll probably have to add one soon enough.15:16
TheMueniemeyer: but you also say that today the callers of WaitAlive() can wait endlessly?15:18
niemeyerTheMue: In fact, I start to wonder whether WaitAlive is even a good plan15:18
TheMueniemeyer: oh15:19
niemeyerTheMue: When are we going to be calling those methods on unit and machine?15:19
niemeyerTheMue: Have you had a look on the current code base to get an ide?15:19
niemeyeridea15:19
TheMueniemeyer: we found two places15:20
TheMueniemeyer: in juju.control.debug_hooks.py and juju.control.ssh15:21
TheMueniemeyer: debug_hooks waits in a while 1 loop15:23
niemeyerTheMue: Looking15:23
TheMueniemeyer: and ssh.y waits for the watch15:23
niemeyerTheMue: Why don't we just expose the results of AliveW?15:26
niemeyerTheMue: WatchAgentAlive(...) { return presence.AliveW(...) }15:27
TheMueniemeyer: can do so, yes, it's only a bit more inconvenient for the caller (handling err, alive and watch, like it is now done in one function)15:28
niemeyerTheMue: Ok.. I'm happy either wya15:28
niemeyerway15:28
niemeyerTheMue: Let's just not sprawl functions we have no use case for15:29
TheMueniemeyer: ok15:30
TheMueniemeyer: would only have expected to have no function w/o timeout. too much erlang influence, where receive has a timeout statement. ;)15:31
* wrtp thought that unit, machine, etc could just return a string path to the agent presence node. then clients could use presence on it to their hearts' desire.15:32
wrtpbut that's probably a bit subversive to say at this point, sorry.15:32
TheMuewrtp: shut up, you're sitting in a train, no proper working place. *rofl*15:33
TheMuewrtp: seems your connection is too good ;)15:33
TheMuewrtp: but as long as we're moving inside the state package it's no problem to get the path, yes. we have zkAgentPath() for it15:35
TheMuewrtp: only for callers outside of state it would be difficult. we would have to make the function public.15:36
wrtplol15:36
wrtpTheMue: that was my thought15:36
niemeyerTheMue: Let's have a timeout there then..15:36
niemeyerTheMue: Sounds totally reasonable..15:36
niemeyerTheMue: WaitAlive(conn, path, timeout)15:37
niemeyerTheMue: The only thing I've been saying is that we have to decide what we want, and go with it. Having multiple functions that have no use case is no good.15:38
TheMueniemeyer: totally agree15:38
* TheMue hugs niemeyer for the timeout variant ;)15:39
niemeyer:)15:40
hazmatbcsaller1, ping15:41
wrtpi wondered why environs tests were taking 45s to run locally - then realised that FindImageSpec does a http request on uec-images.ubuntu.com several times. perhaps i should allow tests to run connectionless somehow.15:55
hazmatbcsaller1, whenever your up... looks like a late night. i was checking out the subordinates15:59
niemeyerwrtp: Definitely16:05
niemeyerwrtp: The Python tests have a dump of the content locally to check that out16:05
* niemeyer => lunch!16:06
hazmatbcsaller1, there's a missing yield in process new relations around subordinate deploy.. and the other problem is that it looks like it ends  up attempting nested containers16:06
hazmatbecause unit deployer picks up the provider type local and tries to use containers, which we don't want for subordinates16:07
fwereade_allenap, ping16:10
wrtpniemeyer: we also have a local dump of the context to allow that (it's used in to test FindImageSpec), but i'm not quite sure what the best way to divert image requests only. perhaps have a variable holding the address of the image server, and change it for local tests. seems a pity to pollute the real code for testing, but maybe worth it.16:10
wrtps/in to/to/16:10
hazmatbcsaller1, nevermind, it is picking up the right deployer16:14
fwereade_hazmat, niemeyer: SpamapS makes a good point on the lists about default-instance-type, default-image-id16:15
fwereade_hazmat, niemeyer: namely that changing them at this stage will justifiably piss people off16:16
fwereade_hazmat, niemeyer: and that we should probably allow them but print deprecation warnings :((16:16
fwereade_hazmat, niemeyer: thoughts?16:16
hazmatfwereade_, he does make a good point, deprecation warnings, and using them still sounds reasonable, but it seems  less of a debt to see that folded into defaults for environment constraints rather than remaining as global overrides16:18
fwereade_hazmat, crap, good point, we *have* to break environments.yaml anyway16:18
fwereade_hazmat, so we should at least make sure to do it only once16:18
fwereade_hazmat, when can we expect that change?16:19
hazmatfwereade_, not sure16:19
hazmati wish i could get out of my juju talk this evening16:20
hazmati'll push forward on the specs16:20
hazmatfwereade_, btw thanks for the reviews16:20
fwereade_hazmat, cool, thanks16:20
fwereade_hazmat, a pleasure, I still haven't really figured out what I'm thinking re GC16:20
fwereade_hazmat, I have an incoherent draft email gathering dust :/16:21
hazmatfwereade_, for the most part its just watching the topology and cleaning out non referenced things, along with recorded actions that have been completed16:21
fwereade_hazmat, yeah, but the devil is in the details16:21
hazmatfwereade_, always :-)16:22
fwereade_hazmat, unit relations still referenced by other units need to survive until all other units have acked their departure16:22
hazmatfwereade_, speaking of which if you have a moment, i was hoping to get a review of https://code.launchpad.net/~hazmat/juju/scheduler-peek-list/+merge/98104 from either you or niemeyer.. i basically rewrote the relation hook scheduler, its much simpler and more robust now16:22
fwereade_hazmat, otherwise we'll get missing data in relation hooks on the other units16:23
hazmatand much better tested16:23
fwereade_hazmat, cool, I'll take a look16:23
hazmatfwereade_, yeah.. relations don't cleaned up  till their not referenced in the topology, which basically happens when either service endpoint leaves the rel16:24
hazmatfwereade_, thanks16:24
hazmatbcsaller1, can a subordinate open a port on the container?16:33
SpamapSHey, I was trying to look at how maas calculates unit addresses, but I don't see it listed in juju.unit.address.get_unit_address .. also.. that if/elif chain seems really wrong.. this method should be moved to the providers themselves.16:34
fwereade_hazmat, https://codereview.appspot.com/584108016:35
bcsaller1hazmat: they return information from their container when asked about networking. There might need to be a change in expose but really its the container service doing the work at that point. I hadn't considered if the implications of that are confusing in status wrt expose16:37
fwereade_SpamapS, concur16:37
=== bcsaller1 is now known as bcsaller
fwereade_hazmat, re unit relations: surely "leave them until the service endpoint dies" is problematic... it means that the number of unit relations for a long-lived service with lots of occasional clients will just grow and grow forever16:39
hazmatfwereade_, why? if the client leaves the relation is broken16:40
hazmateither the relation exists between the services, or it doesn't if one side leaves it in which case it can be gc'd16:41
fwereade_hazmat, but it can't, can it? not until we know that the units on the other side have run the appropriate broken/departed hooks16:42
fwereade_hazmat, or possibly have been GCed themselves16:42
fwereade_hazmat, if we don't wait for that we'll have hooks falling over needlessly16:42
fwereade_hazmat, maybe it doesn't matter if we know they're going to die soon anyway16:43
fwereade_hazmat, but I don't like the idea that a hook could fail to get state that exists according to its view of the world (ie the output of relation-list)16:44
fwereade_hazmat, actually, wait, I'm going to WIP that and reinstate the default-image-id and default-instance-type stuff16:47
wrtpright, train just arriving.16:48
fwereade_hazmat, SpamapS: wait again, I think we should talk about that16:48
fwereade_hazmat, SpamapS: actually, no, I think it's OK: we can just do a deprecation warning saying that using this field silently stomps over any constraints you may set at a later date16:50
hazmatfwereade_, they'll need to remove their presence nodes in the rel17:09
hazmatempty role containers is probably the the threshold for gc17:09
fwereade_hazmat, something will need to but if an agent itself becomes unresponsive before removal then it won't be able to clean out its own presence nodes17:10
fwereade_hazmat, sorry not presence nodes17:10
hazmatfwereade_, then it session ephemeral/pinger will expire naturally17:10
fwereade_hazmat, I misspoke, presence nodes are not relevant17:11
fwereade_hazmat, I meant settings nodes17:11
fwereade_hazmat, that's what other units may still be looking at for an arbitrarily long time17:11
fwereade_hazmat, we can't delete those until we know nobody will be looking at them any more17:12
fwereade_hazmat, a working agent can register disinterest itself, that's fine17:12
hazmatfwereade_, the settings nodes are gc material not coordination.. the presence nodes are for coordination, if their dead and the rel has been marked remove..17:12
fwereade_hazmat, a GC/machine agent that cleans up after a unit that won't die cleanly will have to clear those out itself17:13
fwereade_hazmat, the trouble is that a lack of presence doesn't *necessarily* imply that the unit agent won't come back, does it?17:13
fwereade_hazmat, ...but if it also doesn't exist in the topology it's a pretty safe bet17:14
fwereade_hazmat, hmm17:14
fwereade_hazmat, and it's easy if the rel has been marked for removal17:14
hazmatfwereade_, two scenarios, clean exit from unit relation by running unit, stalled exit by badly behaving unit, the former is easy17:14
niemeyerTheMue: LGTM, thanks!17:14
fwereade_hazmat, agreed17:15
hazmatso on the latter what does it see when it eventually comes alive.. or is terminated17:15
hazmatwe could have clean stop kill the settings node17:15
hazmatand then only gc the structure, and on bad unit, its broken rel context has the required local data17:16
hazmatfrom its settings node17:16
fwereade_hazmat, but surely other units can be behind? ie still running a relation-changed hook that requires that unit's settings17:16
fwereade_hazmat, if the unit clears out its own nodes it still has to wait for other units17:17
fwereade_hazmat, regardless of where the functionality lies, *something* has to keep unit relation settings around until it's safe to delete them17:17
hazmatfwereade_, true it has to wait for the stop of its execution17:17
fwereade_hazmat, no, *other* units' hook execution17:18
hazmater. hook17:18
hazmatfwereade_, ick ;-)17:18
fwereade_hazmat, that's the problem17:18
hazmatfwereade_, we can start with a much more pessimistic base version of gc17:19
hazmatie both endpoint services removed17:19
hazmatand by removed, i mean the services are destroyed17:19
fwereade_hazmat, true enough, nobody said this has to be the final version17:19
fwereade_hazmat, yeah, service destruction is easy17:20
fwereade_hazmat, I'm much more comfortable ignoring potential hook errors when *everything* related is going down as well ;)17:20
fwereade_SpamapS, hazmat, niemeyer: this is interesting, default-ami doesn't work17:23
fwereade_SpamapS, hazmat, niemeyer: this implies to me that we can basically just drop it without warning17:24
fwereade_SpamapS, hazmat, niemeyer: the ec2 provider was looking for default-image-id instead, and therefore never finding it :/17:25
niemeyerfwereade_: Nice, that makes things easier :)17:26
fwereade_niemeyer, ok, I'll just drop default-ami17:27
fwereade_niemeyer, is there a specific way I should be doing a deprecation warning?17:27
fwereade_niemeyer, I thought we had some already but I can't find them17:27
SpamapSfwereade_: default-ami is not valid. its always been default-image-id17:29
niemeyerfwereade_: If it doesn't work, there's no need to warn :)17:32
fwereade_SpamapS, are you 100% sure? it was default-ami in config back in july, and it's default-ami now17:34
fwereade_SpamapS, I suspect that in fact nobody has ever used it17:34
hazmatfwereade_, people have definitely used it, but with default-image-id17:34
hazmatfwereade_, that's whats documented17:35
fwereade_SpamapS, niemeyer: oh, wait: does config accept other random keys? if it does that would make sense17:35
hazmatfwereade_, it does the validation is only against known keys17:35
niemeyerfwereade_: default-image-id was always a hack, though.. I'd just send a polite message to the list pointing that the hack is being replaced by the real implementation17:36
hazmatfwereade_, SpamapS has a branch out there to fix it, but i think the notion was that it was going away...17:36
hazmatthe schema validation of it that is17:36
fwereade_hazmat, niemeyer, SpamapS: I see, thank you all17:36
fwereade_niemeyer, SpamapS's point is that dumping *any* enforced change on our users at this point is a bad thing17:37
fwereade_niemeyer, and I think he's right; and furthermore, the global env change will change environments.yaml as well17:38
fwereade_niemeyer, I don't think it's sensible to land this and then make everyone change *again* imminently17:39
niemeyerfwereade_: This isn't just "any change". This is a bug in juju that I've been pointing out since that flag was first introduced.17:39
niemeyerfwereade_: It *has* to be fixed, or juju won't work17:39
fwereade_niemeyer, yes: however much people hate it, we *must* land at least one environments.yaml change17:40
niemeyerAlso, https://code.launchpad.net/~clint-fewbar/juju/remove-default-ami/+merge/7127817:41
fwereade_niemeyer, however, dumping *multiple* changes, over a few days, at this stage in the cycle, is surely going to lead to calls for our heads on pikes etc17:41
niemeyerThis was the fix to the schema, that never landed..17:41
fwereade_niemeyer, the fix to the schema that was deferred until we had a replacement -- constraints -- which then sat in review purgatory for, what, 2 months17:42
niemeyerfwereade_: I don't have anything positive to say about that, I'm afraid..17:43
fwereade_niemeyer, in hindsight I should clearly have been making a big stink about it myself17:43
fwereade_niemeyer, but, well, I didn't :(. and we have to deal with the situation as we find it in addition to figuring out how we can avoid blundering into it again in the future17:45
fwereade_niemeyer, I think that we have left it too late to break them, but we can discourage their usage by complaining every time they're used17:46
fwereade_niemeyer, however... if I apply this same argument to the global-settings change... I get worried17:48
fwereade_niemeyer, because I think it *does* still apply17:48
niemeyerfwereade_: default-image-id must die, now.. there's no way to implement support for multiple distributions while supporting it17:49
fwereade_niemeyer, with default-series as well, you get something close enough: you just have to make sure the series matches the image17:50
fwereade_niemeyer, if I could be sure nobody was using it I would 100% agree17:51
niemeyerfwereade_: If someone is depending on it, they'll need to tell us, and we'll need to find something else17:51
niemeyerfwereade_: default-image-id can't be supported as it is.17:51
niemeyerfwereade_: We can talk about how to avoid introducing changes late in the cycle, we can talk about how to prioritize tasks appropriately over a cycle, but these are unrelated to the simple fact this has to be fixed.17:52
fwereade_niemeyer, would it not be sensible to add a deprecation warning and kill it for 12.04.1?17:53
fwereade_niemeyer, which robbiew tells me is the actual target for constraints17:54
niemeyerfwereade_: It's not about constraints.. it's about series. If I deploy cs:~fwereade/precise/foobar17:54
niemeyerfwereade_: foobar is in precise17:54
niemeyerfwereade_: default-image-id is broken, and has always been. I was against supporting it at first, and now it's time to kill it and implement the real thing.17:54
fwereade_niemeyer, I have no argument against "it must be fixed", and "it must die"17:55
niemeyerfwereade_: At least you haven't brought them up so far. :)17:55
niemeyerfwereade_: I can buy aguments like "we need to allow people to customize image selection"17:55
niemeyerfwereade_: I can't take "let's continue supporting default-image-id" as it is, because it's a relevant bug to have it.17:56
fwereade_niemeyer, nah, I *really* don't want anyone doing that17:56
fwereade_niemeyer, I'm suggesting we leave it in with a suitable dire deprecation warning now, and actually remove it in the release 3 months down the line17:57
niemeyerfwereade_: default-image-id = "ami-for-oneiric", deploy cs:~fwereade/precise/foobar, BOOM!17:57
fwereade_niemeyer, yeah, and that's bad, no argument17:57
hazmatwell more than a deprecation warning17:57
hazmatwe can issue the error message with the solution, and just refuse to honor it17:58
robbiew+117:58
fwereade_hazmat: so, a specific check during env parsing for those 2 keys, and immediate explosion saying "you must now use constraints"17:58
niemeyerfwereade_: That's my opinion, though. To be honest, this will bite hazmat and SpamapS more than me, so I'm leaving it up for them to decide what's the best approach.17:59
hazmatfwereade_, i'd keep it context relevant, to deploy/add-unit17:59
fwereade_hazmat, hmm, ok18:00
hazmathmm.. and bootstrap18:00
fwereade_hazmat, tbh it still feels like too sudden and painful a change for anyone depending on it18:00
niemeyerhazmat: That's a brand new feature being developed. -1 on deciding/supporting that at this stage in the cycle.18:00
hazmatfwereade_, i don't think people are depending on it.. it was mostly used a hack for lack of arch support18:01
fwereade_hazmat, if we can be reasonably sure that's the case, then great18:01
fwereade_hazmat, in that case I'm happy saying "tough, use constraints"18:02
hazmatfwereade_, default-instance-type is much more commonly used18:02
niemeyerThere's a collection of opinions on this here:18:03
niemeyerhttps://bugs.launchpad.net/juju/+bug/83099518:03
fwereade_hazmat, but I kinda feel it should happen at env-parsing time18:03
niemeyer"default-instance-type and default-image-id are inadequate" - 2011-08-2218:03
fwereade_hazmat, if we're going to force immediate action then we should probably just force immediate action18:03
niemeyer"I currently use this in development and testing of formulas in different ubuntu versions"18:04
niemeyer-- Juan18:04
niemeyer"Often need stacks with different types of machines deployed for different services."18:04
niemeyer-- Mark18:04
hazmatfwereade_, i'm just thinking about the existin environments, nothing to do unless you actually conflict18:04
hazmatall of which are addressed by constraints18:04
niemeyeretc .. etc..18:04
niemeyerRight.. they're all claiming for constraints.18:05
niemeyerAnd series selection18:05
niemeyerWhich is done via charm URL18:05
niemeyerWe're giving them what they want, not what they've asked for..18:05
hazmatfwereade_, but parse time is fine.18:05
fwereade_hazmat, OK18:06
fwereade_hazmat, now wrt to braking environments.yaml *twice*18:06
fwereade_breaking18:06
fwereade_hazmat, I really think we should avoid doing that18:06
fwereade_hazmat, so I think we have to coordinate it with the env-settings change18:07
hazmatfwereade_, the env-settings change is mostly relating to storage and syncing.. with the removal of image-id, instance-type, and default-series.. what remains in env.yaml is probably the same18:08
hazmatie. its a behavioral change not a structural config change18:09
hazmatalthough one more env.yaml that comes to mind is apt-proxy18:09
hazmaturl18:09
fwereade_hazmat, IIRC niemeyer was keen that we remove non-access settings from env.yaml18:10
hazmatfwereade_, apt-proxy-url is an access setting18:10
hazmatin envs that need it, we can't even bootstrap correctly without18:11
fwereade_hazmat, huh, seems you're right, I was sure we had more stuff gunking it up18:12
fwereade_hazmat, I wasn't thnking of apt-proxy, I was thinking we had other non-access ones in there18:12
fwereade_hazmat, OK, so we can be sure this is the *only* breaking change we can know we'll need to introduce?18:13
hazmathmm..18:14
hazmatthe subordinates increment the topology version but there's a transparent migration there18:14
fwereade_hazmat, apart from anything else the env change will change zookeeper storage so people will have to bounce the whole env to upgrade18:14
hazmathmm actually that won't be meanigful unless the cluster is on the same code rev.. ie. it would still be problematic18:15
fwereade_hazmat, (I think?)18:15
hazmatfwereade_, absent consistent code versions, yeah.18:15
hazmatwe can migrate state, but unless everything deployed understands the  newer version it doesn't really resolve anything18:16
fwereade_hazmat, so, OK, we do have 2 inevitable breaking changes18:20
fwereade_hazmat, I think it's very important that we reduce that to 118:20
fwereade_hazmat, opinion?18:25
fwereade_SpamapS, hazmat, niemeyer: just to confirm, btw: default-instance-id should be treated in exactly the same way we do default-image-id; right?18:27
hazmatfwereade_, yes imo.18:32
fwereade_SpamapS, negronjl: in light of your perspective, and beyond making sure there is only one of them: is there any way we can mitigate the impact of a necessary breakage?18:34
fwereade_hazmat, I have been speaking to niemeyer: consider me at your disposal for the next couple of weeks18:47
fwereade_hazmat, how an I have most impact?18:47
fwereade_hazmat, I *want* to do constraints-get and unit constraints as part of it but I'm not sure they're our highest priority18:48
hazmatfwereade_, awesome!18:49
hazmatfwereade_, wrt to constraints.. + env-constraints18:50
fwereade_hazmat, yep, the environment change seems to me to be the most critical thing18:51
fwereade_hazmat, once we have that we can add env constraints18:51
hazmatfwereade_, that sounds like a good place to start, env-constraints and settings changes18:51
hazmatwe should land those soon18:51
fwereade_hazmat, yep, landing them is the big one18:52
hazmati'm going to work with bcsaller on subordinates, i've been testing them last night and today, i see a few problems that need resolving18:52
fwereade_hazmat, ok, I will aim for a series of branches that culminate in env-constraints, along with complete removal of default-image-id and default-instance-type18:53
hazmatfwereade_, awesome thanks18:54
fwereade_hazmat, I think that I should push my current branch; with dii/dir printing loud deprecation warnings, but continuing to work as before (and completely overriding constraints)18:54
fwereade_dii/dit^^18:55
fwereade_hazmat, people on the bleeding edge thereby get a few days' warning, and they can start using constraints as soon as the cut the cord there18:55
hazmatfwereade_, that sounds reasonable, what's the default for the bootstrap node?18:55
hazmatoh.. its going to honor them still18:56
fwereade_hazmat, same as everything: effectively an m1.small18:56
hazmatfwereade_, for series?18:56
fwereade_hazmat, it's actually quite easy to keep them working18:56
fwereade_hazmat, IIRC it's taken from default-series18:56
fwereade_hazmat, hmmm18:56
fwereade_hazmat, that's not an access setting18:56
hazmatfwereade_, right.. but that's one of the things thats going away..18:57
fwereade_hazmat, I know there was something18:57
fwereade_hazmat, oh!18:57
hazmatand keep in mind osx clients, can't always take from current18:57
hazmatwe'll need to pass it on the cli for bootstrap i think18:57
fwereade_hazmat, what are we going to use instead?18:57
hazmator use the latest current18:57
fwereade_hazmat, required?18:57
fwereade_hazmat, env constraints will need to be settable at bootstrap time18:58
fwereade_hazmat, as will default series18:59
hazmatfwereade_, i'd suggest required in the absence of host inspection, but that still feels implicit18:59
hazmatSpamapS, any thoughts wrt to bootstrap node release series specification?18:59
fwereade_hazmat, I kinda feel we should default to latest released series19:00
hazmatfwereade_, custom introspection off cloud images?19:00
hazmaturl that is19:00
fwereade_hazmat, *however*, whatever we do, moving default-series is *another* env.yaml change19:00
hazmatfwereade_, yes, and it should be part of the warnings19:01
fwereade_hazmat, yep, I'll add that now19:01
fwereade_hazmat, so, for now: warnings on all three default-* keys, behaviour unchanged19:02
fwereade_hazmat, blast I have to go19:02
fwereade_hazmat, I'll try to be on later19:02
hazmatfwereade_, cheers, i'm out in 2hrs, back in 7hr19:05
niemeyerhazmat: Just submitted a review for https://codereview.appspot.com/5849054/19:24
niemeyerhazmat: Good stuff.. just a few details19:24
niemeyerhazmat: Let me know if you want to sync up on any of the points19:24
niemeyerfwereade_: This may be relevant to you as well ^19:24
hazmatniemeyer, thanks19:26
hazmati've switched tracks to getting pull together a presentation19:26
SpamapShazmat: was at lunch.. just read your question..19:42
SpamapSIMO, all those default-* settings that are being obsoleted have to stay for a little while, and where they contradict the new way, warnings should be emitted.19:43
niemeyerfwereade_: we should probably rename relation-list to relation-units in the future (keeping the former for compatibility)19:44
SpamapSIf somebody has asked for a default-image-id, we should honor that request, and warn them that we can't guarantee we're deploying the right charm on that image.19:44
niemeyerfwereade_: We'll likely add something like relation-ids, which will make relation-list ambiguous, and potentially error prone19:44
SpamapSIf somebody has a default-series of oneiric, so be it. I see no reason to *remove* that setting.19:44
SpamapSAnd for default-instance-type .. same deal.. leave it be and let it override the hard coded 'm1.small' as a default.. but let constraints override both of those.19:45
niemeyerSpamapS: It will be removed, because the behavior you just described is a bug19:45
niemeyerSpamapS: It doesn't have to be removed now, if you and hazmat agree that's the best approach19:45
niemeyerSpamapS: But the behavior you just described is a bug, which has to be fixed19:45
SpamapSI'm not asking you to keep cruft around forever. Just take on the pattern of deprecation before removal so people have time to adapt.19:46
niemeyerjimbaker: https://codereview.appspot.com/5837050/ reviewed19:47
hazmatSpamapS, so we can keep but deprecate instance-type, but image-id and series are more fundamentally problematic, because their just broken.. its not going to break an existing environment though, we're just forcing folks to update the config file.19:47
jimbakerniemeyer, cool19:47
SpamapSniemeyer: if the documentation says "this is deprecated don't use it" and it warns and says "This is deprecated don't use it" and the software still does what the user asked.. thats a reasonable method to fix a very sticky bug.19:48
SpamapShazmat: *that* is breaking an automated environment.19:49
niemeyerSpamapS: That's the approach I'd use for a silly option that I wanted to deprecate.  I'd not do that on an option that creates a bug for scenarios I care about.19:49
niemeyerSpamapS: Again, that's just my opinion, though.19:49
hazmatSpamapS, forcing any user interaction even keeping existing apps and services running is breaking?19:49
niemeyerSpamapS: I'm happy to have you and hazmat deciding how to handle the removal19:50
SpamapShazmat: at this point, if you error out because of options that were ok before.. you are breaking automated systems.19:50
SpamapSAnd at *least* deprecate the options in the documentation first.19:50
hazmatSpamapS, done19:52
SpamapSI'm probably overreacting. But I've been through this 4 or 5 times now.. backward incompatible breaks have been coming at a pretty steady rate.. and at this point, the impact causes ripples through quite a few peoples' workflow19:52
jimbakerniemeyer, thanks for the review. it does seem to me that relation-ids should take an --interface option, as i note in my response in the mp20:01
jimbakerotherwise, this looks like we are in consensus on this20:02
niemeyerjimbaker: Do you have a real use case where that's relevant20:03
niemeyer?20:03
jimbakerniemeyer, i mention it in the accompanying keystone example20:03
niemeyerjimbaker: That example seems wrong to me20:03
niemeyerjimbaker: You're blindly iterating over a list of relation ids, without any mention of what these ids are actually associated with20:04
niemeyerjimbaker: An interface defines a protocol20:04
niemeyerjimbaker: There's generally little reason to list things based on protocol20:04
niemeyerjimbaker: You'll generally want to know "where's my cache mongodb"20:04
niemeyerjimbaker: Not "where's any mongodb"20:05
jimbakerniemeyer, sounds good to me20:07
jimbakerniemeyer, i will further update the proposal, especially the example, with this in mind20:08
fwereade_SpamapS, hazmat: not sure if I can get back on properly today; but the first branch I propose will include: constraints updates; functioning default-* keys which override all constraints (so those who didn't specify them can use constraints straight off, but those who did don't have to deal with a *really* sudden behaviour change); and dire deprecation warnings of constraints incompatibility and immi20:34
fwereade_nent removal for all 3 default-* keys20:34
fwereade_SpamapS, hazmat: I think this will be safe to land quickly and is a necessary first step20:35
fwereade_SpamapS, hazmat: so ppa users who don't keep up with the lists get *some* warning20:35
fwereade_SpamapS, hazmat: if you forsee any problems with the above, please let me know :)20:35
SpamapSfwereade_: sounds very reasonable20:36
fwereade_SpamapS, cool, thanks20:37
fwereade_SpamapS, I'm sorry about all this :(20:37
fwereade_SpamapS, it also crosses my mind that nobody will actually be able to start using constraints without restarting their env20:38
SpamapSfwereade_: don't be! I appreciate the volume and quality of change, and I know its not possible to cover every case every time.20:38
SpamapSfwereade_: thats just the nature of the beast until we have an 'upgrade-environment' command20:38
fwereade_SpamapS, but we're cutting it fine with this one, and realistically we will have at least one more release which an environment cannot live through usefully20:39
fwereade_SpamapS, yeah, exactly, my feeling is that that's a critical for the next release20:39
fwereade_SpamapS, shame we couldn't get to it this cycle, but so it goes20:40
fwereade_SpamapS, once we have an upgrade-environment mechanism it will at least be *possible* to have non-breaking releases20:41
SpamapSfwereade_: breaking releases are different than not having a feature available to you in an existing environment though.20:44
fwereade_SpamapS, only slightly different, it's not like we can replace the PA20:45
fwereade_SpamapS, and, blast, I have thought of a possible breakage, the client (which does get updated) could try to get constraints info for a state that was never created with it20:46
fwereade_SpamapS, not hard to fix but not optional either20:47
fwereade_SpamapS, andway, sorry, gtg again20:47
SpamapSfwereade_: cheers!20:48
niemeyerTime for some outsiding..21:47
fwereade_if anyone's around, do you recall why default-series is optional for some providers but not apparently all?22:19

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!