=== kadams54 is now known as kadams54-away === kadams54-away is now known as kadams54 === kadams54 is now known as kadams54-away [01:49] ericsnow, ping [01:49] ericsnow, http://reviews.vapour.ws/r/1088/ , re-submitted, thanks ! [02:14] katco: just wondering if you got to the LeaderSettings watcher registration issue or whether I should be picking it up [02:20] jam: hey i'm working on that right now [02:20] katco: k [02:20] jam: it turned out to be a bit trickier to diagnose; the code was registering the worker correctly, but the worker itself didn't adhere to the state.NotifyWatcher interface [02:21] jam: wallyworld figured that one out [02:21] jam: so there's a few issues; one question is why is the type assertion failure not being bubbled up [02:21] jam: but at any rate, i'll have a patch momentarily. had to eat dinner. [02:22] katco: when you EOD can you make sure to push it up so I can pick it up if it isn't done? and a ping to me to look at it would be good [02:22] jam: it's 21:22 here, so i'll get a PR in before i stop :) [02:39] jam: http://reviews.vapour.ws/r/1141/ i will stay on for questions/fixes [02:39] katco: looking now [02:40] jam: wallyworld made an excellent suggestion for making the test better, but i don't have the time tonight to grok it; i can shore that up later: http://pastebin.ubuntu.com/10582820/ [02:41] katco: right. I was about to suggest using the NotifyAsserter directly on the Changes rather than a secondary stream [02:45] katco: reviewed [02:45] jam: ty looking [02:46] * katco smacks head [02:47] jam: so, you're absolutely right [02:47] jam: do you think i should just implement a full watcher instead of attempting to piggy-back off of settingsWatcher? [02:47] jam: or do you think a simple for loop wrapping the settings-watcher's change is sufficient? [02:48] katco: if all it needs to do is proxy another one, thats fine (I think). Is there a reason you can't just hand back the settings watcher itself? [02:48] jam: yeah, the state.NotifyWatcher signature demands a signature of type Changes() <-chan struct{} [02:49] katco: if you are embedding *settingsWatcher, doesn't that give you *its* Changes channel? [02:49] jam: and that's the bit preventing it from being registered [02:49] ah, so we have changes but its not a struct{} [02:49] k [02:49] jam: the settings watcher is not of type NotifyWatcher; it's signature is <-chan Setting something or other [02:49] katco: from being registered? or from being instantiated ? [02:49] jam: i believe from being registered [02:49] jam: we do some reflection from what wallyworld found [02:50] jam: and anything stuffed into resources which does not conform to NotifyWatcher will not be utilized [02:50] katco: right but that's not *registration* thats *extraction* [02:50] katco: because resources can be whatever [02:50] jam: right sorry; extraction [02:51] but when you go to look them up it checks that the thing that's there matches. [02:51] right [02:51] probably we should have good logging as well [02:51] something that says "I found something, but its not what you're looking for" [02:51] yeah we discussed that; it's a bit odd that this didn't trigger some kind of error/log, *something* [02:51] we generally don't send that as an error to the caller (though I think we could here) because of Permission/Priority concerns. [02:52] we also discussed maybe having separate register methods for different types to force this into a compile-time error [02:52] anyway, I'll be back in 10 [02:52] k, i'll implement the for loop [02:55] jam: i'm a bit concerned about what happens when the embedded settingsWatcher is told to stop; what would happen to the LeadershipSettingsWatcher's Changes loop? [03:01] katco: coming late to the party, but in such cases you'd implement Stop() on the outer struct and call the embedded Stop() as well as cleaning up anything else [03:02] axw: i've made a few comments on the gh pr [03:02] wallyworld: yeah the thought crossed my mind; but that means i need to probably implement a tomb of my own [03:03] wallyworld: thanks. I've gotta head out shortly, will respond when I get back [03:03] sure [03:05] katco: yeah maybe, i'd need to check the code in more detail. the reason for not using settings watcher directly is the changes method returns a settings delta, not just an event? and we want notify watcher samantics? [03:05] wallyworld: yeah [03:05] hmmmm [03:06] katco: could we just add a Changes() <-chan struct{} to settings watcher [03:06] to make it also a notify watcher [03:06] i don't think go allows overloading like that [03:06] naming conflict [03:06] it doesn't take into account the entire signature [03:06] ah, yeah [03:06] not i remember why i hate Go often [03:06] now [03:07] lol [03:07] that and no exceptions and no generics [03:07] it forces you to stay simple... which can be an immensely good thing :) [03:08] katco: so we could extract a base settings watcher and embed that in 2 variations - what we have now and a new notify variant [03:08] the 2 new versions of the containing struct would have the different Changes() signatures [03:08] wallyworld: hm. i'm kind of liking that idea. i do worry about unforeseen consequences [03:09] that's what tests are for :-D [03:09] i think we have pretty ok coverage in that area [03:10] maybe earlier today i would have agreed with that [03:10] :) [03:10] lol [03:10] i think i'll give that way a go though [03:11] good idea [03:11] good luck :-) [03:11] i think it will work fine, and solves the issues encountered [03:15] wallyworld: mmm... i think that just makes the issue worse. because i'd then have the same problem in 2 places. the base watcher would still have to notify on a <-chan settings [03:17] couldn't the base watcher use an embedded interface to call when it needs to notify? [03:17] i'd need to look at the code to map it out [03:18] wallyworld: hm. no that may work. [03:19] the one with the notify semantics would just discard the info given and shove struct{} onto a channel [03:20] wallyworld: right, but you're still talking to a channel; you lose the magic of select [03:21] so the way this would work is the i'm in the base type's select, specifically in the case statement that received a change [03:21] i then call the notify function that was passed to me [03:21] that notify function has to send on a channel [03:21] except that now if stop is called, it will block [03:22] as it's implemented now, the select would select the stop even if it was waiting on sending the notification [03:22] because it's in its select [03:22] hmmm [03:22] now mind you [03:22] my brain is revolting a bit at this time of night [03:23] wallyworld: do you have a link to the CI jobs on jenkins? [03:23] this is where generics would be wonderful [03:23] url hacking is failing me [03:23] thumper: o/ [03:23] hey katco [03:23] thumper: http://juju-ci.vapour.ws:8080/job/github-merge-juju/ ? [03:23] or http://juju-ci.vapour.ws:8080/ [03:24] the last one is the dashboard for all the jobs [03:24] that doesn't have the CI jobs on it [03:24] thumper: you need to be logged in [03:24] as who? [03:25] I don't have any creds that AFAIK [03:25] thumper: everything got locked down due to the creds leakage issue raised last week [03:26] hey rick_h_ [03:26] katco: wanna leave it till tomorrow? I'll see if i can hack it a bit today [03:26] wallyworld: nah i'm close [03:27] wallyworld: i don't think there's a way around it. i have to do a copy/paste job. it needs its own tomb setup [03:27] yeah, could be right [03:27] thumper: howdy [03:27] this is when i think go's simplicity works against it [03:28] exactly [03:28] and i run into that *all* the time [03:28] i think they are trying to solve this still [03:28] for me, it's too simple [03:28] but code duplication of this complexity is annoying [03:28] leads to *lots* ot cut and paste bilerplate [03:29] code duplication of any complexity can get annoying - if i never see another cut and paste arraycontainsstring() it will be too soon [03:32] FYI: both CI blockers have now been cleared [03:32] verified that the fix that landed earlier fixed the upgrade issue [03:32] and the power test isn't timing out on cmd/jujud/agent either [03:33] pretty sure those two things were related [03:38] wallyworld: jam: helper method for registering watcher: RegisterWatcher(...) or RegisterNotifyWatcher(...)? [03:39] Notify imo [03:39] as there are idfferent kinds [03:39] eg StringsWatcher [03:39] Bug #1430791 changed: Upgrade-juju is broken on most/all substrates [03:39] Bug #1430898 changed: run-unit-tests ppc64el timeout [03:39] Bug #1431130 was opened: make kvm containers addressable (esp. on MAAS) [03:40] wallyworld: gosh... it makes apiserver/common depend on state [03:40] i don't know if it's worth that. [03:40] no [03:40] let's not do that [03:42] pushing up changes with a little piece of my rotted heart. [03:43] lol [03:43] http://reviews.vapour.ws/r/1141/ [03:43] i think we could probably do something better by passing in a channel into a base type [03:43] but i'm not sharp enough right now to conceive of that [03:48] katco: i know it's late for you, and i'm happy to do it, but i do think the test needs to use something like the pastebin i offerred earlier [03:48] wallyworld: i was going to land that separately... if you want it in i'll have to hand off [03:48] wallyworld: otherwise i won't be fresh for tomorrow. b/c babies don't have a snooze ;) [03:49] katco: ok, so long as you land first thing tomorrow before we branch 1.23 :-) [03:49] wallyworld: land the separate branch? or land changes you will make? [03:49] the followup branch with revised tests [03:50] this branch at least will fix the issue itself [03:50] wallyworld: keep in mind i also have to do the amz fix [03:50] i can do the tests one [03:50] wallyworld: it would be appreciated... so does this one have a ship it then? [03:51] katco: just eyeballing changes - they are a cut and paste right? [03:51] wallyworld: yup [03:51] wallyworld: from settingsWatcher [03:52] katco: ok, tyvm for perservering with this, +1 [04:06] * katco teuche, Jenkins. it is i who waits on you. [04:07] touche even [04:09] Bug #1430898 was opened: run-unit-tests ppc64el timeout [04:09] Bug #1431134 was opened: fix container addressability issues with cloud-init, precise, when lxc-clone is true [04:09] wallyworld: I just realised there's more to do, I hadn't updated the apiserver/storageprovisioner code to use the new watchers [04:10] doing so now [04:11] gah [04:11] between the time i submitted and the job queued up that bug hit [04:11] and now it rejected the build [04:14] can someone land this when trunk opens up? https://github.com/juju/juju/pull/1813 [04:14] i need to go to bed [04:15] wallyworld: axw: anastasiamac ^^ [04:16] katco: ok, will keep an eye out [04:16] good night [04:16] axw: ty sir. have a good day [04:41] wallyworld: katco: so I looked at the change, and we need to tweak it a bit for not actually being a settings watcher, also it is completely untested as is. But if you want to hand it off to me, since I actually need it for my patch, I can pick it up. [04:41] wallyworld: would you rather just hand it off to me? [04:42] jam: if you want, i realise the tests need updating, but it was late for katco so i was going to follow up with a branch, but if you want to do it for your patch that wuld be fine with me [04:42] wallyworld: well both that we now have a lot of untested code in LeadershipSettingsWatcher, and the test needs tweaking for featuretests [04:43] jam: agreed, hence the tests were to be updated straight away [04:45] jam: i was looking to use something like statetesting.NewNotifyWatcherC as is used elsewhere [04:45] was that ow you wre going to redo the tests? [04:47] wallyworld: yeah, I'm grabbing your paste as a reference [04:47] oh, that paste was just an initial brain dum [04:47] p [04:47] wallyworld: I had forgotten that NotifyWatcherC was the original that I used to create the NotifyAsserterC channel stuff [04:47] but should be close [04:47] wallyworld: yeah, I'm tweaking it, but it gives a good reference [04:47] like, we have to import statetesting :) [04:47] of course :-) [04:48] i didn't compile it, just dumpt the text into pastebin [04:49] jam: also for later, did you talk with mark about Properties vs Hints? [04:49] no, mark was out of town last week [04:50] ok, np [04:50] wallyworld: do you know why featuretests starts mongo in a different way? [04:51] no [04:51] i either didn't realise or forgot that it did [04:51] jam: but isn't this standard? coretesting.MgoTestPackage(t) [04:51] hmm… maybe it doesn't. I was running into problems with a —journal flag but only for the featuretests directory [04:51] in package_test [04:52] but they seem to be running now [04:52] maybe it was something else [04:52] symlink issue :-P [04:53] wallyworld: from what I've seen could be [04:53] oh, i was joking :-) [04:53] nope, still happening: [LOG] 0:03.670 ERROR juju.worker exited "state": cannot create database index: cannot use 'j' option when a host does not have journaling enabled [04:54] my state worker spins endlessly in those tests [04:54] hmmmm, not seen that before === beisner- is now known as beisner [05:40] wallyworld: can you please check my latest diff in http://reviews.vapour.ws/r/1128/ [05:40] wallyworld: I've updated storageprovisioner to match the scope changes [05:41] sure [07:20] dimitern: do you know why the blocking bug fails on ppc64 only? [07:21] wallyworld, I don't know that for sure - it might be gccgo-specific [07:22] dimitern: it's hard to tell from the console log - there's so many go routines, most of which are blocked on a select; where did you see an concrete error info? [07:24] wallyworld, search for the test I mentioned in the log - it's there, about the newDummyWorker (or whatever, the second thing) - towards the end of the panic dump you can see upgrade_test.go:1779 (or was it machine_test.go) [07:24] dimitern: thank you, my eyes weren't working properly [07:27] wallyworld, oh, np - it's hard to see anything useful in most of these panics [08:00] wallyworld: we're going to need to expand the storage provider and provisioner. if a machine reboots, we'll have to recreate tmpfs/loop [08:22] morning o/ [08:27] TheMue, o/ [08:29] axw: unless they say they want transient right? [08:29] wallyworld: transient volumes should still be attached after restart, they just may be empty [08:29] wallyworld: at least, in the ec2 definition [08:29] yes, that is true [08:30] wallyworld: I'm thinking of folding the filesystem mounting into the storage provider/provisioner [08:30] wallyworld: and how to deal with remounting after restart. then I realised we need to do the same for tmpfs/loop [08:30] * TheMue just read the State discussion. yeah, there are always repeating names in many projects, e.g. also Context. [08:31] TheMue: it occurred to me as I used the word "context" a lot in that email ;) [08:31] ..ooOO( Data, Dump, Thing, Stuff, ... ) [08:31] axw: yeah, as "context of a state" [08:31] wallyworld: so, my current thinking is that we require Attach{Volume,Filesystem} to be idempotent [08:31] axw: i think it makes sense for the provisioner worker to have that responsibility of dealing with restart/remounting [08:32] yes [08:32] wallyworld: and then the first time we see an already provisioned storage, we request a re-attach [08:32] like unit hooks [08:32] which may be a no-op [08:32] yep [08:33] wallyworld: also, we don't need to think about htis too much now, but I think we're going to want to run Attach on both env + machine for env-scoped attachments [08:34] wallyworld: e.g. for Ceph where we'd want to prepare the resource in the env storage provisioner, then attach/mount in the machine storage provisioner [08:34] yeah, i haven't thought about that [08:34] not enough brain space [08:34] that can come later, I'm pretty confident we can work it in [08:35] wallyworld: anyway, I think I'll work on provisioning filesystems/filesystem attachments in the storage provisioner now [08:35] I won't worry about loop/tmpfs for the moment, will come back to them later [08:36] you mean the restart behaviout for tmpfs/loop? [08:37] wallyworld: we don't watch volume attachments in the storage provisioner yet, so we can't handle them until we do [08:37] wallyworld: if you prefer I can do it for volumes first, I was going to leave it so you could continue on it in parallel [08:38] axw: ok, start on fs, i have some health stuff to do but can then pick up volumes [08:38] though TBH, it's kinda much the same... I might see if I can abstract some stuff to make the core logic common [08:39] ok sounds good [09:03] ugh sleepless night in our household o.0 [09:04] going to work on goamz while i'm up [09:07] bodie_, here now. I'm on UK time this week and back to US eastern time next week. [09:09] katco: is that you or a bot? [09:10] beep boop [09:10] R2D2? [09:11] lol [09:11] i'm going to drink an r2d2 sized coffee [09:11] in a few hours [09:11] double shot [09:12] we think she is teething again [09:19] dimitern: if you get a chance [09:19] dimitern: http://reviews.vapour.ws/r/1138/ [09:20] voidspace, cheers, will look shortly === ashipika1 is now known as ashipika === perrito667 is now known as perrito666 [09:45] voidspace, dooferlad, TheMue, please take a look at this http://reviews.vapour.ws/r/1144/ [09:45] dimitern: *click* [09:51] dimitern: s/live tests on MASS/live tests on MAAS/ ? [09:51] dimitern: in PR description [09:52] dimitern: ah, dhcp leases... :-) [09:52] dimitern: sounds like you've been having fun working all this out [09:52] voidspace, :) oh yeah - ofc [09:53] voidspace, it was a fun week, yeah :D [09:53] TheMue, thanks! [09:53] voidspace: MASS is ok, Metal as Super-Service [09:53] * dimitern has bootstrapped a maas environment at least a 100 times in the past 4 days [09:53] ouch [09:59] voidspace, you've got a review [09:59] dimitern: thanks [10:34] dimitern: PR updated with extended test [10:34] voidspace, great! I'm having another look [10:39] voidspace, looks good to land [10:41] dimitern: cool, thanks [10:52] axw: ping [11:10] katco: axw is here now, if you ping him again, or at least rub his lamp, he may appear in a puff of smoke [11:10] or he was [11:10] lol [11:10] i think i figured it out, i just wanted to pick his brain [11:10] Bug #1431286 was opened: juju bootstrap fails when http_proxy is set in environments.yaml [11:18] wallyworld: got time for a review of goamz? [11:21] wallyworld: dimitern: https://github.com/go-amz/amz/pull/37 [11:25] katco, will have a look shortly [11:27] dimitern: thx [11:37] dimitern: ready with a first pass and like it so far. but would like a 2nd review [11:37] TheMue, sure, thanks so far :) [11:43] * TheMue steps out for lunch [11:44] * fwereade has confused his sleeping patterns again and would like to beg for someone to implement a trivial branch for him while he goes and has breakfast and stuff [11:44] katco: sorry, was afk. looking [11:44] katco: pong [11:44] wallyworld: np at all [11:44] * axw_ emerges in a puff of smoke [11:44] lol === axw_ is now known as axw [11:44] axw_: hey, was mostly going to pick your brain re: io.Copy moving the stream pointer of a request body to the end [11:44] * fwereade needs "leader-elected", "leader-deposed", and "leader-settings-changed" hooks added to charm/hooks [11:44] ah, awx has already reviewed [11:45] wallyworld: ah cool [11:45] katco: I just commentd on the PR [11:45] came on here to check if you were around :) [11:45] axw, jam, I have a couple of concerns re http://reviews.vapour.ws/r/959/ -- possibly unfounded? please consider [11:46] fwereade: okey dokey [11:47] axw: so i don't think making a type assertion for ReadSeeker will work, because the body of http.NewRequest wraps the incoming body in a ioutil.NopCloser [11:47] fwereade: I agree with you about axw's comment. I had one written myself, but forgot that publish button [11:47] why is it at the top....! [11:47] axw, actually I'm not sure I expressed them very well -- I just have a bit of a nervous about not coalescing [11:48] axw, it was an uphill struggle to convince myself it was ok in the client watcher code [11:48] katco: rats [11:48] axw: i know, i tried that at first =/ [11:48] fwereade: so the tick-tock stuff was just preserving what we had and going forward with it. I also wasn't entirely happy as it seems very different from our standard behavior. [11:48] axw: i also realize there is an API break; but we *just* cut v3 [11:49] axw: not really that sure what our stance is on that [11:49] katco: fair enough [11:50] axw: i suppose i'll let dimitern weigh in on that one [11:50] dimitern: Could you take a look at https://github.com/juju/testing/pull/54/files [11:50] fwereade: I don't really understand about the watcher ownership. if the source Stop()s it cleanly, why wouldn't its Err() be nil? [11:51] axw, IMO the source shouldn't stop its watcher until it's exiting its goroutine [11:51] katco: official versions is "what was in a stable release" IMO [11:51] dimitern: I tried running the .bat version and it doesn't work inside "wine cmd" so I am guessing it was broken anyway. I can test in Windows in a moment. [11:52] jam: stable release of juju? or of goamz? [11:52] jam: how did the leader settings work out btw. sorry i had to (try) and go to bed last night [11:52] katco: well goamz is trickier, as its a dependency. I thought it was an API version. [11:52] but if things are just broken in v3 seems like people can't use it [11:53] katco: working on the patch to tests, been very distracted this morning, but I can build on what you landed. [11:53] jam: well, not so much broken. v4 signing just uses more memory than it should. [11:53] axw, "just one deferred Stop, guaranteed to happen after the loop exits" feels much cleaner -- and thus indicates that any close of that channel inside the loop implies that *if* there's no error returned *someone else* has interfered with us, and we should complain [11:53] fwereade: I'm sure I'm being slow, but I still don't see how that affects the error state of the watcher. One way or another the source stops the watcher; if it's *asking* to stop it, why would the watcher stop with an error? [11:53] katco: does fixing the memory issue actually cause a change to the API? [11:53] jam: distracted? people on juju core? no! ;) [11:54] axw: if someone stops something you depend on, isnt that an error? [11:54] axw: (if I pull the handbrake while you're driving, I don't think you'd exit cleanly :) [11:54] jam: it does, because it necessitates some signature changes to take in io.ReadSeeker and not io.Reader [11:54] jam: we don't know how awesome axw is at drifting... [11:55] fwereade: so liveSource.Stop() should always error? [11:55] jam: if you're interested, here is one of the API breakers: https://github.com/go-amz/amz/pull/37/files#diff-7db3cb93944d57b2ffc803281c906018R209 [11:56] * axw is confused [11:56] axw, I think what would happen is [11:56] katco: so if you're really concerned, you could just add a new function PutReadSeeker [11:56] axw, liveSource.Stop calls q.tomb.Kill(nil); return q.tomb.Wait() [11:56] not an API break by most people's definition [11:56] axw, the watcher lives on happily in the background [11:56] I guess if the underlying code *requires* read seaker [11:56] jam: except the old function could not exist as is [11:56] seeker [11:57] jam: yeah exactly [11:57] katco: well you could always read and buffer [11:57] katco, dooferlad, ok, I'm back and responding in order :) [11:57] jam: well, that's what causes the memory concern. that's what we were doing prior [11:57] axw, the main loop observes that <-q.tomb.Dying(), and exits with ErrDying, whithout touching the watcher [11:57] fwereade: ahh, I see. it makes sense when taken with your other comment [11:57] dimitern: ty! [11:57] axw, cool :) [11:59] axw, and fwiw I don't think you're being slow, these types with internal goroutines are subtle and quick to anger [11:59] daughter is awake (at the appropriate time this time)... afk for a bit [11:59] dimitern: no rush - was about to get lunch anyway [12:00] dooferlad, cheers [12:04] katco, reviewed [12:04] katco, couldn't help much I'm afraid [12:06] jam: if you come up with something better than the tick/tock, I'm happy to change the storage source. I'll fix the EnsureErr/watcher.Stop bit tomorrow [12:06] axw: well that patch hasn't landed has it? [12:06] jam: which patch? [12:07] jam: I mean, I'll change the storage source to match whatever you do for the relation one [12:07] axw: ah [12:07] katco: so you can add a new memory-efficient API, and leave the old inneficient one around for compatibility [12:09] jam, just posted http://reviews.vapour.ws/r/1145/, please let me know if you spot any impedance mismatches with what you're doing with Filter [12:09] fwereade: if it is run-leader-elected, I just merged it, the API fits [12:09] jam, will be writing proper tests for it today once I'm ~functional again [12:09] jam, cool [12:09] jam, pushed an update or two since I last spoke to you [12:09] jam, but I don't think I touched leader-settingschanged [12:11] jam, would also appreciate a second opinion on http://reviews.vapour.ws/r/1134/ [12:13] jam, and if you run your merged branch and find that it looks like it roughly works, would you send jamespage a link to the code? [12:14] jam, l-s-c is the important bit for him AIUI, l-e is just a bonus, but giving the structure a bit of a kick around would be good all the same [12:21] dimitern: ty for the review [12:23] dimitern: so i can't keep the old version. it will error if we don't seek back to 0 in the stream [12:24] katco, right, and it can't be used only for signV2 ? [12:25] dimitern: correct. we deprecated v2 for s3 [12:26] katco, can you at least add PutReader that has the same signature, but panics when called (e.g. "use PutReadSeeker in v3") ? [12:26] dimitern: absolutely [12:27] katco, +1 [12:28] dimitern: well actually... typing this out. isn't a compile time error better than a runtime panic? [12:30] katco, depends I guess [12:31] katco, ok, let's suck it up then [12:31] katco, leave PutReader as you did it [12:31] dimitern: or i could change the name, and any references to the old name would be compile-time errors [12:31] katco, and add a comment about the change (was there an issue # as well?) - in case somebody wonders [12:32] dimitern: yes there was [12:32] katco, it will still be a compile time error if the name is the same but not the interface type [12:32] dimitern: ok, so land it as-is? [12:33] dimitern: or i'm sorry... with a comment [12:36] dimitern: updated: https://github.com/go-amz/amz/pull/37 [12:37] katco, with a comment, but how about those benchmarks? [12:37] fwereade: well. my tests are currently hanging right in the middle, which I don't understand, as they should at least be timing out, but I'm trying to sort through it. [12:37] I tried SIGQUIT but the scrollback is soo big [12:38] ah, debug has it, the Merge API request is hanging [12:38] katco: !!! :) [12:39] dooferlad, reviewed, how about you to review mine ? :) [12:39] jam: merge settings is hanging? [12:39] yep [12:39] I see the request go in, and no response [12:39] debugging now [12:40] katco: http://paste.ubuntu.com/10585240/ [12:40] dimitern: i don't know how long this will take, but we want these changes in before the cut of v1.23 today [12:40] dimitern: i need to help jam out right now [12:42] jam: do you have a link handy to show the code that generates this? [12:43] jam, ah, I think I know what you need [12:43] jam, run a WorkerLoop in the test [12:43] that would do it [12:44] fwereade: i don't think i've actually said hi to you in awhile [12:44] sorry i've been scattered all over the place [12:44] katco, not sure I have either, how goes it? [12:45] i am a bit o.0 [12:45] up all night with my daughter [12:45] katco, no worries, I have been in full basement-troll mode myself [12:45] katco, ouch [12:45] lol [12:45] can i be a basement elf instead? [12:45] Galadriel Cox-Buday [12:46] haha [12:46] high praise jw4 [12:46] :) [12:46] katco, ok, so land it then [12:46] dimitern: at worst it will not make things worse [12:46] katco, you have my lgtm and I get it :) [12:46] speaking of trolls, if anyone was reading hpmor and then kinda dropped out because update delays, it's finishing in 2 days [12:47] katco: so is there an easy way to update leadership settings outside the API? I'd probably just work around this, but we'd want to fix it [12:48] jam, I think we do want to fix it, but I'm shelving it under the general "we need to fix shared resources" heading in my mind [12:48] jam: sure, let me grab some code [12:48] fwereade: "fix it" being ? [12:48] jam, ie an apiserver needs an explicitly-created and managed lease-manager to run, rather than just hoping someone else has started one in the background [12:49] jam, katco: the action-at-a-distance thing of WorkerLoop is what makes me edgy [12:49] katco: so I added debugf statements, and I see it get to the point of checking if we're still the leader [12:49] then it hangs [12:49] without getting a failure or success [12:49] jam, are you running a WorkerLoop in the test? [12:49] fwereade: TestLeaderSettingsEvents code [12:49] do you need another worker running? [12:49] jam, yeah [12:50] workerLoop := lease.WorkerLoop(st) [12:50] return worker.NewSimpleWorker(workerLoop), nil [12:50] if you've got one of those running for the duration of the test it'll be fine [12:50] fwereade: for the *api* not to hang? [12:50] jam, yes [12:50] *ugh* [12:50] jam, hence what I was saying above about making the dependency explicit [12:50] at th ev [12:50] jam: the reason being there needs to be a lease manager instance running [12:50] at the very least dying an ugly death of "nobody responding" would be far better [12:50] jam: and the lease manager runs in a worker [12:52] katco: fwereade: wouldn't JujuConnSuite be the place to be starting this sort of thing? [12:52] jam: to answer your other question, you can get leadership settings off of state, and then update that [12:52] jam: currentSettings, err := st.ReadLeadershipSettings(serviceId) [12:52] currentSettings.Update(rawSettings) [12:52] _, err = currentSettings.Write() [12:53] jam, katco: I guess that's probably better but I'd almost rather have the ugliness front-and-centre so we maintain an incentive to notice and fix it [12:53] jam, katco: perhaps that is optimistic [12:53] fwereade: you are saying there is an issue running the lease manager in a worker? [12:54] jam, katco: but even so, what I hate about JCS is that it just sets up *everything* our code might need [12:54] katco, I'm saying that having the api able to hang because a completely separate worker hasn't been started is an issue [12:55] fwereade: i don't think we can do much better than an error there. the two are coupled [12:55] katco, we could start it explicitly inside apiserver and that'd be ~fine, modulo all the yuckiness of writing ad-hoc Runners into our workers [12:55] katco, or we could [write some more infrastructure] and have it created and passed into the apiserver worker, and that'd also be fine [12:55] katco, it's really just the action-at-a-distance thing [12:56] fwereade: what about simply returning bool if it's not setup yet, and then returning an error? [12:56] katco, (and, btw, it's not very safe -- I caused myself a bunch of confusion by accidentally starting two workers in a test) [12:56] fwereade: of course that should never happen in an actual environment [12:57] katco, doesn't address the "why does package X arbitrarily depend on someone else calling some function in package Y" [12:57] katco: fwereade: now I'm getting "failed to merge leadership settings" [12:57] seems wordpress/0 isn't the leader of wordpress ? [12:57] fwereade: so I need to claim leadership and all that stuff first? [12:57] I think I'll just write to state [12:57] far less setup overhead [12:58] fwereade: ideally this sort of test wouldn't even need a state running [12:58] fwereade: well, i mean that's just properly decoupled code, right? the leadership api has a dependency on the lease manager. best to keep those decoupled than to more tightly couple them for the sake of testing [12:58] jam: yeah you have to be leader [12:58] jam: dems the business rules [12:58] katco: being able to run an API server that will hang responding to API calls seems a bit broken [12:59] jam: i'm claiming that's a separate issue; we can easily return an error there [12:59] jam: also i'm questioning whether that situation will arise anywhere but full-stack testing scenarios [12:59] katco: I'm saying that having the Leadership API exposed means it should be running the stuff it needs underneath [12:59] IMO [12:59] katco, they're coupled, full stop. not very tightly, but they are; and so an explicit dependency, whether external ("you can't create an apiserver without a lease manager") or internal ("when starting an apiserver, start and track a lease manager") is desirable [13:00] katco, and IMO an external one is far superior [13:00] katco, that way you don't have the mess around monkey-patching global funcs to test the code with the dependency [13:00] fwereade: agreed (I think we're saying the same thing from different POV) [13:00] jam, yeah, I haven't perceived us disagreeing [13:01] fwereade: well I disagree with you all the time, but that's just when you're wrong :) [13:01] jam, this was one of the major motivations for what I've been saying about JFDIing a worker registry [13:01] jam, ;p [13:01] fwereade: on a different note, we have a small naming issue between LeadershipSetting and LeaderSetting [13:01] fwereade: I have the feeling we should try to standardize somewhere [13:01] jam, bugger, I have not been consistent, have I [13:02] fwereade: the API you requested from me was LeaderSettingsEvents [13:02] jam, I think LeaderSettings is perhaps more correct? [13:02] the uniter method is s.uniter.LeadershipSettings.Merge [13:02] jam, they're the settings that are/were set by the leader, not the settinngs that pertain to the concept of leadership [13:02] fwereade: I feel like LeadershipSettings would be "who is the leader" (settings about Leadership) vs "LeaderSettings" is what did the leader *say* [13:02] jam, paraphrase jinx [13:02] fwereade: :) [13:03] fwereade: so I'll stick with LeaderSettings at this level, and we can look to push that down into apiserver/leadership/settings.go [13:03] jam, shtm, thanks [13:03] sounds hot to me ? [13:04] jam, that was *exactly* what I though, and decided to let it stand :) [13:04] so hairy thank mary? [13:04] fwereade: sorry back [13:04] katco, np [13:05] fwereade: so one thing that surprised me [13:05] fwereade: Discard*Event [13:05] blocks trying to send the discard message [13:05] is that what we want? [13:05] dimitern: ping [13:05] jam, yes [13:05] I guess so [13:06] jam, it guarantees that by the time the func returns, the discard has completed, and we can safely read from the corresponding channel without getting another event until there's a real change [13:06] fwereade: i agree with creating an apiserver and passing in a lease manager. i don't think i knew how to do that given the constraints of the apiserver instantiation at runtime [13:06] fwereade: I really wish we didn't have 50 lines of simplestreams noise when a test fails... [13:06] jam, me too [13:06] (though that's a sign of running too much in jujuconnsuite [13:06] katco, yeah, exactly, that was where I fucked up [13:07] katco, I sorta threw it over the wall at you without making clear that the worker dependency was going to be a problem [13:07] fwereade: but you can at least see the nice dependency tree being built up; it's completely possible if apiservers allowed for it [13:07] voidspace, pong [13:07] i.e. leadershipservice takes a leadershipmanager takes a leasemanager [13:08] katco, oh absolutely -- I think all our workers need that [13:08] IoC? [13:08] or the n-tier thing [13:08] katco, pretty much -- I think the dependency chains are potentially long [13:09] katco, in terms of things that can be usefully modelled as persistent shared resources [13:09] that's where DI comes in usually... with go we'd have to use factories [13:09] katco, pretty much everything needs (1) an api connection and (2) an environment in which we can guarantee we won't be running upgrade steps [13:10] katco, the apiserver needs (2), and a state connection, and a lease manager [13:11] yeah [13:11] katco, so I *think* I have some worthwhile ideas here and I hope to get onto them soon [13:11] awesome :) [13:11] fwereade: so I'm trying to write a test that we can Discard the first leadership message [13:12] however, the test gets to DiscardLeaderSettingsEvent before it actually gets the underlying first watcher event [13:12] jam, don't start reading from the discard chan until you've got the first watcher event [13:12] fwereade: is that *correct* or just good for testing? [13:13] jam, I think it's correct [13:13] dimitern: when we call StopInstances on the kvm / lxc brokers we use instance ids not container tags [13:13] dimitern: so inside StopInstances we don't have that information [13:13] jam, otherwise no client can create,discard without races [13:13] dimitern: it's not obvious to me how to go from one to the other [13:14] dimitern: it's provisionerTask that calls StopInstances and it's called with instance.Instance which doesn't have container id either [13:14] voidspace, whatever did the StartInstance ought to have somehow recorded the instance id assigned to the container [13:14] fwereade: ok, I'll look - thanks [13:14] fwereade: k, it *is* what discardConfig does, but I felt having the extra local variable was clumsy. will do [13:14] voidspace, instance id is the responsibility of the broker layer -- it creates them and returns them and ought to be able to understand them when passed back in [13:15] jam, that whole type is clumsy :( [13:15] heh [13:15] fwereade: it's definitely "somehow recorded", it just might be that we need to change the provisioner api to take instance id and do the lookup itself [13:15] its only 1000 lines in a single select loop [13:15] surely that can't be bad [13:15] fwereade: the new provisioner api takes a container id [13:15] fwereade: I will say, the need to have everything in the select is a bit of a shame [13:15] you can't farm out to other selects [13:15] I guess if they all pipelined into a channel [13:15] * fwereade scratches head at voidspace -- link me code please? [13:15] fwereade: I'll look at how the two are associated when the instance is started [13:16] fwereade: this just landed this morning, a new ReleaseContainerAddresses api [13:16] fwereade: for releasing (with the provider and in state) and IP addresses statically allocated to the container [13:16] fwereade: so it needs to be called on StopInstances [13:16] dimitern: Oh, I had already looked at yours but I thought TheMue said he was doing the review. Looked fine, but I didn't dig into great detail in terms of code style, I was mostly interested in the new/changed functionality. [13:16] fwereade: https://github.com/juju/juju/pull/1810/files [13:17] * jam still loves that "remove my local changes" in git is "git co file" [13:17] I *guess* checkout the file again sort of makes sense if you cross your eyes and pull your brain out through your nose... [13:18] and it *is* the way you do it in SVN as well [13:18] dooferlad, ok, so you can approve it :) [13:19] voidspace, let me have a look [13:20] voidspace, it looks to me like that code is itself getting instance ids from container ids because those ids are recorded in the state anyway? [13:21] voidspace, would you explain the problem again, I think I'm confused [13:22] fwereade: that code needs the container id and the instance id [13:22] fwereade: the ip addresses are allocated to the container id in state and the provider needs the instance id to do the release on the provider side [13:22] fwereade: so we need both [13:22] voidspace, right, and given a container id it looks in the state to figure out the instance id assigned to it [13:22] fwereade: that api needs to be called when we shut down a container - so at StopInstances time [13:23] fwereade: yes, from a container id we can get instance id [13:23] fwereade: but at StopInstances time we have instance id and not container id [13:23] fwereade: so I either need the broker to go from instance id to container id [13:23] fwereade: or change the api to take instance ids [13:24] voidspace, I don't *think* so, but let me marshal my thoughts for a couple of minutes [13:24] cool [13:24] fwereade: so, WantLeaderSettingsEvents() I'd like to discuss the proper way to enabled/disable them. As I think we decided you definitely want an event immediately if you pass true, even if we didn't think there was an event. [13:24] fwereade: do you feel it is better to point-to/not-point to a Changes channel, always read from the channel but not send it on, or create a new watcher each time [13:25] certainly in handling a "true" case, we can just set outLeaderSettingsEventOn = outLeaderSettingsEvents and it will send an event. [13:25] but it *might* send 2 events if there is also a Changes() pending, right? [13:26] jam, queued [13:27] voidspace, context assumption [13:27] voidspace, we're stopping an instance, and we have to unassign all the addresses before we stop the instance, because we can't reference the instance once it's destroyed [13:27] voidspace, accurate? [13:28] fwereade: not quite, we can do it afterwards - order isn't important [13:28] fwereade: we just need to ensure that we release the ip addresses with the provider when we shut down the instance [13:29] Bug #1427342 changed: juju package should have strict version dep with juju-core [13:29] voidspace, ok, who is it that's assigning these addresses? [13:29] fwereade: the broker in StartInstance does [13:29] voidspace, dimitern: I can't remember how dynamic we're trying to be at the moment [13:30] fwereade: by calling the provisioner PrepareContainerInterfaceInfo which requests an IP address from the provider and associates it with host nic [13:30] voidspace, dimitern: ok, so, looking ahead a little [13:30] fwereade: the broker then sets up the iptables rules and container config so that IP address routes to the new container [13:31] voidspace, fwereade, so from an instanceId for an lxc container you can go back to the machine it [13:31] voidspace, dimitern: what's the plan for when we're adding new addresses to containers at runtime? [13:31] voidspace, fwereade it's always the same format: prefix+newMachineTag(containerMachineId).String() [13:32] voidspace, dimitern: that feels hard to integrate with the provisioner [13:32] fwereade, voidspace, we haven't go that far [13:32] gone [13:32] dimitern, [that feels to me like a broker implementation detail on which we should not be depending?] [13:33] I agree [13:33] fwereade, it's a container/lxc implementation detail [13:33] dimitern, voidspace: ok, still thinking [13:33] dimitern, voidspace: what's the lifecycle handling like for addresses? [13:34] voidspace, fwereade, you can get the prefix (usually "juju" but can be other as well) from the broker.manager.ContainerConfig() -> PopValue(container.ConfigName) [13:34] fwereade, voidspace, we release addresses when their assigned machine is no longer alive (at least for containers - that's the first step) [13:35] fwereade: the don't have a Life (as nothing else references them) - so they're created when allocated (currently only for a new container) and removed when the container is stopped [13:35] fwereade: or at least they will be removed... [13:35] dimitern, fwereade katco, mgz: do you have a minute to review a branch for the release of 1.22.0 http://reviews.vapour.ws/r/1146/ [13:36] sinzui, looking [13:36] dimitern, voidspace: ok, my main worry here is that the provisioner shouldn't really be responsible for it [13:36] dimitern wins... [13:36] dimitern, voidspace: looking ahead, we're going to want to be assigning and removing addresses dynamically [13:37] dimitern, voidspace: I know we're not doing that yet, so having the assignment in the provisioner *for now* is probably ok [13:38] sinzui, ship it [13:38] dimitern, voidspace: but if we're just implementing the removal now, I think it would probably be cleaner for machine death to queue a cleanup that directs all associated addresses to the attention of some address worker [13:38] fwereade, *for now* yeah - that's the key takeaway :) [13:38] voidspace, dimitern: yeah, I understand it's a viable shortcut for now [13:39] fwereade, the plan going forward is to resurrect and fix the networker, which will take care of the dynamic config at run time [13:39] voidspace, dimitern: I'm just suggesting that my spidey sense says this is a networker's job, and that it's better to implement a half-networker for cleanup that to add the cleanup to the provisioner as well [13:39] voidspace, dimitern: which is already creaking under the weight [13:40] voidspace, dimitern: and will need to have the address logic extracted regardless [13:40] voidspace, dimitern: am I making sense here? [13:41] fwereade: dimitern: so either put it in the broker for now and move it to the networker when that is resurrected, or create a new worker and move *that* to the networker later [13:42] the former seems like less busy work :-) [13:42] fwereade, I agree putting that responsibility into the provisioner is not ideal and can only be temporary [13:42] fwereade, voidspace, yes - I was going to suggest what voidspace just said [13:42] or resurrect the networker now with just this functionality enabled [13:42] "the former" I mean [13:42] jam, yes, it might send 2 events if a change happens soon enough before we discard [13:43] voidspace, unfortunately it's a wee bit beyond repair in its current state [13:43] ah, ok [13:43] fwereade: I'm trying to sort through it now, because my current code is sending 2 events when I set it to true [13:43] jam, I think for all practical purposes the problematic case is at startup, though, and we get good behaviour by only accepting discards once we've had an initial change [13:44] fwereade: I have that behavior no problem. I'm talking about Want(true) vs Want(false) [13:44] jam, is the watcher being read from continuously, or do you switch off reading from the source when the filter's client doesn't want the results? [13:44] fwereade: currently the watcher is read continuously but it doesn't forward on an event when it is set to false [13:45] fwereade, so can we agree, for now to leave the cleanup of addresses in the brokers with a TODO + bug filed to move it into the networker ASAP [13:45] fwereade: it seems to be a timing thing [13:45] specifically [13:45] if you start with Want(false) and then watch for changes, you get nothing as expected [13:45] and then making a change, again gets nothing as expected [13:45] but internally that change hasn't actually been seen [13:46] dimitern, if we risk messing around with magical conversions from instance id to container id, I'm not sure we can [13:46] dimitern, but wait [13:47] dimitern, the provisioner knows container id and instance id [13:47] dimitern, making StopInstances stop magically-associated addresses feels like a Very Bad Thing -- shouldn't the provisioner itself just loop through a dead machine;'s addresses, cleaning them up, and then finish off the machine? [13:47] we already have all the info we need there [13:48] fwereade, we absolutely can pass instanceIds to ReleaseContainerAddresses API [13:48] dimitern, but why is the api server talking to the env to release them in the first place? [13:48] dimitern, the provisioner has the env right there [13:48] fwereade, that way the provisioner having state access can find the container id from the instance id [13:48] dimitern, it has the instance ids [13:49] dimitern, the provisioner is already working in terms of container ids [13:49] fwereade, because addrs are given to juju by the environment, and we keep track of them in state - i.e. we need access to both [13:50] dimitern, the provisioner *already has an Environ* -- surely it should not be farming environ calls out to the api server [13:50] dimitern, it can say "hey tell me about the addresses for this container id" as well [13:51] dimitern, and then tell the Environ to clean up those addresses [13:51] fwereade, no it doesn't actually [13:51] fwereade, we're talking about the provisioner on the machine, which doesn't have access to the complete environ [13:51] dimitern, oh, hell, sorry [13:51] * fwereade recontextuallises [13:52] heh [13:53] fwereade, so that's the obstacle there - and it felt ok to me to do it in the env provisioner, where we have both access to state and the environ [13:53] voidspace, dimitern: so, a container has (let's say just) one address [13:54] voidspace, dimitern: when its host is finished with the container, it needs to somehow inform the state server that the address is no longer required [13:55] fwereade, voidspace, that's part of the story - the other part needs to happen on the host (cleaning up routes, etc.) [13:55] voidspace, dimitern: and the state server needs to somehow determine about that container's address, so that it can [unassign it from the instance?] and destroy the address itself [13:56] fwereade: the host, state and the provider all need action [13:56] fwereade: yes, currently that means - fetch all ip addresses associated with the container id from state, unassign them from *the host nic* with the provider and delete from state [13:57] voidspace, dimitern: I'm mainly focused on the interactions between the model and the environ here, I assume for the purposes of this discussion that host cleanup is fine and once it's no longer using the address it needs to make sure it's cleaned up somewhow [13:57] fwereade, voidspace, yeah, and that information for now is just a doc in the ipaddressesC [13:57] fwereade: I need to EOD, I think things are "working" but the double change is causing the test to break. Any chance I can hand off to you? [13:57] jam, go for it, just push and remind me the branch [13:58] dimitern, voidspace: forgive my slowness, but let me restate again [13:59] fwereade, no worries, should we do a g+ in fact? [13:59] dimitern, voidspace: we have: [a Dead container]; [a host that's just destroyed the container instance]; [a host machine with an address assigned for the dead container] [13:59] dimitern, voidspace: is irc ok? I feel like I'll need to be doing some looking back [13:59] irc is fine with me [14:00] fwereade, voidspace, fine with me too [14:00] fwereade, voidspace, so what do you mean by that last part [14:00] fwereade: you are correct, with the note that "host machine with an address assigned for the dead container" means that this address is assigned by the *provider* and needs unassigning there [14:01] voidspace, yeah, that was my point as well [14:01] voidspace, dimitern: so, this provisioner is *almost* ready to remove the Dead machine, but it doesn't want to do that until it's cleaned up everything that the machine depends on, lest they leak [14:02] voidspace, and it doesn't have an env connection itself [14:02] dimitern, ^^ [14:02] fwereade: we don't really care about order - but we do need to ensure they don't leak [14:02] fwereade, voidspace, so far, so good yeah [14:02] voidspace, that's what Dead means essentially [14:02] voidspace, nothing should be seeing or interacting with this entity except its appointed undertaker as it were [14:02] voidspace, ie the provisioner [14:03] heh to "appointed undertaker" [14:04] voidspace, dimitern: so it would seem like the thing to do would be to mark the address dying, and have another worker detect that, clean up the environ-side resource, and set it to dead [14:05] voidspace, dimitern: and in general to follow the model defined by machine, where there's a model object with a lifetime that may or may not have an associated environ reference [14:05] voidspace, dimitern: and in the long term that'd be how we'd create and assign them as well, I expect, but that doesn't have to happen now [14:05] fwereade, voidspace, that *does* make sense to me [14:06] voidspace, dimitern: so in that case we should just be able to Destroy() all those model-side addresses and have a worker to clean them all up separately [14:06] fwereade, voidspace, and fits well with the original job of the networker - watching for changes to NICs and addresses and reacting [14:06] except for allocation we can't add indirection - we must sychronously request the address and then do associated setup [14:06] as we don't know if allocation of any specific address will succeed [14:07] and we want the address at container setup to write the configs correctly [14:07] for death it's fine [14:07] yeah [14:07] voidspace, you don't have to do it now, but I don't think it's impossible -- in the easy static case you just gate container setup on address validity, and in the yucky dynamic case you're no more screwed than before ;p [14:08] fwereade: dimitern: which is fine for the apiserver implementation, but *still* leaves the same question about what information to pass from the machine worker to the apiserver [14:08] fwereade: yep, cool [14:08] and instance id is fine as we know the host / provider so the api server can get the container id [14:09] and that maybe the least effort to provide [14:09] fwereade, I'd rather *not* gate container creation on address availability [14:09] voidspace, I think it's just a matter of giving the addresses tags (which I'm pretty sure should not include their ip, they should probably just be address-) [14:09] fwereade, we'll end up in a world of pain [14:09] fwereade, even now, if we can't allocate a static IP we still start the container - it will be addressable from its host [14:10] voidspace, and then we can (1) ask for the machine's addresses, and get juju entity ids, which we then pass back up in a Destroy call [14:10] voidspace, I think? [14:10] dimitern, disagree, I think [14:10] dimitern, when we create a container, we know what addresses it'll need [in the static case] [14:10] fwereade: so, are you suggesting that we watch for model side container death to Destroy the addresses [14:11] fwereade, static addresses being available cannot be guaranteed [14:11] fwereade: or that the provisioner requests destruction? [14:11] because it would be much easier to not have the provisioner have to know all the addresses [14:11] voidspace, I think the provisioner requests destruction of those addresses because it knows they're no longer needed [14:11] dimitern, right, but consider [14:11] fwereade, yes, but we only try to allocate it when we're about to start it initially [14:11] so how is the provisioner going to know the addresses to request destruction of? [14:12] voidspace, I think it should be able to ask for them? [14:12] if it has to make an api call to get addresses associated with the container, and then request destruction of them [14:12] voidspace, by their life cycle value I guess [14:12] voidspace, although actually that is a pointless round trip [14:12] it might as well just make one call requesting the destruction of the addresses for this container [14:12] right [14:12] voidspace, dimitern: can we not just add a cleanup for machine that destroys the associated addresses and keep that all server-side? [14:13] voidspace, dimitern: the provisioner says it's done with the machine [14:13] if we're going to have a worker doing the cleanup it might as well just watch for machine lifecycle [14:13] fwereade, yes! indeed we can [14:13] voidspace, we'll need dynamic addresses soon enough, let's not couple address lifetime to machine lifetime unless we have to [14:14] voidspace, I was thinking about it but then I forgot [14:14] so machine destruction can request Destroy on all addresses and an address watcher do the cleanup [14:14] and that leaves us free to *also* destroy addresses dynamically [14:15] dimitern, about the gate-on-address-availability, I think I want to keep pushing a little [14:15] fwereade, voidspace, which means we shouldn't try to remove any machine that still has allocated addresses to cleanup, rather than "delete cascade" [14:16] dimitern, if we know we need a container with an address on X network, shouldn't we be getting that address before we even start to run the container? and shouldn't the lack of an address cause us to fail fast, ideally before the provisioner even sees the machine? [14:16] fwereade, so far CA is a new feature and unobtrusive - it will work or not, but won't make things worse than before CA got introduced [14:17] dimitern, hmm, good point [14:17] dimitern, have people been using unaddressable containers though? [14:17] fwereade, however, I can see your point as well - and makes sense [14:17] fwereade: people have been using containers [14:18] fwereade, yeah - except on maas they're unaddressable everywhere [14:18] fwereade: the fact that they're now meant to be addressable shouldn't cause them to fail to be created [14:18] fwereade, voidspace, exactly - at least at this point [14:18] dimitern, voidspace: huh? I thought they were always addressable on maas? [14:18] should we decide to make "have a static address on subnet X" a requirement (e.g. like having lxc-ls installed) - then, yes [14:18] dimitern, voidspace: by means of black magic and layer-breaking, true, but still [14:19] heh, right [14:19] dimitern, but ok, shouldn't that still be a model-level consideration either way? [14:19] fwereade, they were kinda addressable (mostly), before I started fixing stuff [14:19] :) [14:19] dimitern, so to preserve behaviour, you create machines with 0 addresses, so the provisioner doesn't have to wait for any of them [14:19] dimitern, haha [14:19] fwereade, it sounds like a constraint of a sort [14:20] dimitern, well, yes -- I think the idea has always been that everything should have an address on the juju-private network [14:21] dimitern, but yes there is a use case for unaddressable containers [14:22] dimitern, and I suppose we shouldn't take away that ability [14:22] dimitern, bah [14:22] dimitern, I am still not comfortable with giving people unaddressable containers by default though [14:22] fwereade, at least not right now I think :) [14:22] dimitern, I really think that should be opt-in, and that fact that it's hitherto been the deafult is the bug [14:23] fwereade, now we're not - we're making a best effort to allocate an address every time, if supported and have available [14:23] Bug #1430049 changed: unit "(AnyCharm)" is not assigned to a machine when deploying with juju 1.22-beta5 [14:23] Bug #1431372 was opened: juju restore fails with "/var/lib/juju/agents: No such file or directory" [14:24] dimitern, yeah, and that's cool, but it's unpredictable in practice -- even if it works 90% of the time, that 10% will be baffling [14:24] dimitern, it's not very useful if all we can say is "your container might get an address" [14:24] fwereade, it's very well logged every step of the way, but it has to be exposed more [14:25] dimitern, "your container will get an address, *or* fail, *or* not get an address because you explicitly said you didn't want it" is more palatable [14:25] fwereade, we also can't guarantee you won't get a broken instance from the environ [14:26] dimitern, yay good logging, but it's not there to support the UX -- that should surface the important things directly [14:26] fwereade, but it happens now and then on EC2 with CI [14:26] fwereade, yes, I agree - address allocation should be exposed with the status / health checks at some poit [14:26] point [14:27] dimitern, different class of problem -- that's equivalent to "the provider said it gave you X address but it lied" [14:27] dimitern, but my point I think is that a failure to allocate an address does indeed need to feed back into an error on the machine [14:27] fwereade, from UX POV it's not "the provider" it's juju's fault :) [14:28] fwereade, ok, I'll keep this in mind [14:28] so for the immediate problem [14:29] fwereade, but unlike the explicit relation endpoints bindings in the net phase 1 model, we don't have a way for the user to express intent [14:29] we keep the provisioner api, called by the provisioner/broker [14:29] but change the server side to give addresses a life-cycle and a cleanup worker [14:29] *or* we keep all the cleanup server side [14:29] voidspace, fwereade, for the immediate problem, let's agree to have a cleanup procedure as part of the machine cleanup in the cleanup worker [14:29] ok [14:29] it will get veeeery clean [14:30] :-) [14:30] cleaner the better [14:30] so we can revert my last merge [14:30] voidspace, dimitern: only concern: calling into the environ to do work from within cleanup [14:31] voidspace, dimitern: would be much more comfortable with a lifecycle watcher for addresses that just worries about dying->dead [14:31] fwereade, why a concern? [14:31] fwereade, yeah, +1 - that will be the next step - networker [14:32] fwereade, dimitern, anyone else: wwitzel3 and I are working on upgrading non-state servers into state servers... we have the code to update the data in mongo, is a watcher the correct way to have the intended server notice it's supposed to convert into a state server, or is there a better way? [14:32] natefinch, +1 watcher, I think [14:32] voidspace, it doesn't have to be reverted now - it can be molded into the cleanup procedure :) [14:32] dimitern, voidspace: I am failing to articulate it, but it's really not the right layer [14:33] dimitern, voidspace: we need to do dances to use the environs from inside state anyway [14:33] natefinch, yeah [14:33] dimitern, voidspace: I'd prefer not to encourage it except for really important reasons [14:34] fwereade, how about a worker then - akin to the instance poller [14:34] dimitern, voidspace: namely preflighting env requests to help give good fast-failing UX in response to nonsensical requests [14:34] dimitern, perfect [14:34] dimitern, and, yes, it's *another* Environ in the same process, which is bad and wrong, but I think I have a plan for that [14:34] fwereade, voidspace, great - then it's decided [14:34] * fwereade cheers [14:34] thanks guys [14:34] lovely [14:34] dimitern: please summarise :-) [14:35] a new worker to do address cleanup [14:35] fwereade, thank you! [14:35] addresses to have a lifecyle [14:35] voidspace, just a sec [14:35] ok [14:36] voidspace, we can do it with a life cycle or otherwise, but essentially a strings/notify watcher worker [14:36] voidspace, which listens to changes to ipaddresses and responds by releasing those no longer needed [14:37] (hmm it can even eventually also handle the allocation i guess) [14:38] voidspace, we need to have a way to signal that worker - having a life is the simplest and obvious choice [14:38] voidspace, however, we already have AddressState === mgz is now known as mgz_ [14:38] voidspace, which is kinda like life - moving forward [14:38] dimitern: although that changes more frequently [14:38] dimitern: so the watcher would be triggered more often [14:38] dimitern: but fine [14:39] dimitern, voidspace: I would recommend using Life myself [14:39] cool [14:39] voidspace, sad to say that wouldn't help with frequent triggering, but for coincidental implementation reasons rather than fundamental design ones iyswim [14:39] heh [14:39] voidspace, dimitern: *unless* we put the life field in a distinct doc [14:39] that all got a lot more meta than I anticipated [14:39] but thanks [14:39] fwereade, voidspace, ok, for pre-existing docs it won't be there, which would mean "alive" I guess.. hmmm smells like an upgrade step for sure [14:39] right [14:40] voidspace, dimitern: yeah, think so [14:40] we could make alive the empty string [14:40] then there's no upgrade needed... [14:40] ;-) [14:40] :) if only.. [14:40] voidspace, dimitern: I would quite like life in a distinct doc, because trigger patterns, but the inconsistency bugs me [14:41] so a separate collection with an ipaddress reference and a life [14:41] voidspace, fwereade, ok, that starts to sound to me like the old idea we had about an "environ-level networker" [14:41] I would *really* like to go to lunch [14:42] dimitern, voidspace: fwiw Alive *is* iota [14:42] voidspace, sure :) [14:42] so if there are any more changes, dimitern can communicate them to me - but it sounds like we're there [14:42] fwereade, nice! [14:42] yes I think so [14:42] dimitern, voidspace: but I would rather just write the fiueld properly than depend on unmarshalling, sometimes we write code that looks in the db not just the local docs, I think it's a landmine [14:42] yep, agreed [14:42] dimitern, voidspace: so upgrade step please [14:43] fwereade, +1 [14:43] voidspace, I'll summarize the steps needed and we can discuss them tomorrow? [14:53] Bug #1431401 was opened: Juju appears hung when using the local provider for the first time [14:58] xwwt, alexisb dimitern natefinch fwereade : 1.23 has a pass. Do you want to branch? Now, or wait for more features/fixes to merge [14:59] sinzui, lets branch === kadams54 is now known as kadams54-away [15:00] sinzui, I have one more fix, but it will get merged after branching just the same [15:00] [15:00] /query alexisb [15:00] dooferlad, that's yours ^^ === kadams54-away is now known as kadams54 [15:00] dimitern, fixes can still go in 1.23 :) [15:01] just no more features with with the exceptions fo the ones the release team and I have agreed to :) [15:01] katco, you looking for me? [15:01] alexisb, yeah :) [15:20] dimitern, master is 1 commit beyond out test. You need to merge dimitern/ca-fixes-clone in the 1.23 I create. If you are fine with this, I will create the branch this minute [15:22] sinzui, ok, I can do this [15:23] dimitern, done. https://github.com/juju/juju/commits/1.23 [15:24] sinzui, cheers [15:36] sinzui, is the merge bot monitoring 1.23 yet? [15:36] dimitern, adding it right now in fact [15:36] it is [15:36] :) great timing sinzui [15:37] sinzui: branch it. === kadams54 is now known as kadams54-away [15:49] jam, do you have a moment to do a review for me? http://reviews.vapour.ws/r/1005/ [16:02] sinzui, is landing in master blocked? [16:03] sinzui, or rather - it appears to be blocked but I didn't think it was [16:03] mattyw, no, but try again, the bug is fix [16:04] mattyw: Why does get meter status return a string for the code and not a MeterStatusCode? Why even have the type if you're not going to use it? [16:05] natefinch, where are you looking? it used to return strings but in theory I've made changes to use the types now [16:05] (I know that's not your change here, just curious) [16:05] natefinch, you looking at 1005? [16:07] mattyw: yes [16:07] mattyw: https://github.com/juju/juju/pull/1677/files#diff-8319a651ebe0aac63caf50325cef6903R112 [16:29] Bug #1430898 changed: run-unit-tests ppc64el timeout [16:29] Bug #1431444 was opened: juju run results contain extraneous newline [16:31] you know your product is becoming successful when people start complaining about newlines === kadams54 is now known as kadams54-away === kadams54-away is now known as kadams54 === kadams54 is now known as kadams54-away === kadams54-away is now known as kadams54 [18:21] wow, just stumbled on cmd/jujud/agent/machine.go which has 77 import statements :/ [18:27] wow, google code is shutting down [18:27] http://google-opensource.blogspot.com/2015/03/farewell-to-google-code.html [18:28] davecheney, http://reviews.vapour.ws/r/1139/ should be satisfactory now :) [18:28] natefinch ... wow [18:31] natefinch: I think that is the nature of all google projects [18:45] natefinch: is there way to block on juju run until the service is up? [18:45] or do i need to write some glue code the parse juju status until the state is "started" [18:48] lazyPower: hmm [18:54] yeah, i kinda thought i'd have to write some glue code. nbd - just wanted to make sure i wasn't reinventing something that already existed [19:01] lazyPower: Juju run might work to keep your refresh of juju status from running more often than you need... but I'm not sure that juju run will block until the service is started. [19:01] yeah it actually returns "No service" so its getting run before the service is even deployed. Either way i'm going to have to block myself. [19:01] and it doesn't make sense for juju run to block when the service doesn't even exist [19:01] ahh yeah, that makes sense [19:22] can apt proxy/mirror settings change while an instance is running? [19:23] bogdanteleaga: probably [19:24] natefinch: have any idea where's the code that deals with it? [19:25] bogdanteleaga: I didn't say we had code to deal with it, I said I wouldn't be surprised if it could happen :) [19:26] bogdanteleaga: what would you need to do in response to a change? Isn't the point of the proxy just that apt will now download stuff from somewhere else? [19:27] natefinch: well I found some stuff that's done wrt it when the instance is booted through cloudinit [19:28] natefinch: not sure about changes after that === kadams54 is now known as kadams54-away === kadams54-away is now known as kadams54 [20:25] thumper: got a little time to talk about converting servers into state servers? having an auth problem logging into the mongodb, wondering if you might have insight as to how to fix that up [20:29] um... sure [20:30] haha [20:30] just hoping you'll think of something I've missed :) [20:31] natefinch: hangout? [20:31] thumper: https://plus.google.com/hangouts/_/canonical.com/moonstone?authuser=1 === kadams54 is now known as kadams54-away === kadams54-away is now known as kadams54 [20:55] oh man, I was also interested in natefinch and thumper conversation, I have a similar problem === kadams54 is now known as kadams54-away === kadams54-away is now known as kadams54 [21:04] perrito666: you missed an awesome conversation. Pure magic. [21:18] thumper: Well I dont doubt there was magic :p I was hoping for a bit of engineering though ;) [21:27] perrito666: nah, not much of that [21:28] * perrito666 shrugs and puts his gandalf suit === robbiew is now known as robbiew-afk === robbiew-afk is now known as robbiew [21:47] I am going to make a very generic question so please dont hate me and, if you think you have any answer whatsoever that might help please contribute it [21:48] I want to change an api server machine tag, what else, besides changing the tag in agent config, you think I should do? [22:13] thumper: ok, apparently my browser spell checker thinks your name should be Time, sorry [22:13] ") [22:13] perrito666: why change the machine tag? [22:14] perrito666: you will also need to change the symlink for the tools [22:14] thumper: I already have code that does that [22:14] my question though is why? [22:14] thumper: because, when you restore you cannot guarantee that the freshly created state server has the same tag than the old one [22:14] * thumper has a theory [22:14] hmm... [22:15] buuut [22:15] I do think there should be another way [22:15] does the upstart script mention the machine id anywhere? [22:15] thumper: not that I recall [22:16] if you change the tag of the machine, it will no longer be able to connect to the api server [22:16] because the password will not work [22:16] I have to make a change in one of two, either the backed up data or the newly created machine [22:17] a mongo user is created for the machine with a particular password [22:17] that user is the machine tag IIRC [22:17] yes [22:17] on the physical machine, the places are the agent config file, and the link to the tools [22:17] I think that is all [22:17] actually [22:17] I think the machine tag is mentioned in the service file [22:18] what worries me is the values inside the mongo [22:18] at least the name of the script [22:18] yes, the password will be wrong in mongo === kadams54 is now known as kadams54-away [22:19] there are quite a few moving parts there [22:19] you just made me realize that, since I am changing everything in the newly created machine, I could just change whatever reference is not replaced by the backup data [22:19] and that would save me from actually changing the tag [22:20] that leaves the question I asked in the email [22:21] I need admin privileges to jumpstart the rs [22:22] and iirc, some of the changes I do are also admin-ish [22:43] perrito666: I only just noticed the email, been focused on doing HR stuff [22:43] I'll think about it and try to have a well thought out response :) [22:44] thanks, that way I can feel worse about just copy pasting the email [23:05] any chance we can get this bugfix merged for 1.23? https://bugs.launchpad.net/juju-core/+bug/1431612 [23:05] Bug #1431612: Action defaults don't work for nil params [23:15] Bug #1431612 was opened: Action defaults don't work for nil params