[00:09] davecheney: would you mind taking another look at http://reviews.vapour.ws/r/346/? [00:21] wallyworld_: yeah, i can log a bug [00:21] there isn't much we can do about it [00:21] yeah :-( [00:22] except fix our test fixture maybe [00:22] the problem is there is no signal that says "the test suite is over, and there are no more test suites" [00:22] the last bit is missing === kadams54-away is now known as kadams54 [00:54] wallyworld_, thumper: here's the other part of the upgrade steps work. http://reviews.vapour.ws/r/399/ [00:54] ack [00:55] ok === kadams54 is now known as kadams54-away [01:04] wallyworld_: https://bugs.launchpad.net/juju-core/+bug/1391353 [01:04] Bug #1391353: state: testing suite leaks ~35 mb (one mongodb database) per test run [01:05] ta [01:10] wallyworld_: i can think of two solutions, one that isn't possible with 1.4 [01:10] how severe do you think this is ? [01:10] the other solution is a fair bit of work [01:10] hmmm, well it fails out landing runs every so often [01:11] nah, i think that is differnt [01:11] * wallyworld_ has to but foo figher tickets NOW, can't talk for a bit [01:11] 35mb is peanuts compared to the 600 gb or something we get on /mnt on a c3 xlarge [01:11] buy === menn0 is now known as menn0-afk [01:44] yikes, big thunderstorm here, killed the power [01:53] sooooo close [01:53] one failing test in state [01:53] and i can repropose my branch [02:09] * thumper is crossing his fingers for 100Mb symmetric fibre UFB === whit_ is now known as Guest44905 [02:14] thumper: thanks for that feedback [02:15] np [02:15] thumper: even though davecheney has been made the point to me several times (thanks for your patience, Dave), it finally clicked for me today right before your review showed up :) [02:15] thumper: I'll be back online a bit later if you want to follow up [02:15] kk === blackboxsw is now known as blackboxsw_away === menn0-afk is now known as menn0 [02:48] thumper: I still can't get fibre here in central christchurch [02:49] thumper: parts of chch have it, but not our area yet [02:50] wallyworld_: cheers for the review [02:50] np === kadams54-away is now known as kadams54 === urulama__ is now known as urulama [04:29] thumper: about backups CLI [04:30] thumper: is it worth finding an alternative to "juju backups ..."? === kadams54 is now known as kadams54-away [07:21] morning all [07:28] dimitern, o/ [07:29] fwereade, heyhey [07:32] fwereade, do we have a precedent for extending the meaning of an environment setting? I'm working on bug 1367863 and I'm thinking of changing "use-floating-ip" from boolean to string with "always|never|state-server" values [07:32] Bug #1367863: openstack: allow only bootstrap node to get floating-ip [07:33] fwereade, or would you add a new setting instead? [07:45] dimitern, hum, let me think [07:46] dimitern, I *think* our upgrades are such that we could do that sanely now -- although we'd need to be prepared for the bools and able to convert them [07:47] fwereade, yeah, so parsing needs to be a bit smarter [07:47] dimitern, but would you double-check with menn0? I'm pretty sure everything'll go into upgrading mode before the changes start, and won't react until they have themselves upgraded [07:48] fwereade, sure [07:49] fwereade, upgrades do not change envs.yaml though.. how about when you try to upgrade an existing openstack environment using "use-floating-ip": true? After the upgrade it will effectively be the same as "use-floating-ip": "always", and if we change it to something else - then what? [07:50] fwereade, it feels like this setting should be immutable once you bootstrap [07:50] dimitern, well, envs.yaml becomes irrelevant once there's an env up [07:51] dimitern, well *really* we should be assigning and unassigning them based on that setting and on services being exposed and unexposed [07:51] dimitern, I don't suppose openstack does the floating-ip-on-expose thing yet though? [07:52] fwereade, eventually yes, but for now I think it's safer not to allow changing the setting after bootstrap [07:52] fwereade, yeah, it doesn't do it now [07:53] dimitern, ok it mainly depends on the context of who needs it [07:54] fwereade, so how about "always|never|state-server|auto" - the last meaning both "state-server" and add FIP on expose, remove it on unexpose? [07:54] dimitern, that seems likely sanest, especially considering the reference to bug 1287662 in there [07:54] Bug #1287662: Add floating IP support [07:54] fwereade, it's a bit more work, but I think it's useful [07:55] dimitern, I *am* raising a slight eyebrow at this being on the top of the list, given that the original bug is given a somewhat wooly "ehh this might be useful to some people" characterisation [07:56] dimitern, and, hmm it looks like niedbalski just picked up that bug [07:56] niedbalski, ping [07:57] fwereade, that's true, the auto-assignment is nice, but not a priority [07:58] fwereade, and we better do it sanely for all providers than piecemeal like this [07:59] dimitern, well, I think there's some potentially serious collision between what you're doing and niedbalski is, to begin with [07:59] dimitern, but regardless, why is the state-server-only bit a priority? [08:00] fwereade, the bug is triaged as high for 1.21 [08:00] dimitern, ISTM that it's really not -- while the manual floating ip assignment is a straight-up bad idea compared to assign-on-expose [08:01] fwereade, and assign-on-expose *does* include the case for adding FIP for the state server, surely [08:01] dimitern, and I'm just looking at "Just to be clear, I think the current behavior is great, this would be just another option that some orgs may want." in the bug [08:03] fwereade, for canonistack or similar OS clouds with limited FIP pools is very important, as it's barely usable otherwise (I can bootstrap, but can't deploy another node on lcy01 due to FIPs shortage and lcy02 is 200% slower for me) [08:04] fwereade, and it's actually very easy to implement the assign-FIP-only-for-JobManageState [08:04] dimitern, yeah, but that's a pretty arbitrary hack, *especially* given what niedbalski seems to be going off and doing [08:05] fwereade, ok, I'll put it on hold until I have a chat with niedbalski then [08:05] dimitern, cool, thanks -- tbh I feel like those two really come down to "do floating ips like we should have in the beginning" [08:06] fwereade, :) well said sir [08:06] dimitern, so we should default to giving state servers floating ips, but not give them to anything else until they're running a unit of an exposed service [08:07] dimitern, please push back hard on the "juju add-floating-ip" thing [08:07] dimitern, because that's either going to take serious work in the model or get a bunch of NOT LGTMs from me [08:07] dimitern, and that won't make anybody happy [08:07] fwereade, of course, I never considered that sane [08:08] dimitern, <3 [08:08] fwereade, it's a really ugly hack, and we did discuss having a better way to express this in the model, eventually [08:09] dimitern, I *think* that expose is the right way to express it -- and I *think* that it's essentially consistent with the expose-as-relation plans too [08:09] dimitern, so I'd really prefer to keep it under that umbrella until we've got really clear use cases that can't be solved that way [08:10] fwereade, sure, we'll get there sooner or later === alexlist` is now known as alexlist [08:35] dimitern, do you have a few minutes to review http://reviews.vapour.ws/r/403/ ? [08:35] fwereade, sure, looking [08:46] morning all [08:49] mattyw, morning [08:49] morning all [08:49] dimitern, voidspace morning [08:49] fwereade, reviewed; a few suggestions, but nothing blocking [08:49] voidspace, morning [08:50] dimitern: saw the reviews from you and axw [08:50] dimitern: I will look at the maas code and see if we need to call release on each one [08:51] voidspace, great, thanks [08:54] dimitern: we could also ask *why* they won't release a node in disk erasing state [08:54] dimitern: although the original bug was about a node that was commissioning [08:54] dimitern: so we would still have to handle that [08:55] voidspace, well, strictly speaking until the disk is erased it still contains data relevant to the user that allocated it [08:55] dimitern: but release could return as a no-op like it does when you call release on a node that is Ready [08:56] dimitern: because it will return to the Ready state "soon" [08:56] dimitern: I don't see what the benefit / purpose of the error 409 is [08:56] dimitern: effectively the state is "releasing" [08:56] voidspace, is it? [08:56] dimitern: isn't it? [08:56] :-) [08:56] voidspace, so it's no longer Allocated? [08:57] dimitern: it is "scheduled for release" [08:57] voidspace, right [08:58] voidspace, ISTM juju needs to grow the knowledge for all those states wrt StartInstance, StopInstance and AcquireInstance (as a special case of the former) [08:59] voidspace, but it doesn't have to happen all in once [08:59] dimitern: we don't look at the status when we call StopInstances [09:00] voidspace, we don't because we don't really care - the only place we care about status is in the instance updater calling Instances() periodically [09:00] right, so why would we grow knowledge about those states if we don't care [09:01] we just care about success or fail [09:01] unless secretly we do care ;-) [09:01] voidspace, but yesterday as I was fixing that openstack issue, I would've wanted a way to trigger an update to the cached instance state, if I know something relevant happened (e.g. agent just got down) [09:02] morning [09:02] TheMue: morning [09:02] morning TheMue [09:03] voidspace, we only care if it helps juju handle better such corner cases, like "what does 409 mean in this context" [09:03] right [09:04] dimitern: so we call StopInstances with multiple ids [09:04] dimitern: which passes those ids (after converting to maas ids) through to MAASObject.CallPost("release", ids) [09:04] dimitern: the MaaS code takes a *single* id [09:05] dimitern: I can't yet see how the call is converted into multiple calls to release [09:05] * TheMue things sometimes only time and a bit of sleep help. if I'm right I found why my vMAAS networking isn't working. [09:05] voidspace, what if we continue calling it with multiple ids, unless we get 409, then we retry by calling it for each id [09:05] dimitern: do you know the mechanism [09:05] dimitern: sure, I understand the suggestion [09:05] voidspace, what, what? [09:05] :) let me see the maas source [09:05] dimitern: src/maasserver/api/nodes.py [09:06] dimitern: NodeHandler.release [09:07] something turns a single api call into multiple calls to the nodehandler methods [09:07] it's not the operation decorator [09:07] hmm.. still looking [09:07] urls_api.py RestrictedResource maybe [09:09] voidspace, btw which maas version are you testing on? [09:09] dimitern: 1.7 [09:10] dimitern: found it [09:10] dimitern: there's a release method that handleds multiple nodes [09:10] dimitern: it does them all separately [09:10] voidspace, and returns the first error? [09:11] dimitern: so one failing will cause an error to be raised, but the others will be done [09:11] dimitern: well, it concatenates messages [09:11] if any(failed): raise NodeStateViolation() [09:11] NodeStateViolation is error 409 [09:12] voidspace, hmm.. so if we parse the response, can we tell which ones failed? [09:12] dimitern: we could if we cared [09:12] voidspace, we should care if all the rest are unchanged [09:13] dimitern: the rest have been released [09:13] voidspace, sweet! [09:13] dimitern: error 409 means "Juju doesn't need to care about those nodes any more" [09:13] and the rest are done [09:13] voidspace, so we don't care then, but it deserves a comment about it [09:13] ok, will do [09:13] voidspace, cheers [09:15] fwereade, re bug 1301996 - I have a lingering feeling this is caused by service settings reference counting we implemented post 1.16 [09:15] Bug #1301996: config-get error inside config-changed: "settings not found" [09:16] dimitern, I think service settings refcounting has been around longer than that [09:16] dimitern, I think I tried to track it down and couldn't [09:16] dimitern, if you can that would *absolutely* be a good use of time though [09:17] dimitern, and, yeah, it's a complex area, I won't swear to it being bug-free [09:17] fwereade, when did we cache the settings in the context initially? at relation-joined? [09:17] fwereade, hmm.. or maybe it was in EnterScope [09:18] dimitern, config settings don't get cached except within a single hook execution [09:18] dimitern, oo I just had a thought [09:18] dimitern, order of execution in refcount ops interacting badly with ReadConfigSettings? [09:19] fwereade, could be, but that would be hard to reproduce [09:20] dimitern, new settings doc needs to exist before unit charm-url field changes; old settings doc needs to exist until after it changes; and we need to be prepared for a refresh/retry in ReadConfigSettings anyway [09:20] dimitern, I bet we're missing one of those things [09:20] dimitern, most likely the last one, I *think* I remember considering the first two when I wrote it [09:20] fwereade, I'll have a deeper look [09:21] dimitern, but I have seen arbitrary changes to order-of-operations landing too [09:21] dimitern, so they're worth a look as well [09:21] dimitern, tyvm [09:29] fwereade, so it seems service.changeCharmOps shouldn't decref the old settings until service's charm url have changed [09:29] dimitern: any chance of a ship it on the updated diff http://reviews.vapour.ws/r/397/ [09:29] dimitern: when you have a minute [09:29] and I'll get coffee [09:30] voidspace, looking [09:30] fwereade, hmm.. but it does add the decref ops at the very end [09:30] voidspace: thanks for verifying the 409 thing. [09:30] axw: no prob, thanks for the review [09:31] dimitern, the unit is the tricky one re refcounting [09:31] dimitern, the unit gets the settings according to its current charm url [09:31] dimitern, not the service's [09:31] voidspace, ship it! :) [09:31] fwereade, ok, I'll have a look there as well [09:34] dimitern: thanks [09:45] fwereade, just looking at the code I can see 3 potential issues: in changeCharmOps - 1) if the new settings do not exist yet, we'll generate an Insert op both with createSettingsOp and settingsIncRefOps(..., canCreate=true); 2) in SetCharm: the TODO comment from waigani is correct - an assert will trigger the code calling isAliveWithSession passing the service name as key to the settings collection, which will surely return an error [09:45] ; 3) in unit.SetCharmURL we're doing an incref(new settings) op, but not asserting the old settings hasn't changed [09:47] dimitern, I don't see why 3 is a problem (*except* when we're creating the new settings) [09:47] dimitern, (2) at least should be easy to repro with a TxnHook test [09:48] dimitern, for (1) I can't remember if we have export_tests for actual settings refcounts but I think we do so that should be reproable as a unit test too [09:48] fwereade, in unit.SetCharmURL we're never creating the settings, I suppose because we expect service.SetCharm to have done it [09:49] dimitern, ahh, there could be some way to miss that, couldn't there [09:49] fwereade, but what if the service charm url changes again during unit.SetCharmURL ? [09:49] dimitern, would depend on a very annoying and quick sequence of service charm changes [09:49] dimitern, I *think* that should never matter [09:49] fwereade, right [09:50] once a unit has set its charm url, it's got a ref to the settings, so the service changing charm url shouldn't be an issue, the refcount shouldn't hit 0 [09:50] fwereade, so fixing these 3 issues and writing proper tests should fix the bug.. but I need to find a way to reproduce it first [09:51] dimitern, well, if you can come up with a precise sequence of unit tests that will repro it -- which you should be able to with txn-hooks, I think -- that's good enough for me [09:51] f**k [09:51] sorry [09:51] dimitern, trying to repro it in a running system will be insanely timing-dependent, I think [09:51] fwereade, if the service changes the url again that will decref the old settings, and since a unit is still upgrading to the old url it will incref the old settings and they'll stay there forever... hmm or perhaps not, because next time the unit upgrades it will decref them [09:52] dimitern, that's the idea, yeah [09:53] fwereade, well, you can always add sleeps to trigger it :) I'll dig in some more [10:00] jam1, jam3, standup? [10:05] dimitern: TheMue: I just got dumped out [10:05] dimitern: TheMue: I think I have to authenticate again [10:05] voidspace, ha, sorry [10:08] jam1: ping [10:08] jam1: standup? [11:00] fwereade: you free now? [11:01] wallyworld_, sure [11:01] coolio, meet you in hangout [11:25] wallyworld_, ah ffs, waiting for plus.google.com [11:25] fwereade: connection sucks, but we had sorta finished anyway [11:26] the url stuff looked ok at first go, so long as all existing tests are kept [11:26] the existing test reflect what we want/need to do [12:15] fwereade: i know there was talk of allowing a charm to provide feedback to the client (eg. to indicate that something is wrong without necessarily entering a hook error state). has anything like that been implemented yet? === liam_ is now known as Guest26825 [12:53] dimitern: you're ocr today :-) [12:53] dimitern: http://reviews.vapour.ws/r/404/ [12:53] my PR is 404... [12:54] voidspace, sure, will look shortly [12:55] dimitern: np [12:55] dimitern: I'll be going on lunch in a bit, no hurry [14:28] voidspace, sorry I got distracted; you've got a review [14:32] dimitern: thanks [14:32] ctrl-w in wrong window... === robbiew is now known as robbiew-afk === robbiew-afk is now known as robbiew [16:38] dimitern: ping [16:38] dimitern: the new maasEnviron.ListNetworks [16:39] dimitern: other than the return type, how will it be different from maasEnviron.getInstanceNetworks ? [16:40] and the fact that it takes an instance id rather than an instance.Instance === kadams54 is now known as kadams54-away === LinStatSDR_ is now known as LinStatSDR === kadams54-away is now known as kadams54 [18:28] mfoord, sorry, was afk; getInstanceNetworks is close to what ListNetworks needs, but there's some CIDR parsing/validation logic in maasEnviron.setupNetworks which I'd like to happen when ListNetworks processes the result of getInstanceNetworks [19:09] g'night all [19:28] some strange relation behavior.. JUJU_REMOTE_UNIT is not in relation-list [19:30] hmm [19:52] fwereade: hey there [19:52] fwereade: we should set up regular calls again === urulam___ is now known as urulama___ === kadams54 is now known as kadams54-away [20:11] menn0: thanks for offering to review that mega branch [20:11] i'll address the copywrite issues and push it again now === kadams54-away is now known as kadams54 [20:13] davecheney: ok, let me know when it's there [20:14] menn0: done [20:14] davecheney: looking [20:15] davecheney: i don't see it on RB. it this a pre-RB branch? [20:16] menn0: nope [20:16] rb has shat itself [20:16] see email from wallyworld_ [20:16] waigani_: [20:16] davecheney: ok. i haven't seen that yet. [20:16] it is possible that this branch was what caused the corronary [20:17] heh [20:18] davecheney: i don't have that email [20:18] davecheney: but i'll review on GH [20:19] Jesse Meek [20:19] 6:28 AM (50 minutes ago) [20:19] Reply to all [20:19] to juju-dev [20:19] The latest three reviews on GitHub (#1103,#1102,#1101) I cannot see in Review Board. Do we have a loose wire? [20:19] davecheney: right, but you said wallyworld_ :) [20:20] yes, i corrected that on the next line [20:20] sorry for the confusion [20:20] sorry, I thought you were intending to write something to jesse :) [20:20] anyway... reviewing! [20:21] davecheney: GH/my browser paused for much longer than normal when opening the diff :) [20:22] davecheney: I don't think it was your branch, there are two PRs before yours that also have not popped up on RB === kadams54 is now known as kadams54-away [20:24] menn0: it's a mere 2,400 lines of change [20:25] davecheney: most of it is very mechanical though... you're wearing out my scroll wheel :) [20:25] menn0: juju.go and constants.go at the root are the only real changes [20:25] anything which referenced those types s/params/juju [20:25] there are no other non mechnical changes [20:27] menn0: oh, you found that [20:27] :) [20:28] there are only two of those [20:28] in the other places I jj "github.com/juju/juju" [20:28] please don't mention the war [20:28] (about packages0 [20:30] * menn0 goes to install more ram just so he can finish this review [20:31] dat pagination [20:31] btw, go imports is great [20:32] alias gi='goimports -w .' [20:32] edit edit, [20:32] gi [20:32] go test [20:32] next package [20:32] yeah... I have it hooked up to Emacs when I save a .go file. [20:32] lifechanging [20:34] davecheney, waigani_ thanks for reviewing my id branch (#1081). i've pushed fixes but a couple of questions for y'all [20:34] davecheney, question for you here, http://reviews.vapour.ws/r/338/#comment2894 [20:37] davecheney: done [20:37] menn0: ta [20:37] i've fixed that nit [20:37] * menn0 dips his mouse in a glass of water to stop the smoke [20:37] waigani_, question for you, more of a general login security thing: http://reviews.vapour.ws/r/338/#comment3469 [20:38] cmars: be with you in a sec [20:41] menn0: http://reviews.vapour.ws/r/400/ [20:43] hmm, have all our bots died ? https://github.com/juju/juju/pull/1103 [20:44] cmars: good point [20:44] given that it needs to exist in that odd form [20:44] just reply to that comment and tell me to pull my head in [20:45] cmars: replied. Wrapping in common.ErrBadCreds makes sense, agreed [20:46] davecheney, waigani_ thanks [20:48] davecheney: have you tried using the rbt to push to RB? [20:50] waigani_: nup [20:50] waigani_: but the bot i was worried about is the commit bot [20:50] which hasn't picked up the $$merge$$ in 10 minutes [20:51] oh.. [20:54] I need to get out of this house, going to work in town. I'll be offline for 20min while I drive in. === kadams54-away is now known as kadams54 [21:15] thumper: i found a bug in set.Strings yesterday [21:15] http://paste.ubuntu.com/8948366/ [21:15] hold [21:15] thumper: http://paste.ubuntu.com/8948372/ [21:15] ^ this one [21:25] davecheney: not sure if you've seen but looks like the build bot worked eventually but there's test failures [21:26] yeah [21:26] looking into the maas fialures now [21:38] thumper, ping [21:38] alexisb: hey there [21:38] hey thumper you have a second? [21:38] for a hangout [21:39] yep, just getting off with cmars [21:50] anyone know how to use jc.TimeBetween? [21:50] I think there might be a typo in its Check === LinStatSDR_ is now known as LinStatSDR [21:59] thumper: ping, 08:15 < davecheney> thumper: http://paste.ubuntu.com/8948372/ [21:59] yep... otp right now, with you very soon [22:00] kk [22:00] just found this bug is affecting the maas code [22:09] davecheney: I need to go get my daughter from school - she is unwel [22:09] davecheney: can we chat when I get back? [22:09] sure [22:23] davecheney: our regular 1:1 hangout [22:28] thumper: coing [22:28] like a bird ;-) [22:29] imma here [22:29] did you mean the standup hangout ? [22:29] hmm... I'm there too. [22:29] no, our 1:1 [22:29] * thumper tries again === kadams54 is now known as kadams54-away [22:47] cmars: i'm looking at review 338 [22:47] cmars: has the "juju server" command been given general approval? [22:51] menn0, might need more discussion, now that I think about it [22:51] menn0, how about I break out the subcommand into a separate PR? [22:52] cmars: or just disable it pending further discussion (but leave the code there) [22:52] cmars: just don't hook it up [22:55] menn0, it's currently not hooked up to the cmd/juju supercommand [23:03] cmars: ok, that's fine then. I haven't gotten to that part of the PR yet. I was asking based on the description of the changes. === kadams54 is now known as kadams54-away [23:23] menn0: waigani thumper https://github.com/juju/utils/pull/84 [23:23] davecheney: still doing another review but will get there soon [23:23] kk [23:23] there is no rbt on that repo [23:24] oh [23:24] actaully there is [23:24] wadda you know [23:24] http://reviews.vapour.ws/r/406/ [23:24] yay, rb is over it's hangover [23:24] \o/ === kadams54-away is now known as kadams54