[00:03] veebers: do you know if bug 1621576 is in progress? [00:03] Bug #1621576: get-set-unset config will be renamed [00:09] wallyworld: it looks like curtis landed the fix in the tests for that. Seems he didn't update the bug. I'll confirm with him that it's finished (but it looks like it is) [00:09] veebers: awesome. the reason for asking is that we will be looking to land the code changes to juju to use the new command syntax [00:09] and without the ci script changes, ci will break [00:15] wallyworld: ack, I'll have confirmation for you tomorrow :-) But I'm pretty sure that fix is complete [00:15] ty [00:16] the juju pr needs a little fixing, so that won't be ready till when the US comes back online anyway [00:45] Bug #1560487 changed: local provider fails to create lxc container from template [01:56] axw: easy one: http://reviews.vapour.ws/r/5646/ [01:57] menn0: LGTM [01:59] axw: thanks [02:50] Nites, anyone happens to know dimitern's mobile? [03:06] wallyworld: another migration cleanup: http://reviews.vapour.ws/r/5648/ [03:07] ok [03:11] wallyworld: re-instatement (?) of vsphere supported architecture - http://reviews.vapour.ws/r/5649/ [03:12] ok, on my list [03:12] :) [03:15] perrito666: not here, you make it in? [04:02] wallyworld: do you have the config changes spec handy? I thought application config was just config now? [04:02] rick_h_: it is [04:02] but not in beta18 [04:02] wallyworld: did it not make b18? oh crap [04:02] PR up today, will land tonght [04:02] ah, thought b18 got all but one commit [04:02] ah, gotcha [04:02] rick_h_: sorry :-( [04:03] we just ran out of time [04:03] wallyworld: all good, just working on my slides for tomorrow and checking my thoughts vs reality in the beta [04:03] needed to coordinaye withCI etc [04:03] wallyworld: understand [04:03] put an asterisk :=) [04:03] yep, will work it out [04:04] rick_h_: also, i will have a fix today for machines not reporting as Down when they get killed [04:04] just a cosmetic thing, but very annoying [04:04] wallyworld: <3 [04:05] especially if you are an admin trying to script whether to enable ha or not [04:13] ... if CI gets a run without a failure... all landings I've seen today report similar failures :) [04:17] Rick_h_ just getting out of the airport after 1h or more in migrations queue I wanted to message him to get dinner but I guess I'll be arriving too late [04:19] perrito666: gotcha, sucky on the queue fun [04:21] Happens :) seems I picked a specially busy day [04:23] Juju is in town so all these people are coming for the charmer summit, evidently :p [04:27] wallyworld, axw: do you know if anyone is looking into all the test timeouts in apiserver/applications [04:27] it's happened to me and lots of other merge attempts it seems [04:27] apiserver/application [04:27] don't know [04:27] menn0: i'm not, i haven't been monitoring landing bot today [04:27] menn0: you're seeing this in the merge job? (anastasiamac ^^) [04:27] wallyworld: ok... i'll start looking [04:27] damn, something broke [04:28] menn0: i am fixing the annoying go cookies issue [04:28] veebers: yep, most merge attempts today have failed because of this [04:28] so someone managed to land something which is failing most of the time [04:28] * menn0 hopes it wasn't him :) [04:28] menn0: right, I was checking to see if it was CI/infra related. I've changed which machine the next run will happen on in hopes it might help. [04:29] veebers: ok thanks. [04:29] veebers: I can't repro the problem locally of course [04:30] menn0: heh :-\ always the way. FYI the last mereg that passed on that job was: "fwereade charm-life" (http://juju-ci.vapour.ws:8080/job/github-merge-juju/9167/) [04:30] menn0: I'll track the next queued up job that will run on the older machine and let you know how it gets on [04:30] wallyworld, axw, anastasiamac: the stuck test appears to be TestAddCharmConcurrently if that rings any bells? [04:31] menn0: no bells but veebers pointed out the commit ^^ that seems to b the culprit :D [04:32] * anastasiamac have to get a kid from school, b back l8r [04:33] veebers: cool, I'll start looking at that merge [04:36] wallyworld: m considering to remove arch caching from vsphere on current pr as well.. any idea how heavily supported architectures retrieval is used? [04:37] wallyworld: it'll b calling simplestream image retrival evry time constraints validator is constructed... [04:37] in a couple of places [04:37] twice in one api call [04:37] when adding a machine i tihnk [04:38] wallyworld: k.. i'll leave it out cached for now.. let's tackle it later for 2.1 maybe... [04:38] tha's from memory thugh [04:38] would need to check code again [04:39] wallyworld: k.. i've created a separate bug for it and we'll address separately then [04:39] mayb we'll even have some help with perfomance benchmarking (veebers :D) to determine how much better/worse we'd do without caching supported architectures :) [04:40] heh :-) [04:45] menn0: you're busy, if you get a chance later, here's that status fix http://reviews.vapour.ws/r/5651/. If no time, I can ask elsewhere [04:47] wallyworld: looking now [04:48] ta [04:55] wallyworld: good stuff. [04:55] wallyworld: i'm creating a card now as the migrations prechecks will need to use this too [04:55] menn0: thanks menno. btw did you book your flights yet? when you arriving/leaving? [04:57] wallyworld: I've sent the email to the agent but haven't heard back yet (unsurprisingly since they're not at work yet) [04:57] you looking to arrive the first sat and leave the following sat? [04:58] wallyworld: I'm likely to be leaving on Saturday night, which gets me in on Sunday evening [04:58] wallyworld: leaving the sprint on Saturday morning [04:58] you'll miss drinks :-) [04:58] wallyworld: possibly [04:58] wallyworld: sometimes they're a bit later [04:59] depending on flights, i'm going to try and arrive sat evening [05:15] wallyworld: so it looks like will hit the timeout in apiserver/application twice while trying to merge. he assumed it was bug 1596960 [05:15] Bug #1596960: Intermittent test timeout in application tests [05:15] but that one says it's only windows [05:15] I'm guessing his changes have made it more likely to happen [05:15] damn, sounds pluasible [05:17] looks messy as well [05:54] wallyworld: will you have a chance to look at that ca-cert issue? I'm trying to stay focused on azure [05:55] axw: yeah, i can look [05:56] axw: just read emails, so the cert issue is just disabling the validation check IIANM [05:57] wallyworld: see uros's latest email, there's also an issue with credentials looking up provider based on controller cloud [05:57] which seems wrong... [05:59] yeah [07:26] Quick Q: in order for a unit to log to rsyslog on node 0, should there be a rule in the secgroup that allows access to tcp port 6514? And should juju add this automatically? === frankban|afk is now known as frankban [07:52] urulama: http://reviews.vapour.ws/r/5652/ FYI [07:54] thanks [07:54] blahdeblah: units can ask for ports to be opened on a bespoke basis [07:55] it's not something we'd do unilaterally [07:55] wallyworld: so it wouldn't be done as part of add-unit when a machine is added via the manual provider? [07:55] wallyworld: been running it with that fix since axw pointed it out :) [07:57] blahdeblah: not that i am aware of. manual provider assumes pretty much that everything is in place. juju tends to try not to mess with manual machines [07:57] wallyworld: OK - thanks [07:58] urulama: i was hinting for a review from your folks :-) [07:59] axw: fyi urulama thinks that add-model issue may be with the controller proxy, so we're off the hook for now [08:00] wallyworld urulama: yeah, I think it's most likely due to something around Cloud.DefaultCloud and/or Cloud.Cloud [08:00] axw: yep, i traced to to the cli making an api call to client.Cloud() and it's all goo in core [08:00] good [08:00] but something missing in proxy most likely [09:55] babbageclunk: https://github.com/juju/juju/compare/master...voidspace:1534103-run-action [09:57] * frobware needs to run an errand; back in an hour. [10:04] voidspace, may I have a 5-second review of http://reviews.vapour.ws/r/5653/ please? [10:04] voidspace, apparently it has been failing a bunch [10:05] fwereade: ok [10:12] fwereade: LGTM [10:13] voidspace, ta [11:55] Bug #1594977 changed: Better generate-image help [11:55] Bug #1622581 opened: Cryptic error message when using bad GCE credentials [12:19] Bug #1622581 changed: Cryptic error message when using bad GCE credentials [13:05] is anyone free for a ramble about cleanups with a detour into refcounting? axw, babbageclunk? [13:12] yup yup [13:14] babbageclunk, so, the refcount stuff I extracted [13:14] babbageclunk, short version: it's safe in parallel but not in serial [13:14] babbageclunk: ? [13:15] fwereade: that is impressive [13:15] that's impressive [13:15] I didn't think that was a thing we needed to worry about. [13:15] hard to do [13:15] voidspace: hi5 [13:15] o/ [13:15] babbageclunk, i.e. refcount is 2; 2 separate transactions decref; one will fail, reread with refcount 1, successfuly hit 0 and detect [13:15] natefinch: :-) [13:15] voidspace, natefinch: I'm rather proud of it, indeed [13:15] lol [13:16] but isn't serial just slow parallel? [13:16] babbageclunk, refcount is 2, one transaction gets composed of separate ops that hit same refcount: will decref to 2, but won't ever "realise" it did so, so there's no guaranteed this-will-hit-0 detection [13:17] ugh [13:17] babbageclunk, we're always composing transactions from ops based on a read state from before the txn started [13:17] All the asserts happen before all of the ops? [13:17] babbageclunk, yeah [13:17] babbageclunk, that's how it works [13:18] of course. ouch. so each assert passes, but they leave it at 0 with no cleanup [13:18] babbageclunk, yeah, exactly [13:19] fwereade: you have two days to fix this, right [13:19] voidspace, perhaps :) [13:19] :-) [13:20] voidspace, stateful refcount thingy is one, wanton spamming of possibly-needed cleanups is another [13:20] voidspace, I'm slightly hopeful that you have a third? [13:20] I hope so too [13:21] voidspace, oh [13:21] oh, no [13:21] sorry [13:21] voidspace, days, I thought you said ways [13:21] fwereade: I hope I have at least a third day [13:21] voidspace, I would imagine so ;) [13:21] * babbageclunk lols sadly. [13:21] fwereade: unless they purge juju of everyone you know... [13:21] fresh start and all that [13:22] voidspace, we have always been at war with... [13:22] :-) [13:23] voidspace, babbageclunk: so on the one hand there is this problem with txns [13:24] voidspace, babbageclunk: and it's one that bites us increasingly hard as we try to isolate and decouple individual changes to the db [13:25] voidspace, babbageclunk: and I don't really have an answer to either the problem or the increased risk we take on as we further isolate ops-generation [13:28] Why do we compose these operations into one transaction? Shouldn't they be multiple transactions? [13:28] Bug #1622136 changed: Interfaces file source an outside file for IP assignment to management interface [13:28] babbageclunk, that is basically where I'm going [13:28] Not sure how we could prevent it though [13:29] babbageclunk, I cannot, indeed, think of a reason that the app remove ops have to be bundled into the final-unit-remove ops [13:30] babbageclunk, and in fact, that approach is itself vulnerable to that style of bug -- if we wrap up the final *2* unit-removes, we'd miss the app-remove code [13:30] fwereade: But it would be nice if the transaction system could prevent you from combining these transactions together somehow since they're not valid. [13:32] babbageclunk, that would probably be sensible, but I can't see any non-blunt ways of doing it -- only one op per doc, I guess? but that works *hard* against any prospect of decomposition [13:33] babbageclunk, the usual escape valve is cleanup ops, ofc -- you can apply a partial change and leave a note to pick it up later, and that's great [13:33] fwereade: can it be more fine-grained than that - one op touching any attribute of a doc in one transaction? [13:34] babbageclunk, perhaps so, but it sorta sucks not to be able to incref unitcount by 5, for example [13:34] (Not sure how easy that would be to do in the mongo expression lang) [13:34] true [13:35] babbageclunk, and anything at the txn layer has sort of lost the real context of the ops, so it's likely hard/impossible to DTRT re compressing ops into one [13:36] babbageclunk, (I heartily support this style of thinking, I just don't think I can do much about it in 2 days, hence cleanups) [13:37] fwereade: yeah, it seems like it would be hard to do that in a generic way - I can see it working for refcounts, but I'm sure the same problem can come from other things harder to reason about. [13:37] so, cleanups! [13:38] babbageclunk, so, if we simplify unit removal (and relation removal, same latent bug) such that it doesn't even care about app refcounts, and just read life and drop in a maybe-clean-the-app-up op [13:38] babbageclunk, the cleanups will run and everyone is happy [13:39] babbageclunk, except that the time taken to remove a service once its last unit goes away has gone from ~0s to 5-10s [13:39] babbageclunk, because the cleanups run on the watcher schedule [13:39] fwereade: Oh, 'cause that's when a cleanup will run. [13:39] so that's the "spam extra cleanup checks" approach [13:40] but removing the service once the units have gone is *mostly* admin right [13:40] babbageclunk, yeah -- and the more we do this, the better our decoupling but the more we'll see cleanups spawning cleanups and require ever more generations to actually get where we're going [13:40] or is there resource removal that only happens at cleanup time too? [13:40] voidspace, yeah, but you can't deploy another app with the same name, for example [13:40] right [13:41] is that a common need? [13:41] maybe I guess [13:41] do watcher polling more frequently! [13:41] ;) [13:42] babbageclunk, that is certainly an option, and it does speed things up, but it's also the sort of tuning parameter that I am loath to fiddle with without paying close attention to the Nth-order effects at various scales and so on [13:43] What about rather than dropping a cleanup you drop another txn that does the removal, with an assert that the refcount's 0? [13:43] babbageclunk, can't guarantee they both apply -- that is the purpose of a cleanup, to queue another txn, really [13:43] Ah, no - the cleanup gets created in the txn, right? [13:44] babbageclunk, and you can't really write ops for future execution in the general case -- if they fail, there's no attached logic to recreate or forget about them, we can only forget [13:45] babbageclunk, voidspace: anyway, one watcher-tick delay is not so terrible [13:46] no [13:47] babbageclunk, voidspace: so I was thinking I could just tweak the cleanup worker: expose NeedsCleanup, and check it in a loop that cleans up until nothing's left [13:48] babbageclunk, voidspace: which at least gives us freedom to explore more-staggered cleanup ops without macro-visible impact [13:48] babbageclunk, voidspace: and which I can probably get done fairly quickly [13:48] sounds reasonable [13:49] +1 [13:49] babbageclunk, voidspace: barring unexpected surprises in trying to separate service-remove from unit-remove [13:49] babbageclunk, voidspace: excellent, thank you [14:11] # github.com/juju/juju/cmd/jujud [14:11] /usr/lib/go-1.6/pkg/tool/linux_amd64/link: running gcc failed: fork/exec /usr/bin/gcc: cannot allocate memory [14:12] yeah, I suffer a fair bit from that [14:12] linking with 1.6 takes a lot of memory [14:12] time to switch to 1.7 then I guess [14:16] I haven't seen it before and now I'm seeing it consistently with master [14:55] ok, so a reboot fixed the memory issues [15:34] frobware: hey, not sure if you've seen my PM [15:34] frobware: here's the PR I'm talking about: https://github.com/juju/juju/pull/6219 === hml_ is now known as hml [15:51] babbageclunk, voidspace_: I think I have a happier medium, in case I don't land anything else: http://reviews.vapour.ws/r/5644/ [15:51] babbageclunk, voidspace_: would either of you be free to take a look before EOD? [15:52] fwereade: Sure, looking now [15:52] babbageclunk, tyvm [16:01] morning juju-dev [16:48] fwereade: Sorry, I got distracted - still looking! [16:51] fwereade: you still here? [17:00] babbageclunk, voidspace: heyhey [17:01] fwereade: so this implementation of a failaction operation seems to work and "do the right thing" https://github.com/juju/juju/compare/master...voidspace:1534103-run-action#diff-ae955475ac58e0d2683d2cfd6101b3f7R1 === frankban is now known as frankban|afk [17:03] fwereade: which is mostly copied from runaction.go [17:07] voidspace, that certainly looks sane to me [17:07] fwereade: cool, it seems to fix the bug and behave sanely - so I'll add tests and propose [17:16] voidspace, cool, tyvm [17:31] hey, juju restore survives suspending the machine for 10 mins, sweet [17:55] does annyone know if there is a way to list all models? [17:55] fwereade: ? [17:56] perrito666, I thought there was literally a list-models? [17:56] fwereade: sorry I meant in state :p [17:56] perrito666, not sure offhand, how does the list-models apiserver do it? [17:56] * perrito666 accidectally mixed chia and earl grey and is not happy about the result [17:57] fwereade: an ungly thing that gets models for a user [17:57] I was trying to avoid constructing another one of those [17:57] :p [17:59] hey, there is an AllModels here [17:59] nice [18:01] perrito666, well, the raw collection is pretty terrible [18:02] perrito666, but, resolved anyway ;p [18:52] hmo: http://ppa.launchpad.net/juju/devel/ubuntu/pool/main/j/juju-core/juju-core_2.0-beta15-0ubuntu1~16.04.1~juju1.debian.tar.xz [20:01] is the message "Contacting juju controller " correct here? http://pastebin.ubuntu.com/23170667/ [20:08] perrito666: buh... that can't be right, unless somehow you can connect to the private address of AWS from where you're running the client [20:09] I cant [20:12] perrito666: weird then. probably just posting the first address in whatever list [20:12] yep, after a restore, juju status will also show that address === rmcall_ is now known as rmcall [20:39] Bug #1622738 opened: Multi-series charms failing in 1.25.6 [22:33] redir i am free for a bit but need coffee so give me 5 if you still had a question [22:33] cool [22:33] I'm here [22:36] but going to make tea while you make coffee [22:55] wallyworld: hey, this https://bugs.launchpad.net/juju/+bug/1595720 is still happening but now its a big issue since admin users are hitting this :( [22:55] Bug #1595720: Problems using `juju ssh` with shared models [23:12] perrito666: damn, i'll add to the list of todo items for rc1, yay [23:16] thumper: standup? [23:36] review up: http://reviews.vapour.ws/r/5657/ [23:40] rick_h_ https://bugs.launchpad.net/juju/+bug/1622787 [23:40] Bug #1622787: If you name a credential with an @ Juju barfs [23:41] marcoceppi: lol ty for keeping the barf part [23:43] o/ marcoceppi