wallyworld | veebers: do you know if bug 1621576 is in progress? | 00:03 |
---|---|---|
mup | Bug #1621576: get-set-unset config will be renamed <juju-ci-tools:Triaged> <https://launchpad.net/bugs/1621576> | 00:03 |
veebers | wallyworld: it looks like curtis landed the fix in the tests for that. Seems he didn't update the bug. I'll confirm with him that it's finished (but it looks like it is) | 00:09 |
wallyworld | veebers: awesome. the reason for asking is that we will be looking to land the code changes to juju to use the new command syntax | 00:09 |
wallyworld | and without the ci script changes, ci will break | 00:09 |
veebers | wallyworld: ack, I'll have confirmation for you tomorrow :-) But I'm pretty sure that fix is complete | 00:15 |
wallyworld | ty | 00:15 |
wallyworld | the juju pr needs a little fixing, so that won't be ready till when the US comes back online anyway | 00:16 |
mup | Bug #1560487 changed: local provider fails to create lxc container from template <canonical-is> <local-provider> <juju-core:Won't Fix> <juju-core 1.25:Triaged by alexis-bruemmer> <OPNFV:New> <https://launchpad.net/bugs/1560487> | 00:45 |
menn0 | axw: easy one: http://reviews.vapour.ws/r/5646/ | 01:56 |
axw | menn0: LGTM | 01:57 |
menn0 | axw: thanks | 01:59 |
perrito666 | Nites, anyone happens to know dimitern's mobile? | 02:50 |
menn0 | wallyworld: another migration cleanup: http://reviews.vapour.ws/r/5648/ | 03:06 |
wallyworld | ok | 03:07 |
anastasiamac | wallyworld: re-instatement (?) of vsphere supported architecture - http://reviews.vapour.ws/r/5649/ | 03:11 |
wallyworld | ok, on my list | 03:12 |
anastasiamac | :) | 03:12 |
rick_h_ | perrito666: not here, you make it in? | 03:15 |
rick_h_ | wallyworld: do you have the config changes spec handy? I thought application config was just config now? | 04:02 |
wallyworld | rick_h_: it is | 04:02 |
wallyworld | but not in beta18 | 04:02 |
rick_h_ | wallyworld: did it not make b18? oh crap | 04:02 |
wallyworld | PR up today, will land tonght | 04:02 |
rick_h_ | ah, thought b18 got all but one commit | 04:02 |
rick_h_ | ah, gotcha | 04:02 |
wallyworld | rick_h_: sorry :-( | 04:02 |
wallyworld | we just ran out of time | 04:03 |
rick_h_ | wallyworld: all good, just working on my slides for tomorrow and checking my thoughts vs reality in the beta | 04:03 |
wallyworld | needed to coordinaye withCI etc | 04:03 |
rick_h_ | wallyworld: understand | 04:03 |
wallyworld | put an asterisk :=) | 04:03 |
rick_h_ | yep, will work it out | 04:03 |
wallyworld | rick_h_: also, i will have a fix today for machines not reporting as Down when they get killed | 04:04 |
wallyworld | just a cosmetic thing, but very annoying | 04:04 |
rick_h_ | wallyworld: <3 | 04:04 |
wallyworld | especially if you are an admin trying to script whether to enable ha or not | 04:05 |
anastasiamac | ... if CI gets a run without a failure... all landings I've seen today report similar failures :) | 04:13 |
perrito666 | Rick_h_ just getting out of the airport after 1h or more in migrations queue I wanted to message him to get dinner but I guess I'll be arriving too late | 04:17 |
rick_h_ | perrito666: gotcha, sucky on the queue fun | 04:19 |
perrito666 | Happens :) seems I picked a specially busy day | 04:21 |
perrito666 | Juju is in town so all these people are coming for the charmer summit, evidently :p | 04:23 |
menn0 | wallyworld, axw: do you know if anyone is looking into all the test timeouts in apiserver/applications | 04:27 |
menn0 | it's happened to me and lots of other merge attempts it seems | 04:27 |
menn0 | apiserver/application | 04:27 |
axw | don't know | 04:27 |
wallyworld | menn0: i'm not, i haven't been monitoring landing bot today | 04:27 |
veebers | menn0: you're seeing this in the merge job? (anastasiamac ^^) | 04:27 |
menn0 | wallyworld: ok... i'll start looking | 04:27 |
wallyworld | damn, something broke | 04:27 |
wallyworld | menn0: i am fixing the annoying go cookies issue | 04:28 |
menn0 | veebers: yep, most merge attempts today have failed because of this | 04:28 |
menn0 | so someone managed to land something which is failing most of the time | 04:28 |
* menn0 hopes it wasn't him :) | 04:28 | |
veebers | menn0: right, I was checking to see if it was CI/infra related. I've changed which machine the next run will happen on in hopes it might help. | 04:28 |
menn0 | veebers: ok thanks. | 04:29 |
menn0 | veebers: I can't repro the problem locally of course | 04:29 |
veebers | menn0: heh :-\ always the way. FYI the last mereg that passed on that job was: "fwereade charm-life" (http://juju-ci.vapour.ws:8080/job/github-merge-juju/9167/) | 04:30 |
veebers | menn0: I'll track the next queued up job that will run on the older machine and let you know how it gets on | 04:30 |
menn0 | wallyworld, axw, anastasiamac: the stuck test appears to be TestAddCharmConcurrently if that rings any bells? | 04:30 |
anastasiamac | menn0: no bells but veebers pointed out the commit ^^ that seems to b the culprit :D | 04:31 |
* anastasiamac have to get a kid from school, b back l8r | 04:32 | |
menn0 | veebers: cool, I'll start looking at that merge | 04:33 |
anastasiamac | wallyworld: m considering to remove arch caching from vsphere on current pr as well.. any idea how heavily supported architectures retrieval is used? | 04:36 |
anastasiamac | wallyworld: it'll b calling simplestream image retrival evry time constraints validator is constructed... | 04:37 |
wallyworld | in a couple of places | 04:37 |
wallyworld | twice in one api call | 04:37 |
wallyworld | when adding a machine i tihnk | 04:37 |
anastasiamac | wallyworld: k.. i'll leave it out cached for now.. let's tackle it later for 2.1 maybe... | 04:38 |
wallyworld | tha's from memory thugh | 04:38 |
wallyworld | would need to check code again | 04:38 |
anastasiamac | wallyworld: k.. i've created a separate bug for it and we'll address separately then | 04:39 |
anastasiamac | mayb we'll even have some help with perfomance benchmarking (veebers :D) to determine how much better/worse we'd do without caching supported architectures :) | 04:39 |
veebers | heh :-) | 04:40 |
wallyworld | menn0: you're busy, if you get a chance later, here's that status fix http://reviews.vapour.ws/r/5651/. If no time, I can ask elsewhere | 04:45 |
menn0 | wallyworld: looking now | 04:47 |
wallyworld | ta | 04:48 |
menn0 | wallyworld: good stuff. | 04:55 |
menn0 | wallyworld: i'm creating a card now as the migrations prechecks will need to use this too | 04:55 |
wallyworld | menn0: thanks menno. btw did you book your flights yet? when you arriving/leaving? | 04:55 |
menn0 | wallyworld: I've sent the email to the agent but haven't heard back yet (unsurprisingly since they're not at work yet) | 04:57 |
wallyworld | you looking to arrive the first sat and leave the following sat? | 04:57 |
menn0 | wallyworld: I'm likely to be leaving on Saturday night, which gets me in on Sunday evening | 04:58 |
menn0 | wallyworld: leaving the sprint on Saturday morning | 04:58 |
wallyworld | you'll miss drinks :-) | 04:58 |
menn0 | wallyworld: possibly | 04:58 |
menn0 | wallyworld: sometimes they're a bit later | 04:58 |
wallyworld | depending on flights, i'm going to try and arrive sat evening | 04:59 |
menn0 | wallyworld: so it looks like will hit the timeout in apiserver/application twice while trying to merge. he assumed it was bug 1596960 | 05:15 |
mup | Bug #1596960: Intermittent test timeout in application tests <tech-debt> <unit-tests> <juju:Triaged> <https://launchpad.net/bugs/1596960> | 05:15 |
menn0 | but that one says it's only windows | 05:15 |
menn0 | I'm guessing his changes have made it more likely to happen | 05:15 |
wallyworld | damn, sounds pluasible | 05:15 |
wallyworld | looks messy as well | 05:17 |
axw | wallyworld: will you have a chance to look at that ca-cert issue? I'm trying to stay focused on azure | 05:54 |
wallyworld | axw: yeah, i can look | 05:55 |
wallyworld | axw: just read emails, so the cert issue is just disabling the validation check IIANM | 05:56 |
axw | wallyworld: see uros's latest email, there's also an issue with credentials looking up provider based on controller cloud | 05:57 |
axw | which seems wrong... | 05:57 |
wallyworld | yeah | 05:59 |
blahdeblah | Quick Q: in order for a unit to log to rsyslog on node 0, should there be a rule in the secgroup that allows access to tcp port 6514? And should juju add this automatically? | 07:26 |
=== frankban|afk is now known as frankban | ||
wallyworld | urulama: http://reviews.vapour.ws/r/5652/ FYI | 07:52 |
urulama | thanks | 07:54 |
wallyworld | blahdeblah: units can ask for ports to be opened on a bespoke basis | 07:54 |
wallyworld | it's not something we'd do unilaterally | 07:55 |
blahdeblah | wallyworld: so it wouldn't be done as part of add-unit when a machine is added via the manual provider? | 07:55 |
urulama | wallyworld: been running it with that fix since axw pointed it out :) | 07:55 |
wallyworld | blahdeblah: not that i am aware of. manual provider assumes pretty much that everything is in place. juju tends to try not to mess with manual machines | 07:57 |
blahdeblah | wallyworld: OK - thanks | 07:57 |
wallyworld | urulama: i was hinting for a review from your folks :-) | 07:58 |
wallyworld | axw: fyi urulama thinks that add-model issue may be with the controller proxy, so we're off the hook for now | 07:59 |
axw | wallyworld urulama: yeah, I think it's most likely due to something around Cloud.DefaultCloud and/or Cloud.Cloud | 08:00 |
wallyworld | axw: yep, i traced to to the cli making an api call to client.Cloud() and it's all goo in core | 08:00 |
wallyworld | good | 08:00 |
wallyworld | but something missing in proxy most likely | 08:00 |
voidspace | babbageclunk: https://github.com/juju/juju/compare/master...voidspace:1534103-run-action | 09:55 |
* frobware needs to run an errand; back in an hour. | 09:57 | |
fwereade | voidspace, may I have a 5-second review of http://reviews.vapour.ws/r/5653/ please? | 10:04 |
fwereade | voidspace, apparently it has been failing a bunch | 10:04 |
voidspace | fwereade: ok | 10:05 |
voidspace | fwereade: LGTM | 10:12 |
fwereade | voidspace, ta | 10:13 |
mup | Bug #1594977 changed: Better generate-image help <helpdocs> <oil-2.0> <v-pil> <juju:Triaged> <https://launchpad.net/bugs/1594977> | 11:55 |
mup | Bug #1622581 opened: Cryptic error message when using bad GCE credentials <juju-core:New> <https://launchpad.net/bugs/1622581> | 11:55 |
mup | Bug #1622581 changed: Cryptic error message when using bad GCE credentials <juju-core:New> <https://launchpad.net/bugs/1622581> | 12:19 |
fwereade | is anyone free for a ramble about cleanups with a detour into refcounting? axw, babbageclunk? | 13:05 |
babbageclunk | yup yup | 13:12 |
fwereade | babbageclunk, so, the refcount stuff I extracted | 13:14 |
fwereade | babbageclunk, short version: it's safe in parallel but not in serial | 13:14 |
babbageclunk | babbageclunk: ? | 13:14 |
natefinch | fwereade: that is impressive | 13:15 |
voidspace | that's impressive | 13:15 |
babbageclunk | I didn't think that was a thing we needed to worry about. | 13:15 |
voidspace | hard to do | 13:15 |
natefinch | voidspace: hi5 | 13:15 |
voidspace | o/ | 13:15 |
fwereade | babbageclunk, i.e. refcount is 2; 2 separate transactions decref; one will fail, reread with refcount 1, successfuly hit 0 and detect | 13:15 |
voidspace | natefinch: :-) | 13:15 |
fwereade | voidspace, natefinch: I'm rather proud of it, indeed | 13:15 |
natefinch | lol | 13:15 |
babbageclunk | but isn't serial just slow parallel? | 13:16 |
fwereade | babbageclunk, refcount is 2, one transaction gets composed of separate ops that hit same refcount: will decref to 2, but won't ever "realise" it did so, so there's no guaranteed this-will-hit-0 detection | 13:16 |
babbageclunk | ugh | 13:17 |
fwereade | babbageclunk, we're always composing transactions from ops based on a read state from before the txn started | 13:17 |
babbageclunk | All the asserts happen before all of the ops? | 13:17 |
fwereade | babbageclunk, yeah | 13:17 |
fwereade | babbageclunk, that's how it works | 13:17 |
babbageclunk | of course. ouch. so each assert passes, but they leave it at 0 with no cleanup | 13:18 |
fwereade | babbageclunk, yeah, exactly | 13:18 |
voidspace | fwereade: you have two days to fix this, right | 13:19 |
fwereade | voidspace, perhaps :) | 13:19 |
voidspace | :-) | 13:19 |
fwereade | voidspace, stateful refcount thingy is one, wanton spamming of possibly-needed cleanups is another | 13:20 |
fwereade | voidspace, I'm slightly hopeful that you have a third? | 13:20 |
voidspace | I hope so too | 13:20 |
fwereade | voidspace, oh | 13:21 |
voidspace | oh, no | 13:21 |
voidspace | sorry | 13:21 |
fwereade | voidspace, days, I thought you said ways | 13:21 |
voidspace | fwereade: I hope I have at least a third day | 13:21 |
fwereade | voidspace, I would imagine so ;) | 13:21 |
* babbageclunk lols sadly. | 13:21 | |
voidspace | fwereade: unless they purge juju of everyone you know... | 13:21 |
voidspace | fresh start and all that | 13:21 |
fwereade | voidspace, we have always been at war with... | 13:22 |
voidspace | :-) | 13:22 |
fwereade | voidspace, babbageclunk: so on the one hand there is this problem with txns | 13:23 |
fwereade | voidspace, babbageclunk: and it's one that bites us increasingly hard as we try to isolate and decouple individual changes to the db | 13:24 |
fwereade | voidspace, babbageclunk: and I don't really have an answer to either the problem or the increased risk we take on as we further isolate ops-generation | 13:25 |
babbageclunk | Why do we compose these operations into one transaction? Shouldn't they be multiple transactions? | 13:28 |
mup | Bug #1622136 changed: Interfaces file source an outside file for IP assignment to management interface <juju:Triaged by rharding> <https://launchpad.net/bugs/1622136> | 13:28 |
fwereade | babbageclunk, that is basically where I'm going | 13:28 |
babbageclunk | Not sure how we could prevent it though | 13:28 |
fwereade | babbageclunk, I cannot, indeed, think of a reason that the app remove ops have to be bundled into the final-unit-remove ops | 13:29 |
fwereade | babbageclunk, and in fact, that approach is itself vulnerable to that style of bug -- if we wrap up the final *2* unit-removes, we'd miss the app-remove code | 13:30 |
babbageclunk | fwereade: But it would be nice if the transaction system could prevent you from combining these transactions together somehow since they're not valid. | 13:30 |
fwereade | babbageclunk, that would probably be sensible, but I can't see any non-blunt ways of doing it -- only one op per doc, I guess? but that works *hard* against any prospect of decomposition | 13:32 |
fwereade | babbageclunk, the usual escape valve is cleanup ops, ofc -- you can apply a partial change and leave a note to pick it up later, and that's great | 13:33 |
babbageclunk | fwereade: can it be more fine-grained than that - one op touching any attribute of a doc in one transaction? | 13:33 |
fwereade | babbageclunk, perhaps so, but it sorta sucks not to be able to incref unitcount by 5, for example | 13:34 |
babbageclunk | (Not sure how easy that would be to do in the mongo expression lang) | 13:34 |
babbageclunk | true | 13:34 |
fwereade | babbageclunk, and anything at the txn layer has sort of lost the real context of the ops, so it's likely hard/impossible to DTRT re compressing ops into one | 13:35 |
fwereade | babbageclunk, (I heartily support this style of thinking, I just don't think I can do much about it in 2 days, hence cleanups) | 13:36 |
babbageclunk | fwereade: yeah, it seems like it would be hard to do that in a generic way - I can see it working for refcounts, but I'm sure the same problem can come from other things harder to reason about. | 13:37 |
babbageclunk | so, cleanups! | 13:37 |
fwereade | babbageclunk, so, if we simplify unit removal (and relation removal, same latent bug) such that it doesn't even care about app refcounts, and just read life and drop in a maybe-clean-the-app-up op | 13:38 |
fwereade | babbageclunk, the cleanups will run and everyone is happy | 13:38 |
fwereade | babbageclunk, except that the time taken to remove a service once its last unit goes away has gone from ~0s to 5-10s | 13:39 |
fwereade | babbageclunk, because the cleanups run on the watcher schedule | 13:39 |
babbageclunk | fwereade: Oh, 'cause that's when a cleanup will run. | 13:39 |
voidspace | so that's the "spam extra cleanup checks" approach | 13:39 |
voidspace | but removing the service once the units have gone is *mostly* admin right | 13:40 |
fwereade | babbageclunk, yeah -- and the more we do this, the better our decoupling but the more we'll see cleanups spawning cleanups and require ever more generations to actually get where we're going | 13:40 |
voidspace | or is there resource removal that only happens at cleanup time too? | 13:40 |
fwereade | voidspace, yeah, but you can't deploy another app with the same name, for example | 13:40 |
voidspace | right | 13:40 |
voidspace | is that a common need? | 13:41 |
voidspace | maybe I guess | 13:41 |
babbageclunk | do watcher polling more frequently! | 13:41 |
babbageclunk | ;) | 13:41 |
fwereade | babbageclunk, that is certainly an option, and it does speed things up, but it's also the sort of tuning parameter that I am loath to fiddle with without paying close attention to the Nth-order effects at various scales and so on | 13:42 |
babbageclunk | What about rather than dropping a cleanup you drop another txn that does the removal, with an assert that the refcount's 0? | 13:43 |
fwereade | babbageclunk, can't guarantee they both apply -- that is the purpose of a cleanup, to queue another txn, really | 13:43 |
babbageclunk | Ah, no - the cleanup gets created in the txn, right? | 13:43 |
fwereade | babbageclunk, and you can't really write ops for future execution in the general case -- if they fail, there's no attached logic to recreate or forget about them, we can only forget | 13:44 |
fwereade | babbageclunk, voidspace: anyway, one watcher-tick delay is not so terrible | 13:45 |
babbageclunk | no | 13:46 |
fwereade | babbageclunk, voidspace: so I was thinking I could just tweak the cleanup worker: expose NeedsCleanup, and check it in a loop that cleans up until nothing's left | 13:47 |
fwereade | babbageclunk, voidspace: which at least gives us freedom to explore more-staggered cleanup ops without macro-visible impact | 13:48 |
fwereade | babbageclunk, voidspace: and which I can probably get done fairly quickly | 13:48 |
voidspace | sounds reasonable | 13:48 |
babbageclunk | +1 | 13:49 |
fwereade | babbageclunk, voidspace: barring unexpected surprises in trying to separate service-remove from unit-remove | 13:49 |
fwereade | babbageclunk, voidspace: excellent, thank you | 13:49 |
voidspace | # github.com/juju/juju/cmd/jujud | 14:11 |
voidspace | /usr/lib/go-1.6/pkg/tool/linux_amd64/link: running gcc failed: fork/exec /usr/bin/gcc: cannot allocate memory | 14:11 |
mgz | yeah, I suffer a fair bit from that | 14:12 |
mgz | linking with 1.6 takes a lot of memory | 14:12 |
voidspace | time to switch to 1.7 then I guess | 14:12 |
voidspace | I haven't seen it before and now I'm seeing it consistently with master | 14:16 |
voidspace_ | ok, so a reboot fixed the memory issues | 14:55 |
dimitern | frobware: hey, not sure if you've seen my PM | 15:34 |
dimitern | frobware: here's the PR I'm talking about: https://github.com/juju/juju/pull/6219 | 15:34 |
=== hml_ is now known as hml | ||
fwereade | babbageclunk, voidspace_: I think I have a happier medium, in case I don't land anything else: http://reviews.vapour.ws/r/5644/ | 15:51 |
fwereade | babbageclunk, voidspace_: would either of you be free to take a look before EOD? | 15:51 |
babbageclunk | fwereade: Sure, looking now | 15:52 |
fwereade | babbageclunk, tyvm | 15:52 |
redir | morning juju-dev | 16:01 |
babbageclunk | fwereade: Sorry, I got distracted - still looking! | 16:48 |
voidspace | fwereade: you still here? | 16:51 |
fwereade | babbageclunk, voidspace: heyhey | 17:00 |
voidspace | fwereade: so this implementation of a failaction operation seems to work and "do the right thing" https://github.com/juju/juju/compare/master...voidspace:1534103-run-action#diff-ae955475ac58e0d2683d2cfd6101b3f7R1 | 17:01 |
=== frankban is now known as frankban|afk | ||
voidspace | fwereade: which is mostly copied from runaction.go | 17:03 |
fwereade | voidspace, that certainly looks sane to me | 17:07 |
voidspace | fwereade: cool, it seems to fix the bug and behave sanely - so I'll add tests and propose | 17:07 |
fwereade | voidspace, cool, tyvm | 17:16 |
perrito666 | hey, juju restore survives suspending the machine for 10 mins, sweet | 17:31 |
perrito666 | does annyone know if there is a way to list all models? | 17:55 |
perrito666 | fwereade: ? | 17:55 |
fwereade | perrito666, I thought there was literally a list-models? | 17:56 |
perrito666 | fwereade: sorry I meant in state :p | 17:56 |
fwereade | perrito666, not sure offhand, how does the list-models apiserver do it? | 17:56 |
* perrito666 accidectally mixed chia and earl grey and is not happy about the result | 17:56 | |
perrito666 | fwereade: an ungly thing that gets models for a user | 17:57 |
perrito666 | I was trying to avoid constructing another one of those | 17:57 |
perrito666 | :p | 17:57 |
perrito666 | hey, there is an AllModels here | 17:59 |
perrito666 | nice | 17:59 |
fwereade | perrito666, well, the raw collection is pretty terrible | 18:01 |
fwereade | perrito666, but, resolved anyway ;p | 18:02 |
mbruzek | hmo: http://ppa.launchpad.net/juju/devel/ubuntu/pool/main/j/juju-core/juju-core_2.0-beta15-0ubuntu1~16.04.1~juju1.debian.tar.xz | 18:52 |
perrito666 | is the message "Contacting juju controller <private ip>" correct here? http://pastebin.ubuntu.com/23170667/ | 20:01 |
natefinch | perrito666: buh... that can't be right, unless somehow you can connect to the private address of AWS from where you're running the client | 20:08 |
perrito666 | I cant | 20:09 |
natefinch | perrito666: weird then. probably just posting the first address in whatever list | 20:12 |
perrito666 | yep, after a restore, juju status will also show that address | 20:12 |
=== rmcall_ is now known as rmcall | ||
mup | Bug #1622738 opened: Multi-series charms failing in 1.25.6 <juju-core:New> <https://launchpad.net/bugs/1622738> | 20:39 |
wallyworld | redir i am free for a bit but need coffee so give me 5 if you still had a question | 22:33 |
redir | cool | 22:33 |
redir | I'm here | 22:33 |
redir | but going to make tea while you make coffee | 22:36 |
perrito666 | wallyworld: hey, this https://bugs.launchpad.net/juju/+bug/1595720 is still happening but now its a big issue since admin users are hitting this :( | 22:55 |
mup | Bug #1595720: Problems using `juju ssh` with shared models <ssh> <usability> <juju:Triaged> <https://launchpad.net/bugs/1595720> | 22:55 |
wallyworld | perrito666: damn, i'll add to the list of todo items for rc1, yay | 23:12 |
wallyworld | thumper: standup? | 23:16 |
thumper | review up: http://reviews.vapour.ws/r/5657/ | 23:36 |
marcoceppi | rick_h_ https://bugs.launchpad.net/juju/+bug/1622787 | 23:40 |
mup | Bug #1622787: If you name a credential with an @ Juju barfs <juju:New> <https://launchpad.net/bugs/1622787> | 23:40 |
rick_h_ | marcoceppi: lol ty for keeping the barf part | 23:41 |
thumper | o/ marcoceppi | 23:43 |
Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!