axw | wallyworld: I think you misunderstood me about the API | 00:52 |
---|---|---|
axw | I just meant to combine them in the CLI code | 00:52 |
wallyworld | ah, damn | 00:52 |
axw | i.e. create an interface that you can satisfy by embedding both the facades into one struct | 00:52 |
wallyworld | oh well, what's there is ok i think? | 00:53 |
wallyworld | we use that same pattern elsewhere too | 00:53 |
axw | wallyworld: I think it's fine | 00:53 |
axw | wallyworld: ModelManager makes more sense for this command anyway I think | 00:53 |
wallyworld | yeah, agree | 00:53 |
rick_h_ | wallyworld: is the model destroy blocking something communicated to other teams? | 01:23 |
rick_h_ | wallyworld: coukd tgat imoact things like the gui? | 01:23 |
wallyworld | rick_h_: it was a stakeholder bug from OIL I think. I am not sure who else knows | 01:23 |
wallyworld | i am not sure about GUI - I'll send an emil | 01:24 |
wallyworld | rick_h_: but the blocking is ony CLI | 01:24 |
rick_h_ | wallyworld: right, i'm +1000 on the change, but want to make sure we chat with ither clients | 01:24 |
rick_h_ | wallyworld: ah ok | 01:24 |
wallyworld | the api behaves the same | 01:24 |
rick_h_ | wallyworld: nvm then | 01:24 |
wallyworld | fair enough question to asl | 01:25 |
wallyworld | *ask | 01:25 |
menn0 | thumper: i've fixed the racy pingerSuite tests: https://bugs.launchpad.net/juju/+bug/1625214 | 01:48 |
mup | Bug #1625214: pingerSuite.TestAgentConnectionsShutDownWhenAPIServerDies write: broken pipe <ci> <intermittent-failure> <regression> <unit-tests> <juju:In Progress by menno.smits> <https://launchpad.net/bugs/1625214> | 01:48 |
menn0 | thumper: what I meant to paste: https://github.com/juju/juju/pull/6410/commits/f9ee5dba43e0fa032fe67d14d1c89cdc1c1e1a55 | 01:48 |
thumper | looking | 01:49 |
thumper | approve | 01:52 |
menn0 | thumper: cheers | 01:52 |
menn0 | wallyworld: https://github.com/juju/juju/pull/6411 please | 02:01 |
wallyworld | menn0: sure, just finishing another | 02:02 |
menn0 | wallyworld: ignore the first commit. that one is merging as part of another PR now. | 02:02 |
wallyworld | ok | 02:02 |
wallyworld | menn0: could you take a look at this one by christian? i think your +1 would be valuable. I've already had a look https://github.com/juju/juju/pull/6408 | 02:19 |
wallyworld | also, your 6411 pr has conflicts | 02:20 |
menn0 | wallyworld: will do. he emailed me about it too. | 02:20 |
wallyworld | ta | 02:20 |
menn0 | wallyworld: conflicts fixed | 02:21 |
wallyworld | ta | 02:21 |
axw | menn0: are you investigating that cert issue? | 02:53 |
menn0 | axw: I was about to take a short break and then stuck into it | 02:54 |
menn0 | axw: I still have a controller up where that's happened | 02:54 |
menn0 | axw: have you seen it too? | 02:54 |
axw | menn0: ok. would it be helpful if I looked in parallel? | 02:54 |
axw | menn0: no, but it didn't sound hard to repro | 02:54 |
menn0 | axw: could be. if you could take a look too that would be great. | 02:55 |
axw | menn0: ok. I'll let you know if I find anything | 02:55 |
menn0 | axw: here's what the failure looks like: http://paste.ubuntu.com/23301342/ | 02:55 |
axw | okey dokey | 02:56 |
axw | menn0: and all you did was bounce the controlelr agent? | 02:56 |
menn0 | axw: I stopped the lxd machine and started it again | 02:57 |
axw | ah yes | 02:57 |
axw | ok | 02:57 |
menn0 | the controller instance | 02:57 |
* menn0 will be back in 15 mins | 02:57 | |
axw | wallyworld: what did you want me to test with add-model? something like this? bootstrap rc2, upgrade to my branch, then add-model foo && add-model bar ap-southeast-2 | 03:17 |
wallyworld | axw: yeah, check that the endpoint fixing is done for addmodel also | 03:17 |
wallyworld | axw: or even first confirm that it is broken without your fix | 03:18 |
wallyworld | so we can be sure that the fix is relvant nd valid | 03:18 |
axw | wallyworld: it would be broken if you tried to upgrade from rc2 | 03:18 |
axw | wallyworld: but if you added a model to a running rc2 without upgrading, it should work. anyway, I'll try what I described | 03:18 |
wallyworld | ta, i think we need to be 10000% sure | 03:19 |
axw | menn0: I strongly suspect -> https://github.com/juju/juju/commit/0d64508f8005ea84c141ed392389fbaef0e70b30 | 03:40 |
axw | menn0: in particular, "info.Cert = existing.Cert" without also setting Key | 03:40 |
blahdeblah | Hi everyone, which ppa should I be using to get the RC version of juju 2.0 so I can try https://docs.google.com/document/d/1yT5pvS38g9Z9SviI9NPmhIZrsO8drwHDPNWAMSmCedE/edit# ? | 03:51 |
menn0 | axw: looks like there was already a ticket for this issue: https://bugs.launchpad.net/juju/+bug/1631145 | 03:52 |
mup | Bug #1631145: rc2 upgrade to rc3 failed with cannot start api server worker: cannot set initial certificate: cannot create new TLS certificate: crypto/tls: private key does not match public key <juju:New> <https://launchpad.net/bugs/1631145> | 03:52 |
menn0 | axw: i've just bumped the priority etc | 03:52 |
axw | okey dokey | 03:54 |
menn0 | axw: any luck replicating? | 03:54 |
axw | menn0: not yet | 03:55 |
axw | blahdeblah: according to release email, ppa:juju/devel | 03:55 |
blahdeblah | axw: cool - thanks | 03:55 |
wallyworld | axw: with maas, do you know what happens if you attempt to acquire a node without specifying an architecture? does it just pick some arbitary machine so long as any other constraints match? | 03:58 |
axw | menn0: hrm. I am seeing "getsockopt...connection refused" without the other error though | 03:58 |
axw | and status isn't working now | 03:58 |
axw | wallyworld: I don't recall, sorry | 03:58 |
menn0 | axw: that's what you'll see at the client | 03:58 |
menn0 | axw: what about in the controller's logs? | 03:58 |
wallyworld | maybe menn0 recalls? | 03:59 |
axw | menn0: this is in the controller machine log | 03:59 |
menn0 | axw: hmm ok | 03:59 |
axw | menn0: it's failing to connect to mongo... | 03:59 |
menn0 | axw: possibily a different problem? | 03:59 |
axw | yes, possibly | 03:59 |
anastasiamac | wallyworld: r:maas arch.. yes, i beleve so.. at lest this s what i ve seen | 04:00 |
menn0 | axw: not ideal | 04:00 |
axw | menn0: although... do we use the same cert/key for mongo? | 04:00 |
wallyworld | menn0: ta, ok | 04:00 |
menn0 | wallyworld: sorry, not sure about maas | 04:00 |
wallyworld | np, i'll ask the maas qguys | 04:00 |
wallyworld | guys | 04:00 |
menn0 | axw: maybe we do use the same cert/key... worth checking | 04:01 |
axw | menn0: it does appear to be the case | 04:03 |
axw | when we update the cert, we write out to server.pem | 04:04 |
menn0 | axw: ok, so probably a different manifestation of the same issue | 04:04 |
axw | yup | 04:04 |
menn0 | axw: i'm tracing through where the cert and key that apiserver.NewServer is given come from | 04:04 |
menn0 | axw: ok so that's out of StateServingInfo in the agent config | 04:06 |
menn0 | axw: perhaps one is being updated without the other? | 04:06 |
menn0 | axw: this could be it: https://github.com/juju/juju/blob/master/cmd/jujud/agent/machine/servinginfo_setter.go#L67-L74 | 04:14 |
menn0 | axw: should existing.PrivateKey get copied too? | 04:15 |
menn0 | axw: I'd also be a lot happier if most of what the stateservinginfo_setter did happened /inside/ the ChangeConfig | 04:19 |
menn0 | axw: same for the method which does cert updates for the apiserver | 04:19 |
menn0 | looks racy otherwise | 04:19 |
menn0 | axw: the more I look, the more I think this is the bug | 04:24 |
* menn0 fixes | 04:24 | |
axw | menn0: sorry, I just realised you lost connection after I pasted the commit before :/ | 04:25 |
menn0 | axw: did you come to the same conclusion? | 04:25 |
* menn0 has been having wifi problems | 04:25 | |
axw | [11:40:32] <axw> menn0: I strongly suspect -> https://github.com/juju/juju/commit/0d64508f8005ea84c141ed392389fbaef0e70b30 | 04:25 |
axw | [11:40:57] <axw> menn0: in particular, "info.Cert = existing.Cert" without also setting Key | 04:25 |
menn0 | axw: ha! did see that. we independently found the same thing | 04:25 |
menn0 | axw: s/did/didn't/ | 04:26 |
menn0 | axw: do you also agree that much of the logic in stateservinginfo_setter should be /inside/ the ChangeConfig func? | 04:27 |
menn0 | axw: same for MachineAgent.upgradeCertificateDNSNames | 04:28 |
axw | menn0: yes, I think so | 04:28 |
axw | anything looking at current config | 04:28 |
menn0 | axw: ok, let's fix it. | 04:29 |
menn0 | axw: have you already started? | 04:29 |
axw | menn0: nope, eating lunch atm | 04:29 |
wallyworld | blahdeblah: i so want to mark your bug as invalid. using firefox as default browser should be invalid :-) | 04:30 |
menn0 | axw: ok, i'll start on servinginfo_setter.go. | 04:30 |
menn0 | axw: i'm going to have to stop soon but will be back on later this evening | 04:30 |
blahdeblah | wallyworld: Feel like taking the gloves off today, eh? :-) | 04:32 |
axw | menn0: ok. let me know where you leave off, I can pick it up | 04:32 |
menn0 | axw: sounds good | 04:32 |
wallyworld | blahdeblah: i used to run firefox until it started to grind everything to a halt with many tabs open and other performance things | 04:32 |
blahdeblah | wallyworld: It has never done that for me. I've heard many claims of such, but never seen it personally, despite having 4 windows with 100-odd tabs open right now. I thank NoScript for that. | 04:33 |
wallyworld | blahdeblah: yeah, that may well be it. sadly lots of sites require js, including lp. well lp doesn't need it but uscks without it | 04:34 |
blahdeblah | I use js on lots of sites; I just don't let it do that by default | 04:34 |
wallyworld | i find it too hard to whitelist everything. well maybe am too lazy | 04:35 |
blahdeblah | anyhew, my bug is valid, and I'll fight you for it! :-P | 04:35 |
wallyworld | adblock works a treat though | 04:35 |
wallyworld | oh i know it is valid | 04:35 |
wallyworld | was just stirring | 04:35 |
blahdeblah | I see your stirring, and raise you lp:1587644, which got me out of bed at least once this weekend | 04:36 |
anastasiamac | menn0: wallyworld: considering u've re-viewed the fix for bug 1625774 | 04:37 |
mup | Bug #1625774: memory leak after repeated model creation/destruction <ateam> <eda> <oil> <oil-2.0> <uosci> <juju:In Progress by 2-xtian> <https://launchpad.net/bugs/1625774> | 04:37 |
anastasiamac | menn0: wallyworld: is it possible/wanted/needed on 1.25 to solve some of the memory leakage there too? | 04:38 |
wallyworld | not sure, depends on when state pool was introduced. not sure if it was for multi model or not | 04:39 |
wallyworld | would have to dig in and look | 04:39 |
menn0 | anastasiamac: it could be applied there but I don't think multi-model was supported without a feature flag was it? the fix would have minimal utility if that's the case. | 04:39 |
wallyworld | yes, multi model in 1.25 is ff | 04:45 |
anastasiamac | wallyworld: even server-side stuff? we usually ff at cli level... | 04:46 |
wallyworld | not for multi model | 04:46 |
wallyworld | IIRC | 04:46 |
menn0 | yeah, I'm pretty sure the feature flag was applied throughout the server too | 04:47 |
menn0 | axw: I have the ssi setter fixes done, pushing now | 04:52 |
axw | cool | 04:52 |
menn0 | axw: can I leave the MachineAgent.upgradeCertificateDNSNames fix to you? | 04:52 |
axw | menn0: yep sure | 04:53 |
menn0 | axw: https://github.com/juju/juju/pull/6413 | 04:54 |
axw | menn0: I think the only thing to do there is to not use "si.Cert", but get fresh value inside ChangeConfig? | 04:55 |
axw | ta | 04:55 |
mup | Bug #1630728 changed: remove user needs better message that user is made inactive <usability> <juju:Triaged> <https://launchpad.net/bugs/1630728> | 04:55 |
menn0 | axw: shouldn't cert.NewDefaultServer be called inside the ChangeConfig though? | 04:57 |
axw | menn0: yes, I mean everything from "Parse the current certificate to get the current dns names." down should be inside | 04:57 |
menn0 | axw: +1 that's what I was thinking | 04:58 |
menn0 | should be easy | 04:58 |
menn0 | axw: just trying to QA this PR before I get called to help out with other stuff | 04:58 |
axw | menn0: except maybe the mongo bit. might want to do that after writing agent conf | 04:58 |
axw | menn0: okey dokey, thanks | 04:58 |
menn0 | axw: maybe... it might be ok to do it in the changeconfig though | 04:59 |
menn0 | axw: actually, i'm out of time. if you could review that PR, I'll check back in later and do the QA before merging | 04:59 |
axw | menn0: approved | 05:00 |
axw | menn0: with one late comment | 05:01 |
menn0 | axw: I made that change and QAed. Merging now. | 06:30 |
axw | menn0: thanks | 06:33 |
axw | menn0: I'm just QAing my change now | 06:33 |
menn0 | axw: cool. i can review when it's up. i've got to do dishes etc but will check back in later. | 06:33 |
axw | menn0: https://github.com/juju/juju/compare/master...axw:lp1631145-upgradecertificatednsnames?expand=1 if you want to take quick look | 06:33 |
axw | ok, no woprries | 06:33 |
axw | wallyworld: got anything smallish you'd like help? | 07:14 |
wallyworld | axw: not off hand. I am finishing a partial list-clouds fix pending input from rick et al. i think there's one remaining bug on alexis' list to do with agent restart but not sure how big it is | 07:16 |
axw | wallyworld: sounded big, I'll take a look | 07:16 |
wallyworld | yeah, it did at first look | 07:17 |
wallyworld | axw: is your cert fix just because we need to recover from a borked cert cause by rc3? | 07:52 |
axw | wallyworld: yes. and also we should be doing stuff inside ConfigChanged in general. doesn't matter in this case because the function is only called at startup | 07:52 |
axw | but good to keep it clean, to avoid copy&paste errors | 07:53 |
wallyworld | +1 | 07:53 |
wallyworld | lgtm | 07:53 |
hoenir | could someone review my PR ? https://github.com/juju/juju/pull/6414 | 08:05 |
hoenir | also note that this PR will be the foundations of a new feature I wish to unlock on juju, enabling manual provisioning for windows machines. | 08:06 |
hoenir | foundation* | 08:07 |
mup | Bug #1420996 opened: Default secgroup reset periodically to allow 0.0.0.0/0 for 22, 17070, 37017 <canonical-is> <juju-core:New> <https://launchpad.net/bugs/1420996> | 08:11 |
=== akhavr1 is now known as akhavr | ||
=== stokachu_ is now known as stokachu | ||
=== meetingology` is now known as meetingology | ||
=== ejat_ is now known as ejat | ||
=== Ursinha_ is now known as Ursinha | ||
dooferlad | mgz: is the github jujubot happy? It seems to be ignoring my $$merge$$ on https://github.com/juju/juju/pull/6406 | 08:58 |
babbageclunk | wallyworld: ping? | 09:20 |
wallyworld | yo | 09:20 |
babbageclunk | wallyworld: I think I agree with you about Release vs Put - do you think menn0 will sulk if I change it? | 09:20 |
wallyworld | babbageclunk: he can sulk, i'll get the popcorn ready :-D | 09:21 |
babbageclunk | :) | 09:21 |
wallyworld | babbageclunk: or even Done() | 09:21 |
babbageclunk | wallyworld: Done isn't verby enough for my taste. Get/Done doesn't seem right. | 09:25 |
wallyworld | fair point, i was trying to find an alternative to Release() to aleviate objections :-) | 09:25 |
babbageclunk | always the peacemaker | 09:27 |
wallyworld | oh, i have been known to enjoy poking the hornet's nest :-D | 09:27 |
babbageclunk | wallyworld: do you think I should rip the non-strict model uuid checking out of validateModelUUID? | 09:42 |
babbageclunk | wallyworld: There are still a few places that use it. | 09:43 |
wallyworld | babbageclunk: i *think* so - maybe in a followup after checking with tim/menno. i am pretty sure it was just to support really old clients | 09:43 |
babbageclunk | wallyworld: Ok - I'll do it as a separate change. | 09:44 |
wallyworld | sgtm | 09:44 |
mup | Bug #1631899 opened: juju show-controller --show-password does not show the password <juju-core:New> <https://launchpad.net/bugs/1631899> | 10:02 |
babbageclunk | wallyworld: Want to take another look at that PR, or are you ok for me to merge it? | 10:21 |
wallyworld | babbageclunk: i can take a quick look, menno was happy with it | 10:22 |
babbageclunk | wallyworld: cool thanks - I think I've gone through all of your comments. | 10:25 |
wallyworld | babbageclunk: i think it looks good to go. thanks for doing the fix; was a challenege to get all the bit lined up so you did well | 10:28 |
babbageclunk | wallyworld: cheers! | 10:28 |
beisner | hi all, last week we started seeing sha256 mismatches when units try to download the charm from the controller (1.25.6). it's prevalent in openstack ci. ie. shutting down: ModeInstalling ... failed to download charm ... expected sha256 FOO, got BAR | 11:02 |
beisner | is there a known issue or are we special? :) more detail @ http://pastebin.ubuntu.com/23302619/ | 11:02 |
mup | Bug #1631899 changed: juju show-controller --show-password does not show the password <juju-core:New> <https://launchpad.net/bugs/1631899> | 11:11 |
mup | Bug #1631899 opened: juju show-controller --show-password does not show the password <juju-core:New> <https://launchpad.net/bugs/1631899> | 11:23 |
mup | Bug #1631899 changed: juju show-controller --show-password does not show the password <juju-core:New> <https://launchpad.net/bugs/1631899> | 11:26 |
mup | Bug #1541482 opened: unable to download local: charm due to hash mismatch in multi-model deployment <2.0-count> <juju-release-support> <juju:Fix Released by menno.smits> <juju-core:New> <https://launchpad.net/bugs/1541482> | 11:26 |
mup | Bug #1629951 changed: cannot specify subnet to create controller in on bootstrap <juju:Triaged> <https://launchpad.net/bugs/1629951> | 12:17 |
mup | Bug #1630029 changed: models should inherit vpc-id from controller <juju:Triaged> <https://launchpad.net/bugs/1630029> | 12:17 |
dooferlad | rick_h_: 1:1 today? | 13:00 |
rick_h_ | dooferlad: /me checks thought he was in it | 13:01 |
rick_h_ | dooferlad: oh hmm, stuck at "requesting to join the video call" | 13:01 |
rick_h_ | katco``: dimitern voidspace mgz natefinch ping for standup | 14:00 |
mgz | omw | 14:00 |
dimitern | omw | 14:00 |
rick_h_ | oh right, US away so ignore me katco`` and nate | 14:00 |
voidspace | rick_h_: omw | 14:02 |
frobware | jamespage: do you often run into this issue: https://bugs.launchpad.net/juju/+bug/1600546 | 14:49 |
mup | Bug #1600546: lxd subnet setup by juju will interfere with openstack instance traffic <2.0> <network> <sts> <juju:Triaged by rharding> <nova-compute (Juju Charms Collection):New> <https://launchpad.net/bugs/1600546> | 14:49 |
voidspace | rick_h_: ping | 14:51 |
rick_h_ | voidspace: pong | 14:51 |
rick_h_ | dooferlad: ping | 15:21 |
dooferlad | rick_h_: hello | 15:21 |
rick_h_ | dooferlad: have more time to chat? | 15:21 |
dooferlad | rick_h_: yes | 15:22 |
rick_h_ | dooferlad: k, meet you back in the 1-1 room | 15:22 |
voidspace | rick_h_: 30 seconds for a bikeshed needed if you have it | 15:32 |
voidspace | rick_h_: last comment on bug 1602192 is the output I've implemented | 15:34 |
mup | Bug #1602192: when starting many LXD containers, they start failing to boot with "Too many open files" <lxd> <juju:In Progress by rharding> <lxd (Ubuntu):Confirmed> <https://launchpad.net/bugs/1602192> | 15:34 |
rick_h_ | voidspace: otp | 15:38 |
rick_h_ | voidspace: will look in a sec | 15:38 |
voidspace | rick_h_: ok, np - I'll PR it and the reviewer can bikeshed it | 15:38 |
jamespage | frobware, I've never run into that issue | 16:00 |
frobware | jamespage: thanks. just trying to understand the severity and whether we address it "now". | 16:00 |
jamespage | but apparently trent has hit it alot | 16:00 |
frobware | jamespage: any feeling whether this is just happening for just the training setup? | 16:01 |
jamespage | frobware, I'm not quite close enough to know the answer to that | 16:01 |
jamespage | sorry | 16:01 |
frobware | rick_h_: ^^ fyi | 16:02 |
voidspace | dimitern: if you have any ideas on how to test this, I'm all ears | 16:02 |
voidspace | dimitern: https://github.com/juju/juju/pull/6419 | 16:02 |
dimitern | voidspace: looking | 16:02 |
voidspace | dimitern: except providing something that stubs out PrintLn. | 16:02 |
voidspace | dimitern: the important bit is in provider/common/bootstrap.go | 16:03 |
dimitern | voidspace: how about adding an interface with BootstrapMessage() method and then check in common/bootstrap if it's implemented by the Environ ? | 16:04 |
dimitern | voidspace: then you could test it with the dummy provider, but no need to change all providers? | 16:04 |
voidspace | dimitern: right, but what does that enable me to test? | 16:05 |
voidspace | dimitern: I could then pass in a fake environ with a custom message, but what do I test? | 16:05 |
voidspace | dimitern: unless I replace the call to fmt.Println with something that can be mocked - which doesn't seem very useful | 16:05 |
voidspace | dimitern: I'm kind of arguing that as this is a ui change it needn't be tested | 16:05 |
voidspace | dimitern: maybe I can look in featuretests to see if something like this is covered | 16:06 |
dimitern | voidspace: only that common.Bootstrap() calls the optional method, when it's there, and checking the ctx.Stdout() to ensure it's there? | 16:06 |
rick_h_ | voidspace: wfm for now, thank you | 16:06 |
voidspace | rick_h_: cool, thanks | 16:06 |
rick_h_ | frobware: yea, so I'm -1 on that being the biggest issue atm | 16:06 |
* rick_h_ goes to get lunchables | 16:06 | |
voidspace | dimitern: in which case I can test that *already* | 16:06 |
voidspace | dimitern: and put a non-empty string in dummy provider to test it | 16:07 |
voidspace | dimitern: let me look - thanks | 16:07 |
voidspace | dimitern: if we're calling fmt.Println will that be in ctx.Stdout ? | 16:07 |
dimitern | voidspace: alternatively, you could go with BootstrapContext.Infof() only called in provider/lxd ? | 16:07 |
dimitern | voidspace: since it's very specific and not needed everywhere | 16:08 |
dimitern | voidspace: check e.g. PrepareForBootstrap in provider/ec2 and the code around validateBootstrapVPC() | 16:09 |
dimitern | voidspace: (bootstrap)ctx.Infof() is already used for similar messages during bootstrap in some providers | 16:10 |
voidspace | kk | 16:10 |
voidspace | dimitern: using Stderr on the context I can test that the message is output | 16:33 |
dimitern | voidspace: \o/ nice! | 16:34 |
=== frankban is now known as frankban|afk | ||
=== dames is now known as thedac | ||
rogpeppe1 | katco``: i've made a bunch of comments and changes in response to your review of https://github.com/juju/juju/pull/6407. PTAL. | 17:55 |
rick_h_ | rogpeppe1: she's on US holiday today | 17:55 |
rick_h_ | rogpeppe1: and with the EU folks EOD'ing will have to catch tomorrow it looks like sorry for the delay | 17:56 |
arosales | If any folks are have access to s390 we are seeing https://bugs.launchpad.net/juju/+bug/1632030 | 18:18 |
mup | Bug #1632030: juju-db fails to start -- WiredTiger reports Input/output error <juju> <juju-db> <mongodb> <s390x> <juju:New> <https://launchpad.net/bugs/1632030> | 18:18 |
* rick_h_ goes to get the boy home from school, biab | 19:09 | |
veebers | alexisb: Ina normal (test) run, should I see the log message: "ERROR cmd supercommand.go:458 creating API connection: ..."? (I'm seeing this error in the failing tests: ERROR cmd supercommand.go:458 creating API connection: EOF) | 20:23 |
alexisb | not on a normal test run you are expecting to pass without error | 20:24 |
veebers | I never see "creating API connection" in a passing test run (for grant-revoke) | 20:24 |
alexisb | veebers, when did you start seeing the new failures? | 20:24 |
veebers | alexisb: what is it a symptom of? One sec will check times | 20:24 |
veebers | alexisb: looks to be about Oct 7 (Jenkins time, so UTC?) had 2 passes of 12 runs since then, the failures all include that error message | 20:25 |
alexisb | menn0, ^^^ | 20:28 |
alexisb | I am wondering if this lines up with this merge: https://github.com/juju/juju/pull/6400 | 20:28 |
menn0 | veebers: where are you see that message? from the juju client or in agent logs? | 20:32 |
menn0 | veebers: a certain amount of such messages are expected as when controllers come up and when workers restart after new controllers are added | 20:33 |
veebers | menn0: um, I'm pretty sure the client? (I'm seeing it in the log output from juju command) http://reports.vapour.ws/releases/4467/job/functional-grant-revoke/attempt/721 if you search for "connection: EOF" | 20:33 |
* menn0 looks | 20:34 | |
veebers | menn0: this is after a controller comes up, users have been added etc. | 20:34 |
veebers | menn0: Would this be related to the ping changes to? http://reports.vapour.ws/releases/4467/job/run-unit-tests-race/attempt/1949 (FAIL: monitor_internal_test.go:69: MonitorSuite.TestLaterPingFails) | 20:40 |
menn0 | veebers: that's a new test. i'll take a look. | 20:41 |
menn0 | veebers: the other issue appears to be macaroons related. here's the related bit in the controller's logs: | 20:41 |
menn0 | 2016-10-10 16:17:39 INFO juju.apiserver request_notifier.go:70 [1E] API connection from 10.0.8.1:33086 | 20:41 |
menn0 | 2016-10-10 16:17:39 INFO juju.apiserver admin.go:102 login failed with discharge-required error: verification failed: no macaroons | 20:41 |
menn0 | 2016-10-10 16:17:39 INFO bakery service.go:366 server attempting to discharge "eyJUaGlyZFBhcnR5UHVibGljS2V5IjoieVZlUEU3cUNJMm5nZWZuUXo2NlFPWVEwanlMVnVDanMwZGhMa0VUa3dRST0iLCJGaXJzdFBhcnR5UHVibGljS2V5IjoiNzF5NnNIOExNNUU1OHYxaGpGek9JdCtCZGRLRHR6SE5pN0ExQ1dBWHZqND0iLCJOb25jZSI6InpoMllTNnZ1bFFHNUlPdVRucXY4U0hVNW5MQWlSaE43IiwiSWQiOiJlSHM0Y3dERDRiOCs5TXBDUG1RakwvenVFdXZhV3dSbmhkY0tXWmtGQjA5MENaMXYxcnBYZ0RNczRsTTUrc2FSS2l2RGt0OFhlOG5BZTNxVG5 | 20:41 |
menn0 | 4Y1YwMXhIUWZxTXRmWHk4TUdnNmRvSWZlZWpSZmJnby9pdjIxSmZiU2FkQjNwdXR6ZmJ2ekRvbGh1aDNPcGdCbTRQd2pSRGx4cGFEQnZaL0lwZGhoL0lQb3dMdXc9PSJ9" | 20:41 |
menn0 | 2016-10-10 16:17:39 INFO juju.apiserver request_notifier.go:80 [1E] API connection terminated after 28.244335ms | 20:42 |
menn0 | 2016-10-10 16:17:39 INFO juju.apiserver request_notifier.go:70 [1F] API connection from 10.0.8.1:33096 | 20:42 |
menn0 | 2016-10-10 16:17:39 INFO juju.apiserver request_notifier.go:80 [1F] user-admin API connection terminated after 43.24719ms | 20:42 |
menn0 | veebers: i'll dig a bit more into that one to see if I can figure it out | 20:43 |
veebers | menn0: I see a macroons change about 3 days ago, "remove macaroons on logout" | 20:43 |
menn0 | veebers: ok, could be related. those messages might also be normal. I'll do some digging. | 20:44 |
veebers | menn0: Hmm, it's possible that the test misbehaves with those recent changes. Just looking now and if it's running on lxd (which this test is) it doesn't handle entering the password. | 20:46 |
veebers | menn0: Did that behaviour change recently? (i.e. lxd not need password) | 20:47 |
menn0 | veebers: hmm, ok. that could be it. | 20:47 |
menn0 | veebers: I have no idea. | 20:47 |
* veebers builds latest juju to test | 20:47 | |
veebers | menn0: let me test that and I'll get back to you | 20:47 |
rick_h_ | veebers: hmm, yea the branch nate did made sure to clear cookies when you logout | 20:47 |
rick_h_ | veebers: menn0 which meant that you had to actually login again after a logout for a change | 20:47 |
veebers | rick_h_: that's probably it then, I should be able to confirm shortly | 20:48 |
menn0 | veebers: is there a ticket yet for the TestLaterPingFails intermittent failure? | 20:49 |
veebers | menn0: not yet, I can make one right now | 20:49 |
menn0 | veebers: ok thanks. i've just reproed it here | 20:50 |
menn0 | grrr. even with injected clocks it's still too easy to introduce these intermittent timing issues .... | 20:51 |
veebers | menn0: fyi https://bugs.launchpad.net/juju/+bug/1632105 | 20:52 |
mup | Bug #1632105: Test MonitorSuite.TestLaterPingFails fails <ci> <regression> <unit-tests> <juju:Confirmed> <https://launchpad.net/bugs/1632105> | 20:52 |
menn0 | veebers: cheers | 20:53 |
menn0 | veebers: ok i've got the fix for TestLaterPingFails done... will propose shortly | 21:14 |
veebers | menn0: sweetbix | 21:15 |
menn0 | veebers: thanks for pointing it out | 21:16 |
veebers | menn0: heh, no worries, it popped up in the 'unknowns' that I'm wrangling | 21:17 |
menn0 | good to get on top of these intermittent failures quickly | 21:18 |
veebers | agreed | 21:18 |
menn0 | thumper: easy review pls: https://github.com/juju/juju/pull/6420 | 21:23 |
* thumper looking | 21:23 | |
thumper | +1 | 21:24 |
veebers | menn0: ugh, sorry for the noise earlier, the fix for the grant/revoke/macroons stuff was in the CI test. | 21:27 |
menn0 | veebers: ok good to know | 21:27 |
menn0 | thumper: it's probably too late to expose the HighAvailabilty facade for controller logins isn't it? | 21:40 |
thumper | menn0: yes and no | 21:41 |
thumper | you could create a new one | 21:41 |
thumper | but still need to support the old one | 21:41 |
menn0 | thumper: i'm working on the ticket to allow juju enable-ha to just work regardless of the current model | 21:41 |
thumper | I guessed | 21:42 |
menn0 | thumper: the client side work is a piece of cake but you then end up with a controller login | 21:42 |
menn0 | thumper: the alternative is to do something tricky in the client so it logs into the controller model | 21:42 |
thumper | Add the new facade | 21:42 |
* thumper thinks | 21:42 | |
menn0 | I don't think that would be hard to do and it's possibly lower risk | 21:43 |
thumper | you need to have admin access in the controller model to do it | 21:43 |
* menn0 checks what the current story with access is | 21:44 | |
thumper | or perhaps just write? | 21:44 |
menn0 | you currently need superuser access | 21:45 |
menn0 | that probably makes sense TBH | 21:46 |
menn0 | thumper: i don't *think* a new facade is required, the existing HighAvailability facade just needs to be exposed for controller logins and (controller) model logins (for compatibility) | 21:50 |
thumper | ok | 21:50 |
menn0 | thumper: the client will have to try both approaches (in case a newer client is talking to an older server) | 21:50 |
* thumper nods | 21:50 | |
menn0 | thumper: with a preference for a controller login | 21:50 |
menn0 | ok | 21:51 |
menn0 | thumper: thanks rubber duck :) | 21:51 |
veebers | thumper: you recently worked on some unit tests wrt to certificates/keys etc? Fyi I just filed this bug which may be of interest: https://bugs.launchpad.net/juju/+bug/1632127 | 21:59 |
mup | Bug #1632127: Unittest "MachineSuite.TestCertificateDNSUpdatedInvalidPrivateKey" fails on multiple archs <ci> <unit-tests> <juju:Confirmed> <https://launchpad.net/bugs/1632127> | 21:59 |
thumper | veebers: as much as I love fixing bugs, I'm focusing on a different piece of work this week | 21:59 |
veebers | thumper: ack, only wanted to bring it to your attention incase my assumption of you working in that area was correct :-) | 22:01 |
menn0 | veebers: TestCertificateDNSUpdatedInvalidPrivateKey was added by axw yesterday | 22:21 |
menn0 | it just failed for me on a merge attempt tooo | 22:21 |
veebers | menn0: ack thanks, oh he's now on leave right? | 22:23 |
menn0 | veebers: ah yes... could be | 22:23 |
menn0 | veebers: assign it to me... I was working with him in that area yesterday | 22:23 |
anastasiamac | menn0: veebers: Roger filed a bug for the failure.. i've marked Chris's as a duplicate... | 22:23 |
menn0 | anastasiamac: ok cool. bug number? | 22:24 |
veebers | menn0: anastasiamac points out I filed a dupe bug | 22:24 |
anastasiamac | menn0: veebers: https://bugs.launchpad.net/bugs/1631990 | 22:24 |
mup | Bug #1631990: cmd/jujud/agent: sporadic test failure in MachineSuite.TestCertificateDNSUpdatedInvalidPrivateKey <ci> <unit-tests> <juju:Triaged> <https://launchpad.net/bugs/1631990> | 22:24 |
veebers | ah, beat me to it :-) | 22:24 |
menn0 | veebers: yep got that... I was after the original ticket | 22:25 |
menn0 | anastasiamac, veebers: thanks | 22:25 |
jamespage | hey menn0 - I see you picked up bug 1541482 | 22:28 |
mup | Bug #1541482: unable to download local: charm due to hash mismatch in multi-model deployment <2.0-count> <juju-release-support> <uosci> <juju:Fix Released by menno.smits> <juju-core:Triaged by menno.smits> <https://launchpad.net/bugs/1541482> | 22:28 |
menn0 | jamespage: yep. I fixed the same/similar issue for 2.0 so I seemed like the right person to deal with it for 1.25. | 22:29 |
jamespage | menn0, awesome | 22:29 |
jamespage | menn0, its killing our final release testing atm | 22:29 |
jamespage | menn0, any ideas what triggers the race that causes it? | 22:29 |
jamespage | we don't see it all of the time | 22:29 |
menn0 | jamespage: IIRC it's to do with the apiserver's on disk charm cache | 22:30 |
menn0 | jamespage: the solution is to remove the cache. it's not necessary any more (already done for 2.0) | 22:30 |
jamespage | menn0, yeah - I saw the fix you did for 2.0 | 22:31 |
menn0 | jamespage: there might be some other related fixes in that area too. I need to review previous PRs. | 22:32 |
jamespage | menn0, ok - I'll leave it with you | 22:32 |
menn0 | jamespage: actually, I just checked and I've already done that | 22:32 |
menn0 | jamespage: those changes just haven't been released yet | 22:33 |
menn0 | jamespage: we need a 1.25.7 release | 22:33 |
jamespage | yp! | 22:33 |
menn0 | jamespage: if it would help I can supply binaries for you to use in the mean time (to validate that the problem is indeed fixed for you) | 22:34 |
thumper | jamespage: how urgent is the need for 1.25.7 for your team? | 22:36 |
jamespage | thumper, we release thursday; currently we're doing final sweep of amulet testing tidy including disabling old release combos and enabling new ones | 22:37 |
jamespage | thumper, but that final sweep has been going for alot longer than normal as we can't get a full amulet run through consistently atm | 22:38 |
* thumper nods | 22:38 | |
jamespage | thumper, we tried a workaround, but we're having to unpick that now | 22:38 |
jamespage | thumper, as that did something quite different to what we expected | 22:39 |
jamespage | thumper, so fairly urgent - I'll just see if we can splice in custom binaries to our Charm CI or not | 22:39 |
thumper | jamespage: well, there is no way it is happening this week :( | 22:41 |
thumper | so, sorry | 22:41 |
jamespage | that was my guess | 22:41 |
alexisb | menn0, ping | 23:18 |
anastasiamac | jamespage: plz email alexisb and rick_h_re:urgency for 1.25.7 to come out \o/ | 23:22 |
mwhudson | alexisb: well i started on that bug and now i'm reading the kernel source :) | 23:42 |
thumper | menn0: are you working on https://launchpad.net/bugs/1631990 now? | 23:45 |
mup | Bug #1631990: cmd/jujud/agent: sporadic test failure in MachineSuite.TestCertificateDNSUpdatedInvalidPrivateKey <ci> <unit-tests> <juju:Triaged by menno.smits> <https://launchpad.net/bugs/1631990> | 23:45 |
menn0 | thumper: I haven't gotten to it yet | 23:47 |
menn0 | thumper: did you want to pick it up, or were you just bitten by it? | 23:48 |
Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!