[00:34] thumper: It should take about how it's used by the tests (not juju) to store the details for substrates to test against === wgrant_ is now known as wgrant [01:16] veebers: do you have a few minutes [01:16] thumper: I do [01:16] veebers: HO? [01:16] thumper: sounds good, up release-call? [01:16] sure [01:21] anastasiamac: small one https://github.com/juju/juju/pull/9085 [01:22] * anastasiamac looking [01:24] wallyworld: lgtm as long as 'hooks' dir is created.. m guessing it is otherwise it would not have worked for u :D [01:24] the python code does all that [01:24] you just need to assign the hooks [01:24] yep. i assumed so :) thanks for a quick fix!! [01:24] np, i should do the same fix for the other charm in the same pr [01:24] the constraints one [01:25] will be the same thing [01:25] yes, all charms should have it now... [01:25] unless thay r testing the actual failure to deploy invalid charms and I do not think we have ci tests for that, only unit ones) [01:36] wallyworld, thumper it seems the upgrade test failure for 2.4 is legit, upgrade commands states to use proposed, controller logs show it's looking in released: https://pastebin.canonical.com/p/NG8PRCg26f/ [01:37] shit eh [01:38] lucky we have tests [01:46] I've noted in the doc, moving on to the next one [01:48] team, what's the haps with https://bugs.launchpad.net/juju/+bug/1782803 (just noticed it as I was filing a bug) [01:48] Bug #1782803: juju 2.4.1: juju status failure [01:48] just noticed it was critical* [01:55] veebers: from memory we told them it wasn't for juju to retry [01:55] if it's the one i'm thinking of [01:56] anastasiamac: i pushed a couple of small fixes for the other 2 CI failures [01:56] veebers: it was marked as New overnight but it should not be critical [01:57] wallyworld: ok, the bug has been updated 6 hours ago. It might not be clear there as it's still marked crit [01:57] ack anastasiamac ^^ [01:57] veebers: it's something with their setup and yes, wallyworld is right - not on us [01:58] looks like they've reopened it will logs attached [01:59] *with [01:59] it can be looked at but IMO we'll push back as not a release stopper [01:59] wallyworld: m not convinced that the api change is needed but m not too attached to it :D so my +1 still stands unless u want multiple +1 from me :D [02:00] anastasiamac: what api change? [02:00] "zip file spec 4.4.17.1 says that separators are always "/" even on Windows." [02:00] ok that. that's why the unit tests are failing on windows [02:00] we were looking for a hooks\install [02:00] wallyworld: oh ic... good to know [02:01] i have not rerun the windows unit tests but that *has* to be the reason i think [02:01] we'll see soon enough [02:01] :) [02:02] kelvinliu__: nice work with the enable-condition [02:04] kelvinliu__: has that fix above landed? if so i'll strike out the issue in the doc [02:05] * thumper groans [02:06] veebers: the test failed with an unrelated failure AFAICT [02:06] veebers, wallyworld just in 1:1 meeting with Tim. going to land it now, [02:06] gr8 ty [02:07] thumper: which test [02:07] veebers, I deployed the RunFunctionaltests-amd64 job with the fix, and tested [02:07] thumper: ah log rotation right [02:07] yeah... [02:07] kelvinliu__: sweetbix [02:08] I'm just deploying the charm myself locally and testing that way [02:08] wallyworld, landed and tested. going to re-test crd now [02:08] ty [02:09] thumper: what was the new failure? [02:10] https://pastebin.canonical.com/p/rQfKs22C4d/ [02:12] thumper: you weren't expecting "ERROR:root:Wrong unit name: Expected: /var/log/juju/machine-0.log, actual: /var/log/juju/machine-lock.log" ? [02:14] veebers: oh, I was just looking at the last error... [02:15] thumper: ack, that last failure is probably jujupy choking because you used --existing and it screwed up and got confused :-| [02:15] ah [02:26] wallyworld: quick call? [02:27] ok [02:27] wallyworld: release call HO? [02:31] wallyworld: https://github.com/juju/juju/pull/9086 [02:50] lxc list [02:50] lol, wrong window [02:58] veebers: lolo :) at least no password... we've all putour password into irc chat at least once :) [02:58] hah, I have done that too :-P [02:59] or perhaps 'lxc list' *is* my password >_> [03:00] hmmm k that would b pretty sad pass phrase :) altho m not better - i usually use song lyrics as my pass phrases :) [03:01] like a variation on 'a spoonful of sugar' :D [03:06] ^_^ [03:08] ok, I'm redoing how we do the manual provider test, it's silly how we're currently doing it [03:13] Did someone clean up the GCE addresses? The quota is saying 4/23 in use. [03:15] babbageclunk: I didn't, is it split by region? [03:16] veebers: I think so, but this is for the us-central1 region that's in the error. [03:16] huh, curiouser and curiouser. [03:16] babbageclunk: hmm odd, Perhaps it's was a perfect storm and there was heaps of jobs running in that region at the time and we got unlucky to run out [03:17] Yeah, maybe. [03:19] babbageclunk: could be worth checking what regions are used in tests and perhaps manually distributing them out a bit? [03:20] veebers: ok, just looking at the job config to understand what it's doing. [03:20] babbageclunk: heh, let me know if you need anything clarified :-) Most the job configs are setup, the test run is a single build step [03:22] Thanks, I'll have a go at working it out first before roping you in! :) [03:22] is it possible to set a UserKnownHostsFile option for juju (i.e. ssh option)? [03:51] veebers: juju help ssh says yes [03:52] i assume you are talking about for running juju ssh [03:52] wallyworld: I meant for everything ssh that juju does (i.e. with a manual provider how it gets into the machine) [03:53] oh, juju use of ssh internally. i think that's all fixed [03:53] it's ok I've gone with a different approach that'll work. It's just not so fancy [03:53] fixed as in hard coded [03:53] wallyworld: ack, thanks for confirming. I've got something working though [03:54] (the reason was: I was 'lxc copy'-ing new machines from a base, but need to auth them to ssh in, using a generated known_hosts key would work, but need to set which file that actually is). [03:54] veebers: i left a comment on that upgrade bug - not something we can fix quickly / easily sadly IIANM [03:54] I've since created manual tests for the different clouds and locked down which IPs they start with. The lxd network management seems pretty nifty https://stgraber.org/2016/10/27/network-management-with-lxd-2-3/ [03:55] wallyworld: oh, :-( [03:55] wallyworld: it worked previously though right? [03:55] not that i can see [03:55] i can't have [03:55] it [03:56] ah ok [03:56] oh, it works in develop though [03:56] simple controller model owrks [03:56] but not agents on machines [03:56] probably broken in devel too, or not? [03:57] need to check but if it works in develop my theory is wrong [03:59] veebers: is the pexpect() stuff a substring match? eg does child.expect('(?i)password') match "some text here password:" [04:02] wallyworld: that test is green for develop branch (upgrades) [04:03] wallyworld: re: pexpect, should just be regex IIRC [04:03] it could be green because the agent binaries get cached [04:03] so my theory could be wrong. the code looks correct though [04:04] for the controller be use the supplied agent stream [04:05] wallyworld: FYI https://pexpect.readthedocs.io/en/stable/api/pexpect.html#pexpect.spawn.expect [04:06] hmmm, that test should work then [04:07] unless it needs ^.* etc [04:09] we should make it as promiscuous as possible, we only care if its asking for a password [04:12] wallyworld: FYI I found a 2.4 branch run of the upgrade tst that passed: http://10.125.0.203:8080/job/nw-upgrade-juju-amd64-lxd/199/console (2.4-rc2) [04:12] veebers: it could be the error is misleading then [04:13] the agents will only look in release streams [04:13] but if the controller has been done successfully first, the agents will be cached [04:14] although this one fails as we're seeing now: http://10.125.0.203:8080/job/nw-upgrade-juju-amd64-lxd/233/console [04:25] I can't seem to get the dbLog feature tests that fail intermittently to fail on my machine at all [04:26] veebers: if my reading of the pexpect doc is correct, our test is broken. http://pexpect.sourceforge.net/pexpect.html#spawn-expect seems to say that expect("bar") will not match "foobar". so our expect("password") will not match "Enter a password:" [04:29] wallyworld: huh, that seems to be the case if we're just passing in the string. We could pass in a compiled regex instead [04:29] is the tst really just using ("password")? that sucks [04:30] child.expect('(?i)password') [04:30] which hopefully is treated as an uncompiled regex [04:31] alol other usages seem to do the right thing and use the whole prompt [04:32] eg [04:32] child.expect('Enter client-email:') [04:32] did someone delete that GCE quota? [04:32] not me said the duck [04:32] babbageclunk: I haven't touched it [04:32] Weird, it's not listed on the quota page anymore. :/ [04:33] wallyworld: "Strings will be compiled to re types" [04:33] babbageclunk: that's really odd [04:33] wallyworld, the crd works as expected. [04:34] veebers: ok, i'll look to follow convention elsewhere and use the exact prompt [04:34] kelvinliu__: awesome ty [04:34] wallyworld, np [04:34] wallyworld: a regex would be better surely? so we don't get tripped up by minor text changes [04:35] our preferred convention elsewhere (in juju also) is to use exact text [04:35] so we get breakages [04:35] so we think about the consequences of changing [04:35] and also so we can see when error messages are dumb [04:36] if you just match on a small regexp, you miss things like "could not do this because: could not do this: because could not do it" etc [04:37] wallyworld: ack [04:37] good point [04:39] thumper: it seems like the commands in that job are failing which feeds bad input into the next command. one sec I'll line something up [04:48] * thumper nods [04:59] vinodhini: looks like the timeout extension worked, it needed an extra 10 minutes apparently [04:59] 100 minutes is a long time for that test though, maybe there is an issue with azure-arm. Did you try a different region too? Perhaps the default we use is slow etc. [04:59] i didnt try diff region. [05:00] its just timeout period i incresed first in default reg [05:00] http://localhost:18080/job/nw-model-migration-amd64-azure-arm/647/console [05:05] vinodhini: I would attempting trying a different region see if that goes faster; having a test take 1hr 40 min is a bit gross :-) [05:07] i will try with actual time period and diff region [05:07] i mean the orig time period [05:16] veebers: just a quick clarification plz correct me if i am wrong here - ENV=parallel-azure-arm -- iam setting this to different region. and i am listing out the regions from juju list-region azure [05:16] vinodhini: no, that env stays the same (it's the part that says run this test in azure-arm). just below that should be the assess_ call, that should take a --region arg [05:16] one sec, let me check [05:17] veebers: a small PR for the pexpect fix [05:17] https://github.com/juju/juju/pull/9087 [05:17] ok. iam seeing in acceptance test assess_model_migration [05:17] i got that. [05:18] --region is option which overrides it. [05:18] it alright thanks veebers [05:18] vinodhini: yeah --region should be there for the model migration test [05:18] sweet :-_ [05:18] wallyworld: ack, looking [05:18] wallyworld: you've used a json query CLI tool before? something like jq or so? [05:18] i have [05:19] can't remember the syntax though [05:19] been a while but very useful [05:20] its ok. i verified in py script [05:20] wallyworld: ack cool I'll look it up, Should be able to use this 5 piped command using grep/sed/head etc. ^_^ [05:20] now i have set time 90 and diff region and started it [05:20] lets see [05:20] veebers: yep, i pipe from stdin etc when i used it [05:22] veebers: i thought about controller_name but that is the one bit we don't really care about that could change [05:22] wallyworld: ack, fair enough [05:22] and it may not be contreoller_name [05:23] the test should be using a different controller [05:23] for true multi-controller cmr [05:35] Is anyone else getting gocomplaints from gometalinter about gomocks-generated files not being goimported? [05:44] wallyworld: ^ [05:44] I've updated the nw-bootstrap-constraints-maas-2-2 job so it should get the right input for the test, going to have tea will check back in later on. [05:53] babbageclunk: i haven't so far [05:53] kelvin added some new micks yesterday [05:53] but they are all committed in tree [05:54] wallyworld: I tried running it again and it went away, so I don't know what was happening there. [05:54] * wallyworld shrugs [05:54] * babbageclunk also [06:25] wallyworld: don't forget to propse your fixes to develop too :-) [06:30] veebers: are u strill ard. i did revert back the time qnd changed the region and its all good Success. [06:30] http://localhost:18080/job/nw-model-migration-amd64-azure-arm/648/console [06:48] wallyworld: looks like veebers not ard [06:49] i would like to know abt this azure failure which is actually fine if we change the regin. [06:50] go go gadget gometalinter [06:50] so what shd be the solution ? i have made the modification directly in Web UI [06:59] I have updated the doc. [07:15] vinodhini: not sure, i'll have to read the failure, i am not faimi9lair with it [07:18] vinodhini: wouldn't it be better to increase the test timeout? that's what i seem to recall may have been discussed this morning [07:43] wallyworld: i was away to get some dinner. [07:43] Yes. initially i increased the timeout period and it was successful. [07:43] but veebers was asking me not to do that way [07:52] vinodhini: ok, i'm surprised at that. i'll talk to him tomorrow. just changing the region is quite fragile as that coud slow down also [07:52] thanks for looking into it [07:52] its ok. [07:53] i was working in credentialsd part its was just side by side running. [07:53] good plan [07:53] this is not potential failure. its slow thats why its an issue [07:53] yeah, azure is very slow at instance creation/destroy [07:54] So we arent doing release today ? [07:54] I am sure veebers will look into the status a bit later :-) [07:54] maybe, maybe not, depends on how the other guys go with the remaining issues. i'd say not today but tomorrow if i had to guess [07:55] In this case how to target the solution iam not sure. Modifying a config option is not a fix. [07:55] So we should focus on solution. [07:57] ok. wallyworld. I am drafting a mail to you. I wont be there tomorrow morning hours as i have appoinment with Indian consulate. [07:59] it depends on the root cause. if the substrate is slow, then increasing a timeout seems reasonable to me [08:04] vinodhini, wallyworld: The timeout is already 90 minutes, any more seems like a huge amount. My suggestion was to try a different region in case the original is having troubles etc. [08:04] wow 90 minutes!!! [08:04] fark [08:05] wallyworld: if it's still taking ages in another region there is an issue there [08:06] yeah, let's see [08:06] yeah, it times out after 90 :-) Takes about 1hr 45 min for a successful run [08:07] veebers: do you know the gce quota status? was that sorted? [08:08] wallyworld: no idea sorry. I know babbageclunk was looking. We thought perhaps it was bad timing and we had a bunch of stuff all running the same region etc. Not sure if the suggestion to check which region is used across tests (with the thought to share it out a bit) went [08:08] ok, np [08:10] wallyworld: the jq way is much better: https://github.com/CanonicalLtd/juju-qa-jenkins/pull/81/files [08:56] veebers: looks good [09:10] wallyworld: this is an easy one: https://github.com/juju/juju/pull/9088 [09:10] looking [09:11] lgtm ty [10:56] manadart: you got 5 minutes for a quick HO? [11:04] * stickupkid gone for lunch [11:14] morning party folks [11:30] stickupkid: morning [11:31] stickupkid: can I ask you to pause WIP and grab an issue from the release blocking doc please? [12:11] sure can [12:12] stickupkid: ty, the other side of the world cranked out a lot of notes/fixes and we need to help move forward today. [12:12] rick_h_: just reading up on the doc [12:12] stickupkid: k, let me or hml know if you have any questions/issues [14:04] externalreality: Approved #9084 [14:15] manadart, cool. I spoted that I attempted to push the removal of the Id feild did not make it in. Gonna add that before attempting to land. [15:16] externalreality: Didn't quite get all of my PR done before EoD, but I've put it up as a WIP, if you are able to review: https://github.com/juju/juju/pull/9090 [15:27] stickupkid: quick pr pls: https://github.com/juju/juju/pull/9091 [15:28] hml: done [15:28] stickupkid: ty [15:30] stickupkid: i’m off to long lunch shortly. do you have anything for me to review? [15:30] hml: nope, nothing atm, just digging [15:30] pretty sure I'm just making the hole deeper [15:30] stickupkid: ha! [15:34] manadart, reviewing now [15:38] stickupkid: "I'm gonna need a bigger shovel!" [15:39] rick_h_: true [15:52] has anyone seen this recently "16:51:11 DEBUG juju.provider.common bootstrap.go:575 connection attempt for 10.156.96.10 failed: /var/lib/juju/nonce.txt does not exist" - it's been happening a couple of times today [15:52] ? [15:53] Just doing a "juju bootstrap localhost --debug" on the 2.4 branch [15:53] it works in the end, but really takes it's time... [15:57] stickupkid: looks like some history https://bugs.launchpad.net/juju-core/+bug/1314682 [15:57] Bug #1314682: Bootstrap fails, missing /var/lib/juju/nonce.txt (containing 'user-admin:bootstrap') [15:57] rick_h_: nice, i'll give that a read [16:00] rick_h_: so i guess the retry that's implemented to fix this, does work... maybe my computer was just being slow... [16:01] stickupkid: yea, not sure. [16:01] * stickupkid back to digging... === beisner_ is now known as beisner [20:55] Morning o. [20:58] wheeeee [21:20] wallyworld, kelvinliu_, knobby: This call reminded me of this, if you haven't seen it: https://www.youtube.com/watch?v=JMOOG7rWTPg :p [21:24] cory_fu, ^.@ [21:24] wallyworld, veebers: I had a look at the GCE quota thing. As far as I could see the quota was now fine - IP addresses in use was fluctuating between 4 and 0 when the test was running. I couldn't change the region tests were using because it's defined as us-central1 in environments.yaml. Maybe I could duplicate parallel-gce as parallel-gce-us-east1 and move some jobs to use that instead? [21:25] babbageclunk: using --region with an assess script should overwrite that IIRC [21:25] babbageclunk: it looks like we may hit it when there are two ci-run going at the same time [21:26] babbageclunk: that’s what was giong on when it was hit again in run 1089 [21:26] veebers: ah, thanks - so if I change the jobs to use different regions that might avoid it? It definitely looks like a per-region quota. [21:27] veebers: ok, I'm going to do that now. [21:29] (dumb question, but what does the nw- prefix mean?) [22:31] gah, my brain's stopped accepting "likelihood" as a real word. [22:31] likeli [22:32] it does look strange written out [22:46] veebers: can you take a look at https://github.com/CanonicalLtd/juju-qa-jenkins/pull/83 ? I've checked there are no errors from jenkins-jobs. [22:46] babbageclunk: can do [22:46] ta [22:46] After it's deployed I'll make sure to run each of the changed jobs, just in case I missed a \ [22:48] babbageclunk: LGTM. a redeploy should be just doing nw-* so it redploys all the functional jobs (no need to screw around cherry picking names etc.) [22:49] veebers: more detail? Hang on, I'll read more of the readme. [22:50] babbageclunk: oh, your question earlier re: nw_ prefix; hah it's because while we where spinning up the new CI run bits we continued to run the original jobs; You couldn't run both at the same time as they stomped on each other (workspace/$JOBNAME is the working dir for a job). So I added nw- (new world), it was supposed to be changed when we did the roll over but never was [22:51] babbageclunk: ah sorry, hah yeah the arg for jenkins-job . . . . -r jobs/ci-run nw-* [22:52] Ah, ok - so running `jenkins-jobs update` like in the deploying jobs section, but with a wildcard to do all the new-world jobs. [22:52] veebers: coolthanks! [22:52] babbageclunk: yep that's the one [22:54] veebers: ok, having a go at deploying them now. [22:55] babbageclunk: sweet, let me know when it's done as I'm deploying and testing some changes I'm making [23:43] how do you add a private key interactively (in juju add-credential)? Remove all the linebreaks?