/srv/irclogs.ubuntu.com/2016/11/23/#juju-dev.txt

blahdeblahaxw: Does jujud-machine-0 need to contact itself on 127.0.0.1:17070?  I'm seeing those connections pop up even after seeing remote ones drop away.00:09
blahdeblahaxw: i.e. should I allow the loopback ones?00:10
axwblahdeblah: it does yeah, maybe just stop the agents *except* jujud-machine-0 because jujud-machine-0 talks to itself over loopback also00:13
blahdeblahaxw: Righto; so allow loopback, and stop the other agents on the machine 000:14
axwblahdeblah: yup00:14
blahdeblahaxw: OK - that's done; how long shall I leave it before checking for recovery?00:16
axwblahdeblah: I would expect it to be fairly quick if it's going to fix the issue. 10-15 mins00:23
blahdeblahOK - time to make a coffee and see how it goes then. :-)00:23
blahdeblahaxw: load dropped off pretty dramatically: https://libertysys.com.au/imagebin/2iugcW6L.png00:41
blahdeblahand network traffic, but that's pretty much expected given I'm stopping the packets: https://libertysys.com.au/imagebin/B8B9g8F7.png00:41
blahdeblahaxw: and here's your smoking gun: network https://libertysys.com.au/imagebin/Dqygzb0N.png00:42
blahdeblahaxw: what next - start the local agents back up?00:43
axwblahdeblah: yes, unblock them and see what happens please00:45
blahdeblahOK - that's done00:48
* blahdeblah starts writing an update for the bug report00:48
axwblahdeblah: thanks for that00:49
blahdeblahaxw: So, where to next with this?  The agents have gone straight back to spamming the controller's logs...00:58
blahdeblahlots of these: unit-ntpmaster-5[2554]: 2016-11-23 00:58:53 INFO juju.worker.dependency engine.go:351 "uniter" manifold worker stopped: dependency not available00:59
blahdeblahand these:00:59
blahdeblahunit-nrpe-6[2568]: 2016-11-23 00:59:20 DEBUG juju.worker.dependency engine.go:301 starting "uniter" manifold worker00:59
blahdeblahunit-nrpe-6[2568]: 2016-11-23 00:59:20 DEBUG juju.worker.dependency engine.go:268 "uniter" manifold requested "agent" resource00:59
blahdeblahaxw: And we're straight back to previous load & network traffic levels01:19
blahdeblahRAM still creeping up01:19
axwblahdeblah: are there errors in the log? about lease manager not being available or whatever it is01:20
thumperFAIL: pinger_test.go:104: pingerSuite.TestAgentConnectionDelaysShutdownWithPing01:57
thumperseriously?01:57
thumperthis test is still failing?01:58
thumperFFS01:58
menn0thumper: yep... I thought I had it licked but not01:58
anastasiamacthumper: menn0: funny how it's not on my hit list of intemittents that block promotion from develop to staging..02:00
anastasiamacthumper: menn0: do u have time to pick it up?02:00
thumperdealing with migration stuff right now02:01
menn0anastasiamac: ditto... I'm not supposed to pick up anything else02:01
anastasiamacawesome \o/02:01
blahdeblahaxw: Sorry - missed that message earlier.  Should the messages be of type ERROR?02:04
axwblahdeblah: yes02:04
axwblahdeblah: in the agent logs02:04
blahdeblahaxw: not machine 0?02:04
axwblahdeblah: there might be some in there too, but the errors I'm thinking of are the cause of the (non-controller) agent restarts02:05
menn0wallyworld: review done02:05
wallyworldtyvm02:05
blahdeblahaxw: no sign of any errors to do with lease manager02:24
axwblahdeblah: hmm, ok. no errors at all in the agents, suggesting they're restarting?02:25
blahdeblahaxw: only the ones relating to the connection going down when I blocked traffic02:25
blahdeblahyou want me to collect full logs from machine 0 for you?02:25
axwblahdeblah: yeah wouldn't hurt, thanks02:26
anastasiamacthumper: i wonder what the magic incantation is - still can't reproduce it after 520 runs under stress02:27
thumpermenn0: https://github.com/juju/juju/pull/6597 easy one02:30
menn0thumper: looking02:30
menn0thumper: QA steps?02:31
thumperrun the tests02:31
thumperI made sure it failed first02:31
menn0thumper: ok fine02:31
menn0thumper: how serious is the problem that's being fixed?02:32
menn0seems problematic02:32
thumpermenn0: migrations will fail02:32
thumperif anyone has ever removed a unit that had an action02:32
thumperas the cleanups needs to be empty for the model02:32
thumperalso, data just hangs around02:32
thumperplan on backporting to the 2.0 branch...02:33
menn0thumper: glad you found it now then02:33
thumperwell02:33
thumperare we going to do another 2.0 release?02:33
menn0nfi02:33
menn0thumper: ship it02:33
thumpermenn0: found it as part of the b7 upgrade02:33
thumperas horrible as that work has been, found many issues02:34
anastasiamacthumper: another 2.0.x release mayb a possibility -it's on a per-need basis..02:38
anastasiamacthumper: original plan is to support 2.0.x until feb02:38
wallyworldmenn0: tech board?03:03
menn0wallyworld: just when I was actually starting to make some progress today...03:03
wallyworldyeah, always the way03:03
=== frankban|afk is now known as frankban
macgreagoirvoidspace frobware: I've a couple of IPv6-related PRs in, if you have a minute. Both required for the next juju PR.08:41
macgreagoirhttps://github.com/juju/testing/pull/11708:41
macgreagoirhttps://github.com/juju/replicaset/pull/308:42
macgreagoirBoth simple too.08:42
jamfrobware: ping11:21
jamjust wondering what you discovered WRT the bridge script in your testing yesterdoy11:21
jammacgreagoir: just to mention, you can ping me as well now. :)11:36
macgreagoirjam: Of course :-) cheers.11:37
frobwarejam: well, one thing is that if there's nothing to do we still run ifupdown - which was fine before, but not now.11:42
frobwarejam: was offline - powercut11:42
jamfrobware: good catch11:42
jamgood thing you're using 4g everywhere, right? :)11:42
jammacgreagoir: reviewing11:42
frobwarejam: I got sidetracked by landing https://github.com/juju/juju/pull/659911:43
frobwarejam: we will need the script to be present from now on so it seemed prudent to make that so...11:43
jamsure, though maybe we should just trigger writing it from jujud itself? given we have a copy inside the jujud binary11:44
frobwarejam: can do. baby steps.11:45
jam:)11:45
jamI also wonder if we ultimately want it as a python script if we are moving to doing it ourselves. It made sense when it was driven by cloud-init. Stuff to consider, at least.11:45
frobwarejam: I got a little annoyed it wasn't on there during my testing.11:45
frobwarejam: yes, something I had considered too.11:45
frobwarejam: as it stands now, it will work for dynamic and our current static bridging.11:46
macgreagoirfrobware: replicaset comments amended.11:59
jammacgreagoir: commented.12:00
frobwaremacgreagoir: was still looking at this12:00
frobwaremacgreagoir: reviewed both12:15
frobwarejam, macgreagoir, voidspace: PTAL @ https://github.com/juju/juju/pull/659912:15
frobwareit seems the pre-flight PR checks are failing12:27
frobwaretake a look at: http://juju-ci.vapour.ws/job/github-check-merge-juju/268/artifact/artifacts/trusty-err.log12:27
frobwarescripts/setup-lxd.sh: line 10: lxd: command not found12:27
voidspacerick_h: grabbing coffee will be 2 mins late to first standup12:29
rick_hvoidspace: rgr12:29
jammgz: http://juju-ci.vapour.ws/job/github-check-merge-juju/268/ is that just out-of-disk space error?12:54
jamlxd-out.log at least has: "Failed to copy file. Source: /var/lib/jenkins/cloud-city/jes-homes/merge-juju-lxd/models/cache.yaml Destination: /var/lib/jenkins/workspace/github-check-merge-juju@2/artifacts/lxd/controller"12:54
jamand why does http://juju-ci.vapour.ws/job/github-check-merge-juju/268/artifact/artifacts/trusty-out.log have "17.04,Angsty Antelope,angsty,2016-10-23,2017-04-30,2018-01-29 "12:55
jam++ lxd --version12:58
jamscripts/setup-lxd.sh: line 10: lxd: command not found12:58
jamseems fishy12:58
perrito666hey jam are you on our tz this week? or just working incredibly late?12:58
jamperrito666: it is just hitting 5pm here.12:59
jamI'm UTC+4, so not *that* far away from your TZ, just not very close.12:59
perrito666its 7hs, it far enough12:59
perrito666also I forgot it was morning here :p13:00
mgzjam: the Angsty thing is a hack to make sure distro-info behaves13:00
perrito666I might be needing vacations13:00
jamperrito666: when you start mixing up day and night...13:01
jammgz: k, just thinking that since it is labeled 17.04 it will become wrong pretty soon.13:01
perrito666jam: just morning and afgernoon, to be fair 10AM and 8PM are not really different around here13:01
jamperrito666: more drinking at 10am?13:01
perrito666I wont admit nor deny that13:02
jammgz: so I think the final thing is that trusty somehow is missing LXD, (not installed from trusty-backports?)13:03
mgzjam: the thing is, it works a portion of the time13:06
mgzand the script isn't changing13:06
mgzso I don't at a glance know what's up13:06
mgzsomeone just needs to spend some time to debug it, happens enough that it really needs resolving13:07
mgzI'll see if there's a bug already13:07
jammgz: I don't see anything in http://juju-ci.vapour.ws/job/github-check-merge-juju/268/artifact/artifacts/trusty-out.log that indicates it is installing lxd13:07
jamis it doing something like accidentally using the Xenial script where lxd is preinstalled ?13:08
* mgz finds a passing one to compare13:08
jammgz: also, looking at the apt-update HTTP list, I don't see trusty-backports in there13:08
jamtrusty and trusty-updates, but not trusty-backport13:08
jammgz: I remember there were some issues with particular Trusty MAAS images that didn't have trusty-backports enabled by default, which differed from CPC images13:09
mgzha, a passing one also fails there13:09
mgzbut exits 013:09
jammgz: so passing is because it failed-to-fail ?13:09
mgzsee 26713:10
mgzno, it ran the tests, by the trusty-out.log and they pass13:10
jammgz: yeah, I see the "lxd not found" line.13:10
mgzso, the fatal error is actually the "Connection to ...amazonaws.com closed by remote host."13:11
mgzI don't see a bug, so will file one.13:12
mgzatm check job history is mostly red, and I suspect that's largely spurious13:12
jamfrobware: commented on https://github.com/juju/juju/pull/659913:21
voidspaceevery time the SSO two factor login annoys me, I remember I implemented it...13:33
voidspacedammit13:33
mgzvoidspace: aha! does this mean we can all come to you with complaints? :)13:34
voidspacemgz: haha, I'm not on that team any more13:34
voidspacemgz: but feel free to complain, sure13:34
gnuoydo juju 1.X and 2.X both use the same version of goose ?13:35
voidspacemgz: ^^^^13:36
voidspacegnuoy: I doubt it13:36
voidspacegnuoy: different revisions of goose.v113:38
gnuoyvoidspace, hmm, ok, ta13:38
voidspacegnuoy: http://pastebin.ubuntu.com/23522141/13:38
voidspacegnuoy: 2.0 is on a 2016-11 revision13:38
voidspacegnuoy: 1.25 a 2015-11 revision13:39
gnuoyvoidspace, ok thanks, thats v. helpful13:39
mgzgnuoy: nope13:42
* mgz was late13:42
mgzgnuoy: we can bug fix goose for 1.25 if needed though13:42
mgzbut it notably isn't going to have the new neutron stuff13:42
gnuoymgz, I think the bug impacts both tbh but I've only confirmed on 1.X13:42
gnuoymgz, the new neutron stuff has landeD ?13:43
mgzgnuoy: yes, I've delayed sending out announcement as there was some fallout to handle13:43
mgzbut will do so.13:43
voidspacerick_h: I'm out of bourbon, can you bring me a nice bottle to Barcelona and I'll recompense you there... (pretty please :-)13:45
voidspacerick_h: if you can remember and be bothered of course...13:45
rick_hVoidspace hah, what is your preferred one?13:47
voidspacerick_h: any reasonable kentucky bourbon will be fine, I'm no expert13:48
voidspacerick_h: the one I've just finished that I liked13:48
voidspacerick_h: was bulleit (or something like that)13:48
voidspacerick_h: scotch is too rough for me, but you Americans can make nice whisky13:48
frobwarejam: there is a wrapper around the bridge script which invokes it at its new location13:49
frobwarejam: that wrapper always existed. It looks to see what version of python is available, and whether all interfaces should be bridged.13:50
frobwarejam: and then runs: /path/to/python /path/to/the/bridge/script.py13:51
gnuoymgz, fwiw https://bugs.launchpad.net/juju/+bug/1625624/comments/713:54
mupBug #1625624: juju 2 doesn't remove openstack security groups <ci> <landscape> <openstack-provider> <juju:Triaged by alexis-bruemmer> <https://launchpad.net/bugs/1625624>13:54
voidspacerick_h: and if there's anything I can bring from the UK just let me know, scotch, marmite, scottish shortbread....13:55
mgzgnuoy: thanks, having a look13:56
frobwarenote bug 1643057 was fixed in MAAS 2.1.213:56
mupBug #1643057: juju2 with maas 2.1.1 LXD containers get wrong ip addresses <cdo-qa-blocker> <landscape> <juju:Invalid> <MAAS:Fix Committed by mpontillo> <MAAS 1.9:Won't Fix> <MAAS 2.0:Won't Fix> <MAAS 2.1:Fix Released by mpontillo> <https://launchpad.net/bugs/1643057>13:56
mgzgnuoy: do you know if this is tied to a particular openstack version?13:57
mgzbecause goose hasn't changed the underlying http handling in a long time I believe13:57
gnuoymgz, that is an excellent question. I believe this broke for us when we went to Mitaka13:57
mgzgnuoy: that sounds very likely13:58
mgzmost of our CI testing is against older (much older) versions13:58
balloonsahh frobware, you need to look at http://juju-ci.vapour.ws/job/github-check-merge-juju/269/artifact/artifacts/trusty-out.log/*view*/. You have unit test failures. Everything else is noise14:08
frobwareballoons: heh, none of these seem to be related to what I changed.14:22
mgzballoons: 268 is the one to look at. it either dced after build or something else similar.14:22
mgz269 is perrito666's14:22
mgz(which does have real unit test failures)14:22
mupBug #1625624 opened: juju 2 doesn't remove openstack security groups <ci> <landscape> <openstack-provider> <juju:Triaged by alexis-bruemmer> <juju 2.0:Triaged> <juju-core:Triaged> <https://launchpad.net/bugs/1625624>15:09
katconatefinch: on that branch; not shown: i have to land the CallMocker type into juju/testing15:20
natefinchkatco: ahh, thanks, that would be a key piece :)15:21
katconatefinch: yeah sorry i just ran out of time last night15:21
katconatefinch: you can see what it looks like because it's removed from deploy_test.go15:21
katconatefinch: but doesn't build atm. feel free to wait on review if you like15:21
katconatefinch: i will endeavor to get that landed around lunch15:22
natefinchkatco: I can start the review now, that's no problem15:25
natefinchwow that call mocker adds a lot of complexity15:27
katconatefinch: going into a meeting, but would appreciate more thoughts on that15:30
perrito666mgz: ??16:31
perrito666mgz: I dont have a merge in progress16:32
mgzperrito666: the check on pr #656416:36
perrito666mgz: ah yes, I know16:38
=== frankban is now known as frankban|afk
mupBug #1644333 opened: !amd64 controller on MAAS cloud requires constraints <juju-core:New> <https://launchpad.net/bugs/1644333>18:45
babbageclunkhey menn020:24
menn0babbageclunk: howdy20:24
babbageclunkso my logtransfer stuff is failing because the migrationmaster worker is connecting as a machine agent but the debuglog endpoint expects a user.20:25
katcomenn0: hey, curious what you think of this https://github.com/juju/testing/pull/118/files in relation to this https://github.com/juju/juju/pull/6598/files20:26
babbageclunkmenn0: I guess I should add another httpContext authentication method that has an authfunc for any one of machine/unit/user?20:27
menn0babbageclunk: well just machine+user would be enough right?20:27
babbageclunkmenn0: oh, right20:28
menn0katco: i'll take a look shortly20:28
katcomenn0: no rush, ta20:28
babbageclunkmenn0: I was going to use stateForRequestAuthenticated, but that doesn't check permissions20:28
menn0babbageclunk: but yes that seems ok. there's not much danger in opening up that endpoint to machines20:29
babbageclunkmenn0: Ok, thanks, I'll do that. Tempted to do it as a function that accepts the tag kinds and rework the others in the same way.20:29
menn0babbageclunk: that could be nice as long as it's not too big a change20:30
menn0babbageclunk: i.e. don't get too distracted20:30
babbageclunkmenn0: No, it's simple.20:32
menn0babbageclunk: cool, then go for it20:32
babbageclunkmenn0: I'm so close to getting this working I can taste it!20:32
menn0babbageclunk: what does it taste like? :)20:33
babbageclunkmenn0: it tastes like... victory21:34
perrito666babbageclunk: for a moment I thought we where still talking about what came out of your keyboard :p21:40
babbageclunkperrito666: ugh, who put all this cat hair on my keyboard!?21:41
babbageclunkperrito666: never clean anything, it only leads to heartache21:41
thumperanyone else noticed a plethora of ERRORs now being written to the logs like this:  11:17:30 ERROR juju.rpc error writing response: write tcp 127.0.0.1:17070->127.0.0.1:52408: write: connection reset by peer22:23
thumpernormally about 10 at a time22:23
babbageclunkthumper: yeah, I think I've seen those - I didn't realise they were new though22:42
babbageclunkaxw, thumper, menn0: can someone take a look at this? https://github.com/juju/juju/pull/660622:47
menn0katco: I like the approach. i've added a bunch of suggestions let's get that in.23:00
menn0babbageclunk: looking23:00
thumpermenn0: oh fuck...23:01
* menn0 cringes23:02
thumpermenn0: I've just hit something quite interesting23:02
thumperquick HO to discuss?23:02
* menn0 doesn't like interesting23:02
menn0thumper: no :)23:02
thumper:(23:02
* menn0 buries head in sand23:02
menn0thumper: see you in 1:123:02
thumperack23:02
thumpermachine-0: 12:01:20 ERROR juju.worker.migrationmaster:9097b3 source prechecks failed: unit ubuntu/0 not idle (failed)23:04
babbageclunkthumper: ping?23:20
thumperbabbageclunk: hey23:20
babbageclunkthumper: my migration master's getting an error when connecting to the logtransfer endpoint - the target model isn't importing (since it's already at success).23:22
thumperah...23:22
babbageclunkthumper: how do you think I should handle that - another phase?23:22
thumperwhat comes after success?23:23
babbageclunkthumper: ...profit?23:23
thumperheh23:23
menn0thumper, babbageclunk: SUCCESS -> LOGTRANSFER23:23
babbageclunkmenn0: but is the target model in that state?23:23
thumperI think perhaps the method should take the expected import state23:24
thumperstateForMigration should take an expected state23:24
* menn0 agrees with thumper23:24
thumperand check against that23:24
thumperrather than just assuming importing23:24
menn0the logtransfer endpoint should only work if the migration state is "none"23:25
babbageclunkthumper, menn0 - yeah, makes sense. In this case that's MigrationModeNone23:25
menn0(that's not the exact name)23:25
thumperwhy is the state none?23:25
babbageclunkok, thanks chaps - doing that now23:25
thumperif it is still migrating?23:25
babbageclunkmenn0: ^23:26
thumpermenn0: FYI, the precheck is failing because the machine instance status is "started" not "running"23:26
menn0thumper: ok cool .. easy to fix then23:27
thumperI'm looking at what full status is looking at23:27
* thumper is still digging23:27
menn0thumper: fullstatus should be using the same thing to generate the status but perhaps it's the interpretation of the status23:28
thumperyeah, that's what I'm checking23:28
menn0babbageclunk: review done. looks great.23:29
babbageclunkmenn0: thanks!23:29
axwmgz: thanks for the review, PTAL at my response when you have a moment23:33
perrito666axw: wallyworld I need to skip standup sorry ill update you guys upon return23:34
perrito666axw: good catch with region check23:34
axwperrito666: ok, ttyl23:35
axwperrito666: BTW see validateCloudRegion in state/model.go for one place we're doing validation of region. I think we just want to copy that logic into environs.MakeCloudSpec23:37
axwwe're validating when we *do* pass a region, but not when we don't23:37
thumpermenn0: status always shows "started" never "running"23:37
* thumper sighs23:43
thumperapiserver converts running -> started somewhere23:44
* thumper still digging23:44
menn0thumper: i'll show you where after the standup23:46
thumperok23:46
wallyworldperrito666: standup?23:47
thumpermenn0: fyi https://bugs.launchpad.net/juju/+bug/162043823:55
mupBug #1620438: model-migration pre-checks failed to catch issue: not idle <ci> <intermittent-failure> <regression> <juju:Triaged by menno.smits> <https://launchpad.net/bugs/1620438>23:55
mupBug #1640521 changed: Unable to deploy Windows 2012 R2 on AWS <juju-core:Won't Fix> <https://launchpad.net/bugs/1640521>23:59
mupBug #1644333 changed: !amd64 controller on MAAS cloud requires constraints <juju:New> <https://launchpad.net/bugs/1644333>23:59

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!