/srv/irclogs.ubuntu.com/2017/05/16/#juju-dev.txt

babbageclunk	wallyworld: Can you take another look at https://github.com/juju/juju/pull/7340? I think I've resolved your comments.	00:01
wallyworld	sure	00:01
babbageclunk	Thanks	00:05
wallyworld	babbageclunk: done	00:19
babbageclunk	wallyworld: cheers.	00:19
babbageclunk	wallyworld: I've already done the migration/dump-model work! :) Although migration_import refuses to import a model with remote applications - I don't want to just enable it because it probably won't just work.	00:22
wallyworld	sure, ok	00:22
babbageclunk	I guess mainly because at the moment we're talking about cross-model/same-controller, and migration is inherently between controllers.	00:24
* thumper takes dog for a walk		01:20
babbageclunk	wallyworld, menn0: huh, haven't seen a test fail like this before: http://juju-ci.vapour.ws:8080/job/github-merge-juju/10891/artifact/artifacts/xenial.log	02:40
babbageclunk	Do you think it's a problem?	02:41
menn0	babbageclunk: well it's certainly a problem :p	02:42
menn0	babbageclunk: maybe mention it to mwhudson ?	02:42
mwhudson	eh that's just a nil pointer deference isn't it?	02:42
wallyworld	menn0: babbageclunk: that's a know bug, let me look at the number	02:42
mwhudson	dereference	02:42
wallyworld	bug 1686720	02:43
mup	Bug #1686720: worker/lease: NewDeadManager returns manager that crashes <juju:Triaged> <https://launchpad.net/bugs/1686720>	02:43
wallyworld	it's not a go things, it's a juju npe IIANM	02:43
babbageclunk	wallyworld: ok, if it's got a bug I'mm'a just rerun. :/	02:44
=== thumper is now known as thumper-afk
menn0	wallyworld, babbageclunk or anastasiamac: quick help fix. https://github.com/juju/juju/pull/7343	03:48
wallyworld	sure	03:48
menn0	axw: ping	03:49
axw	menn0: pong	03:49
menn0	axw: I had a poke at this: http://reports.vapour.ws/releases/5227/job/run-unit-tests-win2012-amd64/attempt/3720	03:50
menn0	i know what the problem is but I don't know what the correct fix is	03:50
babbageclunk	menn0: reading now	03:50
axw	doh	03:50
axw	menn0: I can take it over if you like	03:50
menn0	axw: ok, happy to hand it over (there's a LK card already)	03:50
axw	okey dokey	03:51
menn0	axw: i'm curious though, are those paths supposed to by unix only?	03:51
axw	menn0: yep	03:51
axw	menn0: storage is not currently supported on windows	03:51
menn0	axw: ok, then the tests are slightly wrong	03:51
menn0	axw: the tests shouldn't use filepath.FromSlash	03:52
wallyworld	menn0: i added a suggestion, but lgtm	03:52
menn0	axw: and BlockDevicePath shouldn't use filepath.Join	03:52
menn0	wallyworld: cheers	03:52
axw	menn0: yup. I'll fix that up, thanks.	03:53
menn0	axw: cheers	03:53
menn0	wallyworld: good call. doing that now.	03:56
babbageclunk	menn0: D'oh, took too long with my comments.	04:03
anastasiamac	babbageclunk: but ur comments were totally worth it :) i wondr if i could give menn0 a resources related bug for jumping the gun?... babbageclunk, want to negotiate it for me? :D	04:07
axw	menn0: mind reviewing? https://github.com/juju/juju/pull/7344	04:09
menn0	axw: otp but yes afterwards	04:10
menn0	babbageclunk: good comments, i'll follow up	04:10
babbageclunk	wallyworld: got a moment for a sanity check? Or are you also otp?	04:10
babbageclunk	menn0: ok thanks!	04:10
babbageclunk	menn0: especially on the diaeresis one.	04:10
wallyworld	babbageclunk: otp, give me a little time	04:11
babbageclunk	wallyworld: I mean, you're my second choice in this case anyway - is thumper-afk in the same call?	04:15
axw	thanks jam	04:25
wallyworld	babbageclunk: tim isn't on this call	04:29
babbageclunk	ah, ok - he's actually afk then, presumably	04:30
wallyworld	babbageclunk: free now	04:56
menn0	babbageclunk: luckily the merge attempt failed so I can get your suggestions in	04:57
menn0	but not the diaeresis one :p	04:57
babbageclunk	menn0: tsk!	04:57
menn0	pre-existing maybe?	04:57
babbageclunk	menn0: I mean, I was just kidding with the tsk!	04:59
babbageclunk	I'm fine with no hyphen.	05:00
babbageclunk	wallyworld: hey, so my confusion was about this bit of code: https://github.com/juju/juju/blob/develop/worker/provisioner/provisioner_task.go#L739	05:01
babbageclunk	I'd expect that I'd be able to see those "failed to start instance" messages from the provisioner in the log, but I don't	05:01
babbageclunk	(Oh, this is re:that maas timeout bug btw)	05:02
wallyworld	did you figure it out?	05:02
babbageclunk	wallyworld: no	05:02
wallyworld	are you sure that code is being run?	05:02
babbageclunk	wallyworld: Well, that's the bit that thumper-afk was saying was the problem, and the thing that would be affected by changing the retry count.	05:03
wallyworld	sure, but maybe the code path isn't hitting that line in your deployment. since when do you believe what tim says :-P	05:05
babbageclunk	wallyworld: it makes sense that that would be the problem, but if it is I don't understand why I can't see that in the log.	05:05
babbageclunk	wallyworld: Oh, I mean in the log file attached to the bug.	05:05
wallyworld	in the past i've had similar issues and it did turn out the code wasn't running are you sure the right agent binary is being used	05:05
thumper-afk	:P	05:05
wallyworld	have you updated your source code post beta4	05:06
wallyworld	otherwise you'll be using agent from simplestreams	05:06
wallyworld	unless you used --build-agent	05:06
babbageclunk	wallyworld: I'm not running anything - I don't know how to make my maas timeout in the same way.	05:06
wallyworld	babbageclunk: does the log indicate that this error is bubbles up? "cannot start instance for machine"	05:08
wallyworld	i'm trying to establish if we really do get int that retry loop	05:08
babbageclunk	Right - I don't think we do, because we don't see that logging.	05:08
babbageclunk	Which would indicate that the retry count isn't the problem. (Unless we wouldn't expect to see the message in this log for some reason.)	05:09
wallyworld	or maybe we do and err is nil	05:10
wallyworld	maas says it will start the instance	05:10
wallyworld	it then goes through a lifecycle or allocating->deployed or something	05:10
babbageclunk	wallyworld: in which case changing the retry count won't help.	05:10
wallyworld	so juju never gets to retry as it doesn't think it needs to	05:10
wallyworld	exaxtly	05:10
wallyworld	it may be the wrong bit of code is being looked at here?	05:11
wallyworld	what does juju do sfter starting the instance - doe sit poll it until it goes to deployed state	05:11
wallyworld	doesn't look like it	05:12
wallyworld	it just assumes it is started	05:12
babbageclunk	wallyworld: hang on - trying to chase through the provider code.	05:12
wallyworld	maybe that's a bad assumption. i know with openstack startinstance returns but that just means the start request is accepted	05:13
wallyworld	the instance itself then goes through a lifecycle - building, blah....	05:13
wallyworld	before it is ready	05:13
babbageclunk	wallyworld: ok, thanks for confirming my thinking - still chasing then	05:19
wallyworld	np, good luck!	05:19
jam	axw: ping	07:11
axw	jam: pong	07:11
jam	axw: you have a PR #6326 that is about 8 months old	07:11
jam	axw: azure using simplestreams	07:11
jam	is there something we should close/unblock/???	07:11
axw	jam: yeah, it got stalled numerous times. we need to ensure streams.canonical.com has the azure bits. probably not going to happen until 2.3 at the earliest now	07:12
axw	jam: we were close to a resolution at barcelona, but now aaron's gone and he was handling the streams side	07:12
jam	axw: should we close that PR and come back to it?	07:12
axw	jam: can do I guess, though the code shouldn't change much	07:13
axw	jam: is it harmful leaving it there? distracting?	07:13
jam	I'd like to get us to PRs that are actionable	07:13
jam	so when we do a scan we can know rather than having a bunch of stuff we have to ignore	07:13
axw	jam: fair enough. no drama to recreate it later, I'll close	07:13
jam	we're a bit far from that, but it would be nice	07:13
axw	I agree	07:14
jam	axw: you also have a 1.8 branch that didn't land 3 days ago?	07:14
axw	jam: yep, I will get back to that one real soon. needs some fixes in the 1.25 branch	07:15
* axw creates a card to get it done		07:15
=== frankban\|afk is now known as frankban
axw	wallyworld_: can you please review https://github.com/juju/juju/pull/7346 tomorrow, unless some kind soul does it overnight? the meat and potatoes are in the first two commits	09:44
wallyworld_	axw: looking	11:16
wpk	Any volunteers? https://github.com/juju/juju/pull/7348	13:35
gsamfira	heya	16:28
gsamfira	anyone want to look over a small PR? https://github.com/juju/utils/pull/279	16:29
=== frankban is now known as frankban\|afk
balloons	menn0, not sure who all else is in the tech board, but I'd appreciate a look at https://github.com/juju/juju/pull/7350	21:50
wallyworld	babbageclunk: did you find where juju is not sufficiently retrying the provisioning?	22:04
babbageclunk	wallyworld: the Juju log attached to that bug doesn't show the provisioning failure - I've asked for another one that does. I want to check with thumper that he's sure that's the problem - it seems likely, but it'd be good to have the log showing it.	22:13
wallyworld	babbageclunk: it would be good yes. we could though make a preemptive fix and increase retry timeout based on knowledge that machines can take a while to provision; this will unblock the fix and if/when we get better logs we can confirm	22:15
babbageclunk	wqll	22:15
babbageclunk	oops	22:15
babbageclunk	wallyworld: yeah, that's my thinking too.	22:16
babbageclunk	wallyworld: https://github.com/juju/juju/pull/7351	22:16
wallyworld	looking	22:16
babbageclunk	It's pretty short	22:17
wallyworld	babbageclunk: i could comment but i won't :-)	22:17
* babbageclunk sniggers		22:17
wallyworld	babbageclunk: is 10 x 10s enough?	22:18
wallyworld	babbageclunk: i wonder also, we should surface the retry attempts in status	22:18
wallyworld	there's an api to surface progress	22:19
wallyworld	we could ensue the message says "Waiting for machine to be provisioned, attempt 1 or 10" or something	22:19
babbageclunk	wallyworld: machine.SetInstanceStatus - yeah, could do. I'll add that.	22:19
wallyworld	so the user can see stuff is happening	22:19
wallyworld	1 of 10	22:20
babbageclunk	According to thumper, this happens occasionally (I think he said something like 10% of the time?), so normally 3 tries is enough. 10 seems like a reasonable number to bump it to in that case?	22:22
babbageclunk	wallyworld: oh, it turns out it already does that: failed to start instance (%s), retrying in %v (%d more attempts)	22:24
wallyworld	cool, ok	22:24
wallyworld	let's land it and take a view	22:25
babbageclunk	wallyworld: kind of an annoying amount of digging around in logs for a 1-line PR	22:26
wallyworld	it' the old mechanic paradigm - the fix is trivial, you just have to know where to apply the hammer	22:26
babbageclunk	sure	22:27
babbageclunk	wallyworld: ok, I'll move onto getting rid of that annoying unknown collection "remoteApplications" message.	22:28
wallyworld	+1 ty	22:29
wallyworld	assuming it was the last pr to land which introduced it	22:29
wallyworld	i think it must have been because it was all ok before then IIANM	22:29
babbageclunk	wallyworld: I don't think it was the last PR - it's in the logs on bug 1688028 too	22:34
mup	Bug #1688028: Juju fails to request deploy after busy node returns OK <cdo-qa> <cdo-qa-blocker> <juju:In Progress by 2-xtian> <https://launchpad.net/bugs/1688028>	22:34
wallyworld	ok, otp, will catch up soon	22:34
gsamfira	anybody want to take a stab at a couple of really tiny PRs? :)	22:35
babbageclunk	gsamfira: sure!	22:52
wallyworld	thumper: i had also pinged ante and andres in #juju to confirm that maas storage issue is fixed so hopefully we'll be able to close the issue real soon	22:57
thumper	wallyworld: I updated the bug to incomplete :)	22:58
wallyworld	yay :-)	22:58
wallyworld	hopefully nothing more to do	22:58
gsamfira	babbageclunk: https://github.com/juju/utils/pull/279 and https://github.com/juju/charm/pull/228	23:10
gsamfira	babbageclunk: thanks! ^_^	23:10
babbageclunk	gsamfira: Why do you think using defer might lead to leaked file handles?	23:22
gsamfira	babbageclunk: It was an improper use of the word. The whole defer fh.Close() pattern, while I get is useful to close a file even if a panic happens along the way, can cause ulimit errors when, for example unarchiving an archive with many files.	23:27
babbageclunk	gsamfira: ok, I thought it might be something like that - so in a function with a loop extracting many files.	23:28
gsamfira	babbageclunk: yap	23:28
gsamfira	babbageclunk: there is another issue with that pattern. On Windows for example, if you try something like (pseudo code follows: fh := Open("myfile"); defer fh.Close; fh.Write("hello"); io.Move("myfile", "awesome_name")	23:30
gsamfira	it will fail	23:30
babbageclunk	gsamfira: That said, I think the places you're changing in the utils PR would be better done by changing the deferred closes to be sync+closes. The loop problem doesn't apply there.	23:30
gsamfira	(the move at least)	23:30
babbageclunk	gsamfira: oh, right. Is that why you moved the sync+close to be before the call to the change func in AtomicWriteFileAndChange?	23:30
gsamfira	would it not be useful to return any errors while doing a sync/close?	23:30
gsamfira	yap	23:31
gsamfira	also in zip.go. in the case of copyTo	23:31
babbageclunk	gsamfira: for returning errors, yes, but not if we're already returning an error - we wouldn't want to clobber it.	23:33
thumper	menn0, wallyworld: I won't be at the tech-board meeting today	23:34
thumper	have taxi duty :-\|	23:34
menn0	thumper: ok	23:34
wallyworld	thumper: no worries, we'll cover for you :-)	23:34
gsamfira	babbageclunk: not really clobbering, I think. We still return the original error if that's where it happens. But we also get an error returned if we fail to flush to disk, or close the file...which I think is an important error to know :)	23:35
gsamfira	otherwise we cannot guarantee that the config file or state file is consistent	23:35
babbageclunk	Oh, I mean, it would be better to make the defer do sync+close and return any error (avoiding clobbering an error already coming back) than just putting the sync+close on the happy path.	23:36
babbageclunk	gsamfira: ^	23:37
babbageclunk	does that make sense?	23:37
babbageclunk	gsamfira: I'll put what I mean into the review.	23:40
gsamfira	babbageclunk: yup. Makes sense	23:40
babbageclunk	gsamfira: cool	23:40
gsamfira	babbageclunk: sounds good. helps to add context :)	23:40
gsamfira	I'll address it in the morning. Grabbing some sleep for tonight :)	23:41
gsamfira	thanks for looking over it!	23:41
gsamfira	babbageclunk: ^	23:41
babbageclunk	gsamfira: no worries! I started typing it up and it's unwieldy enough that I don't think it's worth it - I think yours is fine in that case.	23:49

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!