/srv/irclogs.ubuntu.com/2017/05/16/#juju-dev.txt

babbageclunkwallyworld: Can you take another look at https://github.com/juju/juju/pull/7340? I think I've resolved your comments.00:01
wallyworldsure00:01
babbageclunkThanks00:05
wallyworldbabbageclunk: done00:19
babbageclunkwallyworld: cheers.00:19
babbageclunkwallyworld: I've already done the migration/dump-model work! :) Although migration_import refuses to import a model with remote applications - I don't want to just enable it because it probably won't just work.00:22
wallyworldsure, ok00:22
babbageclunkI guess mainly because at the moment we're talking about cross-model/same-controller, and migration is inherently between controllers.00:24
* thumper takes dog for a walk01:20
babbageclunkwallyworld, menn0: huh, haven't seen a test fail like this before: http://juju-ci.vapour.ws:8080/job/github-merge-juju/10891/artifact/artifacts/xenial.log02:40
babbageclunkDo you think it's a problem?02:41
menn0babbageclunk: well it's certainly a problem :p02:42
menn0babbageclunk: maybe mention it to mwhudson ?02:42
mwhudsoneh that's just a nil pointer deference isn't it?02:42
wallyworldmenn0: babbageclunk: that's a know bug, let me look at the number02:42
mwhudsondereference02:42
wallyworldbug 168672002:43
mupBug #1686720: worker/lease: NewDeadManager returns manager that crashes <juju:Triaged> <https://launchpad.net/bugs/1686720>02:43
wallyworldit's not a go things, it's a juju npe IIANM02:43
babbageclunkwallyworld: ok, if it's got a bug I'mm'a just rerun. :/02:44
=== thumper is now known as thumper-afk
menn0wallyworld, babbageclunk or anastasiamac: quick help fix. https://github.com/juju/juju/pull/734303:48
wallyworldsure03:48
menn0axw: ping03:49
axwmenn0: pong03:49
menn0axw: I had a poke at this: http://reports.vapour.ws/releases/5227/job/run-unit-tests-win2012-amd64/attempt/372003:50
menn0i know what the problem is but I don't know what the correct fix is03:50
babbageclunkmenn0: reading now03:50
axwdoh03:50
axwmenn0: I can take it over if you like03:50
menn0axw: ok, happy to hand it over (there's a LK card already)03:50
axwokey dokey03:51
menn0axw: i'm curious though, are those paths supposed to by unix only?03:51
axwmenn0: yep03:51
axwmenn0: storage is not currently supported on windows03:51
menn0axw: ok, then the tests are slightly wrong03:51
menn0axw: the tests shouldn't use filepath.FromSlash03:52
wallyworldmenn0: i added a suggestion, but lgtm03:52
menn0axw: and BlockDevicePath shouldn't use filepath.Join03:52
menn0wallyworld: cheers03:52
axwmenn0: yup. I'll fix that up, thanks.03:53
menn0axw: cheers03:53
menn0wallyworld: good call. doing that now.03:56
babbageclunkmenn0: D'oh, took too long with my comments.04:03
anastasiamacbabbageclunk: but ur comments were totally worth it :) i wondr if i could give menn0 a resources related bug for jumping the gun?... babbageclunk, want to negotiate it for me? :D04:07
axwmenn0: mind reviewing? https://github.com/juju/juju/pull/734404:09
menn0axw: otp but yes afterwards04:10
menn0babbageclunk: good comments, i'll follow up04:10
babbageclunkwallyworld: got a moment for a sanity check? Or are you also otp?04:10
babbageclunkmenn0: ok thanks!04:10
babbageclunkmenn0: especially on the diaeresis one.04:10
wallyworldbabbageclunk: otp, give me a little time04:11
babbageclunkwallyworld: I mean, you're my second choice in this case anyway - is thumper-afk in the same call?04:15
axwthanks jam04:25
wallyworldbabbageclunk: tim isn't on this call04:29
babbageclunkah, ok - he's actually afk then, presumably04:30
wallyworldbabbageclunk: free now04:56
menn0babbageclunk: luckily the merge attempt failed so I can get your suggestions in04:57
menn0but not the diaeresis one :p04:57
babbageclunkmenn0: tsk!04:57
menn0pre-existing maybe?04:57
babbageclunkmenn0: I mean, I was just kidding with the tsk!04:59
babbageclunkI'm fine with no hyphen.05:00
babbageclunkwallyworld: hey, so my confusion was about this bit of code: https://github.com/juju/juju/blob/develop/worker/provisioner/provisioner_task.go#L73905:01
babbageclunkI'd expect that I'd be able to see those "failed to start instance" messages from the provisioner in the log, but I don't05:01
babbageclunk(Oh, this is re:that maas timeout bug btw)05:02
wallyworlddid you figure it out?05:02
babbageclunkwallyworld: no05:02
wallyworldare you sure that code is being run?05:02
babbageclunkwallyworld: Well, that's the bit that thumper-afk was saying was the problem, and the thing that would be affected by changing the retry count.05:03
wallyworldsure, but maybe the code path isn't hitting that line in your deployment. since when do you believe what tim says :-P05:05
babbageclunkwallyworld: it makes sense that that would be the problem, but if it is I don't understand why I can't see that in the log.05:05
babbageclunkwallyworld: Oh, I mean in the log file attached to the bug.05:05
wallyworldin the past i've had similar issues and it did turn out the code wasn't running are you sure the right agent binary is being used05:05
thumper-afk:P05:05
wallyworldhave you updated your source code post beta405:06
wallyworldotherwise you'll be using agent from simplestreams05:06
wallyworldunless you used --build-agent05:06
babbageclunkwallyworld: I'm not running anything - I don't know how to make my maas timeout in the same way.05:06
wallyworldbabbageclunk: does the log indicate that this error is bubbles up? "cannot start instance for machine"05:08
wallyworldi'm trying to establish if we really do get int that retry loop05:08
babbageclunkRight - I don't think we do, because we don't see that logging.05:08
babbageclunkWhich would indicate that the retry count isn't the problem. (Unless we wouldn't expect to see the message in this log for some reason.)05:09
wallyworldor maybe we do and err is nil05:10
wallyworldmaas says it will start the instance05:10
wallyworldit then goes through a lifecycle or allocating->deployed or something05:10
babbageclunkwallyworld: in which case changing the retry count won't help.05:10
wallyworldso juju never gets to retry as it doesn't think it needs to05:10
wallyworldexaxtly05:10
wallyworldit may be the wrong bit of code is being looked at here?05:11
wallyworldwhat does juju do sfter starting the instance - doe sit poll it until it goes to deployed state05:11
wallyworlddoesn't look like it05:12
wallyworldit just assumes it is started05:12
babbageclunkwallyworld: hang on - trying to chase through the provider code.05:12
wallyworldmaybe that's a bad assumption. i know with openstack startinstance returns but that just means the start request is accepted05:13
wallyworldthe instance itself then goes through a lifecycle - building, blah....05:13
wallyworldbefore it is ready05:13
babbageclunkwallyworld: ok, thanks for confirming my thinking - still chasing then05:19
wallyworldnp, good luck!05:19
jamaxw: ping07:11
axwjam: pong07:11
jamaxw: you have a PR #6326 that is about 8 months old07:11
jamaxw: azure using simplestreams07:11
jamis there something we should close/unblock/???07:11
axwjam: yeah, it got stalled numerous times. we need to ensure streams.canonical.com has the azure bits. probably not going to happen until 2.3 at the earliest now07:12
axwjam: we were close to a resolution at barcelona, but now aaron's gone and he was handling the streams side07:12
jamaxw: should we close that PR and come back to it?07:12
axwjam: can do I guess, though the code shouldn't change much07:13
axwjam: is it harmful leaving it there? distracting?07:13
jamI'd like to get us to PRs that are actionable07:13
jamso when we do a scan we can know rather than having a bunch of stuff we have to ignore07:13
axwjam: fair enough. no drama to recreate it later, I'll close07:13
jamwe're a bit far from that, but it would be nice07:13
axwI agree07:14
jamaxw: you also have a 1.8 branch that didn't land 3 days ago?07:14
axwjam: yep, I will get back to that one real soon. needs some fixes in the 1.25 branch07:15
* axw creates a card to get it done07:15
=== frankban|afk is now known as frankban
axwwallyworld_: can you please review https://github.com/juju/juju/pull/7346 tomorrow, unless some kind soul does it overnight? the meat and potatoes are in the first two commits09:44
wallyworld_axw: looking11:16
wpkAny volunteers? https://github.com/juju/juju/pull/734813:35
gsamfiraheya16:28
gsamfiraanyone want to look over a small PR? https://github.com/juju/utils/pull/27916:29
=== frankban is now known as frankban|afk
balloonsmenn0, not sure who all else is in the tech board, but I'd appreciate a look at https://github.com/juju/juju/pull/735021:50
wallyworldbabbageclunk: did you find where juju is not sufficiently retrying the provisioning?22:04
babbageclunkwallyworld: the Juju log attached to that bug doesn't show the provisioning failure - I've asked for another one that does. I want to check with thumper that he's sure that's the problem - it seems likely, but it'd be good to have the log showing it.22:13
wallyworldbabbageclunk: it would be good yes. we could though make a preemptive fix and increase retry timeout based on knowledge that machines can take a while to provision; this will unblock the fix and if/when we get better logs we can confirm22:15
babbageclunkwqll22:15
babbageclunkoops22:15
babbageclunkwallyworld: yeah, that's my thinking too.22:16
babbageclunkwallyworld: https://github.com/juju/juju/pull/735122:16
wallyworldlooking22:16
babbageclunkIt's pretty short22:17
wallyworldbabbageclunk: i could comment but i won't :-)22:17
* babbageclunk sniggers22:17
wallyworldbabbageclunk: is 10 x 10s enough?22:18
wallyworldbabbageclunk: i wonder also, we should surface the retry attempts in status22:18
wallyworldthere's an api to surface progress22:19
wallyworldwe could ensue the message says "Waiting for machine to be provisioned, attempt 1 or 10" or something22:19
babbageclunkwallyworld: machine.SetInstanceStatus - yeah, could do. I'll add that.22:19
wallyworldso the user can see stuff is happening22:19
wallyworld1 of 1022:20
babbageclunkAccording to thumper, this happens occasionally (I think he said something like 10% of the time?), so normally 3 tries is enough. 10 seems like a reasonable number to bump it to in that case?22:22
babbageclunkwallyworld: oh, it turns out it already does that: failed to start instance (%s), retrying in %v (%d more attempts)22:24
wallyworldcool, ok22:24
wallyworldlet's land it and take a view22:25
babbageclunkwallyworld: kind of an annoying amount of digging around in logs for a 1-line PR22:26
wallyworldit' the old mechanic paradigm - the fix is trivial, you just have to know where to apply the hammer22:26
babbageclunksure22:27
babbageclunkwallyworld: ok, I'll move onto getting rid of that annoying unknown collection "remoteApplications" message.22:28
wallyworld+1 ty22:29
wallyworldassuming it was the last pr to land which introduced it22:29
wallyworldi think it must have been because it was all ok before then IIANM22:29
babbageclunkwallyworld: I don't think it was the last PR - it's in the logs on bug 1688028 too22:34
mupBug #1688028: Juju fails to request deploy after busy node returns OK <cdo-qa> <cdo-qa-blocker> <juju:In Progress by 2-xtian> <https://launchpad.net/bugs/1688028>22:34
wallyworldok, otp, will catch up soon22:34
gsamfiraanybody want to take a stab at a couple of really tiny PRs? :)22:35
babbageclunkgsamfira: sure!22:52
wallyworldthumper: i had also pinged ante and andres in #juju to confirm that maas storage issue is fixed so hopefully we'll be able to close the issue real soon22:57
thumperwallyworld: I updated the bug to incomplete :)22:58
wallyworldyay :-)22:58
wallyworldhopefully nothing more to do22:58
gsamfirababbageclunk: https://github.com/juju/utils/pull/279 and https://github.com/juju/charm/pull/22823:10
gsamfirababbageclunk: thanks! ^_^23:10
babbageclunkgsamfira: Why do you think using defer might lead to leaked file handles?23:22
gsamfirababbageclunk: It was an improper use of the word. The whole defer fh.Close() pattern, while I get is useful to close a file even if a panic happens along the way, can cause ulimit errors when, for example unarchiving an archive with many files.23:27
babbageclunkgsamfira: ok, I thought it might be something like that - so in a function with a loop extracting many files.23:28
gsamfirababbageclunk: yap23:28
gsamfirababbageclunk: there is another issue with that pattern. On Windows for example, if you try something like (pseudo code follows: fh := Open("myfile"); defer fh.Close; fh.Write("hello"); io.Move("myfile", "awesome_name")23:30
gsamfirait will fail23:30
babbageclunkgsamfira: That said, I think the places you're changing in the utils PR would be better done by changing the deferred closes to be sync+closes. The loop problem doesn't apply there.23:30
gsamfira(the move at least)23:30
babbageclunkgsamfira: oh, right. Is that why you moved the sync+close to be before the call to the change func in AtomicWriteFileAndChange?23:30
gsamfirawould it not be useful to return any errors while doing a sync/close?23:30
gsamfirayap23:31
gsamfiraalso in zip.go. in the case of copyTo23:31
babbageclunkgsamfira: for returning errors, yes, but not if we're already returning an error - we wouldn't want to clobber it.23:33
thumpermenn0, wallyworld: I won't be at the tech-board meeting today23:34
thumperhave taxi duty :-|23:34
menn0thumper: ok23:34
wallyworldthumper: no worries, we'll cover for you :-)23:34
gsamfirababbageclunk:  not really clobbering, I think. We still return the original error if that's where it happens. But we also get an error returned if we fail to flush to disk, or close the file...which I think is an important error to know :)23:35
gsamfiraotherwise we cannot guarantee that the config file or state file is consistent23:35
babbageclunkOh, I mean, it would be better to make the defer do sync+close and return any error (avoiding clobbering an error already coming back) than just putting the sync+close on the happy path.23:36
babbageclunkgsamfira: ^23:37
babbageclunkdoes that make sense?23:37
babbageclunkgsamfira: I'll put what I mean into the review.23:40
gsamfirababbageclunk: yup. Makes sense23:40
babbageclunkgsamfira: cool23:40
gsamfirababbageclunk: sounds good. helps to add context :)23:40
gsamfiraI'll address it in the morning. Grabbing some sleep for tonight :)23:41
gsamfirathanks for looking over it!23:41
gsamfirababbageclunk: ^23:41
babbageclunkgsamfira: no worries! I started typing it up and it's unwieldy enough that I don't think it's worth it - I think yours is fine in that case.23:49

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!