/srv/irclogs.ubuntu.com/2015/06/29/#juju-dev.txt

menn0thumper: the next fix for bug 1468994: http://reviews.vapour.ws/r/2042/00:13
mupBug #1468994: leadership settings documents written out without env-uuid field <juju-core:In Progress by menno.smits> <juju-core 1.24:In Progress by menno.smits> <https://launchpad.net/bugs/1468994>00:13
menn0thumper: also, not sure if this relevant to what you're looking at: https://bugs.launchpad.net/juju-core/+bug/1457225/comments/900:14
thumperdon't think that is related to this current issue00:15
menn0thumper: k00:18
thumperdavecheney: I don't suppose there is a way for a function in Go to annotate the call stack?00:51
thumperdavecheney: that would be so handly when analysing panics00:51
davecheneythumper: no, sorry, there is not01:30
davecheneyi did see one tool published01:30
davecheneybut I cannot rememver it's name at the moment01:30
davecheneyit didn't annotate stack traces01:30
thumperoh well01:30
davecheneybut it did do some kind of analysis/grouping01:31
thumperdavecheney: \o/01:35
thumperI have a panic with the workers showing restarting01:35
* thumper digs01:35
davecheneythumper: paste ?01:36
thumpersure...01:36
* thumper finds the start01:37
thumperdavecheney: http://paste.ubuntu.com/11791044/01:38
thumperdavecheney: it is even one of the tests known to cause issues01:38
thumperdavecheney: line 301 is the a.Stop call from the test (pretty sure)01:39
davecheneyoww, this  paste is hurting my machine01:40
thumperline 347 shows a worker has said it started after everyone had been killed01:41
davecheneyis this the "did not reach expected state" ?01:41
thumperline 255 also looks weird01:42
davecheneythe line ofhte output ?01:42
thumperdavecheney: I think it is the case of the goroutine not fully starting before being told to stop01:42
davecheneyor another line ?01:42
thumper355 sorry01:42
thumperdavecheney: pretty sure this is all due to timing around the workers that had been asked to start haven't fully started01:44
thumperdavecheney: and then told to die01:44
thumperand they all get confused01:44
* thumper digs into timing a bit more01:44
davecheneyhow are workers told to stop ?01:44
* thumper rages01:46
thumperdavecheney: remember when I said that this would be due to a worker finishing with a non-fatal error?01:46
thumperlook at line 57901:46
davecheneyuh h01:47
davecheneyuh oh01:47
davecheneyi htink i even logged an issue about that permission deined error01:47
thumperand another...01:47
davecheneyasking, why does it do this01:47
thumper616 - 61901:48
thumperNFI01:48
thumpertest isolation failure01:48
thumperwhere it shouldn't be looking at /var/lib/juju01:48
thumperhowever, that isn't the only problem01:49
davecheneyisFatal is passed into the runner01:49
davecheneyso there is no knowing that it considers fatal01:49
thumperI'm prepared to bet money that this is another case of multiple unexpected issues causing problems with each other01:49
* thumper digs more01:49
davecheneyi'd back that01:49
thumperI'm so happy to get this logging...01:50
davecheneyso, i see lots of start "worker"01:51
davecheneybut no corresponding stopped "worker"01:52
davecheneythumper01:55
davecheneyhow does killWorker work ?01:55
thumperyou should see killing and exited01:56
thumperkilling is the request to kill it01:56
davecheneyinfo.worker.Kill() isn't plumbed through to the worker01:56
thumperand exited when it is done01:56
davecheneyit's not passed to start()01:56
thumperit is all a bit convoluted01:56
davecheneythumper: hangout call ?01:58
thumpergimmie a few minutes... in the middle of something01:58
davecheneyi can see a case where the started worker's kill channel will not be hooked up correctly02:01
davecheneyie, the kill signal is delivered to the previous invocation of the worker02:01
thumperdavecheney: 1:102:02
davecheney                        workerInfo.worker = nil02:09
davecheney                        go runner.runWorker(workerInfo.restartDelay, info.id, workerInfo.start)02:09
davecheney                        workerInfo.restartDelay = RestartDelay02:09
davecheney                                } else {02:14
davecheney                                        logger.Errorf("exited %q: %v", info.id, info.err)02:14
davecheney                                        workerInfo.worker = nil02:14
davecheney                                }02:14
davecheneythumper, menn0 https://github.com/juju/juju/pull/266802:43
menn0davecheney: thumper has lost power at home02:44
menn0davecheney: looking02:44
davecheneyshtter02:53
menn0davecheney: I don't completely understand this change02:57
davecheneymenn0: hangout ?02:57
menn0davecheney: AFAICS killAll is only called when the runner's tomb is dying02:57
davecheneycorrect02:57
davecheneyit's hard to explain in irc02:57
davecheneyit's taken thumper and I all morning02:58
menn0davecheney: ok standup hangout02:58
thumperpower is back03:02
thumpermenn0, davecheney: I found something very interesting03:03
thumperapi worker starts: upgrader, upgrade-steps, and api-post-upgrade03:03
thumperhowever when the kill order is given03:03
thumperwe see:03:04
thumperkilling upgrader03:04
thumperkilling api-post-upgrade03:04
thumperugh03:04
thumperjust worked out that the upgrade-steps have just finished03:04
thumperhowever03:04
thumpereven though we say killing api-post-upgrade03:04
thumperit doesn't kill any of it's workers03:04
thumpermenn0: standup hangout?03:07
thumpermenn0, davecheney: logging for 1.24 http://reviews.vapour.ws/r/2045/05:12
thumperhaven't been able to reproduce the panic yet with this logging in05:12
thumperbut still running tests05:12
menn0thumper: reviewed, just some small suggestions05:18
thumperk05:18
davecheneythumper: looking05:25
thumperdavecheney: sok, menn0 reviewed05:25
thumperdavecheney: however with this logging, can't reproduce the error05:25
davecheneythumper: http://paste.ubuntu.com/11791616/05:32
davecheneyrun it under the stress05:32
davecheneycopy that to ~/stress.bash05:33
davecheneycd $PATH05:33
davecheneybash ~/stress.bash05:33
menn0thumper: with that you might be to reproduce the bug more quickly by just running one of the tests that is known to trigger the issue instead of all the tests (maybe?)05:34
thumperok05:34
* thumper stresses the tst05:53
thumperdavecheney: even with the stress script running just the upgrade suite, it isn't failing06:00
davecheneyshitter06:02
thumper25 times in a row it is good now06:06
* thumper tries the entire package06:08
thumperI think I have it again06:17
thumperbut we are no further than we were before...06:17
thumpercan't tell... waiting for CI to throw it up06:19
* thumper is done06:19
dimiternjam, are you around for our 1:1?07:05
jamdimitern: just trying to connect now07:05
jamdimitern: I think google's creds need me to login again, but it isn't giving me the window07:06
dimiternjam, ok07:06
mupBug #1453096 opened: Machine agent state changes not included in the mega-watcher <cloud-installer> <landscape> <juju-core:New> <juju-gui:Triaged> <https://launchpad.net/bugs/1453096>08:21
dimiterndooferlad, TheMue, PTAL http://reviews.vapour.ws/r/2049/08:56
katcoericsnow: standup14:04
mupBug #1469731 opened: Leader should probably change when removing unit <juju-core:New> <https://launchpad.net/bugs/1469731>14:16
wwitzel3katco: you enjoyed canceling those, I can tell15:03
katcowwitzel3: haha15:03
katcowwitzel3: i always enjoy canceling meetings ;)15:04
mupBug #1469777 opened: juju-local falsely claimed to be missing <ci> <local-provider> <juju-core:New> <https://launchpad.net/bugs/1469777>16:23
mupBug #1469799 opened: Agent tests fail with no output <ci> <ppc64el> <test-failure> <juju-core:Incomplete> <juju-core 1.24:Triaged> <https://launchpad.net/bugs/1469799>16:53
mupBug #1469799 changed: Agent tests fail with no output <ci> <ppc64el> <test-failure> <juju-core:Incomplete> <juju-core 1.24:Triaged> <https://launchpad.net/bugs/1469799>16:59
mupBug #1469799 opened: Agent tests fail with no output <ci> <ppc64el> <test-failure> <juju-core:Incomplete> <juju-core 1.24:Triaged> <https://launchpad.net/bugs/1469799>17:02
mupBug #1469807 opened: ListSuite.TestOutputFormats fails on Windows <ci> <test-failure> <windows> <juju-core:Incomplete> <https://launchpad.net/bugs/1469807>17:02
mupBug #1469807 changed: ListSuite.TestOutputFormats fails on Windows <ci> <test-failure> <windows> <juju-core:Incomplete> <juju-core net-cli:Triaged> <https://launchpad.net/bugs/1469807>17:14
mupBug #1469807 opened: ListSuite.TestOutputFormats fails on Windows <ci> <test-failure> <windows> <juju-core:Incomplete> <juju-core net-cli:Triaged> <https://launchpad.net/bugs/1469807>17:23
=== kadams54 is now known as kadams54-away
katcoericsnow: got time to chat about the api server abstractions? wwitzel3 natefinch welcome as well18:31
ericsnowkatco: sure18:31
katcoericsnow: moonstone18:31
mupBug #1469844 opened: Juju bootstrap fails with nonce mis-match error <bootstrap> <ci> <intermittent-failure> <maas-provider> <juju-core:New> <juju-core devices-api-maas:New> <juju-core feature-proc-mgmt:New> <https://launchpad.net/bugs/1469844>18:56
=== kadams54-away is now known as kadams54
katconatefinch: hey do you have a branch i can look at for the server methods? i need to see how you're using the api server as an example for how to generalize my stuff19:06
katcojam: hey you around for a pretty simple question about registering facades?19:07
natefinchkatco: yeah19:07
jamkatco: what's up?19:07
katcojam: here: https://github.com/juju/juju/blob/master/apiserver/common/registry.go#L12619:08
katcojam: is there any reason newFunc couldn't be of type func(*state.State, *common.Resources, common.Authorizer) (interface{}, error)?19:08
jamkatco: so this isn't the code as I wrote it, so I'll have to figure out why it isn't19:09
jamsomeone added the "ForFeature" versions, and changed something.19:09
natefinchkatco: sorta rough still and in the middle of it, so it doesn't compile, but the code pretty much makes sense: https://github.com/natefinch/juju/blob/proc-server-api/process/api/server/server.go   and https://github.com/natefinch/juju/blob/proc-server-api/process/api/server-args.go19:09
katcojam: natefinch no worries at all. the stuff i'm looking for is hand-wavey anyway19:10
jamkatco: I remember now19:10
jamconcrete return types19:10
jamkatco: specifically, our NewFoo() functions returns *Type objects, and we reflect on that Type to determine the API19:11
katcojam: can't we cast the interface to the concrete type using reflection?19:11
jamkatco: you can't pass a function that returns a concete type to an interface that says it returns interface{}19:11
katcojam: sorry, i meant have the functions go ahead and return interface{}, but where the type of that interface is important, discover its true type via reflection19:12
jamkatco: http://play.golang.org/p/XjucCOijN919:12
jamkatco: we can't do registration time lookup for the signature of the API, (we have to actually call it)19:13
jamkatco: but more than that19:13
jamif you want to test NewFoo19:13
jamyou really want a Foo there19:13
jamnow we can rewrite all of the tests to use NewFoo().(Foo)19:13
katcojam: well the tests are hte easy problem, right? yeah what you just said19:13
katcojam: don't want to take too much of your time after-hours... we're just fiddling with that area of the code and thought it might make more sense to be concrete about the function type. if i can get the reflection to work out, are you opposed to that approach?19:15
jamkatco: so there is RegisterFacade which takes the exact thing you were asking19:15
jambut it also requires you to pass in a reflect.Type19:15
katcojam: ah so there is19:15
jamkatco: IMO its really nice to just be able to have a normal NewFoo() function19:16
jamand then be able to Register(NewFoo)19:16
jamand both NewFoo acts like other NewFoo functions (returning a Foo) and it has the type that you need to register, so you don't have to import reflect into all of your packages.19:16
katcojam: the only issue is that we've taken a (potential -- if the reflection bit works out) compile-time error and made it runtime19:16
jamkatco: how do you mean a compile time error? because you're changing the signature into interface{}19:17
jamwhich means it has no signature19:17
jamso the thing you *actually* care about19:17
jamwhich is that you got the right type when you call it19:17
jamis only determined *when it is called*19:17
jamkatco: with this way, you find out during Register()19:17
jamwhich happens during init()19:18
jamwhich means that every juju process finds out really fast19:18
katcojam: so, iirc, the rpc stuff expects a type of interface{} anyway, so it can reflect and find the methods, etc.19:18
katcojam: it's the registration that requires the specific signature, right?19:19
jamkatco: its the registration that allows a Type which it will then use to expose the API and it confirms at actual call type that the object conforms to the concrete Type that was registered.19:19
katcojam: so the bit i'm thinking we can still do that, even if the factory methods returned an interface19:20
katcojam: oops s/the bit//19:20
jamso there is an 'interface{}' level under the covers, but it means that the infrastucture helps you DTRT (I think)19:20
jamkatco: so you can certainly do it, and use RegisterFactory instead of RegisterStandardFactory19:21
katcojam: i'm wondering if you can do it w/o passing in the reflect.Type as well19:21
jamkatco: how do you know what methods will be available in order to define the API?19:21
katcojam: i'd have to do some research, but my gut says yes. this code says someone else did the research and said no however ;)19:22
jamkatco: so the original code was all static analysis19:22
katcojam: i'm thinking you can reflect on an interface{}, get its real type, and then do as we do now19:22
jamwe had 1 top level type that exposed lots of GetFoo() Type methods19:22
jamkatco: you could decide that you can call any method you like from the API and we'll call the function and then late tell you "the object we created doesn't support the method you wanted"19:23
katcojam: that's what we have no anyway isn't it? the api layer just uses strings to call methods?19:24
jamkatco: it rejects the requests before calling the factory19:24
katcojam: ah i see, so the type isn't constructed before failing19:24
jamkatco: *object* here19:24
katcojam: well, hows this: its not critical to what i'm working on. but if i find some time, i'll do some experiments, and if i think its headed in a beneficial direction given what points you've raised, i'll submit a pr and we can shoot holes in it19:25
katcos/its/it's/19:26
wwitzel3katco: I'm back if you still want to chat19:29
katconatefinch: ty... so how you're using state hasn't really been addressed yet?19:29
katcowwitzel3: ty, ericsnow and i talked. think i'm good for now19:29
natefinchkatco: you mean like passing in an interface?  No, I hadn't written that yet, but you can probably figure it out by looking at the commented out functions that call state functions which ericsnow is writing.19:35
katcoericsnow: natefinch: can we hop in moonstone rq? i want to run an approach by you. wwitzel3 you are welcome too19:36
natefinchkatco: seems like the RegisterStandardFacade function should be in charge of that... it's already passing in state. It could instead pass in an interface19:36
ericsnowkatco: k19:36
natefinchkatco: sure...  I have to go in 25 minutes, though, have a quick errand to run.19:37
=== kadams54 is now known as kadams54-away
=== kadams54-away is now known as kadams54
thumpermramm2: weren't we going to have a chat about now?20:28
wwitzel3ericsnow: ping if you're around20:29
ericsnowwwitzel3: sure20:29
=== kadams54 is now known as kadams54-away
mbruzek1is the juju-dev@lists.ubuntu.com a private or public email list?21:08
mbruzek1nevermind it looks public to me21:08
mbruzek1https://lists.ubuntu.com/archives/juju-dev/2015-June/thread.html21:09
mupBug #1437445 changed: worker/uniter: FAIL: util_test.go:665: "never reached desired status" <unit-tests> <juju-core:Triaged> <https://launchpad.net/bugs/1437445>21:39
katcoanastasiamac: running just a little late. brt22:02
anastasiamackatco: nps22:03
sinzuithumper: ppc64el is retesting with a cap on procs22:05
thumpersinzui: I read that as "crap on procs"22:05
sinzui:)22:05
sinzuithumper: also, maybe we can get the mem increaed to 16 for this one instance22:06
=== kadams54-away is now known as kadams54
thumpersinzui: is the power test running?23:07
thumpersinzui: how's it going?23:07
=== kadams54 is now known as kadams54-away
=== kadams54-away is now known as kadams54
=== kadams54 is now known as kadams54-away
=== kadams54 is now known as kadams54-away

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!