[00:13] thumper: the next fix for bug 1468994: http://reviews.vapour.ws/r/2042/ [00:13] Bug #1468994: leadership settings documents written out without env-uuid field [00:14] thumper: also, not sure if this relevant to what you're looking at: https://bugs.launchpad.net/juju-core/+bug/1457225/comments/9 [00:15] don't think that is related to this current issue [00:18] thumper: k [00:51] davecheney: I don't suppose there is a way for a function in Go to annotate the call stack? [00:51] davecheney: that would be so handly when analysing panics [01:30] thumper: no, sorry, there is not [01:30] i did see one tool published [01:30] but I cannot rememver it's name at the moment [01:30] it didn't annotate stack traces [01:30] oh well [01:31] but it did do some kind of analysis/grouping [01:35] davecheney: \o/ [01:35] I have a panic with the workers showing restarting [01:35] * thumper digs [01:36] thumper: paste ? [01:36] sure... [01:37] * thumper finds the start [01:38] davecheney: http://paste.ubuntu.com/11791044/ [01:38] davecheney: it is even one of the tests known to cause issues [01:39] davecheney: line 301 is the a.Stop call from the test (pretty sure) [01:40] oww, this paste is hurting my machine [01:41] line 347 shows a worker has said it started after everyone had been killed [01:41] is this the "did not reach expected state" ? [01:42] line 255 also looks weird [01:42] the line ofhte output ? [01:42] davecheney: I think it is the case of the goroutine not fully starting before being told to stop [01:42] or another line ? [01:42] 355 sorry [01:44] davecheney: pretty sure this is all due to timing around the workers that had been asked to start haven't fully started [01:44] davecheney: and then told to die [01:44] and they all get confused [01:44] * thumper digs into timing a bit more [01:44] how are workers told to stop ? [01:46] * thumper rages [01:46] davecheney: remember when I said that this would be due to a worker finishing with a non-fatal error? [01:46] look at line 579 [01:47] uh h [01:47] uh oh [01:47] i htink i even logged an issue about that permission deined error [01:47] and another... [01:47] asking, why does it do this [01:48] 616 - 619 [01:48] NFI [01:48] test isolation failure [01:48] where it shouldn't be looking at /var/lib/juju [01:49] however, that isn't the only problem [01:49] isFatal is passed into the runner [01:49] so there is no knowing that it considers fatal [01:49] I'm prepared to bet money that this is another case of multiple unexpected issues causing problems with each other [01:49] * thumper digs more [01:49] i'd back that [01:50] I'm so happy to get this logging... [01:51] so, i see lots of start "worker" [01:52] but no corresponding stopped "worker" [01:55] thumper [01:55] how does killWorker work ? [01:56] you should see killing and exited [01:56] killing is the request to kill it [01:56] info.worker.Kill() isn't plumbed through to the worker [01:56] and exited when it is done [01:56] it's not passed to start() [01:56] it is all a bit convoluted [01:58] thumper: hangout call ? [01:58] gimmie a few minutes... in the middle of something [02:01] i can see a case where the started worker's kill channel will not be hooked up correctly [02:01] ie, the kill signal is delivered to the previous invocation of the worker [02:02] davecheney: 1:1 [02:09] workerInfo.worker = nil [02:09] go runner.runWorker(workerInfo.restartDelay, info.id, workerInfo.start) [02:09] workerInfo.restartDelay = RestartDelay [02:14] } else { [02:14] logger.Errorf("exited %q: %v", info.id, info.err) [02:14] workerInfo.worker = nil [02:14] } [02:43] thumper, menn0 https://github.com/juju/juju/pull/2668 [02:44] davecheney: thumper has lost power at home [02:44] davecheney: looking [02:53] shtter [02:57] davecheney: I don't completely understand this change [02:57] menn0: hangout ? [02:57] davecheney: AFAICS killAll is only called when the runner's tomb is dying [02:57] correct [02:57] it's hard to explain in irc [02:58] it's taken thumper and I all morning [02:58] davecheney: ok standup hangout [03:02] power is back [03:03] menn0, davecheney: I found something very interesting [03:03] api worker starts: upgrader, upgrade-steps, and api-post-upgrade [03:03] however when the kill order is given [03:04] we see: [03:04] killing upgrader [03:04] killing api-post-upgrade [03:04] ugh [03:04] just worked out that the upgrade-steps have just finished [03:04] however [03:04] even though we say killing api-post-upgrade [03:04] it doesn't kill any of it's workers [03:07] menn0: standup hangout? [05:12] menn0, davecheney: logging for 1.24 http://reviews.vapour.ws/r/2045/ [05:12] haven't been able to reproduce the panic yet with this logging in [05:12] but still running tests [05:18] thumper: reviewed, just some small suggestions [05:18] k [05:25] thumper: looking [05:25] davecheney: sok, menn0 reviewed [05:25] davecheney: however with this logging, can't reproduce the error [05:32] thumper: http://paste.ubuntu.com/11791616/ [05:32] run it under the stress [05:33] copy that to ~/stress.bash [05:33] cd $PATH [05:33] bash ~/stress.bash [05:34] thumper: with that you might be to reproduce the bug more quickly by just running one of the tests that is known to trigger the issue instead of all the tests (maybe?) [05:34] ok [05:53] * thumper stresses the tst [06:00] davecheney: even with the stress script running just the upgrade suite, it isn't failing [06:02] shitter [06:06] 25 times in a row it is good now [06:08] * thumper tries the entire package [06:17] I think I have it again [06:17] but we are no further than we were before... [06:19] can't tell... waiting for CI to throw it up [06:19] * thumper is done [07:05] jam, are you around for our 1:1? [07:05] dimitern: just trying to connect now [07:06] dimitern: I think google's creds need me to login again, but it isn't giving me the window [07:06] jam, ok [08:21] Bug #1453096 opened: Machine agent state changes not included in the mega-watcher [08:56] dooferlad, TheMue, PTAL http://reviews.vapour.ws/r/2049/ [14:04] ericsnow: standup [14:16] Bug #1469731 opened: Leader should probably change when removing unit [15:03] katco: you enjoyed canceling those, I can tell [15:03] wwitzel3: haha [15:04] wwitzel3: i always enjoy canceling meetings ;) [16:23] Bug #1469777 opened: juju-local falsely claimed to be missing [16:53] Bug #1469799 opened: Agent tests fail with no output [16:59] Bug #1469799 changed: Agent tests fail with no output [17:02] Bug #1469799 opened: Agent tests fail with no output [17:02] Bug #1469807 opened: ListSuite.TestOutputFormats fails on Windows [17:14] Bug #1469807 changed: ListSuite.TestOutputFormats fails on Windows [17:23] Bug #1469807 opened: ListSuite.TestOutputFormats fails on Windows === kadams54 is now known as kadams54-away [18:31] ericsnow: got time to chat about the api server abstractions? wwitzel3 natefinch welcome as well [18:31] katco: sure [18:31] ericsnow: moonstone [18:56] Bug #1469844 opened: Juju bootstrap fails with nonce mis-match error === kadams54-away is now known as kadams54 [19:06] natefinch: hey do you have a branch i can look at for the server methods? i need to see how you're using the api server as an example for how to generalize my stuff [19:07] jam: hey you around for a pretty simple question about registering facades? [19:07] katco: yeah [19:07] katco: what's up? [19:08] jam: here: https://github.com/juju/juju/blob/master/apiserver/common/registry.go#L126 [19:08] jam: is there any reason newFunc couldn't be of type func(*state.State, *common.Resources, common.Authorizer) (interface{}, error)? [19:09] katco: so this isn't the code as I wrote it, so I'll have to figure out why it isn't [19:09] someone added the "ForFeature" versions, and changed something. [19:09] katco: sorta rough still and in the middle of it, so it doesn't compile, but the code pretty much makes sense: https://github.com/natefinch/juju/blob/proc-server-api/process/api/server/server.go and https://github.com/natefinch/juju/blob/proc-server-api/process/api/server-args.go [19:10] jam: natefinch no worries at all. the stuff i'm looking for is hand-wavey anyway [19:10] katco: I remember now [19:10] concrete return types [19:11] katco: specifically, our NewFoo() functions returns *Type objects, and we reflect on that Type to determine the API [19:11] jam: can't we cast the interface to the concrete type using reflection? [19:11] katco: you can't pass a function that returns a concete type to an interface that says it returns interface{} [19:12] jam: sorry, i meant have the functions go ahead and return interface{}, but where the type of that interface is important, discover its true type via reflection [19:12] katco: http://play.golang.org/p/XjucCOijN9 [19:13] katco: we can't do registration time lookup for the signature of the API, (we have to actually call it) [19:13] katco: but more than that [19:13] if you want to test NewFoo [19:13] you really want a Foo there [19:13] now we can rewrite all of the tests to use NewFoo().(Foo) [19:13] jam: well the tests are hte easy problem, right? yeah what you just said [19:15] jam: don't want to take too much of your time after-hours... we're just fiddling with that area of the code and thought it might make more sense to be concrete about the function type. if i can get the reflection to work out, are you opposed to that approach? [19:15] katco: so there is RegisterFacade which takes the exact thing you were asking [19:15] but it also requires you to pass in a reflect.Type [19:15] jam: ah so there is [19:16] katco: IMO its really nice to just be able to have a normal NewFoo() function [19:16] and then be able to Register(NewFoo) [19:16] and both NewFoo acts like other NewFoo functions (returning a Foo) and it has the type that you need to register, so you don't have to import reflect into all of your packages. [19:16] jam: the only issue is that we've taken a (potential -- if the reflection bit works out) compile-time error and made it runtime [19:17] katco: how do you mean a compile time error? because you're changing the signature into interface{} [19:17] which means it has no signature [19:17] so the thing you *actually* care about [19:17] which is that you got the right type when you call it [19:17] is only determined *when it is called* [19:17] katco: with this way, you find out during Register() [19:18] which happens during init() [19:18] which means that every juju process finds out really fast [19:18] jam: so, iirc, the rpc stuff expects a type of interface{} anyway, so it can reflect and find the methods, etc. [19:19] jam: it's the registration that requires the specific signature, right? [19:19] katco: its the registration that allows a Type which it will then use to expose the API and it confirms at actual call type that the object conforms to the concrete Type that was registered. [19:20] jam: so the bit i'm thinking we can still do that, even if the factory methods returned an interface [19:20] jam: oops s/the bit// [19:20] so there is an 'interface{}' level under the covers, but it means that the infrastucture helps you DTRT (I think) [19:21] katco: so you can certainly do it, and use RegisterFactory instead of RegisterStandardFactory [19:21] jam: i'm wondering if you can do it w/o passing in the reflect.Type as well [19:21] katco: how do you know what methods will be available in order to define the API? [19:22] jam: i'd have to do some research, but my gut says yes. this code says someone else did the research and said no however ;) [19:22] katco: so the original code was all static analysis [19:22] jam: i'm thinking you can reflect on an interface{}, get its real type, and then do as we do now [19:22] we had 1 top level type that exposed lots of GetFoo() Type methods [19:23] katco: you could decide that you can call any method you like from the API and we'll call the function and then late tell you "the object we created doesn't support the method you wanted" [19:24] jam: that's what we have no anyway isn't it? the api layer just uses strings to call methods? [19:24] katco: it rejects the requests before calling the factory [19:24] jam: ah i see, so the type isn't constructed before failing [19:24] katco: *object* here [19:25] jam: well, hows this: its not critical to what i'm working on. but if i find some time, i'll do some experiments, and if i think its headed in a beneficial direction given what points you've raised, i'll submit a pr and we can shoot holes in it [19:26] s/its/it's/ [19:29] katco: I'm back if you still want to chat [19:29] natefinch: ty... so how you're using state hasn't really been addressed yet? [19:29] wwitzel3: ty, ericsnow and i talked. think i'm good for now [19:35] katco: you mean like passing in an interface? No, I hadn't written that yet, but you can probably figure it out by looking at the commented out functions that call state functions which ericsnow is writing. [19:36] ericsnow: natefinch: can we hop in moonstone rq? i want to run an approach by you. wwitzel3 you are welcome too [19:36] katco: seems like the RegisterStandardFacade function should be in charge of that... it's already passing in state. It could instead pass in an interface [19:36] katco: k [19:37] katco: sure... I have to go in 25 minutes, though, have a quick errand to run. === kadams54 is now known as kadams54-away === kadams54-away is now known as kadams54 [20:28] mramm2: weren't we going to have a chat about now? [20:29] ericsnow: ping if you're around [20:29] wwitzel3: sure === kadams54 is now known as kadams54-away [21:08] is the juju-dev@lists.ubuntu.com a private or public email list? [21:08] nevermind it looks public to me [21:09] https://lists.ubuntu.com/archives/juju-dev/2015-June/thread.html [21:39] Bug #1437445 changed: worker/uniter: FAIL: util_test.go:665: "never reached desired status" [22:02] anastasiamac: running just a little late. brt [22:03] katco: nps [22:05] thumper: ppc64el is retesting with a cap on procs [22:05] sinzui: I read that as "crap on procs" [22:05] :) [22:06] thumper: also, maybe we can get the mem increaed to 16 for this one instance === kadams54-away is now known as kadams54 [23:07] sinzui: is the power test running? [23:07] sinzui: how's it going? === kadams54 is now known as kadams54-away === kadams54-away is now known as kadams54 === kadams54 is now known as kadams54-away === kadams54 is now known as kadams54-away