[00:07] mwhudson: nope [00:07] well i tried to run the tests a while back [00:08] but a cavelcade of blocking issues have meant I haven't looked at arm64 for going on 6 weeks [00:08] fair enough [00:33] thumper: sorry i missed the standup [00:33] there are two main races remaining [00:33] one the cert update worker, which I don't think I can fix [00:33] and a race counting outstanding connections in the api server [00:33] which I think I can fix [00:34] as I'm going to be travelling for the next three weeks, i'd like to talk to you about how to fix the cert update worker [00:34] it's a big job [00:34] when I say fix, it needs to be rewritten [01:53] Bug #1470297 opened: worker/uniter/storage: data race in test [03:24] thumper: ok I have managed to eventually reproduce 1469199 by doing some very nasty things to mongodb [03:24] oh? [03:24] hang on [03:25] thumper: http://paste.ubuntu.com/11802709/ [03:25] that's the script that does it [03:25] the env that triggered the bug report only has one state server [03:26] but mongodb briefly went from PRIMARY to SECONDARY and back again in 1 sec [03:26] after a little playing around I found that any change to the replicaset config causes the primary to drop to secondary for a bit [03:26] that script keeps causing that to happen [03:27] most of the time Juju handles the mongodb disconnect as it should [03:27] until it doesn't [03:27] I now have an env which is in the same state as described in the bug [03:27] no API server running [03:28] the api worker is continually trying to get a connection but failing [03:28] I actually think it's the state worker which is stuck/brokeen [03:28] possibly related: [03:28] 2015-07-01 03:20:57 DEBUG juju.mongo open.go:122 TLS handshake failed: tls: first record does not look like a TLS handshake [03:28] 2015-07-01 03:20:57 DEBUG juju.mongo open.go:122 TLS handshake failed: local error: unexpected message [03:29] that happened shortly before things went bad [03:29] might be a red herring though [03:29] * menn0 goes to instrument the state and apiserver workers [03:29] could be... [03:29] but then again... [03:29] menn0: have you dropped the idea of internally catching and retrying the 'bad record MAC'? [03:30] thumper: I'm not sure if that's the root cause here so I'm not focussing on that at the moment [03:31] ok [04:25] thumper: http://paste.ubuntu.com/11802397/ [04:25] current status [04:25] some failures when run under -race [04:25] davecheney: did you want to have a chat? [04:25] nah [04:25] i'm working on the apirserver race [04:26] we'll talk tomorrow about the cert updater issue [04:27] ok [04:28] davecheney: I noticed a few other failures in that listing that don't look racy [04:28] davecheney: like the runner_test.go:134: runnerSuite.TestOneWorkerStartWhenStopping one [04:28] I might poke that with a stick [04:29] yes, that is what I mean by some failures [04:29] and this one: FAIL: ssh_gocrypto_test.go:137: SSHGoCryptoCommandSuite.TestCommand [04:29] i shold have said "-race triggers some other unrelated filaures" [04:29] SHUT [04:29] SHIT [04:29] that is supposed to be fixed [04:30] which? [04:30] oh, ignore [04:30] and is it just with -race? [04:30] i fixed a differernt issue with that package [04:30] that is a straight up failure [04:30] kk [04:30] * thumper fetches his pointy stick [04:32] :-( [04:32] davecheney: when I run utils/ssh with -race I get a race [04:35] pasetbin ? [04:35] there was one race that needed the update to gocheck [04:35] is there another one ? [04:35] ah... I may not have the latest [04:36] I'm in the 1.24 branch [04:36] should I backport the gocheck dep fix ? [04:36] can't hurt [04:36] we'll be stick with 1.24 for a while [04:39] davecheney: http://paste.ubuntu.com/11802877/ [04:39] davecheney: that is with the gocheck fix as far as I can tell [04:39] ok [04:39] i've seen that one before [04:39] it doesn't happened every time [04:39] davecheney: I went into the gopkg.in/check.v1 dir, and did git pull origin v1 [04:39] i'll put it on the list [04:39] davecheney: I'm using go 1.4.2 [04:48] thumper: this is a proper heisenbug... I can't trigger it when I add extra logging [04:48] yay... NOT [04:49] thumper: https://github.com/juju/juju/pull/2694 [04:50] i wont submit this one til roger signs off on it [04:50] * thumper looks [04:50] tbh, i'm not 100% on the description of the fix [04:50] but the race was 100% reproducible before [04:50] and now it is not [04:53] davecheney: how does this change anything? [04:53] I don't get it [04:53] so, there is a difference between waiting on Dying [04:53] and waiting on Dead [04:53] Dying happens when you use tomb.Kill [04:53] Dead happens when someone calles tomb.Done [04:53] so there is a window where some waiting on the tomb, on tomb.Done are still running [04:54] also [04:54] tbh, I cannot explain it fully [04:54] but the logic now is straight forward [04:54] we only start the defer chain when the listener is shutdown [04:54] I would have thought that the wait group will be executed first (LIFO) [04:55] ??? [04:55] defer srv.wg.Wait() // wait for any outstanding requests to complete. [04:55] it will be, then tomb.done [04:55] right [04:55] but somehow the http server is accepting a final request [04:55] i cannot explain how [04:55] but in this new form, the listener is 100% closed before calling wg.Wait() [04:57] seems weird, but ok [04:57] anyway, i want roger to have a look at it [04:57] agreed [04:58] thumper: https://canonical.leankit.com/Boards/View/115065967/115808142 [04:58] what do you want to do with this card ? [04:58] * thumper looks [04:58] I think we can remove it [04:58] having found a different way to deal with it [04:59] how do we record time for a card that was not landed [04:59] how do we record time for a card that was not landed ? [04:59] also, what's going on with the in review column [04:59] there is more work there than in progress [05:00] davecheney: we don't care about recording time for a card not landed [05:00] fairy'nuff [05:08] * thumper found a real race in the runner code [05:08] which only raised it's ugly head in the -race test runs [05:09] * thumper is submitting [05:10] this *may* be the cause of the timeout we were seeing on ppc [05:10] no... don't thinkso [05:11] runner code ? [05:11] a data race [05:11] or a change in timing ? [05:11] http://reviews.vapour.ws/r/2073/diff/# [05:11] timing change [05:12] but not when dying [05:12] calling start worker, then stop worker real quick [05:12] will not stop the worker [05:12] because the worker hasn't told the runner it has started [05:13] right [05:13] that is because there is a nil info.Worker field [05:13] ship that shit [05:14] that code needs a shotgun rennovation [05:33] Bug #1470345 opened: provider/local: test failure [05:54] Bug #1470345 changed: provider/local: test failure [05:57] Bug #1470345 opened: provider/local: test failure === ashipika1 is now known as ashipika [07:18] * dimitern TIL: given func f1(strings ...string); f1([]string(nil)...) works just as well as f1([]string{}...) === mup_ is now known as mup [08:58] gsamfira, ping? [09:00] mattyw: pong [09:01] dimitern: ping [09:02] jam, standup? [09:02] gsamfira, hey there, I'm just doing a code review (not yours) but there's a suggestion of something that might break windows, so I wanted your thoughts: http://reviews.vapour.ws/r/2030/diff/# [09:02] mattyw: looking [09:09] mattyw: Should be fine as far as I can tell. [09:10] mattyw: should be easy to test if there are any doubts. GOOS=windows go install github.com/juju/juju/... [09:10] gsamfira, does it look like a reasonable thing to do in windows? [09:12] mattyw: I can not promise that errors will be the same in this scenario on both windows and Linux. So while it will probably not error out, you might not catch the error you are expecting. [09:13] mattyw: also, debuglog is not yet supported on windows. [09:13] gsamfira, ah ok [09:14] mattyw: it relies on tmux and ssh, both of which are missing from windows :) [09:14] gsamfira, good point, tmux is the thing I miss most [09:24] gsamfira, thanks for your help [09:26] mattyw: my pleasure :) [10:41] TheMue, hey [10:41] dimitern: yup [10:41] TheMue, here's a sketch of what I think is needed - http://paste.ubuntu.com/11803914/ [10:42] TheMue, feel free to modify/simplify it, but it should include all the relevant pieces [10:43] fwereade, have a look if you can whether I missed something? ^^ [10:43] dimitern, sure [10:43] dimitern: looks indeed similar to my IPAddressWatcher [10:43] dimitern: only missing the generic part and having a concrete one instead [10:43] dimitern: nice [10:44] TheMue, cool [10:45] dimitern, strong +1 to :190 [10:46] fwereade, cheers [10:47] dimitern, TheMue: otherwise looks sane to me [10:47] TheMue, fwereade, in that case the api client-side will treat both EntityWatcher and StringsWatcher apiserver facades the same [10:47] (save for the used facade name) [10:48] dimitern, I think they *are* pretty much the same though -- it's just that an EntityWatcher makes more generally helpful guarantees about the semantic content of its values [10:48] dimitern, the one quibble is whether the client-side one should be returning actual parsed tags [10:48] yep, more than just *any* string [10:49] fwereade: would expect it (parsed tags) [10:49] TheMue, dimitern: yeah [10:51] TheMue, dimitern: it would be a shame to copy-paste the client-side strings watcher just for that though -- please avoid duplication where you can [10:52] fwereade, agreed [10:52] fwereade: hey, that's HA by redundancy [10:52] LOL [10:53] fwereade, that's a fair point, but if we return []names.Tag we can't reuse the client-side strings watcher proxy [10:54] dimitern, TheMue: yeah -- and it may be that golang effectively just forces us to copy-paste anyway [10:54] TheMue, tags are serialized as []string in params, that's intentional as in other cases, but the client-side could very well convert them to []names.Tag [10:54] dimitern, TheMue: but please see what you can do :) [10:54] fwereade, sure [10:55] dimitern: yes, paring on client side would have been my approach. I like to keep the wire protocol as simple as possible [10:56] s/paring/parsing/ [10:59] TheMue, +1 [10:59] does anyone know if there's a time when we'd call state.Open *without* knowing the environment and state server uuids? [11:01] because there's this really bloody inconvenient bit where we call .StateServingInfo somewhere inside Open, and use it to fill in missing fields [11:02] * TheMue is afk, lunchtime [11:03] and I'd like to pass it in from outside, but this branch is a monster already, so if anyone knows why it's a bad idea before I start, please tell me :-/ [11:03] but, for now, lunch sounds good :) [11:04] fwereade, AFAIK for backwards compatibility with older versions [11:05] fwereade, but that shouldn't make the interface awkward to use and test IMO [11:08] fwereade, 2 minutes? [11:12] bogdanteleaga, is this still needed? https://github.com/juju/charm/pull/47 [11:13] mattyw: nope, they seem to be already updated [11:13] mattyw: somehow :) [11:13] bogdanteleaga, magic :) [11:49] dimitern: generating devices from a template isn't too hard [11:49] dimitern: https://code.launchpad.net/~mfoord/gomaasapi/devices/+merge/263370 [11:49] dimitern: see newDeviceHandler [11:50] dimitern: little bit of work needed on that (generating a proper system id), but should be plain sailing from here [12:00] voidspace, awesome! [12:05] fwereade, sorry do you have 2 minutes? [12:35] fwereade, ping? [13:11] voidspace, ping [13:23] dimitern: pong [13:25] Bug #1455623 changed: TestPingCalledOnceOnlyForSeveralWorkers fails [13:26] voidspace, did you manage to sort out your issue with maas 1.9 ? [13:27] dimitern: no, still talking to Raphael about it [13:27] dimitern: a migration got changed in the daily builds [13:27] dimitern: which screwed things up [13:27] dimitern: we're on step 5 of trying to sort it out [13:27] voidspace, I see [13:27] voidspace, I hope it works then :) [13:28] me too :-/ [13:43] Bug #1463910 changed: Upgrade tests timeout on ppc64 [14:25] Bug #1470526 opened: Juju bootstrap with local provider fails on precise === Ursinha_ is now known as Ursinha === hazmat_ is now known as hazmat === cppforlife__ is now known as cppforlife_ === kadams54_ is now known as kadams54-away [16:02] ericsnow: meeting [16:03] katco: just when I thought it was safe to go back in the water :) [16:03] ericsnow: haha sorry =/ === kadams54-away is now known as kadams54_ [16:43] ericsnow: you have 2 ship its [16:43] katco: thanks! [16:44] ericsnow: i'm going to place a card on our kanban to ensure we revisit the state stuff [16:44] ericsnow: we don't want that to fall off our radar [16:44] katco: good idea [16:46] ericsnow: also: http://reviews.vapour.ws/r/2075/ [16:46] katco: I'll take a look === liam_ is now known as Guest12050 [18:05] Bug #1470601 opened: UniterSuite.TestLeadership fails on windows === kadams54_ is now known as kadams54-away [18:35] ericsnow: you around? [18:35] natefinch: yep [18:35] ericsnow: I noticed that the ListProcesses state method (as I last saw it) is effectively a bulk call into state... but it doesn't seem to have a way to return multiple errors. How will it handle getting called with a mix of valid and invalid IDs, for instance? [18:36] natefinch: it just returns the ones it found [18:36] natefinch: it's up to the caller to sort that out [18:36] natefinch: FYI, I've already fixed that up in my "adjustments" patch === \b is now known as benonsoftware [19:04] ericsnow: so if i propose my branch again the feature branch, will the RB review be made for me? [19:05] s/again/against [19:05] wwitzel3: yep [19:09] ericsnow: I need to add a few more, but no reason it can't be up for review while I take care of that. [19:10] wwitzel3: k [19:10] tests that is [19:10] well maybe I do, I don't know [19:10] I think I'm actually exercising all of the client paths .. oh error paths, I need to do those [19:10] ok, yeah [19:10] good talk [19:10] wwitzel3: :) [19:20] Bug #1470220 changed: Juju-deployer incorrectly reporting errors in juju 1.24.0 environment [19:37] natefinch: ping [19:38] natefinch: the card I was doing, the API client abstractions, is the card you are doing :) [19:38] natefinch: I think we just copied too many of the server cards [19:38] wwitzel3: oops! [19:39] natefinch: my branch is up, I just have to add one more method and a couple tests, but the PR is up [19:39] wwitzel3: I'll take a look and see if I have anything to add [19:41] wwitzel3: that looks great, and pretty much just what I was doing. [19:43] natefinch: actually I can ask you, I didn't see a stand alone Status on the API server side? [19:43] natefinch: I am assuming ListProcess is filling that role? [19:45] wwitzel3: I was just exposing what ericsnow had in state... I think list process is the thing, though maybe we want a shortcut for a brief status message [19:45] natefinch, wwitzel3: yeah, we might want that [19:46] natefinch, wwitzel3: we can tackle that when we add support to juju status [19:47] ericsnow: is your state work merged into the feature branch? [19:47] natefinch: not yet [19:47] natefinch: multi-environment stuff is killing me [19:48] ericsnow: is there anything I can do to help? [19:49] natefinch: I don't think so [19:49] natefinch: I've almost got it [19:51] ericsnow: awesome [20:20] bogdanteleaga: ping [20:20] thumper: pong [20:20] bogdanteleaga: hey there [20:20] "2015-07-01 17:27:20 WARNING utils.featureflag flags_windows.go:34 Failed to open juju registry key HKLM:\\SOFTWARE\\juju-core; feature flags not enabled\n" [20:20] I don't think we should emit warnings if the registry key isn't there [20:21] is there a difference between non existent and can't open? [20:21] yes [20:21] bogdanteleaga: this made master fail CI BTW [20:21] I got a fix for it though [20:21] awesome [20:22] here you go https://github.com/juju/juju/pull/2699 [20:23] thumper: since you're here a fast review wouldn't hurt :p [20:23] bogdanteleaga: looking now [20:27] bogdanteleaga: shipit [20:28] thumper: cool, thanks [20:28] bogdanteleaga: np [20:32] sinzui: I'm assuming the upgrade testing for 1.24.2 all went well? [20:32] thumper: sorry? I don’t know which testing that is [20:33] sinzui: didn't you say that you were going to make sure that the proposed release goes through additional CI testing around upgrades? [20:33] thumper: yes, I did the 1.22.6 -> 1.24.2 [20:33] cool [20:34] I was so happy to finally see that bless come through yesterday evening [20:34] sinzui: I tried doing an upgrade from 1.25 to 1.26 using zip for 1.26 and metadata taken from simplestreams on a webserver set up by me; can you think of anything else I should test? [20:37] bogdanteleaga: 1.24.x -> .1.26.x. I expect a message that there are no candidates (no crash) also, we need to bootstrap juju 1.18.1 and 1.20.11 and confirm they do not choke on streams with zips [20:48] wwitzel3: got you a review on the client stuff. There's a couple pretty important changes that need to be made. [20:50] natefinch: thanks, the second comment, not sure what you mean? Isn't it up to the caller to inspect the results for errors? [20:51] natefinch: oh, I see what i did, heh [20:51] I stipped them out beofre the caller gets a chance to act on them [20:52] yeah, I think we can just return the API objects that the API calls return.. maybe stripped of an extraneous containing struct (like if the struct is just holding a slice of something) [20:53] ahh hmm.... error in the code ProcessResults should not have an Error value itself [20:53] or maybe ericsnow added that on purpose [20:53] though I kind of thought that was what the error return from the API call was for [20:54] natefinch: yeah, I added that on purpose [20:54] natefinch: ProcessResult or ProcessResults? [20:55] my understanding is that the error return is more for errors in the machinery, not in the handling logic of the request [20:59] ericsnow: other bulk calls don't seem to have an Error on the top level result [21:00] natefinch: k [21:00] I have to go make dinner, but I'll be on later. === natefinch is now known as natefinch-afk === kadams54 is now known as kadams54-away [21:32] thumper, I'm here if you're free [21:32] fwereade: otp now, release call [21:34] thumper, np, ping me when you're free, I may or may not be around :) [21:42] thumper: there at lots of calls to os.Exit(2) inside cmd/go [21:42] the error could be coming from there [21:52] https://bugs.launchpad.net/juju-core/+bug/1470601 [21:52] Bug #1470601: UniterSuite.TestLeadership fails on windows [21:52] how long until the build is unblocked ? [22:34] wwitzel3: meeting === kadams54 is now known as kadams54-away [22:57] anastasiamac: axw I am going to bail today the context switch would be really expensive today [22:58] perrito666: nps :D could u send a briefing email? tyvm :D === kadams54-away is now known as kadams54 [22:59] anastasiamac: I will [22:59] perrito666: \o/ have fun! [23:00] anastasiamac: I actually am having fun :) txx [23:00] * perrito666 is having the fun of finally cracking a problem [23:01] completely unrelated question, is anyone using vim and a decent buffer switcher [23:12] katco: shoot, sorry [23:12] katco: still going? [23:15] katco: sorry, I'm sure the plan was "Wayne will do everything" since I wasn't there .. I deserve that [23:16] wwitzel3: FYI, I'm merging the state patch right now [23:16] ericsnow: good deal [23:16] wwitzel3: you'll be manning the booth by yourself wed-fri, we'll do the rest [23:16] ;) [23:16] axw: haha [23:17] axw: yeah, I finished a meeting for the dockercon wrapup doc, and in my brain, that was my late meeting, so I ran to the store [23:17] just completely spaced === kadams54 is now known as kadams54-away [23:18] wwitzel3: nps, otp, will fill you in later [23:18] axw: thanks [23:19] sinzui: we have a problem: http://data.vapour.ws/juju-ci/products/version-2844/kvm-deploy-trusty-amd64/build-1206/consoleText [23:20] sinzui: there are conflict markers in the main CI branch code [23:20] thumper: we have just fixed it and and queued the retest [23:20] sinzui: cool [23:20] thumper: and it is just on the kvm-slave [23:20] sinzui: I was trying to get a jump on looking at master CI :) [23:20] thumper: yeah, my bad, I forgot I needed to revert a hack before doing the real landing [23:21] ha [23:21] np [23:24] wwitzel3: we are putting the hardware requirements for the demo on you, though. So if we need any systems, monitors, etc, get that info to alexisb by tomorrow. [23:25] heh thumper you really want to see that bless :) [23:27] oh ye [23:27] s