[00:02] perrito666: i think i see what's going on [00:02] perrito666: http://play.golang.org/p/VzqUYG-D2y [00:02] perrito666: the capacity fluctuates depending on how far into the slice you start your window, so you pretty much have to create a new slice and copy into it [00:03] it might be looking into the capacity and trying to get as many objects instead of lenght objects [00:04] man, I lova play.golang so much [00:04] hehe [00:04] it is nice [00:04] love* [00:05] its like a playable pastebin [00:05] :) [00:05] sandboxed too [00:05] i think rob pike did an interesting write-up on it somewhere [00:05] or maybe it was russ cox [00:06] wallyworld: got it figured out, the user-data needed to be base64 encoded to be properly stored on GCE [00:06] wallyworld: but the cloudinit script doesn't expect it to be base64 encoded (afaict), so that is the issue. [00:06] wwitzel3: it's been a while since i looked, isn't is base64 encoded for all providers? [00:07] wallyworld: it is, but that is actually handled in the provider libraries for most, not in juju [00:08] wallyworld: go's gce library doesn't enforce how you insert metadata like go amz [00:08] so we should update the gce provider then [00:08] sorry, api libraries, no the providers [00:08] i guess we don't have access to api lib [00:08] but yes, we updated the provider to base64 encode [00:09] but cloud-config is still nil, so we are looking at the cloudinit GCE datastore to see if it does indeed deal with the user-data being base64 [00:12] wallyworld: so we are on the right path, thanks for letting me bug you earlier :) [00:12] np, i didn't help much [00:13] I'm used to that when I talk to you [00:13] ;D [00:13] wwitzel3: no one disrespects a tanzanite team member without disrespecting all of us! ;) [00:14] katco: haha, I'll remember that for next time [00:21] ok, being 9:20 pm I feel I should EOD cheers all [00:21] perrito666: tc [00:21] perrito666: i'll see you tomorrow yeah? [00:22] indeed [00:23] ok, then no happy holidays for you yet :) [00:23] heh ah true down under people are in their friday [00:31] if i'm getting this error: juju.worker exited "environ-provisioner": no state server machines with addresses found [00:31] but i have created a machine with the factory and job of JobManageEnviron, what am i doing wrong? [01:27] wallyworld: I get this in any unit after upgrading from 1.20 to 1.22 [01:28] exited "uniter": failed to initialize uniter for "unit-mediawiki-0": cannot initialize hook commands in "1.22-alpha1.1-trusty-amd64": symlink 1.22-alpha1.1-trusty-amd64/jujud 1.22-alpha1.1-trusty-amd64/action-fail: no such file or directory [01:28] haven't we seen that before? === menn0_ is now known as menn0 [01:31] yes [01:32] yes we have [01:32] menn0: hangout? [01:33] menn0: the 1.20 upgrade worker creates a relative symlink [01:34] I think the fix I did for 1.21 didn't get landed in master because "we didn't support 1.20 -> 1.22 upgrades" [01:34] new code expects absolute [01:34] the 1.21 upgrade worker makes an absolute symlink [01:36] thumper: sorry, just saw this [01:36] thumper: standup channel? [01:36] ack [02:01] * thumper turns off distractions... [02:01] if people super need me, text [02:38] wallyworld: are you looking at bug 1396099? [02:38] Bug #1396099: AWS/Joyent/HP/manual/maas: juju deploy error "connection is shut down" [02:38] menn0: yes, but been in meetings for the past 2 hours [02:38] partly done, but i'm not happy with the soluton [02:39] as we can get address changes at any time [02:39] and there's potential for at least 2 restarts as the state server comes up [02:39] wallyworld: is there a way to avoid restarting the API server and just have the new cert be used for new connections? [02:40] that's what i need to dig into [02:40] when i first looked, it didn't seem plausible [02:40] but i will have to look a bit harder [02:40] wallyworld: that would be slickest solution (but I can see how it might be hard/impossible) [02:41] yes, agreed. i did initially want to do that but since it appeared hard, it was easier to restart [02:41] i didn't think it would matter [02:41] as there would only be a very short window when the state server comes up [02:48] wallyworld: turns out computers are pretty fast :-p [02:49] menn0: scripting, yes [02:49] wallyworld: that's what I mean [02:49] i didn't think we'd be polling state server to jump in and grab a connection [03:17] wallyworld: I don't think anything is polling [03:17] wallyworld: as soon as the bootstrap command returns the tests (and my script) attempt to deploy a charm [03:17] wallyworld: and that's often the same time the API server gets restarted [03:24] so if I want to make a change to the cloudinit DataSourceGCE python, is there an easy way to test that with Juju? [03:24] ok, I have given up on getting this done today [03:25] menn0: yeah, sorry, i didn't mean polling in the strictly literal sense, just a client that connected immediately after the state server comes up. [03:25] however, I do have it to the state where I just need the tests [03:25] still in meeting s o i haven't looked anymore yet [03:25] and break it in half [03:25] found that I have some state methods needed [03:26] so I can land them independantly [03:57] thumper: sorry, he's all mine! [03:59] he who? [03:59] thumper: ian [03:59] katco: he's all yours, with my blessing [03:59] * thumper ditches him while he has the opportunity [03:59] no backsies [03:59] thumper: i thought you'd be more jealous [04:00] rofl [04:01] happy holidays everyone [04:01] I'll be back Jan 5th [04:01] thumper: happy holidays [04:02] see ya thumper, have a good one [04:02] thumper: be safe, have fun [04:02] I foresee much hacking in the next two weeks [04:02] and none of it in go [04:02] python, javascript and C# [04:02] laters [04:03] nice [05:00] anastasiamac: oh and i sincerely apologize for monopolizing ian's time. sorry about that [05:05] ok off to bed... tc and happy holidays to all of those for whom it is friday! [05:06] katco: thanks... have a good break [05:06] menn0: you too good sir! [05:06] wallyworld: I think I see a way to swap out certificates on the fly === kadams54 is now known as kadams54-away === kadams54-away is now known as kadams54 [05:08] wallyworld: if you look at the implementation of the listener returned by tls.NewListener [05:09] wallyworld: you can see that the config (which contains the cert) is passed to Server for each inbound connection [05:09] wallyworld: so if you can change the cert in the config, new connections will use the new cert [05:10] wallyworld: of course the config is kept private so we'd have to implement our own tls listener [05:10] wallyworld: but that's trivial (it's tiny) and the part that does the hard work (Server) is public [05:28] menn0: yep, i came to the same conclusion, but haven't been able to finish implementing because i've been ina meeting all day [05:28] will complete the fix tonight [05:29] i was wary of changing the tls config on the fly [05:29] but on reading the tsl code in the std lib i think it will be ok [05:29] it still feels a bit dirty, but hopfully will work [05:30] wallyworld: there's a bit of restructuring to do to make the api server accessible to the certupdater isn't there? [05:30] yes, but i'm part way through doing it [05:30] katco: np :) i got the whole 15 min of his time :) [05:30] wallyworld: also, you'll probably need a mutex to protect updates to the config [05:30] probably yes [05:31] although i'm thinking of using a channel [05:31] wallyworld: and I'd be tempted to take copies of the config so that existing connections keep their previous config [05:31] there's only one server though [05:32] the cert is only ever augmented with new addresses [05:32] so i should just be able to replace on the fly [05:32] wallyworld: there's one server but each time a connection is accepted a new Server instance is created and it gets a pointer to the config [05:33] really? [05:33] yep [05:33] have a look at crypto/tls/tls.go in the stdlib [05:33] the worker (which is the server) is run and then stays running [05:33] the listener.Accept method [05:34] it shouldn't matter as a new cert will accepts connections made previosuly [05:34] wallyworld: you're probably right [05:34] as we are just adding to the address list [05:34] wallyworld: if that's the only change made to the cert [05:35] that's the only change [05:35] wallyworld: and the cert probably doesn't even get looked at once the connection is set up [05:35] i can always iterate if we find an issue [05:35] yes, that's my understanding [05:36] i gotta run to relocate, will finish the fix tonight now that all my meetings are done [05:36] wallyworld: I am still worried about what happens if the cert gets accessed (for a new connection) just as it's being updated [05:36] me too, i have to think it through [05:36] wallyworld: b/c I think the handling of connection happens in it's own goroutine [05:36] wallyworld: np [05:36] wallyworld: have a great xmas [05:36] wallyworld: i'm about done [05:37] menn0: you too, thanks for thinking through this issue with me, good that we converged on a similar approach [05:37] have a good break [05:37] wallyworld: yep that is a good sign [05:37] wallyworld: (or we're both equally stupid) [05:37] lol, well maybe [05:37] wallyworld: it was fun to think about [05:37] i will do it and test and see how it works [05:38] i just wish i had more time today out of meetings to actually do the fix [05:38] anyway, gotta run, be back later [05:39] wallyworld: cool, speak later [09:10] davecheney, was just looking at state/multiwatcher, and wondering what RelationUnitsChange was doing in there? [10:17] morning all [10:18] morning perrito666 [10:35] fwereade: is there already a plan for importing the jenv files generated by "juju user add" to the environments directory of the juju home? perhaps something like "juju user import pat/to/jenv"? I am asking because we added support for using jenv files without a corresponding entry in environments.yaml in quickstart and that would complete the story. [10:36] path/to/jenv even [10:38] frankban, yes, that's the heart of it, but I think we're calling it `juju connect` [10:39] frankban, thumper will have a better idea of when we can expect that than I do, though [12:27] fwereade: you know who I should talk to about a cloud-init question? smoser is out already [12:27] wwitzel3, hmm, not really :( [12:28] fwereade: dang, ok, thanks :) [12:28] wwitzel3, possibly gsamfira_ might know about it, what with cloudbase-init? [12:32] fwereade: ok, cool. I'm trying to figure out why my stored user-data on GCE isn't unpacking in to cloud-config [12:33] wwitzel3, sorry, nothing really springs to mind [12:34] fwereade: it is probably my fault, I think I might need package some headers with the base64 blob [12:34] fwereade: I will probably talk you throughout this process, you can ignore me, or just make unattentive bu supportive comments [12:35] talk at you that is [12:35] wwitzel3, have you tried dephlogisticating the grommets? [12:35] (sorry) [12:35] (talk away :)) [12:36] fwereade: haha, no, that's perfect [12:36] fwereade: at least you weren't on the hangout with me yesterday, after doing nothing but hitting my desk and swearing for 5 minutes I told ericsnow "I should probably take a break" [12:37] wwitzel3, ouch [12:37] fwereade: but, when I came back, in like 5 minutes, I sovled the problem [12:38] wwitzel3, yeah, tuning those things is tricky though [12:38] wwitzel3, give up too early and the frustration just comes right back when you return [12:38] wwitzel3, wish I could tell what the optimal take-a-break moments are [12:40] fwereade: in my experience, you have to go until you break mentally *lolcry* .. that is how your brain resets and lets you look at the problem from a different view. [12:40] fwereade: or add booze ;) [12:40] wwitzel3, haha, yeah [12:44] for me the trigger is when the thought "this cant be done" arises === kadams54 is now known as kadams54-away [13:15] ericsnow: so looking at ec2's userdata, we send it up Base64 encoded, but it is stores decoded. [13:16] ericsnow: so I feel I've done proper due diligence in that we are doing the right thing in the GCE provider. [13:16] ericsnow: We need to update DataSourceGCE.py to b64decode the incoming raw user data from GCE. [13:30] fwereade: hi, i have a fix for the CI blocker, would love it if you could take a look [13:30] http://reviews.vapour.ws/r/667/ [13:32] perrito666: i didn't get a chance to look at the pr you mentioned sorry, spent all day in meetings and then had to fix a CI blocker [13:32] wallyworld: dont worry man [13:44] wallyworld: lgtm [13:44] TheMue: awesome, tyvm [13:45] wallyworld: yw, nice solution. only stumble first over the call of processCertChanges and the go statement afterwards, but then I saw that it starts an anonymous goroutine too [13:46] yes, sorry if it was unclear [13:47] wallyworld: it's fine, I like it [13:50] wallyworld, I'm not totally in love with the stop chan, wouldn't a tomb be a bit nicer? [13:51] guess so, seemed like overkill [13:51] the stop chan concept is use delsewhere [13:51] rogpeppe, if you could switch your examples to not being the store charm, it would be nice to send that email to the public list [13:52] perrito666: are you by any chance talking about the failing test when you set /utils to the latest commit? [13:52] aznashwan1: I currently have no clue what you are talking about [13:52] wallyworld, yeah, I know, but it's generally vulnerable to panic when double-closed, where tomb is fine with double-killing [13:53] hazmat: yeah [13:53] perrito666: because I ran into that on my latest PR and fixed it [13:53] hazmat: i need a decent example [13:53] fwereade: fair point, i was assuming that listener close would only be done once [13:54] fwereade: can i land to unblock and follow up? the listener is not closed during the normal operation of the state server [13:54] perrito666: there was a commit in /utils which changed a message and a test failed in core because of it, thus preventing you to get up to date [13:54] wallyworld, yeah, you're probably right, but that feels to me like exactly the sort of invariant that it's easy to break accidentally [13:55] wallyworld, yeah go for it [13:55] ok, ta. i need to tweak the lxc cache. so i'll fix in that branch [13:56] perrito666: nothing major, just an ErrorMatches failing because an error was slightly modified, but it's solved now, and my latest PR will have utils set to its current HEAD [14:03] aznashwan1, that's *exactly* why we need to get utils and other subrepos under CI integration tests with core [14:10] dimitern: you make a very good point, /utils needing this most of all the others... [14:11] aznashwan1, we're discussing it and I'll push for it to happen sooner rather than later [14:16] dimitern: that would be awesome and I wish you the best of luck towards it :D [14:16] aznashwan1, thanks :) [14:34] natefinch: coming? [14:47] wallyworld: seen, the changeCertListener.Close() panics [14:47] wallyworld: ? [15:16] wallyworld: need someone to pick up that PR and try to push it through? [15:17] mgz_: looks like CI says no space left on device.... want to show me how to fix it so I can do it myself later? [15:33] you mean the gating job? [15:33] mgz_: http://juju-ci.vapour.ws:8080/job/github-merge-juju/1691/console [15:34] that tends to just be a symptom of having got a bad instance on aws [15:34] mgz_: interesting [15:35] I don't think there's anything to do but wait for a bit, then try relanding later [15:35] mgz_: kk [15:35] it may also be the branch is bad [15:36] first panic is "runtime error: close of nil channel" [15:36] hmm [15:37] which is the cause and which is the effect? That's the question. [15:37] it's possible panics are going to a drive on the instance without much room [15:37] which would be why we see it sometimes [15:37] (aart from the known we got a crummy instance thing) [15:38] does go dump itself somewhere by default during tests? [15:42] what do you mean dump itself? [15:44] stick anything on disk [15:44] a la segfault [15:50] mgz_: nope, the panic output is all you get [16:03] so it sounds like we're still working on getting CI unblocked? [16:04] katco: alas, yes [16:05] natefinch: Shower Thought: You and your wife might create a new human before we create the next version of Juju. [16:05] katco: haha yep [16:09] katco: they will, next version of juju is due jan 9 [16:09] perrito666: :) [16:16] mgz_: I think I see the bug in wallyworld's code. I'll fix it and see if I can repropose. [16:16] natefinch: ace [16:20] mgz_: he was closing a channel he never gave a value, which is why it was "close of a nil channel" [16:20] natefinch: that's not an error is it? [16:20] katco: sorry, when I say "never gave a value", I mean, he never initialized it to a valid channel [16:21] natefinch: ah gotcha, sorry [16:25] hmm... so my wife is having some contractions, not at all sure it's labor, but it's something. Can someone else pick this up? I made a comment at the bottom of the PR: https://github.com/juju/juju/pull/1346 [16:26] good luck natefinch [16:26] thanks... might be nothing, she's had them before, but I probably shouldn't be on the computer ;) [16:26] :) [16:26] someone will take care of it, go go [16:35] mgz_: so catching up: will the tests fail on my machine for this bug? [16:37] katco: I expect the first panic to happen for you, the following fallout is mongo dependent, so may all be different [16:37] boy this was a catastrophic failure lol [16:38] * katco is scrolling through the jenkins log [16:39] katco: yeah, the only bit you really need to care about is the first panic [16:39] yeah trying to figure out what the heck package failed first... looks like api? [16:39] the rest is pretty much just mongo falling over due to poor test isolation, which apparently also fills up our allocated /tmp space [16:39] yeah, api [16:39] the pr is 1346 [16:39] ericsnow: https://bugs.launchpad.net/cloud-init/+bug/1404311 [16:39] Bug #1404311: gce metadata api doesn't properly stream binary data [16:40] wwitzel3: nice! [16:41] mgz_: blindly trying nate's suggestion [16:41] wwitzel3: you posting that to #cloud-init? === kadams54 is now known as kadams54-away [16:57] mgz_: found a few other issues. net.Serve closes the listener so we were also double-closing the channel, also our changeCertListener methods were operating on copies of the struct which meant they were locking copies of mutexes [16:57] mgz_: api package tests look like they're succeeding; i'll submit a pr after that to let CI run the full suite [17:00] I have a gramatical issue on my code [17:10] mgz_: PR incoming. [17:22] alrighty all, today is the last day before our holiday shutdown. Please feel free to start you celebrations early once you get to a good stopping point on what you are working on [17:22] and enjoy your time off with family and friends! [17:23] alexisb: thank you! [17:23] alexisb: you too [17:23] alexisb: happy holidays to you and your family :) [17:23] I am very much looking forward to the great work we will be doing in 2015 [17:23] it will be an exciting year! [17:23] alexisb: thanks, also happy holidays to you and your family [17:27] katco: you added line 162 in apiserver.go based on nates diagnosis? [17:27] TheMue: correct [17:28] katco: ok, then lgtm, seen the other parts earlier [17:28] TheMue: as well as a few other things. look like a test in cmd/jujud/agent needs to be updated as well [17:28] TheMue: ty [17:28] katco: yw [17:29] TheMue: so you might be more familiar with these changes. i need to pass a channel in cmd/jujud/agent/upgrade_test.go:493: not enough arguments in call to a.apiserverWorkerStarter [17:30] TheMue: it doesn't look to me like the test cares, so i was just going to pass in a channel and do nothing with it. [17:30] katco: looking [17:37] mgz_: i'm in the process of fixing this, but why does it look like the CI run has frozen? http://juju-ci.vapour.ws:8080/job/github-merge-juju/1693/console [17:48] katco: machine.go line 798 the method defined in line 838 is used, the channel created one line before [17:50] TheMue: hm i see that. but the test that needed updating didn't look like it was trying to test cert upgrades at all, right? [17:50] TheMue: so we don't really care if certs are changed, right? [17:51] katco: we already cared before, by stoping and restarting the apiserver [17:52] katco: the test for this isn't changed, but the way how it is realized, now w/o a restart [17:55] TheMue: i'm sorry, i'm not understanding. i'm looking at TestLoginsDuringUpgrade in upgrade_test.go. I don't see any API server restarts in there? [17:56] TheMue: and i'm completely missing why a test for logging in during an upgrade would be concerned with cert changes? [18:01] katco: the important change is here: http://reviews.vapour.ws/r/667/diff/# machine.go line 802 [18:01] katco: but I have to admit I don't know where it has been tested so far [18:02] katco: the old version does the restart, the new one signals the cert change, so that the listener reacts [18:03] katco: looks like one hang test, probably the reason for no output for ages [18:03] katco: that's in apiserver.go line 102, where the goroutine is started [18:06] TheMue: i'm still not understanding what all of that has to do with this test though? the test is not compiling because it doesn't have enough arguments anymore, and as far as i can tell it doesn't need to receive anything on this channel to test what it's trying to [18:10] katco: ah, I see, sorry, yes. This test hasn't been changed or the local change of wallyworld hasn't been checked in. so you have to change the test, sorry. [18:10] TheMue: oh no worries. i just was missing what you were trying to communicate :) [18:11] katco: no, you absolutely have been right, I missed the point regarding the test [18:11] TheMue: so my original question is: can you see any reason not to just pass it a channel and then not do anything with that channel? trying that, the tests hang [18:11] TheMue: trying to figure out why... [18:13] katco: if you only pass in a channel and nobody listens a sending will block the system [18:14] TheMue: i created a buffered channel [18:14] make(<-chan params.StateServingInfo, 1) [18:15] katco: ok, then I would have to digg deeper too [18:15] TheMue: but conceptually, you don't see ignoring that channel as an issue? [18:16] katco: based on the title of this tests it's focussing the login during upgrade, no cert change [18:17] katco: so yes, this shouldn't be an issue here [18:18] TheMue: great, ty... i'll continue trying to figure out why the test hands with a buffered channel =/ [18:19] katco: I'm lurking here into the channel a bit longer, so feel free to ping me [18:19] TheMue: thanks, i appreciate it :) [18:19] katco: yw [18:19] TheMue: if i end up not talking with you again, happy holidays! [18:20] katco: thanks. also to you happy holidays. [18:20] TheMue: ty! === mup_ is now known as mup === mup_ is now known as mup [19:05] found the issue [19:06] deleted the closing of the stop channel from the wrong place. re-running api tests now === meetingology` is now known as meetingology === ev_ is now known as ev [19:11] any CI people around? if this fix works, i'd like to open the trunk [19:12] mgz_: ^ === mup_ is now known as mup [19:27] katco: go for it [19:28] mgz_: i didn't realize i could open the trunk. how do i do that? [19:28] * perrito666 is back from 2 hs without internet nor power... gotta love summer [19:28] wb perrito666 [19:30] katco: you wait for the rev to get through gating, then go through the rest of the jobs (can track on jenkins or the report site), then mark bug as fix released when the revision is blessed [19:30] ericsnow: ping [19:30] katco: tx [19:30] mgz_: oh cool. i don't think i've ever done that. [19:30] perrito666: hey [19:31] mgz_: will you be around if i have questions? or is anyone else familiar with this process? [19:32] i don't know what reports to look at. i'm guessing 1.22-alpha1 for a bless? [19:33] oh i need to backport this to 1.21 don't i. [19:33] for beta 5? [19:33] I tried to make a nap during the blackout but was recruited by my wife -> https://twitter.com/majomalnis/status/546021647632052225 you have no idea how hard is to make those with >30C [19:34] perrito666: yum :) [19:34] well, the bug is only marked for 1.22, so i guess i don't have to backport this [19:34] katco: oh, they are tasteless :p until you add the white stuff [19:34] perrito666: are they sugar cookies? [19:35] katco: no, animal cookies actually [19:35] they must be glaced to have some taste [19:35] perrito666: i don't think i know what that is lol [19:35] really? I thought they where fairly common its an english recipe [19:36] small animal shaped cookies [19:36] ohhh [19:36] like animal crackers perhaps? [19:38] http://books.google.com.ar/books?id=OlIOxYU2YWsC&pg=PT73&source=gbs_toc_r&cad=3#v=onepage&q&f=false [19:38] * perrito666 shares the source code :p [19:38] haha [19:49] so, working with red wine surely is no good base for proper code. ;) working day is over. i wish you all a merry christmas and a happy new year. see you all freshly charged in the new year. o/ [19:51] TheMue: likewise, enjoy that wine [19:51] TheMue: tc! happy holidays! [20:38] ok, attempt n 19831287319238 to finish today's work without being interrupted [20:39] * katco is trying to figure out where a test is hanging, and go test is not giving me any output [20:42] I'm trying to figure out how to get a state testing unit's charm URL [20:43] ericsnow: still here? [20:43] perrito666: yep [20:43] metadata know about series? [20:46] it seems that the testing unit doesn't have the charm URL in its doc, but I'm not sure how to get that where it needs to go. I'm using state.AddCustomCharm [20:53] ericsnow: ? [20:53] hey [20:56] is anyone familiar with the dummy provider? [20:58] katco: other than that we use it in our tests, sorry no [20:58] bleh [21:04] why is -gocheck.vv not scrolling output for me. it's making it impossible to diagnose this [21:05] katco: the other day nate said that I should also -check.v=true to get that working [21:05] perrito666: sweet let me try that [21:09] nothing =| [21:09] go test github.com/juju/juju/cmd/jujud/agent/... -check.v=true -check.vv=true -gocheck.f=TestManageEnvironV [21:10] (note that i modified the name of that test to include a V to differentiate between other tests) [21:12] katco: I see things are going as well now as when I left earlier [21:13] natefinch: hey welcome back [21:13] natefinch: i fixed some things but there is a test hanging somewhere and i can't get any output from go test so i can put tracers in [21:14] natefinch: if i pepper in panics, i can figure out what's hanging, but c.Log, fmt.Println, doesn't show anything. and no amount of v's coerce gocheck into giving me anything. [21:14] katco: I think you need to give it test.v before anything shows [21:14] natefinch: yeah i did [21:14] hmm [21:14] natefinch: here are the changes: https://github.com/katco-/juju/commit/9e7758f27625a139ee8f8c87827278c2f8b18559 [21:15] natefinch: and something is hanging in MachineSuite in cmd/jujud/agent/machine_test.go; i think it's TestManageEnviron [21:16] natefinch: i'm toying with the idea of landing my refactoring changes, because it makes it cleaner to get a machineagent which may help this problem along [21:17] natefinch: b/c right now it appears (???) to be hanging on a newAgent [21:22] hmm the agent tests pass for me on your branch [21:22] http://juju-ci.vapour.ws:8080/job/github-merge-juju/1694/console [21:23] i'll submit again in the off chance it was a fluke, but i can't get them to pass on my machine now [21:25] katco: your comment says net.Server was closing the stop channel already? I don't see how that's possible [21:27] natefinch: where did i see that... [21:29] natefinch: http://golang.org/pkg/net/#Conn [21:29] natefinch: https://github.com/katco-/juju/commit/9e7758f27625a139ee8f8c87827278c2f8b18559#diff-19f910c28876da8a8d94937e102b2ebeR96 [21:31] katco: yeah but that doesn't close the stop channel, which is what kills that goroutine. It'll just stick around forever... but I think the close of the channel should be there. [21:31] code freeze is today, yes? [21:32] I had a branch which fwereade requested in rather strong terms to be in the release [21:32] it seems there might have been a miscommunication with TheMue, I was traveling yesterday and thought we were in sync to get it landed [21:32] https://github.com/juju/juju/pull/1341 [21:33] natefinch: sorry trying to remember my reasoning. with that in there, we definitely get a double close error [21:33] this is important to me if it's still possible to land -- it's merely a few testing changes to master and the inclusion of a new charm version which includes a MUCH cleaner ux for Actions [21:33] katco: it's probably the Close() method on changeCertListener getting called more than once. [21:33] * bodie_ nudges natefinch in the hopes of getting a pair of in-crowd eyes on this... [21:33] natefinch: tomb was calling it, and then i though the listener [21:34] katco: but right now nothing is going to be closing the channel [21:34] bodie_: trunk is blocked on this bug =/ [21:34] and unfortunately I have to run.... [21:34] gotya, does that mean this can't land before freeze or what should I expect? [21:35] or who should I talk to? [21:35] bodie_: no idea. If we want it in, it'll get im/ [21:35] in [21:35] bodie_: i've been trying my best to open trunk, don't know if i'll succeed. i'm running into issues [21:35] okey doke [21:35] bodie_: I've never met a code freeze that wasn't more of a slushy [21:36] the addition of pressure lowers the freezing point of code :P [21:36] lol exactly [21:36] ok gotta run before I get divorced [21:36] lol XD [21:37] katco, lmk if I can pitch in anywhere -- I'm with my family, but if it's crucial I'm sure I can squeeze in a couple of hours [21:37] bodie_: will do... reevaluating what nate said [21:48] bodie_: do you know what to do when CI runs out of space? [21:48] mgz_: ^^ on the off chance you're still on [21:48] ohh cmooon again? [21:48] CI was out of space this AM [21:48] =/ [21:51] katco: if you mean the landing job, see the log [21:51] perrito666: what fixes it? [21:51] it's an ephemeral image just for the tests [21:51] it doesn't run out of space unless the instance happens to be bad, or the tests fail horribly enough to fill /tmp [21:51] *instance [21:52] mgz_: ah ok. so then a fresh tests gets a new image? [21:52] well that was failing this am already [21:52] katco: yes [21:52] mgz_: ok that makes sense [21:52] mgz_: ty :) [21:52] perrito666: the same branch has been failing [21:52] land anything else it's almost certainly fine [21:52] * katco is hoping she finally has it fixed, running tests locally again atm [22:11] mgz_, I made a tweak to charm which fwereade was really keen on, and opened a PR to fix testing in master [22:11] mgz_, I thought TheMue and I were in sync to get it landed yesterday, but I think we had a misunderstanding since I landed the charm changes but not the ones in juju core [22:12] mgz_, I'm wondering how I should make sure this is in 1.22 since it's a user-facing issue we really want to get in [22:12] I don't know who to talk to or what to do about this and it's making my skin crawl.. :S [22:15] bodie_: i'm submitting another test run. the test is hanging on my machine, but there's no reason not to try it while i fiddle with things. if you want to help out, you can try testing cmd/jujud/agent/... and find out where it's hanging [22:15] bodie_: sorry on this PR: https://github.com/juju/juju/pull/1347 [22:16] bodie_: i cannot for the life of me get gocheck to feed me verbose output. [22:17] sure, let me give that a spin. can I get a quick glance over RB 661? it's really quite trivial [22:17] bodie_: sure [22:20] ok fine gentlemen and ladies, EOD [22:20] bodie_: not exactly sure what i'm looking at. does the new version of charm change the outfile key to snapshot? [22:20] have all a nice end of year and overall holidays whatever you belief is [22:20] perrito666: happy holidays, tc [22:22] katco, basically it just reduces the boilerplate for actions.yaml [22:22] katco, the change to do it is already landed in charm; this just updates master for the required testing changes [22:23] bodie_: seems benign. lgtm [22:23] ^_^ [22:23] bodie_: does it have an associated bug on launchpad? [22:23] bodie_: especially since it is tweaking the tests [22:23] bodie_: if so, target it to the milestone, otherwise you can file a bug to go along with maybe [22:23] that sounds like a good option [22:24] thanks mgz_ [22:24] there's not a bug [22:24] mgz_: ty for being so helpful past your EOD the last day of the year [22:24] :) [22:25] indeed [22:26] katco, driveby: `go test -test.v -gocheck.vv` should give you enough? [22:26] fwereade: ty, but this gives me nothing until the process times out: go test -v github.com/juju/juju/cmd/jujud/agent/... -check.v=true -check.vv=true [22:32] bodie_: are you experiencing the same issue with the test hanging? [22:32] * bodie_ hails fwereade -- https://bugs.launchpad.net/juju-core/+bug/1404397 [22:32] Bug #1404397: charm actions.yaml simplification breaks master tests [22:32] katco, just taking a crack at it after opening the bug [22:33] bodie_: no worries, thank you [22:37] katco, go version? [22:37] I don't suppose it makes a difference since CI isn't taking it [22:38] bodie_: 1.3.3 [22:38] bodie_: i made 1 change since i resubmitted; we'll see if this latest version works out [22:45] * katco sighs [22:46] the CI test is now failing in api/watcher [22:47] O.o [22:47] that seems fairly random to me [22:47] it timed out [22:48] katco, fetching but not seeing your change [22:49] it's an ammendment [22:49] ah [22:52] katco, api/watcher/... passes for me on your branch [22:52] let's see what happens when I do a whole-hog juju test [22:53] bodie_: i also resubmitted the job. again. [22:53] bodie_: it hadn't failed in those places yet. provider/dummy also failed [22:54] bodie_: which is currently hung, but with the output i couldn't get from cmd/jujud/agent [22:56] either i really didn't get enough sleep last night or this is all fairly random [22:59] you just haven't sacrificed to the loa recently [23:00] lol [23:01] fwiw, my laptop is just about exactly as fast as the CI server.... [23:02] lol [23:02] if this fails i'm going to EOD. i was on very late last night and i don't think i'll solve this [23:06] so, on that note, my anemic laptop is well past api/watcher [23:07] and CI is still on the package after api/watcher [23:07] oh and there we go. something blew up [23:07] i think it's going to time out [23:08] test ran too long on mine. [23:08] :/ [23:08] I'll give it a whack on the digitalocean 8-core instance [23:11] where did you time-out? [23:11] still running, but here's the output so far [23:11] not sure why it's still running come to think of it [23:13] looks like apiserver, but not really obvious where or why [23:13] http://paste.ubuntu.com/9574300/ [23:13] that's where CI is going to hang. there's a bug for sure. it's where the changes are centered [23:22] yup CI has failed [23:24] katco, my hunch is that whatever was supposed to release the semaphore hold didn't [23:25] that might be obvious [23:25] it's definitely something along those lines [23:25] i am too fried to debug threading atm -.- [23:26] so i'm going to EOD. sorry i couldn't get trunk unblocked [23:26] it seems there are many goroutines which were in a semacquire state at time of death [23:26] take care :) and merry christmas / happy festivus [23:26] happy solstice in my case :) [23:26] but happy holidays to you [23:30] perhaps it would be useful in debugging to keep track somehow of who has the lock [23:31] I see only three goroutines which were killed while waiting for lock in apiserver.(*changeCertListener).Close [23:32] which traces to http://golang.org/src/sync/mutex.go line 66 [23:32] fwiw [23:32] but that's not where the problem is; that's merely the symptom [23:32] whoever had the lock perhaps was waiting on io [23:33] and there are an enormous number of IO Wait-ing goroutines [23:33] perhaps the cert was never sent [23:38] I sense a perturbance in jujud/agent/upgrade_test.go line newApiserverWorker line 493 (nil channel) [23:38] katco ^ hope that helps [23:38] but I'm not certain what the intent is [23:38] EOD/holiday for me, take care all! [23:40] * bodie_ would like someone to vet bug 1404397 since it's a big Actions usability improvement [23:40] Bug #1404397: charm actions.yaml simplification breaks master tests