[03:10] davecheney: so how do we define a custom error type again? [03:10] davecheney: what do I search for in code? [03:13] davecheney: actually nm. [03:14] davecheney: the error I care about is a NotFoundError, which seems sane... [03:26] anything that has an Error() string method satisifies the error interface [03:31] gah... [03:31] tests seem to be blocking again for some reason [03:31] how the hell can I know if the tests have stopped running? [03:32] I think it is getting deadlocked somewhere [03:32] cntl-\ [03:33] speaking of wtf [03:33] oh, handy [03:33] we have a method to override the default dial timeout when calling the state.Open [03:33] BUT IT IS NEVER USED !!! [03:33] well, that isn't strictly true, it is used inconsistnetly [03:34] hmm... [03:34] it seemed to be stuck in the guts of go [03:34] not in our code [03:34] wtf? [03:35] davecheney: can I pastebin for you to take a look at? [03:35] kk [03:35] http://paste.ubuntu.com/5633106/ [03:35] you might get a cleanre stack trace if you send SIGQUIT to the test process [03:35] in go 1.0 [03:36] cntl-\ will panic both the test program, and the test runner [03:36] so you get two stack traces intermixed [03:36] i fixed that for 1.1 [03:36] try again, and high the $PKG.test with a SIGQUIT [03:36] kk [03:38] s/high/hit [03:40] http://paste.ubuntu.com/5633115/ [03:41] davecheney: ^^ [03:43] davecheney: can you see what it is doing? [03:45] sorry, i think that is the wrong process [03:45] that is the `go` process [03:45] you want it's child [03:47] * thumper sighs [03:47] I seem to have a bucketload of mongod processes and go test processes running [03:48] you will have at least 8 [03:50] I had about 30 something tests, and 4 different mongod processes with 8 children each [03:50] I've killed them all, and will try again [03:51] might be a good idea to clean out your /tmp [03:51] there will be lots of junk in there [03:51] hmm... [03:52] there is a lot of mongo*.sock [03:55] davecheney: ok, it has blocked again... [03:55] davecheney: using htop to look at the process tree [03:56] davecheney: it seems that go test ./... has five different /tmp/go-build/.... something.test children [03:56] I thought it wouldn't run in parallel? [03:56] it does not run the tests inside a package in parallel [03:57] but it will test packages in parallel [03:57] ok [03:57] go test -p 1 ./... [03:57] well, they seem to have somehow deadlocked each other [03:57] will disable [04:01] oh ffs [04:01] even with -p 1, the tests hang [04:01] which test is is [04:01] there will be one [04:05] can't tell [04:05] * thumper is busy killing things [04:06] I just killed the system mongod process [04:06] and rerunning the tests after switching to trunk [04:07] best to start from a mostly known state [04:07] getting further this time [04:07] btw, there is a working mongo PPA now [04:07] * thumper hates how gocheck doesn't clean up it's test directories [04:07] if you want to switch to that [04:07] davecheney: what was the problem? [04:07] no idea [04:08] so... what changed? [04:08] the latest build jools did worked [04:08] dunno, ask him [04:08] i just copied his packge into our ppa [04:08] not for me it didn't [04:09] https://launchpad.net/~juju/+archive/experimental [04:09] pls try this one [04:09] I timed the tests here: [04:09] real 2m42.342s [04:09] user 5m34.444s [04:09] sys 0m26.396s [04:16] gah... [04:16] second run through fails [04:18] do you have the test output ? [04:18] by fails I mean hangs [04:18] I had switched to my branch though [04:18] I'm back on trunk again, and trying that [04:19] I'm beginning to wonder if it is my work [04:19] which while not impossible [04:19] would suprise me [04:19] as I didn't think I was doing anything weird [04:19] but I must be [04:19] trunk worked again [04:20] well, got further than before [04:57] * thumper back for the meeting later === thumper is now known as thumper-afk [04:57] kk [05:56] jam: can you change ownership of goose to ~juju instead of ~gophers since i can triage bugs etc anymore [05:57] wallyworld: I'm not in ~gophers anymore either. I thought you said you had super powers still for LP [05:57] i was wrong :-( [05:58] jam: but you registered the project, so you should be able to do it [05:58] change ownership [05:58] https://launchpad.net/~gophers/+members#active looks like only Gustavo is in ~gophers. I'll try [05:59] wallyworld: shouldn't it be an edit link beside "Maintaners" ? [05:59] jam: yeah, if you can't see it then you don't have permission. just to check, the administer link [06:00] yeah, I tried that one too [06:00] just lets me set "aliases" for the project [06:00] balls, i thought you would be able to [06:00] I can apparently set whether it tracks bugs in launchpad, but not change who can access those [06:00] oh well, we'll have to ask gustavo to change ownership etc [06:01] i filed a bug before but couldn;t triage or set importance etc [06:03] * wallyworld relocates [06:39] wallyworld_: seconded your rsynclogd stuff [07:02] ta, will land [07:45] mornin' all [07:46] rogpeppe: morning [07:46] dimitern: hiya === thumper-afk is now known as thumper [07:51] hi rogpeppe, dimitern [07:51] thumper: hya [07:51] thumper: hi, even [07:56] thumper: yo! [07:58] hey all [07:59] fwereade__: heyhey [07:59] dimitern, heyhey [08:09] morning mgz [08:10] wallyworld: side note I didn't want to forget about. Did we need to change the default firewall ports to include the rsyslogd port? [08:10] jam: i didn't change anything, but it worked regardless [08:11] jam: i had a mysql node and it logged back to bootstrap with no fw changes [08:11] wallyworld: so supposedly they are all in the same security group, so everything can talk to everything in the group, I wasn't sure if that was true on HP/Canonistack [08:11] it appears to be true it works by trying it out :-) [08:43] jam: can blue take this one ? https://canonical.leankit.com/Boards/View/103148069/104151606 [08:46] thumper: that looks like one that is intended for us, but I have a really hard time going from URL links to finding the actual cards on the board [08:46] jam: I'll move it to blue backlog [08:46] jam: it has been moved to blue todo [08:51] jam: mgz: dimitern: this is a critical bug for release tomorrow, could you +1 it and i'll land after the standup later? https://codereview.appspot.com/7937044/ [08:51] wallyworld: I have an in-progress review of it. I'll make sure to finish that. [08:51] your String() function looks like it exists in 2 places. [08:51] ah, nm [08:51] yes, different structs [08:51] genericId vs genericInstanceId [08:51] wallyworld: i'm on it [08:51] thanks guys :-) [09:08] rogpeppe: you submitted https://code.launchpad.net/~rogpeppe/juju-core/212-api-doc/+merge/147919 as of yesterday, can we move your 'write API design' card to Merged? [09:09] jam: i wasn't aware that i'd submitted it [09:09] jam: i still haven't seen two LGTMs [09:11] jam: in fact, i definitely haven't submitted it [09:12] rogpeppe: yeah sorry, I was looking at https://codereview.appspot.com/7919043/ and saw the last thing was submitted, but clearly that was the wrong link I had followed. [09:12] jam: np [09:12] rogpeppe: as a doc, which is better than having nothing, I'm willing to have you land it with a single +1 [09:12] jam: thanks for your review BTW [09:12] so 'trivial' [09:12] jam: ok, thanks. will do. [09:13] TheMue: looking here: https://launchpad.net/~gophers/+related-projects there is "golxc" is that something that should stay in ~gophers or move to ~juju? [09:13] jam: did my responses to your questions seem reasonable, BTW? [09:15] rogpeppe: so I think the answer is: (a) it is stateful, but we can just put more processes/machines in front [09:15] (b) as a websocket, you maintain the connection until it gets interrupted, in which case you have to set up the state from scratch again [09:16] (c) we know this design is a bit worriesome because it needs to know about all changes, but that is a whole different discussion I'm happy to defer for now :) [09:16] jam: a) i'm not sure "in front" is right there - we'd be replicating the API server itself, not putting more things in front of it. [09:16] rogpeppe: in front of mongo [09:17] jam: ah yes, indeed [09:17] rogpeppe: on that point, could you put haproxy sort of thing to load balance? [09:17] jam: yeah, that's what i'd do [09:17] as long as it new to maintain the websocket to the same api server? [09:17] jam: yup [09:17] I'm not as familiar with websocket, but presumably it is "give me a connection and then stop pretending I'm talking HTTP" [09:18] jam: yeah, it hijacks the connection AFAIR [09:19] jam: but it has some cruft in the middle too (it does packets) [09:19] rogpeppe: so i'm happy to chat about the design, etc, but none of that blocks the landing of a doc that describes what we have/you are actually building. [09:19] jam: sure [09:32] fwereade__: Rietveld: https://codereview.appspot.com/7943043 [09:34] * thumper is done for the day now [10:14] dimitern, ping [10:14] fwereade__: pong [10:15] dimitern, was the source of that bug yesterday clear? [10:15] fwereade__: which one? [10:15] dimitern, the one roger saw, that cascaded nastily [10:15] dimitern, rogpeppe: actually would you just update me quickly on what's done/planned wrt that issue? [10:16] fwereade__: not really, no (for me at least, and couldn't reproduce it) [10:16] fwereade__: i just disabled the test [10:16] fwereade__: and filed a bug [10:16] fwereade__: assigned to dimitern :-) [10:16] fwereade__: i filed the one about mgo today btw [10:16] dimitern: what one was that? [10:17] rogpeppe: bug 1158190 [10:17] <_mup_> Bug #1158190: intermittent failure with go tip and GOMAXPROCS=5 < https://launchpad.net/bugs/1158190 > [10:18] dimitern, cool, thanks [10:18] rogpeppe, ok, cool, thanks [10:18] dimitern: which revision of mgo are you using? [10:18] rogpeppe: how can i check? bzr info doesn't say [10:19] dimitern: i use bzr log [10:20] rogpeppe: ah, bzr info -h also shows that i found. so rev 183 [10:20] dimitern: ok, cool, that's the same one as me [10:20] sorry, that was bzr info -v [10:21] dimitern: that's weird then; i can't see how it could get a nil pointer error on that line [10:21] rogpeppe: panic dumps don't lie :) [10:22] dimitern: actually, they can quite often be out by a line [10:22] rogpeppe: oh, i didn't know this [10:25] dimitern: it's *usually* fairly obvious [10:25] dimitern: and in this case, i can't see an obvious candidate (servers is not nil, and that's the only way that line can panic AFAICS) [10:26] rogpeppe: weird.. [10:26] dimitern: tip has had some very significant changes recently. i wouldn't entirely rule out some memory-corruption issue. [10:48] long lunch today, bbl [11:03] lunchtime too [11:06] dimitern, fwreade: i just saw another uniter test failure in trunk : http://paste.ubuntu.com/5633753/ [11:06] rogpeppe: this looks like the same issue [11:07] dimitern: i'm not sure. it first dies in a different test, for a different initial reason [11:07] dimitern: (it first dies in "hook error service dying") [11:09] rogpeppe: is it consistently failing? [11:10] dimitern: nope [11:10] dimitern: i saw that one after about 10 runs [11:10] dimitern: all with different values for GOMAXPROCS [11:10] dimitern: (that one was with GOMAXPROCS=60) [11:11] rogpeppe: so values of n > 2xnumber of cores still work? [11:11] dimitern: yeah, you can have any number [11:11] dimitern: it's just the number of processes that can be running cpu-bound stuff at once [11:11] dimitern: i'm continually running dfc's stresstest shell script [11:11] rogpeppe: i see, so what should we do about it? [11:12] dimitern: we should delve in and try to understand what's happening [11:13] dimitern: i'd start by comparing the logs from the passing test with the logs from the failing test and see where they diverge [11:14] rogpeppe: ok [11:25] rogpeppe, fwereade__: I'm trying to make the change to default to "precise". I can easily update the test. However, when I change it, it only tests the logic inside Config.New() (where if the value is empty it gets auto-set to a new value) [11:25] it does not test the value in schema.Defaults [11:25] do you know how to trigger schema.Defaults? [11:26] jam: schema.Defaults is triggered when a config attribute isn't specified, no? [11:27] rogpeppe: I would think so, but if I just change the line in "New" then all paths return the value I specify [11:27] * rogpeppe goes to look [11:27] So, AIUI, there should be "not specified" as separate from "specified as the empty string" [11:28] rogpeppe: ah, maybe I'm on crack because "default-series": version.Current.Series *is* precise on my machine. [11:28] And even though I'm monkey-patching the value during testing, it is too late, because the value is already in the map [11:28] jam: ha! (you're still running precise?) [11:29] rogpeppe: the last one to support Unity-2D, and 3D doesn't work very well in a VM [11:29] jam: ah, i see [11:29] jam: i didn't realise you used a VM [11:29] not always, but I'm in Windows to do the windows building, etc. [11:29] but I can't run the test suite there [11:29] so VM [11:30] of course [11:30] I do have a raring VM which works ok, but the "3D" support is pretty poor for virtualbox [11:30] even with the "allow 3D for guest" checked. [13:11] fwereade__, fwiw, bug 1131608 is still a blocker for us to fully deliver (developing the charm needs it) [13:11] <_mup_> Bug #1131608: deployed series is arbitrary < https://launchpad.net/bugs/1131608 > [13:13] gary_poster, I *think* that, as of thumper's branch proposed this morning, it should actually be resolved [13:13] fwereade__, oh, that would be great, thanks. [13:13] gary_poster, I'm pretty sure that's the last piece that needed to be added [13:18] * TheMue missed to restart irc after reboot *facepalm* === wedgwood_away is now known as wedgwood [14:10] rogpeppe, ping [14:10] fwereade__: pong [14:10] rogpeppe, kanban :) [14:10] fwereade__: oh, bugger [15:20] rogpeppe, ping [15:20] fwereade__: pong [15:21] rogpeppe, since you can repro it, would you try out a fix for that uniter test please? [15:21] fwereade__: sure [15:21] rogpeppe, line 570 should read: [15:22] fwereade__: let me just clone $GOROOT :-) [15:22] info: "upgrade failed", [15:22] ...and that's it [15:22] rogpeppe, I *think* [15:26] fwereade__: so where's the source of the indeterminacy? [15:26] fwereade__: i'm first verifying i can still reproduce the bug; then i'll try the fix. [15:27] rogpeppe, that we wait until a few steps *before* the point we care about, and then assert we're at that point [15:27] fwereade__: ah, i think i see - "hook failed: "start"" is just a stage on the way to "upgrade failed" [15:28] fwereade__: yeah [15:29] rogpeppe, but actually wait [15:29] rogpeppe, I am suddenly very confused by the test [15:29] rogpeppe, even if it works, something's funny [15:30] rogpeppe, fuck, it's harder than I thought [15:32] fwereade__: yeah, sorry, still fails [15:32] fwereade__: it would be great to sort out the test isolation issue too [15:33] rogpeppe, yeah, I'm feeling a bit blocked on the thing I picked up, might take a proper look t both of those [15:38] pwd [15:52] * rogpeppe wishes that gocheck printed searchable-for string with every assertion failure [15:53] rogpeppe: like what exacty? [15:53] mgz: like "assertion failed" [15:54] ah, you mean just searchable in the output [15:54] mgz: yeah [15:54] yeah. [15:54] mgz: 'cos some tests (looking at uniter here) produce heroic quantities of output, and finding the errors is not easy! [15:55] mgz: ah, this'll work pretty well: search for _test\.go [15:55] ha [15:56] fwereade__: i just saw another uniter test failure (in trunk, or nearly trunk, this time). in steadyUpgradeTests. [15:56] fwereade__: same symptom (never got expected hooks) [15:56] rogpeppe, cool, I'll take a look there too, might be similar [15:57] fwereade__: will paste you the output if you want [15:57] rogpeppe, please [15:58] fwereade__: http://paste.ubuntu.com/5634430/ [15:58] rogpeppe, thanks [16:03] pretty simple cleanup CL, if anyone wants to take a look: https://codereview.appspot.com/7945044 [16:46] rogpeppe, when you have a moment, would you try lp:~fwereade/juju-core/fix-1157898 please? [16:47] fwereade__: running tests on it now [16:47] rogpeppe, that other one you saw is profoundly weird... it looks like the hook is (maybe?) running but the juju-log tool is not [16:49] rogpeppe, if that fix works I might add a couple of logging lines before I propose, to help diagnose the other one if we see it again [16:52] fwereade__: looking good so far - three full tests without incident [16:52] rogpeppe, excellent [16:52] fwereade__: so what's the signifcant of the code move in Uniter.deploy ? [16:53] rogpeppe, to make the things I said in the test true -- specifically, by delaying the SetCharm until the operation is not stoppable [16:53] rogpeppe, the critical point is doing so after the download has complete, after that it doesn't check Dying until it's done [16:54] fwereade__: so is that a genuine bug? [16:54] rogpeppe, motivationwise, a bit of a hack; otherwise a reasonable change, I think... could definitely be argued to be more correct to not set a charm until you actually *have* the charm in hand [16:55] fwereade__: sounds reasonable to me [16:55] rogpeppe, I think it is arguable that it was -- I'm not quite sure how it would behave after having set a charm but finding itself unable to download [16:56] rogpeppe, but the test was definitely racy, and I think that now it no longer is [16:56] fwereade__: 5 iterations good so far [16:56] rogpeppe, awesomesauce [16:56] fwereade__: and the Reset fix looks... why didn't we see that before? :-) [16:57] 6 [16:57] pwd [16:59] rogpeppe, haha :) [17:01] I'm seeing test failures on trunk, is that a known thing? After skimming the scroll-back I didn't see mention of it. [17:02] benji, I am not aware of any myself, would you paste them please? [17:02] fwereade__: http://paste.ubuntu.com/5634632/ [17:03] I'll check back after lunch and see if things are better. [17:04] benji, I think that might be jam's change [17:04] jam, are you (1) around and (2) running on precise? [17:05] fwereade__: i think he mentioned he's running on precise [17:05] dimitern, grumble grumble, I bet that's it [17:06] fwereade__: 12 [17:07] rogpeppe, more awesomeness, I'm starting to feel good about it [17:07] `; beep` ha, a sign of a test suite that takes too long to run #;0~~ [17:07] fwereade__: i think frankban could reproduce these failures as well [17:09] dimitern, good point [17:09] frankban, am I right in thinking you were seeing the uniter test failure that rog disabled yesterday? [17:10] confrimed that's a regression due to r1044 on quantal [17:10] shall I just back the change out for now, as it's jam's eod? [17:10] fwereade__, dimitern: yes you are [17:10] fwereade__: 8 failures in uniter_test [17:10] frankban, would you try out lp:~fwereade/juju-core/fix-1157898 please? [17:10] fwereade__: ^ [17:10] fwereade__: sure [17:11] mgz, yes please [17:11] mgz, I'll write him a bug [17:11] mgz, hmm, should it be a bug? I'll just mail him [17:11] mail or note in the review should be fine I'd say [17:14] proposed, and will go ahead and submit [17:15] benji: please pull lp:juju-core [17:16] jam: merge lp:~gz/juju-core/backout_r1044 into your feature branch, revert+fix, and repropose [17:17] mgz: i doubt jam is around to do this at this time [17:17] he has the log :) [17:18] I don't expect him to do it till his next work day [17:18] (true, this kind of thing could also go to the list) [17:18] but that would be sunday, right? i think it's better to back out that now, since trunk is broken [17:19] dimitern, it is backed out :) [17:19] fwereade__: tests pass in your branch [17:19] frankban, sweet, tyvm [17:19] fwereade__: np [17:20] fwereade__: oh, ok :) sorry i missed this [17:24] fwereade__: i just saw this after 22 successful runs: http://paste.ubuntu.com/5634687/ [17:25] rogpeppe, that's outside the scope of what I did, do I think I'm going to propose it as is for now [17:25] fwereade__: sure. just so's you know :-) [17:28] rogpeppe: are you running the tests with dave's stress testing script? [17:28] dimitern: yeah [17:28] rogpeppe: cool! so it's as stable as it gets for now after 22 runs [17:29] dimitern: yeah [17:29] dimitern: it's still not good though [17:29] * rogpeppe says, with a serious face on [17:29] rogpeppe: why? have you seen the same issue again [17:30] dimitern: no, but any intermittently failing test is Bad [17:30] dimitern: even if it's "only" once every 22 times [17:30] rogpeppe: ah, sure :) but this one at least seems fixed [17:30] dimitern: yup [17:31] dammit, must dash: if anyone fancies https://codereview.appspot.com/7950043 I'd be really happy [17:31] fwereade__: will take a look [18:06] reminder to self: never interrupt bzr at work [18:17] g'night all [18:18] i have a few CLs up for review if anyone fancies taking a look; they all have kanban tickets [18:18] fwereade__: the trunk appears happy now, thanks === deryck is now known as deryck[lunch] === deryck[lunch] is now known as deryck [20:29] morning [20:36] fwereade__: don't suppose you are around? [21:47] * thumper wonders if there was anything else to email the list with... [22:01] thumper, heyhey [22:02] thumper, not really, but kinda [22:08] * thumper was confused there for a minute [22:08] fwereade__: I'm just wondering about testing the tools selection for start instance [22:08] fwereade__: also, tools are specified by bootstrap... [22:09] fwereade__: trying to finish this off before looking at the machine things we talked about last night my time [22:10] thumper, still thinking about the tests, but not quite following on bootstrap [22:11] well... you said that the tools weren't set in the start instance params [22:11] well, they aren't if coming from StartInstance, but are from Bootstrap [22:11] thumper, tools selection with --upload-tools should Just Work once we get rid of the weird multi-bucket fallback stuff in tools [22:12] thumper, assuming we set default-series and agent-version at upload time, anyway [22:12] thumper, ofc none of that is written yet [22:12] :) [22:13] thumper, but I think it's the stuff I was talking about in the saner bits of my last email [22:13] I'm not clear on why the multibucket stuff needs to change [22:13] fwereade__: which we do, except for it being mildly broken [22:13] fwereade__: by version.Current anyway [22:13] thumper, mainly because the falling back is deeply confusing to me [22:14] thumper, so long as we only use the first bucket that has any tools, I can figure it out quite easily [22:14] thumper, but the way bad matches from closer buckets beat good matches from distant buckets breaks my brain [22:14] hmm... [22:15] so, here is a question... [22:15] if I have a development version of tools I have uploaded [22:15] how does this interact with start instance? [22:15] when it is looking for tools [22:15] this bit still confuses me [22:16] but I could probably just talk with dave about that when he starts [22:16] if we just used the version defined in agent-version, we may not get my special uploaded tools [22:16] thumper, I *think* that if we do agent-version right, we can easily pick the tools right [22:17] thumper, but I confess to some uncertainty around the magic insertion of the dev-version flag [22:17] thumper, there might be a "development" field in env-config that comes into play somewhere [22:18] thumper, but I shouldn't really be getting into this now tbh [22:18] :) [22:18] thumper, if I'm still awake when nobody else is I'll swing by again [22:18] fwereade__: just sleep :) [22:18] thumper, if I don't, happy weekend :) [22:18] you too [22:18] long weekend for me [22:56] davecheney: morning [22:56] davecheney: I have a question (or two) [22:56] thumper: shoot mate [22:56] davecheney: apart from all those other questions on the email list [22:56] i haven't got through all the correspondence on the list yet [22:57] davecheney: https://canonical.leankit.com/Boards/View/103148069/104140367 is this done by the change I recently landed in worker/provisioner/provisioner.go line 264? [22:57] davecheney: in trunk as of this morning [22:57] gimme a sec [22:57] if so, yay, another thing done quickly [22:57] i suspect it is [22:57] the provisioner calls StartInstance [22:57] cool, I'll move it to done [22:58] please hold, confirming [22:59] thumper: LGTM, [23:00] * thumper holds (even though he has already moved the card) [23:01] * davecheney has much love for bzr log --show-diff [23:02] wow, 3 LGTMs on the logging bikeshed [23:02] fuck yeah [23:02] i'm gonna submit that before anyone changes their mind [23:05] :) [23:05] davecheney: I saw two, but thought I'd throw mine in too for good measure [23:06] wallyworld_: has a point [23:06] but I prefer cut -f for log splitting [23:06] so that is how I wrote it [23:06] I don't think wallyworld_ has a point [23:07] don't put the colon in [23:07] better without [23:07] given a defined timestamp, and a defined severity [23:07] it is trivial to get those two out, or ignore them if need be [23:07] adding a colon adds a character for no real benefit [23:07] okdie dokes [23:08] shit, all the lanes are full [23:08] better get cracking [23:08] one reason I did a few reviews :) [23:08] i'm going to move rogers api doc back out of review [23:08] * thumper heads out for lunch [23:08] back later [23:08] it's been in there since before atlanta === thumper is now known as thumper-lunch