[07:40] mornin' al [07:40] l [07:40] where *is* al? [08:30] wrtp, over the hills and far away ;p [08:31] TheMue, were there meant to be some prereqs on your CLs? Just did one, moved on to the next, and saw the same changes... [08:31] fwereade_ is back in Go-land, yay! [08:38] wrtp, yeah, and I can almost remember what the hell I'm doing ;) [08:38] fwereade_: bonus [08:40] fwereade_, wrtp: moin [08:40] TheMue: yo! [08:40] TheMue, heyhey [08:41] fwereade_: I would like you to not do something regarding those watchers. It's a sequence of branches based on each other. So I have to realize them bottom-up. [08:42] fwereade_: Base is the enhancement of the content watcher (will go in again in a few moments) [08:44] TheMue, ah, ok, sorry; I hope the comments on the one I did might be useful anyway [08:45] fwereade_: Sure they will. Which one is it? I've not yet scanned them all. [08:54] TheMue, sorry, got called away; it's https://codereview.appspot.com/6059047/ [08:59] fwereade_: So, read it, thx. You also prefer "treaten" insted of treated"? *smile* [09:00] TheMue, it's clearly superior ;p [09:00] TheMue, but the tyranny of the majority wins [09:00] fwereade_: Hehe, so maybe I should keep it. [09:01] fwereade_: The name "loop" for the goroutines is a result of the general discussion about watches here. In my own code I typically call it backend() [09:03] fwereade_: The idea of mooving all watches into an own file sounds good. I already have a watch_test.go [09:03] fwereade_: unit.go is already quite big. [09:05] fwereade_: What I don't get is the "force upgrade". Could you please explain? [09:05] Morning all [09:06] TheMue, heh, I wasn't closely involved, but: [09:06] (heya niemeyer) [09:06] TheMue, a requirement that units be upgradeable in a non-started state came up from somewhere [09:06] TheMue, you can now pass --force to juju upgrade-charm [09:07] TheMue, and the UA interprets that as "upgrade regardless of unit state" [09:07] niemeyer: moin [09:07] TheMue, you'd have to take a look at the code to see exactly how it's done [09:08] TheMue, lp:934350 [09:08] fwereade_: ok, thx for the hint [09:13] The point of ugprades in non-started state comes from the fact upgrades are often used to fix the problem [09:15] niemeyer: Good to know. Regarding the state I've yet only tested for the existence of the node for it in ZK. [09:16] niemeyer: Could you take a look at http://paste.ubuntu.com/936632/. I fould a compromise between your and mine approach to keep the watcher test code compact. [09:16] TheMue: The approach I suggested isn't really mine.. it's a very common way to structure tests [09:17] TheMue: http://code.google.com/p/go-wiki/wiki/TableDrivenTests [09:18] TheMue: IMO, this is still cleaner [09:19] TheMue: I'm not as religious as other people are.. some actually go overboard and attempt to use this scheme even when it's not really adequate for the test at hand, or could be simplified if different [09:20] TheMue: Your case, though, is a pretty legitimate table [09:20] niemeyer: OK, so I'll adopt them, even if I don't even find them more readable. [09:20] TheMue: Let me try to explain why it feels more readable then.. [09:20] TheMue: You actually iterate over a list of possibilities with some logic [09:21] TheMue: Instead of having the list of possibilities as a first class entity and having the logic iterating over that list, the test is inverted [09:22] niemeyer: Yes, here a table is ok, but later I also have to test for timeouts and closed channels. And the code needed here is due to the structure almost a duplication of the test code above. This is IMHO needless and cluttering. [09:22] TheMue: It has a function with logic displaced, and then manually iterates over the list of possibilities [09:22] TheMue: The table may be richer [09:23] niemeyer: Using an anonymous struct, yes. [09:23] TheMue: If you're iterating over a list of possibilities, it's still easier to read through if it is organized as a sequence of facts than if everything is mixed up with a stream of code [09:23] TheMue: It's a bit like having a higher level language to describe the test [09:24] niemeyer: How about the call of watcher.Stop() inbetween? [09:24] TheMue: Again, I'm not religious about it, and in many cases this makes things worse.. this looks like a good chance to use it, though [09:24] TheMue: Maybe that call should be out of the iteration logic? [09:26] niemeyer: That's what I meant above. There's iteration, then only one err assert, then again a test for timeout. The way I've tried it reads simply top-down (ok, you have to know the three arguments of the assertChange()). [09:27] TheMue: The fact you have a couple of lines afterwards isn't justification to not do an iteration as an actual iteration [09:28] TheMue: Note in my suggestion that the code within the loop is also simpler [09:28] TheMue: and so is the test after it [09:29] TheMue: The logic in the paste instead shoves a true/false pair that is unchanged over the whole iteration, and makes the loop more complex [09:29] niemeyer: Yes, the iteration for the (just) 4 values is fine, but later I need two receives with timeout. Now a one-liner, then 4 to 5. [09:30] TheMue: Those are handled in the suggestion I sent you [09:30] TheMue: With something as trivial as [09:30] select { [09:30] case <-time.After(N): [09:30] t.Fatalf("boom") [09:30] } [09:30] TheMue: That's much better than assert(false) [09:31] niemeyer: The new approach uses Fatal to show the timeout better. And still it's more code (and that twice) [09:32] TheMue: It uses fatal in a remote place that is disconnected from the actual logic because we've forced a loop into a sequence of equivalent function calls [09:32] niemeyer: But let my try the table approach and then we'll see which one looks cleaner. [09:32] TheMue: Thanks [09:33] TheMue: As a general comment, more code isn't necessarily bad.. there are other factors that have to be taken into account [09:34] niemeyer: Yes, readbility. And from my perspective the approach using a little helper is more readble. [09:34] TheMue: Ok.. I disagree, but I respect that [09:35] TheMue: Perhaps one of the reasons I feel different is that you have a comment there playing the role of code that should be readalbe [09:35] // Changes() has to be closed. [09:35] assertChange(false, nil, false) [09:35] select { [09:35] case ...: [09:35] c.Fatalf("Changes channel not closed") [09:35] } [09:36] .. [09:36] You need the comment because assertChange(false, nil, false), by itself, says nothing [09:37] niemeyer: Hehe, yes, got me. In Smalltalk I wrote "self assertChange: … isClosed: true hasTimout: false." [09:37] niemeyer: Here my former lang is more expressive. [09:39] TheMue: Indeed.. one might do the same with Python. The question is whether that'd be good or not. I the specific case we're debating, I still prefer dropping one layer because the helper function is unnecessary. [09:40] niemeyer: Maybe it's unnecessary, yes. But many unnecessary things are useful. *smile* [09:41] In other words, isClosed and hasTimeout are both layers that one must look elsewhere to understand. [09:41] They don't pull their weight. [09:45] early lunch, bbs [09:52] niemeyer: Here is the new one: http://paste.ubuntu.com/936664/ [09:56] TheMue: This comment can go away: [09:56] // Receive the four changes create, content change, [09:56] // delete and create again. [09:56] TheMue: It's explicit now.. [09:56] TheMue: This one too, for the same reason: [09:56] // No more changes, expect a timout. [09:57] TheMue: Same about the last one.. [09:59] niemeyer: So one sentence: Just remove all comments in the method. [09:59] TheMue: Sorry, I was reading through it, but yes.. this is the case.. no comments that duplicate the code please [10:02] niemeyer: I would at least for the two last selects prefer to explain what they are concentrating on. E.g. why it makes sense to have an empty case <-timeAfter(). A kind of c.Assert(ImHappyBecauseImCalled). [10:02] TheMue: It makes sense to have an empty time.After because we expect it to happen [10:03] TheMue: We don't need comments for that [10:04] niemeyer: OK, so et's hope that a new maintainer in several month will get it as easily too. [10:04] TheMue: If he doesn't get that easily, he shouldn't be working on juju [10:04] (or she) [10:04] niemeyer: *lol* [10:04] niemeyer: OK, good argument. [10:04] TheMue: one advantage of structuring this as a table is that it makes it easy to have different time outs for the expected-timeout vs not-expected-timeout case. [10:04] TheMue: As a counterexample, this is the kind of comment that makes sense to have in the test: [10:04] TheMue: http://paste.ubuntu.com/936679/ [10:05] TheMue: when you don't expect a timeout, it can be much longer because it doesn't interfere with the normal progress of the test [10:05] TheMue: Comments are not about what the code is doing.. they are about what we can't see [10:05] TheMue: the final timeout should be shorter. [10:05] wrtp++ [10:07] wrtp: The timeout could be an argument of a helper too. [10:07] ROTFL [10:07] I rest my case.. [10:07] lol too [10:12] So at least one success. ;) [10:13] Lunchtime ... [10:13] TheMue: Enjoy [10:53] niemeyer: so, https://codereview.appspot.com/6059044 is in [10:56] TheMue: LGTM, thanks! [10:58] TheMue: Fine, thx. Will submit and merge it to go on with next one. It's the ConfigWatcher in state.go. What do you think about Williams idea of moving all watchers of the state package to a file named watchers.go? [11:03] TheMue: Sounds fine, but please do it in a separate CL, or we'll lose all comments now [11:04] niemeyer: OK, will move the after all are in after an LGTM. [11:04] s/the/them/ [11:08] niemeyer: got all the changes prepped for moving to config based, and with mongodb auth in place - may be a short service interruption while the changes are applied (will let you know as and when) [11:08] mthaddon: Sounds great [11:08] mthaddon: Thanks [11:09] np [11:38] niemeyer: ok, config updates being applied now - may be service interruption for a bit, will let you know when done [11:41] mthaddon: Cheers [11:47] niemeyer: the appserver processes are running, but not responding on port 8080 to our nagios checks [11:48] niemeyer: here's some code diving before ocean diving -> https://codereview.appspot.com/6056047 [11:48] niemeyer: ah, I think this is because I've set api-addr: localhost:8080 [11:48] niemeyer: to listen on all interfaces, what should I set that to? 0.0.0.0:8080 ? [11:49] mthaddon: Yeah, that'll work [11:49] k, thx [11:52] TheMue: done :) [11:54] niemeyer: Hmm, no notification here. Looks good so that I can move on? [11:55] TheMue: Please check it out [11:55] TheMue: I forgot to submit somehow [11:55] TheMue: It LGTM, but there's something to change [11:56] niemeyer, I think I need some guidance re response on https://codereview.appspot.com/5756054/ [11:57] fwereade_: Sounds good.. I'll have some breakfast and we can discuss it [11:57] fwereade_: Just replied to the thread in the ML again, btw [11:57] niemeyer, swet, tyvm [11:57] mthaddon: All good there? Can I get some breakfast? :) [11:57] niemeyer: OK, thx. Will change that one point and then submit it. [11:57] niemeyer: yep, all looking good thx [11:57] niemeyer: go eat! [11:58] mthaddon: Cheers! Will run tests later [12:23] niemeyer: you need to tag the tomb package (or remote all the existing tags) [12:23] s/remote/remove/ [12:30] wrtp: Hmm, I believe the tags there are right [12:30] niemeyer: i don't see a go1 tag [12:30] niemeyer: i did go get -u and everything broke [12:30] wrtp: There isn't one.. that shouldn't be a problem, I believe [12:31] niemeyer: try it [12:31] niemeyer: it went back to release 12, i believe [12:32] [niemeyer@gopher ~]% GOPATH=$PWD/gopath go get launchpad.net/tomb [12:32] [niemeyer@gopher ~]% [12:32] get -u [12:32] [niemeyer@gopher ~]% mkdir gopath [12:32] and then try to build juju/go [12:34] niemeyer: BTW i'm seeing a store test failure: http://paste.ubuntu.com/936825/ [12:35] Will have to increase the timing then.. there's a race there due to the background coutning [12:35] counting [12:35] wrtp: Can you please try removing your local tomb package? [12:36] wrtp: The tags are right [12:37] niemeyer: that worked. maybe the go pull process didn't update the tags properly. [12:37] gotta go to lunch [12:38] wrtp: That's what I imagined [12:39] mthaddon: Oh, ouch.. forgot a detail [12:40] mthaddon: Can you please update the nagios check with stats=0? [12:49] mthaddon: Btw, did you really intend to hit the store about twice a second with the nagios check? [12:52] fwereade_: ping [12:52] niemeyer, pong [12:52] fwereade_: Heya [12:52] fwereade_: So.. [12:52] fwereade_: Quick experiment.. [12:52] fwereade_: Don't look at the code for a moment [12:52] niemeyer, ok :) [12:52] fwereade_: What has to happen for something to actually be logged? [12:53] fwereade_: In terms of the command setup [12:53] niemeyer, someone to call ctx.InitLogging? [12:53] fwereade_: More.. [12:54] niemeyer, the flags to have been set such that InitLogging does something? [12:54] fwereade_: More.. [12:55] niemeyer, someone else to call log.Printf/Debugf? [12:55] fwereade_: That too. [12:55] fwereade_: The log commands have to be provided, the command has to call InitLog.. Printf has to be called.. SetsLog has to be true.. [12:56] niemeyer, ok, but the breakage occurs at the point where someone calls InitLog [12:56] fwereade_: In your latest comment, you demonstrate a long chain of recommendations [12:56] niemeyer, that's the point at which the host process's logging setup is potentially borked [12:56] fwereade_: but all of them are actually going towards the same goal.. this is too simple a problem for the maze being put in place [12:57] fwereade_: I'm personally lost in that maze already.. I don't know why we have InitLog anymore, for example [12:57] fwereade_: I'm sure there's a good reason.. it's just left my consciousness [12:58] fwereade_: I'd appreciate if we could simplify what's there, rather than introducing additional flags that put logging behind yet another yes/no flag [12:59] niemeyer, it's a little bit lost in the fog of time for me too, but I seem to recall feeling that the code was confirming it was a good idea when the pile of imports in supercommand dropped significantly [13:00] niemeyer, I'm all about the simplicity, even if I don't always zero in on it perfectly without help ;) [13:01] fwereade_: Well, we're all guilty of displeasing the simplicity gods frequently :) [13:02] niemeyer, it does seem that logging and subcommand selection are essentially orthogonal problems that are only implemented in the same place because it has hitherto been convenient [13:02] niemeyer, and we're now in a situation where the one does not imply the other, and that therefore they should be broken up somehow [13:03] niemeyer, a whole new type feels a bit heavyweight; a flag and a bunch of ifs (I agree) feel a bit ugly [13:04] fwereade_: I think a new type that handles the part of the problem of a supercommand preparing a subcommand to might actually simplify things [13:04] fwereade_: and make that logic more readalbe [13:05] niemeyer, awesome; then I'll try that direction [13:05] fwereade_: It feels clever, and "saves a type" (?), but at the same time this isn't the first "Wait, what?" moment we reach with that logic [13:05] niemeyer, I was worried you might see it as wanton code raviolation [13:05] fwereade_: It may be time to at least check how that'd look like [13:06] fwereade_: I suspect it will be simple by itself, and will also make the supercommand implementation more obvious [13:06] niemeyer, yeah, I think so too [13:06] fwereade_: We basically need just a wrapper that handle half of what the supercommand does today, if I understand it well [13:06] handles [13:06] niemeyer: that's haproxy hitting it to confirm the service is up or not - is there a different URL we can use for that? [13:06] niemeyer, exactly; thanks for the guidance :) [13:07] mthaddon: Cool.. I'm not too concerned about the traffic by itself [13:07] * fwereade_ is rather proud of "raviolation" [13:07] mthaddon: At least not yet.. it's good that something is exercising a bit of load there [13:07] niemeyer: so in terms of updating the nagios check - you want stats=0 used how? [13:07] mthaddon: I just find unfortunate that we don't *actually* had 8000+ people interested in Jenkins in the last hour! ;-) [13:08] heh [13:08] mthaddon: This is an http query parameter [13:08] niemeyer: and still for the /charm/oneiric/jenkins URL? [13:08] mthaddon: stats=0 will disable the stats collection [13:09] mthaddon: Yeah [13:09] niemeyer: and we should update haproxy to use the same? [13:09] mthaddon: Yeah, please [13:09] 8.3k and counting, literally [13:09] k, thx [13:10] mthaddon: Then, let's clean up stats to avoid the bogus spike please [13:13] ok, I'll let you know when that's done and you can talk me through cleaning up the stats [13:20] TheMue: Any other branches for review? [13:21] niemeyer: I'm doing the one with the NeedsUpgradeWatcher. [13:22] niemeyer: You mentioned to refactor the test for the ConfigWatcher too. Here I have to think a bit. The ConfigNode has no public fields, only access methods. [13:23] niemeyer: So I'm thinking how I best can test it via tables. [13:25] TheMue: Thanks, no worries.. just wanted to make sure I wasn't missing something [13:25] niemeyer: Not yet, I'll notify you. [13:26] TheMue: i just got a test failure: http://paste.ubuntu.com/936882/ [13:27] wrtp: Oh, strange, here it has been ok (from inside the editor and from commandline just before submit). [13:28] TheMue: i'll try pulling and testing again [13:29] TheMue: i tried again (without pulling) and it succeeded. [13:29] wrtp: Uuuh, even more strange. [13:30] TheMue: it looks to me like that code is racy [13:30] oh, maybe not [13:30] wrtp: In method I use no goroutine [13:30] i wonder what value got received. [13:30] TheMue: WatchConfig starts a goroutine, no? [13:31] wrtp: Yes. I meant inside the test to start changing values later. [13:31] TheMue: i was thinking that we don't know exactly when the channel is closed. but it doesn't (shouldn't?) matter in fact. [13:33] TheMue: slightly concerning, but it seems to pass consistently now. [13:33] wrtp: Needs to get warm. *smile* [13:34] wrtp: I'm currently in that test file and refactor the tests. So please wait a moment if possible, than I can check it too. [13:36] wrtp: For tests you could also take and print the received change. There shouldn't been any one after the error. So I think the value will be nil, but the channel is still open. [13:36] TheMue: yes, the test could print the received value [13:36] wrtp: The 'illegal content error' to be more special. [13:40] wrtp: Otherwise with the race condition it's hard to say "Hey, we know the channel will close soon, but not exactly when.". *sigh* [13:40] TheMue: i don't think there's a race condition [13:40] TheMue: if a value is received, there's an error [13:41] TheMue: (because the watcher has sent a value when it should've) [13:44] wrtp: I've got an idea, but I'm not yet sure. I'll take a look during test refactoring. [13:59] mthaddon: Just run a quick test with the store, btw.. [13:59] mthaddon: Can you please mail me logs when you have a moment? [14:00] niemeyer: which logs are you interested in? there's very little in charmd.log [14:01] mthaddon: That was the one.. if it has little, that's good [14:02] niemeyer: nothing in there since March [14:02] mthaddon: Super [14:04] mthaddon: Hmm.. can you do me a quick favor? [14:05] mthaddon: How many processors do we have in the frontend servers? [14:05] niemeyer: each one is 8 cores [14:05] mthaddon: What else run in them? [14:05] niemeyer: they both run the full stack, apache/haproxy/appserver/mongodb [14:06] mthaddon: Super [14:06] mthaddon: Can you please export an env var [14:06] mthaddon: For charmd, specifically [14:06] mthaddon: export GOMAXPROCS=4 [14:09] niemeyer: like this? https://pastebin.canonical.com/64567/ [14:10] mthaddon: Yeah, thanks [14:10] mthaddon: I mean, I think.. [14:11] mthaddon: Not versed in the syntax there [14:11] sure, I'll test it first, just checking that looks right in the upstarts script really [14:11] mthaddon: Yeah [14:12] mthaddon: This will allow the runtime to spread the goroutine across more cores [14:12] goroutines [14:13] k [14:24] mthaddon: Any luck there? Just want to run another test before we reset the data [14:25] niemeyer: just filtering through our puppet infrastructure atm - will let you know when it's been applied (currently pending a review from another member of IS) [14:25] mthaddon: Wow, cool [14:26] but it "dry-ran" clean, so shouldn't be too long [14:55] mthaddon: I'm heading out for lunch.. I'd appreciate if we could deploy those changes and then do the cleanup today still so that we can start collecting stats for real [14:56] niemeyer: sure [15:26] niemeyer: https://codereview.appspot.com/6050053/ is in. [15:38] just gonna reboot. [15:55] niemeyer: changes all applied - let me know what needs to be done to clean up the stats [16:03] niemeyer: you said "the ssh identity is configurable" but it's not configurable from environments.yaml. should it be? [16:09] gn all, might be back a bit later [16:20] mthaddon: Can start with: use juju; db.stat.counters.count() [16:20] fwereade_: Night! [16:23] niemeyer: 288 [16:23] mthaddon: Nice [16:23] Well below the 40k+ actual counts made [16:24] mthaddon: So we just have to clean that up [16:24] db.stat.counters.remove() [16:24] ok, done [16:24] mthaddon: Thanks! [16:24] that's it? [16:24] mthaddon: We're good then [16:24] mthaddon: Yeah [16:24] sweet, that was easy [16:24] mthaddon: Wait [16:25] mthaddon: Almost.. there's something wrong.. [16:25] mthaddon: Something is still poking at the URL without the stats=0 setting [16:25] hmm, I wonder if haproxy strips the query string... [16:26] mthaddon: It souldn't, at least [16:26] mthaddon: 200+ since reset already [16:26] must be haproxy then - I'll take a look - may need to quote the URL or something [16:28] mthaddon: Is the 0 after = a zero? [16:28] mthaddon: I know it's obvious, sorry, but just to be sure [16:28] yeah, is definitely a zero [16:30] so, off for today, bank appointment. have a nice evening. [16:30] niemeyer: how are you checking the stats, just so I can confirm when I try a few possible fixes? [16:30] TheMue: Cheers [16:30] mthaddon: Sent [16:30] thx [16:31] mthaddon: Meanwhile, I'm checking to see if there's a bug in the code.. I had tests, but maybe the tests are broken [16:31] niemeyer: you've seen, next proposal is in and tomorrow I'll continue. [16:31] k [16:31] TheMue: Sounds great, thanks [16:32] niemeyer: np, and have fun at diving [16:33] TheMue: Cheers man.. will try to provide you some good feedback so you can move on tomorrow [16:33] niemeyer: thx, bye [16:33] hmm, according to http://code.google.com/p/haproxy-docs/wiki/httpchk, query strings are permitted, and they don't say anything about needing to quote them [16:35] mthaddon: I believe there's a bug in the code.. the test is probably not good enough [16:35] ah, okay [16:35] mthaddon: Will fix it and ping you [16:35] mthaddon: (even if you're not around) [16:36] niemeyer: cool cool - will pick up the ping in the morning if it's past my EOD and can do a rollout whenever [16:36] mthaddon: Sounds great.. please repeat the remove after the deployment [16:37] yep yep [16:37] mthaddon: So we can get meaningful numbers from then on already [16:37] * mthaddon nods [16:37] mthaddon: Thanks for your help today [16:37] np [16:38] Hah.. found the test bug.. the URL being pinged is wrong, which means we don't need stats=0 to not have counters.. :( [16:39] * niemeyer fixes it [17:14] [LOG] 38.27388 JUJU state: ssh error (will retry: true): ssh: Welcome to Ubuntu 11.10 (GNU/Linux 3.0.0-17-virtual i686) [17:14] ha ha [17:15] rog: :-) [17:17] niemeyer: i can't see how the python version stops the ssh session from going into interactive mode [17:18] rog: That's what one of the flags do [17:18] niemeyer: i'm using all the same flags, i think [17:20] niemeyer: for the record: "ssh" "-T" "-o" "StrictHostKeyChecking no" "-o" "PasswordAuthentication no" "-L" "localhost:37722:localhost:2181" "-p" "22" "ubuntu@ec2-174-129-123-190.compute-1.amazonaws.com" [17:20] rog: -N [17:21] niemeyer: ah, thanks. the python version doesn't seem to use that flag though. strange. [17:22] rog: Maybe it's doing the unthinkable.. :-) [17:22] niemeyer: maybe it's ignoring stdout. [17:23] rog: Right, maybe it *is* sitting idle with a shell.. which would be quite funny [17:23] niemeyer: that would probably explaining it. [17:23] niemeyer: yes, i think that's what must be happening. lol. [17:27] niemeyer: funny, adding that option triggered a "Permissions 0664 for '/home/rog/src/go/src/launchpad.net/juju/go/state/sshtest/id_rsa' are too open." error. ssh *is* baroque. [17:28] rog: Woah [17:28] rog: Vhat geev [17:29] * rog hopes that bzr respects permissions. [17:29] rog: Uh oh.. it doesn't [17:29] rog: +x only [17:29] guess i'll add a chmod then [17:29] rog: Yeah, the charm tests do something like that already [17:29] rog: on init(), IIRC [17:31] rog: Would you mind to have a quick look at this: https://codereview.appspot.com/6072046 [17:32] sure. [17:32] niemeyer: BTW all tests just passed, using the ssh connect. [17:32] rog: Woohay [17:32] which is highly satisfactory! [17:32] rog: I can imagine [17:32] I'm excited myself [17:33] This is a critical roadblock [17:33] rog: Btw, I'll reduce the 5 back to 2.. that wasn't the issue [17:34] niemeyer: ECONTEXT [17:34] rog: The CL [17:34] oh yeah, i see [17:34] rog: I've changed the timing there to attempt to cause an error, but I'll put that back [17:34] :3 lol [17:34] rog: Seriously.. [17:35] sorry i missed that! [17:35] rog: The best compilers wouldn't catch that.. [17:35] rog: Sorry that I wrote it! [17:35] :-) [17:35] that's why we have reviewers. i failed! [17:36] rog: What's funny is that this is *exactly* a case where there was a non-obvious bug on the other side [17:36] niemeyer: i think the basic problem was that the test was testing for the *absence* of something. [17:36] A pathetic test when we most need it [17:36] niemeyer: (which it still does AFAICS) [17:36] rog: True [17:36] rog: yeah, but it works now [17:36] lol [17:36] until something changes [17:37] rog: Well.. there's nothing we can do about *that* :-) [17:38] niemeyer: the test should really try an action that does something, test that the something succeeded, *and* check that no stats were taken [17:38] rog: That's exactly what it does right now [17:38] rog: Well, it's not testing that it succeeded [17:38] rog: That's a good idea [17:38] niemeyer: yeah [17:39] niemeyer: how come the GET of /charms/xxx was succeeding, BTW? [17:39] rog: It wasn't.. [17:40] oh yeah, that was just the NewRequest error check [17:40] so... that's my feedback [17:40] rog: Cheers.. already on the way [17:42] rog: Updated [17:45] niemeyer: LGTM [17:45] rog: Cheers [17:45] rog: Was a good hint, thanks [17:45] niemeyer: pleasure [17:53] mthaddon: Code is good to go on 122.. [17:53] Taking a short break === rog is now known as Guest27265 === Guest27265 is now known as rogpeppe [18:00] niemeyer: https://codereview.appspot.com/6074044/ [18:52] rogpeppe: Cool, I'm sitting down on them now [18:52] niemeyer: lovely, thanks [18:53] rogpeppe: re. the "just in case of what", that was to close the channel [18:54] rogpeppe: errorc [18:57] niemeyer: why would closing it be a good thing? [18:59] rogpeppe: You're right, looking at how it's used there isn't a good reason indeed [18:59] rogpeppe: Should -N be added to the list of args? [19:00] it is... i thought! [19:00] oh, of course, that's in the next CL [19:00] can we leave it until the next CL? [19:00] it doesn't affect the tests [19:00] niemeyer: ^ [19:01] rogpeppe: Yeah, that's fine [19:20] rogpeppe: Going through the testing right now, btw [19:20] niemeyer: cool. [19:32] rogpeppe: Btw, looking at the issue people are having with the stock ssh package, our decision seems to be paying off [19:32] niemeyer: yeah. [19:32] niemeyer: plus, i hope we can move away from ssh entirely before long [19:33] rogpeppe: Indeed [19:33] niemeyer: i'll be very happy when i delete this code :-) [19:33] rogpeppe: Me too :-) [19:35] rogpeppe: At bed last night I had that crazy thought that replacing zookeeper with mongodb might not be so hard and may make certain things easier on the transition to an HTTPS API [19:35] niemeyer: does mongodb do non-polling notifications? [19:35] rogpeppe: Yeah, there's a little used trick for that, with capped collections [19:36] rogpeppe: One can tail a capped collection, basically [19:36] niemeyer: have you got a reference? [19:36] i'm not at all familiar with mongodb [19:36] rogpeppe: http://godoc.labix.org/mgo#Query.Tail [19:36] I hope that exists.. /me clicks :) [19:36] Yeah, it does [19:37] hmm, sounds a bit like a hack to me [19:37] rogpeppe: Wow, that was quick :) [19:37] niemeyer: seems like we'd be building functionality on top of a feature that really wasn't designed to be used that way [19:38] rogpeppe: I don't know what you mean.. tail was used exactly for that use case [19:38] s/used/created/ [19:38] "Capped collections are not shard-able". is that a problem? [19:39] niemeyer: perhaps it doesn't mean what i think it means [19:39] rogpeppe: No.. we'd definitely not use shards [19:39] rogpeppe: shard != replica set [19:39] niemeyer: ah [19:40] niemeyer: so you could have capped collection with a size of 1 and it would be roughly equivalent to a zk watch? [19:40] rogpeppe: We'd probably use something like 4 or 8k collections, and watch on specific patterns [19:41] rogpeppe: I *think* I can recreate exactly the same API we have in the zookeeper package on top of mongodb with some ease [19:41] niemeyer: but... what would we gain? [19:41] rogpeppe: Quite a few things [19:41] - Non-RAM storage.. means we can drop the use of S3 for several things [19:42] niemeyer: that would indeed be nice. [19:42] - SSL and authentication [19:42] - Automatic join and departure of cluster members (zk doesn't do that!) [19:42] niemeyer: that would be irrelevant if we moved to an HTTPS API [19:43] rogpeppe: Not entirely.. we can move to HTTPS step by step.. [19:43] rogpeppe: We could switch over to mongodb first, without an HTTPS api, just by reimplemneting the zk package [19:43] rogpeppe: This is an easy switch over.. no changes to the state package [19:44] niemeyer: so you think you can create a "mongo-zk" package that looks almost exactly like the zk package? [19:44] rogpeppe: Right.. at least almost-exactly-enough to our purposes [19:44] niemeyer: that would be quite a cool thing [19:44] rogpeppe: So very low risk [19:45] Another one: [19:45] - Look ma', no Java! [19:45] :) [19:45] niemeyer: now yer talkin'! [19:45] niemeyer: is your mongodb client pure go? [19:45] rogpeppe: Yeah [19:45] niemeyer: another benefit then: no cgo! [19:46] Yep! [19:46] Although that hasn't been much of an issue in practice [19:46] (Java is) [19:46] niemeyer: so... you'd use a collection for each zk node? [19:47] rogpeppe: No.. a single collection to the whole FS structure, indexed on path [19:47] niemeyer: ah, the mgo query defines the watch [19:47] rogpeppe: Right! [19:48] rogpeppe: and the watched collection would be a separate one [19:48] rogpeppe: fs.data and fs.events for example [19:49] niemeyer: right [19:49] niemeyer: how do you choose a good cap size? [19:49] niemeyer: for fs.events, that is [19:50] rogpeppe: In practice, given what watches means for us today, pretty much anything non-absurdly small would do [19:50] rogpeppe: They must be large enough for the application to "breath" and still not miss events [19:52] niemeyer: yeah, because historical events don't matter of course. [19:52] rogpeppe: We could pick something like 1M for example, and be pretty sure to never worry except perhaps on absurdly large cases [19:52] rogpeppe: Right [19:57] rogpeppe: Funny enough, we might actually avoid the polling we're introducing in the presence package with this mechanism [19:57] rogpeppe: Since the watching is less ephemeral than with zk proper [19:58] niemeyer: i don't *think* so [19:58] niemeyer: because we want to know when a client goes away [19:58] niemeyer: so we *want* some kind of ephemerality [19:58] niemeyer: which is what the presence package gives us [19:59] rogpeppe: Yeah, indeed.. we still need to update the note [19:59] nod [19:59] node [19:59] niemeyer: but at least we *have* moved away from dependence on ephemeral nodes [19:59] niemeyer: which is something you couldn't easily emulate, i think. [20:00] niemeyer: so your event collection is just a set of (path, operation) tuples, right? [20:00] niemeyer: and Tail is guaranteed to return events in order? [20:01] rogpeppe: Yeah [20:01] (to both) [20:02] rogpeppe: capped collections + tail was implemented to support the operation log of MongoDB itself [20:02] rogpeppe: Replication is based on that [20:02] rogpeppe: So we can be pretty comfortable that they'll pay attention to the feature working well [20:02] niemeyer: of course, another plus point - we can use mongodb to implement juju logging [20:03] rogpeppe: True [20:03] rogpeppe: Not only juju, actually.. we might stream log for all the machines [20:03] niemeyer: definitely. i've been wanting to do that. [20:04] Although, we should be careful with that [20:04] niemeyer: indeed - we could swamp the network [20:04] niemeyer: but for debugging purposes, i think it's crucial [20:04] rogpeppe: I wouldn't say the network, but the mongodb master for sure [20:04] niemeyer: ah [20:06] i'd be sorely tempted to streamline the "zk" API when redoing it [20:08] rogpeppe: Well, that's trivial to do after we're done migrating [20:08] rogpeppe: I mean, I'd be happy to have it improved too [20:09] rogpeppe: But would try to mimic it precisely first [20:09] rogpeppe: So we can play with the idea in a risk less fashion until we're certain we want to flip [20:09] rogpeppe: After flipping, we have no great attachment to it [20:09] rogpeppe: Review delivered! [20:10] niemeyer: yeah, you're right [20:10] niemeyer: brilliant, thanks! [20:14] niemeyer: i expected more nasties there. thanks a lot. will fix tests in the morning. time to go now. [20:14] niemeyer: i expect to see a working version of mongo-zk by then :-) [20:15] rogpeppe: Haha :) [20:16] rogpeppe: Given the problem, I quite like the approach [20:17] rogpeppe: I'd prefer to not have any of it, as you would [20:17] rogpeppe: Let's just try to make that stuff solid meanwhile, and as clean as possible [20:17] rogpeppe: In that sense, your branch feels like a win. Thanks.