/srv/irclogs.ubuntu.com/2012/02/24/#juju-dev.txt

TheMue	re	08:25
niemeyer	Good morning all!	09:12
TheMue	niemeyer: hiya	09:18
fwereade__	heya niemeyer, you're on early :)	09:30
niemeyer	fwereade__: Yeah, a bit :)	09:30
* rog finally makes his network connection work again		09:46
niemeyer	rog: Welcome back	09:47
rog	niemeyer: hi!	09:48
rog	niemeyer: you're up earlier than usual...	09:49
TheMue	rog: hiya	09:49
rog	TheMue: morning!	09:49
niemeyer	rog: Yeah, a bit	09:50
rog	niemeyer: any chance you could have another look at those CLs? i'm hoping to get something pushed this week.	09:50
niemeyer	rog: Not right now	09:52
rog	niemeyer: k	09:52
fwereade__	niemeyer, btw, there was a conversation the other day I think you missed about gozk panicing in C when a Conn is closed (and something else is busy in C code)	09:55
niemeyer	fwereade__: Ugh, that's not good	09:55
fwereade__	niemeyer, would fixing this just be a matter of a RWMutex on handle accesses, or is it more subtle?	09:55
niemeyer	fwereade__: Well.. I'd need some more information about the crash	09:55
fwereade__	niemeyer, I've seen it in a couple of cases, but the common feature is that some goroutine is always somewhere like: launchpad.net/gozk/zookeeper._C2func_zoo_wexists(0x36edf80, 0x36ef380)	09:57
niemeyer	fwereade__: Ok, do you have a full traceback?	09:57
fwereade__	niemeyer, here's the one I just saw	09:58
rog	niemeyer: i had a brief look at this - it just happens when zk derefs the nil handle, because handle access isn't mutexed, as fwereade__ suggestes	09:58
fwereade__	niemeyer, http://paste.ubuntu.com/855178/	09:58
rog	suggests	09:58
fwereade__	niemeyer, the set line shouldn't be the problem, because that conn is not closed	09:59
niemeyer	fwereade__: Is this looping while you're calling Close on the zk Conn?	10:01
fwereade__	niemeyer, yes	10:01
niemeyer	fwereade__: That's going back to the point we talked about	10:02
niemeyer	fwereade__: We shouldn't do that	10:02
niemeyer	fwereade__: We shouldn't ever close a connection behind the back of logic that we're managing ourselves	10:02
fwereade__	niemeyer, ok, maybe I'm missing something	10:03
niemeyer	fwereade__: That's true for tests, and also for real logic	10:03
niemeyer	fwereade__: We can even implement locking to prevent the nil pointer reference at some point, but that's not the most critical issue	10:04
niemeyer	fwereade__: Ok, but do you recall that conversation?	10:04
fwereade__	niemeyer, ok, in that case I guess the tests I was trying to write are not actually relevant, because I'm really trying to test deal-with-session-event handling that doesn't exist yet	10:04
niemeyer	Feb 22 12:09:11 <niemeyer> fwereade_: We should never, ever, ever, leave background logic unattended while we're messing around with the system state in disruptive ways	10:05
niemeyer	fwereade__: Hmm, can you expand on that?	10:05
fwereade__	niemeyer, yes, I do; the reason I saw this one just now is because all I thought I had active on that conn was a watch	10:05
fwereade__	niemeyer, I was trying to verify that the watch channel got closed when the underlying ZK connection got yanked out from underneath	10:06
niemeyer	fwereade__: Ok, that's an interesting case.. maybe we have something to fix then	10:06
niemeyer	fwereade__: Can you please paste waitFor?	10:06
fwereade__	niemeyer, http://paste.ubuntu.com/855189/	10:07
niemeyer	fwereade__: Sorry, I thought that was where the loop was.. TestDisconnectAliveWatch then, I guess	10:09
niemeyer	Ah, this is the guy blowing up: launchpad.net/gozk/zookeeper._C2func_zoo_set2(0x0, 0x36ef360)	10:10
fwereade__	niemeyer, yep	10:10
niemeyer	fwereade__: Please paste the test and (*Pinger).run	10:10
fwereade__	niemeyer, before I paste this: I know the stuff in the goroutine comment is wrong :p	10:10
fwereade__	niemeyer,	10:10
fwereade__	http://paste.ubuntu.com/855193/	10:10
niemeyer	fwereade__: Cool :)	10:11
fwereade__	niemeyer, http://paste.ubuntu.com/855195/	10:11
fwereade__	niemeyer, wait, you're right, it is set and not exists, bugger, I'm at least partially on crack (in a way that I was not hitherto aware of)	10:13
niemeyer	fwereade__: You know it's wrong in which way?	10:13
fwereade__	niemeyer, "this should be equivalent"	10:13
fwereade__	niemeyer, ie, what we should actually be doing is not just closing the conn until we've dealt with everything	10:16
fwereade__	niemeyer, but if it is indeed the set that's blowing up, anmd it looks like it is, then I'm completely confused because that's using a different conn to the one that gets closed	10:17
niemeyer	fwereade__: Not really	10:17
niemeyer	fwereade__: Oh, wait.. yes	10:17
fwereade__	niemeyer, so I would appear to be crackful in an entirely fascinating and original way :/	10:18
niemeyer	fwereade__: But!	10:18
niemeyer	fwereade__: Please paste Close	10:18
niemeyer	fwereade__: I bet you have a race	10:18
fwereade__	niemeyer, oh balls, I try to hit the target node again	10:18
fwereade__	niemeyer, (to give clients as long a time as possible before they see a timeout)	10:19
fwereade__	niemeyer, (naively trusting that if the connection is borked it'll just error and I can ignore it)	10:19
niemeyer	fwereade__: That's fine	10:19
niemeyer	fwereade__: But a race is not fine.. please paste Close	10:20
TheMue	Interesting, http://cocode.io/beta/ will provide collaborative coding. It's written in Go knows Go as a highlighted lang. Sadly yet only GitHub, no Bazaar (or others).	10:21
niemeyer	<niemeyer> fwereade_: That's fine	10:22
niemeyer	<niemeyer> fwereade_: But a race is not fine.. please paste Close	10:22
niemeyer	fwereade_, fwereade__: Are you having wifi issues too?	10:23
niemeyer	fwereade_, fwereade__, fwereade: Are you having wifi issues too? :-)	10:23
fwereade	niemeyer, seems so :p	10:23
fwereade	niemeyer, http://paste.ubuntu.com/855206/	10:24
fwereade	niemeyer, if you didn't see above	10:24
niemeyer	fwereade_, fwereade__, fwereade: I always knew you were not a single person..	10:24
niemeyer	fwereade: Yeah, you have a race indeed	10:24
niemeyer	fwereade: Can you please have a look at this post: http://blog.labix.org/2011/10/09/death-of-goroutines-under-control	10:25
fwereade	niemeyer, I'm a globally distributed collection of outsourced drones, but don't tell anyone	10:25
niemeyer	fwereade: I'll highlight a few things afterwards, if they're not obvious	10:25
fwereade	niemeyer, cool, thanks	10:25
* fwereade reads		10:25
rog	niemeyer: where's the race?	10:27
fwereade	niemeyer, I remember reading that when you wrote it, and totally forgot about it until now	10:28
fwereade	niemeyer, it seems clear, but I'd appreciate the highlighting all the same	10:28
rog	oh yeah, in TestDisconnectAliveWatch. i was looking in the pinger methods	10:31
fwereade_	grar	10:36
niemeyer	fwereade_: Are you here? :)	10:37
fwereade_	niemeyer, er, probably	10:38
niemeyer	fwereade_: Ok :)	10:38
niemeyer	fwereade_: So, quicklY! :-)	10:38
niemeyer	fwereade_: Observe the way Stop is implemented in the blog post	10:38
niemeyer	fwereade_: It sends a fatal error, and waits	10:39
TheMue	niemeyer: next round of https://codereview.appspot.com/5671055/	10:39
niemeyer	TheMue: Thanks, I'll have a look at that next to rog's	10:39
TheMue	niemeyer: yep, thx	10:39
niemeyer	fwereade_: You're sending a stop signal, and ignoring the fact that, naturally, stopping isn't instantaneous	10:40
fwereade_	niemeyer, yeah; I had a "closed" channel I was waiting on originally, but it was pointed out that since closing was unbuffered I wasn't getting any benefit from waiting on closed after sending to closing	10:40
niemeyer	fwereade_: Uh.. this is bogus	10:41
niemeyer	fwereade_: Oh, or maybe not.. let me see again	10:41
rog	niemeyer: it looked ok to me :-)	10:41
fwereade_	niemeyer, I'm not saying there weren't subtle bugs in what I was doing, or in what I am doing ;)	10:41
rog	niemeyer: i hope it's not bogus otherwise my understanding of channels is fundamentally flawed...	10:42
niemeyer	rog: No, you're right.. as long it's unbuffered, that's fine	10:43
rog	phew	10:43
niemeyer	fwereade_: The race is in the test itself	10:44
rog	yup	10:44
niemeyer	fwereade_: No, sorry.. crack again..	10:44
niemeyer	altConn is in a different end	10:45
niemeyer	Hmm	10:45
rog	niemeyer: isn't is a race that waitFor(c, altConn, path) is called concurrently with altConn.Close() ?	10:46
rog	(not that i've looked at waitFor)	10:46
niemeyer	rog: No.. the pinger is blowing up with a nil handle	10:47
niemeyer	Despite anything else, there's a race here in that the handle is being cleaned up underneath its feet	10:48
rog	niemeyer: that still looks like a race to me	10:48
niemeyer	rog: :-)	10:48
rog	niemeyer: even if it's not the thing that's blowing up	10:48
niemeyer	rog: If the logic in the test is sane, the Close won't happen before the kill	10:49
niemeyer	rog: if..	10:50
rog	niemeyer: oh yes, i hadn't seen connect - i was assuming that the session event was the connection event, not the teardown event.	10:51
rog	but presumably connect waits for the first session event	10:51
fwereade_	rog, yes	10:51
niemeyer	rog: That's what I'm assuming too	10:51
rog	fwereade_: how reproducible is this?	10:56
fwereade_	rog, not very :(	10:56
fwereade_	rog, it'll happen eventually	10:56
rog	fwereade_, niemeyer: AliveW is busy on altConn at the same time that altConn is closed, right?	11:00
rog	so that looks like a race to me	11:00
fwereade_	rog, yes; so that Close could happen at any time, and that's definitely a problem	11:01
rog	i.e. the server gets killed, the goroutine receives the session event and closes altConn, at the same time AliveW happens to do a Change	11:01
fwereade_	rog, there are similar tests that really are only watching when the conn gets closed, and they're fine; yes	11:02
niemeyer	rog: If that's the case, then the connection is being Closed before kill()	11:02
rog	niemeyer: i don't think so	11:02
rog	niemeyer: the kill happens, then the close happens, then AliveW does something	11:03
niemeyer	rog: AliveW is run before kill..	11:03
rog	niemeyer: AliveW has a goroutine	11:03
fwereade_	niemeyer, I guess it's not impossible that there's another session event confusing me	11:03
niemeyer	rog: OH, does it?	11:03
niemeyer	Why?	11:03
niemeyer	To transform the event, I guess	11:03
rog	niemeyer: yup	11:03
niemeyer	But it should be listening only	11:03
niemeyer	On a channel.. that's not supposed to blow up	11:03
niemeyer	fwereade_: Can you please paste the whole file, or just push the branch?	11:04
rog	yeah	11:04
fwereade_	niemeyer, sure	11:04
fwereade_	niemeyer, ~fwereade/juju/go-presence-nodes	11:04
fwereade_	niemeyer, when we're watching for aliveness we can't just dumbly watch a channel	11:05
fwereade_	niemeyer, we need to keep rewatching when we get changes before timeouts	11:05
niemeyer	fwereade_: Ho ho ho	11:05
fwereade_	niemeyer, technically, above, when we're watching for deadness	11:05
fwereade_	niemeyer, have I misunderstood something fundamental?	11:06
niemeyer	fwereade_: I'm just curious about the rewatching.. that may explain it	11:06
fwereade_	niemeyer, (also we may need to rewatch when waiting for liveness too, it's a bit of an edge case, but...)	11:06
niemeyer	fwereade_: Yeah, there's a race there..	11:07
niemeyer	fwereade_: You're firing goroutines that use zk in a way that closing the connection will ignore completely	11:07
fwereade_	niemeyer, huh, I thought that closing the conn would close all the watches, and I'd see that	11:08
rog	niemeyer: i don't think there's a way to do it without either having a way to Close an AliveW or by making zk.Close concurrent-safe	11:08
rog	niemeyer: personally, i think i'd opt for the latter	11:08
rog	niemeyer: (after all, all the other routines are ok to call concurrently)	11:08
rog	s/routines/zk methods/	11:09
niemeyer	rog: Yes, there may be something to change in gozk, but there's something to be handled in presence itself that I'm trying to understand	11:10
rog	niemeyer: presence seems ok to me, but i'm probably missing something	11:11
niemeyer	fwereade_: Why do we need awaitDead and awaitAlive to loop?	11:11
fwereade_	niemeyer, awaitDead to refresh the data watch until I get a timeout without the data watch having fired	11:12
niemeyer	fwereade_: Ah, right.. it's the opposite.. it fires when it doesn't get a watch	11:12
niemeyer	Hmm	11:12
fwereade_	niemeyer, awaitAlive to handle the case where a known-unresponsive node is deleted -- it's still dead but not in the same way and IMO I shouldn't really alert the client to say "yeah, nothing changed"	11:12
niemeyer	Well, looks like we need our Watcher type back..	11:14
rog	niemeyer: if zk Close was concurrent-safe, everything would be fine	11:15
niemeyer	So that we can stop this logic in a sane way	11:15
rog	niemeyer: just like closing a net.Conn	11:15
rog	niemeyer: then the logic just stops as a matter of course	11:15
niemeyer	rog: What do you mean by concurrency safe?	11:16
rog	niemeyer: i mean that you could Close a zk conn concurrently with executing some other operation on the conn	11:16
fwereade_	niemeyer, it feels like one of the advantages of gozk is that the watches can just get auto-closed, and it would be really nice if we could follow the same convention here	11:16
fwereade_	rog, nitpick, technically it's all the other methods that aren't concurrent-safe ;)	11:16
niemeyer	rog: That's not enough	11:17
rog	fwereade_: they're concurrent-safe with each other...	11:17
rog	niemeyer: no?	11:17
niemeyer	fwereade_: Exactly	11:17
niemeyer	rog: What fwereade_ said	11:17
rog	niemeyer: i don't understand. wouldn't a mutex around the handle access be acceptable?	11:17
fwereade_	niemeyer, it still seems to me that a RWMutex on handle around C calls would be enough; what am I missing?	11:18
rog	niemeyer: (every method would have to respect it, of course)	11:18
niemeyer	rog: This is something else, not what you originally said	11:18
rog	niemeyer: that's what is necessary to make it ok to call Close concurrently, AFAICS	11:18
rog	niemeyer: which is what i was suggesting	11:18
niemeyer	rog: You're still missing the point..	11:18
fwereade_	niemeyer, so am I I think	11:19
niemeyer	rog: It's not just about concurrency	11:19
* rog usually does		11:19
niemeyer	rog: Close(); Set(foo).. BOOM!	11:19
rog	niemeyer: no.	11:19
rog	niemeyer: not if Set checks to see if handle is nil and then returns an error	11:19
niemeyer	rog: Dude.. please make up your mind	11:19
fwereade_	niemeyer, wouldn't we just grab a read lock, check for nil, and error back cleanly?	11:20
rog	fwereade_: exactly	11:20
niemeyer	fwereade_: YES!	11:20
niemeyer	fwereade_: We'd just do that.. that's not what was being said so far!	11:20
fwereade_	niemeyer, ha, I thought it was	11:20
niemeyer	<niemeyer> rog: What do you mean by concurrency safe?	11:20
niemeyer	<rog> niemeyer: i mean that you could Close a zk conn concurrently with executing some other operation on the conn	11:20
rog	niemeyer: i just said "make Close concurrent-safe". that doesn't imply that Close is the only method that needs to change.	11:20
niemeyer	rog: This is not about concurrency! Serial operations blow up!	11:21
rog	niemeyer: well, that's a different (although related) issue.	11:21
fwereade_	niemeyer, Close might block for a while (or *possibly for ever if enough is going on...) but that seems preferable to unrecoverable panics in C if if a hamfisted amateur like myself is trying to use the library	11:21
niemeyer	rog: Yes, it's the issue we're seeing	11:21
rog	niemeyer: i think we're seeing an issue because of concurrency, no?	11:22
niemeyer	<niemeyer> rog: This is not about concurrency! Serial operations blow up!	11:22
rog	niemeyer: is that the case in this example? i thought we were doing things in two separate goroutines	11:22
niemeyer	<niemeyer> rog: This is not about concurrency! Serial operations blow up!	11:23
niemeyer	rog: You can inline.. Close(); Set(); BOM!	11:23
fwereade_	rog, niemeyer: sorry, lamb chops are hot on the table, back shortly :)	11:23
rog	niemeyer: that's not what's happening in this issue though, right?	11:24
niemeyer	rog: Yes, it is	11:24
rog	niemeyer: in the same goroutine?	11:24
niemeyer	rog: No.. the fact there are two goroutines is completely irrelevant	11:24
rog	niemeyer: to my mind, it's not. if we were in a single goroutine it would be easy to avoid doing an operation after Close.	11:25
rog	niemeyer: it's the concurrency issue that makes it harder	11:25
niemeyer	rog: Sorry, but a "concurrency issue" is something else than what you have in mind	11:26
rog	niemeyer: ok, fair enough	11:26
rog	niemeyer: so...	11:26
niemeyer	rog: You can put a lock across every single function in gozk, and it will still blow up	11:26
rog	niemeyer: only if they don't check that handle is not nil before calling into C, no?	11:26
niemeyer	rog: Yes, and that's not a concurrency issue.. this is about using a connection after it's been closed.	11:27
niemeyer	rog: and make that not panic	11:27
rog	niemeyer: agreed. but just doing that won't fix our problem either.	11:27
rog	niemeyer: because then there would be a race.	11:27
niemeyer	rog: Sure, because you can't trust on the test for the handle to be valid without a lock	11:29
rog	niemeyer: exactly	11:29
niemeyer	fwereade_: So..	11:37
rog	niemeyer: i think a RWMutex would probably do the job just fine	11:37
rog	on zk.Conn, that is	11:37
niemeyer	fwereade_: You're right, we must definitely fix gozk itself so that we protect from that kind of issue	11:38
niemeyer	fwereade_: I'm just still wondering if we should change the interface for that	11:38
niemeyer	fwereade_: I guess it's fine to keep it running it in background, and simply error out, given that this is purely a read operation without any side effects	11:38
rog	niemeyer: that was my thought.	11:40
niemeyer	fwereade_: Btw, I'd prefer to not have the (s state) as a parameter to the await functions	11:40
niemeyer	fwereade_: There's actually a bug in that regard in the current logic	11:41
niemeyer	fwereade_: No, there isn't, sorry	11:42
niemeyer	fwereade_: You're passing the parameter as a copy, which is safe	11:42
niemeyer	fwereade_: Still, I don't see why the parameter is necessary	11:42
rog	niemeyer: it saves passing the path and timeout parameters as separate arguments	11:43
rog	niemeyer: but i'm with you	11:44
niemeyer	rog: Yeah, I think the organization there needs some tweaking for clarification	11:47
rog	niemeyer: i wondered if things might be clearer if there was an internal watcher type which held the params that are being passed around.	11:51
niemeyer	rog: I was wondering about that too, but this is makes it seem like there wouldn't be much benefit compared to what is there now:	11:52
niemeyer	s, zkWatch, err = newStateW(conn, s.path)	11:52
rog	niemeyer: s, err := w.getStateW() ?	11:52
niemeyer	rog: The only bit that is actually constant is the path	11:52
rog	yeah	11:53
niemeyer	rog: Right, w is the path..	11:53
niemeyer	rog: Which renders fwereade_ design elegant	11:53
rog	it was the other methods i was really thinking about (waitAlive and waitDead)	11:53
niemeyer	rog: Right.. as methods, they would have to live on a type that is simply the path	11:54
niemeyer	rog: Because timeout and aliveness changes	11:54
rog	niemeyer: that's fine. they can still be fields in the waiter type. it will change when they change.	11:55
rog	niemeyer: (it's not used concurrently)	11:55
niemeyer	rog: There's no point in having them as fields.. it's only used in that one function	11:55
rog	maybe "node" might be a better name for the type	11:55
niemeyer	rog: Yeah, path :)	11:56
niemeyer	rog: type node string, where node is the path	11:56
niemeyer	rog: That won't improve much what fwereade_ has there now, though. The design feels sound on that level	11:57
* rog is thinking about it		11:57
niemeyer	rog: Well.. I guess we could have the connection on it too	11:57
niemeyer	Which would make it more interesting	11:58
rog	niemeyer: yeah	11:58
niemeyer	type node string	11:58
rog	niemeyer: and the watch channel to send to	11:58
niemeyer	func (n node) readState() (alive, timeout, error)	11:58
niemeyer	func (n node) readStateW() (alive, timeout, watch, error)	11:58
rog	func (n node) get() error	11:58
rog	sorry	11:58
rog	func (n *node) get() error	11:58
niemeyer	func (n node) awaitAlive	11:58
niemeyer	func (n node) awaitDead	11:58
rog	func (n *node) getW() error	11:58
niemeyer	rog: What's get?	11:59
niemeyer	rog: get error? :)	11:59
rog	niemeyer: gets the current state of the node. stores it in n.	11:59
rog	niemeyer: getW also gets a wait channel and stores that in n too.	11:59
niemeyer	rog: That's a bad name then.. update() maybe	11:59
rog	sure	11:59
niemeyer	and updateW()	12:01
niemeyer	rog: Yeah, that should improve things indeed, +1	12:01
rog	cool	12:02
* fwereade_ reads back...		12:02
niemeyer	rog, fwereade_: So.. I'd just like to take an eagles view on the whole issue to close it down, if that's ok	12:02
niemeyer	We have one bug, and one design approach we're agreeing that breaks a prior rule	12:02
* rog feels those sharp eyes on him from far far above.		12:02
niemeyer	:)	12:02
niemeyer	The bug is, as we discussed at length, the fact closing and using a connection is a blow up in gozk for sure. We must fix that.	12:03
niemeyer	The design decision is a more subtle one	12:03
niemeyer	The prior goal I was personally attempting to ensure is that we never have background logic under our control running in parallel with other stuff that will break down the concurrently executing logic	12:04
niemeyer	Under that design approach, we'd need a Stop() method on the watcher	12:04
niemeyer	So that we can stop it.. that would avoid the gozk bug too	12:05
niemeyer	But, we're agreeing to not do that, and instead fix gozk, and establish a new condition:	12:05
niemeyer	If the background logic is entirely innocuous, it's fine to allow it to die a cold death	12:05
rog	sounds good to me. as long as the logic will die, and not be left around as garbage	12:07
fwereade_	niemeyer, that sounds sensible to me personally, but then I would say that	12:07
niemeyer	AliveW is one example of that.. by itself, it's doing nothing. If it blows up in the background, there are zero side effects, _as long as the logic that is using the watch does nothing_!!	12:07
fwereade_	niemeyer, this feels consistent with every other ZK watch	12:07
niemeyer	It's _NOT_ ok to have someone waiting on the watch resulting from AliveW, and doing further actions like changing the FS or whatever else, despite the fact the connection has been closed	12:08
rog	and the logic using the watch will just see a close on the channel, which means an error, so hopefully this kind of logic can cascade	12:08
niemeyer	fwereade_: Agreed	12:08
fwereade_	niemeyer, excellent	12:08
rog	fwereade_: agreed	12:08
niemeyer	rog: No, that's exactly what we don't want	12:08
niemeyer	rog: We'd have to reevaluate the use of the watch, if it's also in background	12:09
rog	niemeyer: no? i'm imagining some other innocuous logic layered on top of AliveW	12:09
niemeyer	rog: For it to be alright, it'd have to be innocuous too	12:09
rog	niemeyer: yes, definitely	12:09
niemeyer	rog: Cool, we're in agreement then	12:09
rog	yup	12:09
niemeyer	So, those are the relevant decisions	12:10
niemeyer	Glad we're in agreement, phew :)	12:10
niemeyer	fwereade_: In addition to that, we were nitpicking over the interface around the state type	12:10
niemeyer	fwereade_: Which is not terribly important, but that might be interesting to sort out if you still have the energy	12:11
fwereade_	niemeyer, I saw that and I'm not strongly invested in what I currently have; the fields seemed to go well together, and to stop the function signatures breaking lines, and so I decided it was good :)	12:11
niemeyer	fwereade_: rog was suggesting a node type, which feels nice to clarify the state handling	12:11
niemeyer	fwereade_: Something like this is the status quo of the discussion:	12:12
niemeyer	type node struct { conn, path, timeout, alive }	12:12
niemeyer	func (node) update() error	12:12
niemeyer	func (node) updateW() (watch, error)	12:12
rog	*node	12:13
niemeyer	Sorry, yes	12:13
niemeyer	func (*node) waitAlive()	12:13
niemeyer	func (*node) waitDead()	12:13
niemeyer	(the last two with the necessary return types and parameters)	12:13
fwereade_	niemeyer, LGTM	12:13
fwereade_	thanks guys, productive discussion :-)	12:13
niemeyer	fwereade_: waitAlive and waitDead would be implemented on top of the former two	12:14
niemeyer	fwereade_: Likewise!	12:14
rog	i keep on wondering if waitAlive and waitDead might not be better named as deadWait and liveWait respectively	12:14
rog	i.e. name them after the state they start in rather than the state they're waiting for.	12:14
rog	dunno	12:14
niemeyer	rog: -1.. these are nouns	12:14
niemeyer	we need verbs	12:14
rog	yeah	12:14
rog	i want to call waitDead waitDeath	12:15
rog	but can't think of an appropriate name for the other one	12:15
fwereade_	rog, I considered similar but "waitBirth" wasn't right	12:15
fwereade_	rog, indeed	12:15
niemeyer	waitAlive and waitDead feels great to me	12:15
rog	yeah	12:15
niemeyer	And on that node, I'll do some reviews..	12:16
rog	that's fine. just thought i'd mention it in case others has similar thoughts.	12:16
fwereade_	niemeyer, would you like me to take a look at gozk as well?	12:16
niemeyer	Actually, let me finish answering emails first	12:16
fwereade_	niemeyer, I don't want to overload you :)	12:16
rog	i could do gozk	12:16
niemeyer	fwereade_: Yes, please	12:16
niemeyer	or rog.. whatever works	12:16
rog	or fwereade_ sure	12:16
rog	i'm a bit ahead of myself anyway currently	12:17
* fwereade_ bows out gracefully		12:17
rog	fwereade_: how about you refactor presence (again!) and i'll do zk.	12:18
fwereade_	rog, give me a shout if it becomes a hassle in any way and I can grab it	12:18
fwereade_	rog, but that sounds good to me	12:18
* rog thinks is shouldn't be too much problem.		12:18
rog	s/is/it/	12:18
andrewsmedina	niemeyer: hi	12:27
niemeyer	andrewsmedina: Hey	12:27
rog	fwereade_, niemeyer: ok, zk changes made. now for the harder bit... the testing.	12:35
fwereade_	rog, good luck :)	12:36
* rog thinks of a devious way of forcing a zk request to take a long time.		12:38
andrewsmedina	niemeyer: in the last weekly version Go isn't work with two packages in same project	12:44
andrewsmedina	andrewsmedina: in some of your projects you put a example.go which is a different package.	12:46
fwereade_	rog, just a thought; I have a couple of `switch event.Type`s without defaults; if I get totally unexpected events, it would probably be sensible to `close(watch)`, right?	13:08
rog	fwereade_: you could probably panic	13:08
rog	fwereade_: it's part of the contract that the correct event types should be returned	13:09
rog	fwereade_: if they're not, then something's very broken	13:09
rog	the contract with zk, that is	13:09
fwereade_	rog, in that case I think explicitly panicking may be excessively defensive	13:09
rog	fwereade_: i'd prefer to see a panic rather than a silent failure in that case	13:10
fwereade_	rog, ok, just `panic(event)` then?	13:10
rog	fwereade_: panic(fmt.Errorf("unexpected event %v", event))	13:10
rog	or something like that	13:10
fwereade_	rog, ok then :)	13:11
rog	it's nice to see the thing that caused the panic	13:11
fwereade_	rog, sounds good	13:12
fwereade_	rog, hm, I guess I can lose the forward() stuff once we have a Close-safe gozk	13:16
rog	fwereade_: yeah. that'll be good.	13:16
rog	fwereade_: i could push a version that you could use until i've done the tests, if you like.	13:16
fwereade_	rog, a good thing but also sort of a shame, that stuff made me very happy :)	13:17
fwereade_	rog, I guess that would be useful	13:17
rog	fwereade_: it's still there for when you need it again...	13:17
rog	fwereade_: in fact i think i might be using that technique in the zk tests... (my "devious way")	13:17
fwereade_	rog, indeed, the pleasure of deleting code still outweighs the pleasure I took in the code to begin with	13:18
fwereade_	rog, nice :)	13:18
rog	fwereade_: i'm gonna have a forwarder and stop all incoming traffic. then i know that the reply can't come back quickly...	13:18
rog	fwereade_: in fact i might steal your code directly, since it's still there...	13:20
fwereade_	rog, go for it :)	13:20
TheMue	So, solved a merging prob, 76 tests passed	13:24
rog	TheMue: nice	13:25
TheMue	rog: yeah, it goes on, piece by piece	13:26
rog	TheMue: i bet you're glad you speeded up the tests now :-)	13:26
TheMue	rog: definitely	13:35
TheMue	rog: when zk is cold (first start) about 11 sec, then about 6.3 sec.	13:36
rog	TheMue: how long if you run just one test?	13:37
TheMue	not yet tested	13:37
fwereade_	rog, TheMue, IME live-zookeeper tests have a 5s overhead	13:39
fwereade_	(when hot)	13:39
rog	so about 1.3 seconds to run the tests. that's not bad	13:39
rog	TheMue: try go gotest -gocheck.run patternMatchingOneTest :-)	13:40
TheMue	rog: with a lot i/o	13:40
rog	oh, i think it might be gocheck.f actually	13:40
fwereade_	rog, it is	13:40
rog	fwereade_: i'm going to change that soon - it should be the same as the equivalent test option	13:41
rog	fwereade_: i just haven't pushed the change yet	13:41
rog	well, the merge request	13:41
fwereade_	rog, fine with me; I'll discover it, swear, check the options and be happy :)	13:41
TheMue	rog: a single test is 5.11 vc 6.36 for all tests	13:45
rog	ain't java wonderful?	13:45
TheMue	fwereade_: your guess with 5 sec seems good	13:45
TheMue	rog: haha	13:45
fwereade_	niemeyer, should ExpectFailure (gocheck) possibly output something in test runs? as it is, its "don't forget this test exists" value seems a touch limited	13:52
niemeyer	fwereade_: The purpose is precisely to not forget that there are broken tests without having a failure that we'll learn to ignore	13:54
fwereade_	niemeyer, agree that ignored-by-convention failures are poisonous; I just don't see how it stops us forgetting any better than renaming the methods to DontTest* would	13:56
niemeyer	fwereade_: It mentions that there are expected failures at the end of the run, doesn't it?	13:57
fwereade_	niemeyer, perhaps I'm Doing It Wrong, but not afaics	13:57
fwereade_	william@diz:~/code/go/src/launchpad.net/juju/go$ go test launchpad.net/juju/go/state/presence	13:57
fwereade_	ok launchpad.net/juju/go/state/presence5.890s	13:57
niemeyer	Hmm	13:57
niemeyer	fwereade_: Run it without the package name	13:58
niemeyer	fwereade_: Within the directory	13:58
niemeyer	fwereade_: Or with -v	13:58
niemeyer	fwereade_: go test is eating the output	13:58
fwereade_	niemeyer, no dice either way	13:58
fwereade_	niemeyer, ah, somehow run it without using "go test"?	13:58
niemeyer	No	13:59
niemeyer	fwereade_: There's something wrng indeed	13:59
fwereade_	niemeyer, it's not critical, shall I just open a bug?	14:01
fwereade_	niemeyer, (srry, I feel bad distracting you all the time while the store's still in progress)	14:02
niemeyer	fwereade_: No worries really	14:02
niemeyer	fwereade_: I haven't even managed to get to that yet	14:02
fwereade_	niemeyer, heh, that's a large part of why I feel bad about it ;)	14:03
niemeyer	fwereade_: Yes, please file a bug.. it's supposed to print it	14:03
niemeyer	Hey cliff-hm	14:25
niemeyer	cliff-hm: How're things going in the RedHat side of the world?	14:25
cliff-hm	Pretty OK thanks. :)	14:27
cliff-hm	Just trying to understand and play with Juju a bit to see how it fits itself together.	14:28
niemeyer	cliff-hm: Nice, please let us know what you think	14:28
cliff-hm	(I prefer not to hide away) :)	14:29
cliff-hm	Sure	14:31
cliff-hm	I may try to do that contest, not sure yet.	14:32
cliff-hm	(In my spare, non-company time)	14:32
niemeyer	cliff-hm: That'd be great	14:38
niemeyer	cliff-hm: When can we expect juju to be integrated into RedHat's products? :)	14:38
niemeyer	rog: Can you please sort out this branch so we can take it out of the review list: https://code.launchpad.net/~rogpeppe/juju/fix-tutorial-with-expose-2/+merge/75379	14:43
rog	niemeyer: jeeze, i thought i did that months ago :-)	14:43
niemeyer	rog: and this one too: https://codereview.appspot.com/5674051/	14:44
rog	niemeyer: TheMue wanted to do something different there, so i was thinking of abandoning it	14:45
niemeyer	rog: That's a fine solution, but we need to reject it so that that it goes out of the review list	14:45
rog	niemeyer: ok. i'll reject it.	14:46
niemeyer	rog: Thanks	14:46
cliff-hm	niemeyer, I don't know ;) I'm doing this on personal time ... out of my own interests.	14:46
niemeyer	cliff-hm: Ah, understood. I thought it was related to you being a managed of the Spacewalk/Satellite/RHN/etc team	14:48
niemeyer	cliff-hm: Please drop us a note if you have any suggestions then	14:49
cliff-hm	My interests stem from that background, yes. But I'm not under company orders to spy and look at what others do :) My day job is too busy for those tasks.	14:50
cliff-hm	And yeap. I will try to give feedback. Though likely to have more questions :)	14:53
cliff-hm	(been reading so far, hoping tonight and weekend to stay hands-on)	14:53
niemeyer	cliff-hm: Sweet	14:54
niemeyer	cliff-hm: I have good friends at RH, btw..	14:54
rog	niemeyer: both rejected - the earlier was submitted in some other form sometime.	14:54
niemeyer	cliff-hm: Probably not very close to you, though	14:55
cliff-hm	we have a lot of employees now. I know a bunch of people in my time here, but there is a lot more people that I do not know, than know.	14:58
niemeyer	cliff-hm: Can imagine.. most of the people I know are around kernel development, I believe	14:58
niemeyer	Rietveld cross-changes diffing is such a huuuge aid	15:02
niemeyer	cliff-hm: Do you work close to the Openshift guys?	15:02
niemeyer	Suddenly rietveld is crawling	15:09
niemeyer	Which makes this a good time to have lunch	15:09
niemeyer	rog: I did manage to LGTM one of hte pending branches, though	15:09
rog	niemeyer: magic, thanks	15:09
niemeyer	Will finish pending reviews when I'm back, and then will step out to do store work	15:10
cliff-hm	I know several, but not close with them. Closer to the Aoelous and Katello, pulp project guys.	15:13
rog	niemeyer, fwereade_: https://codereview.appspot.com/5694068/	15:18
rog	probably the most significant one yet.	15:18
rog	fwereade_: if you wanna have a look, probably the best place to start is in util.go, which has the basic robustness primitive.	15:19
rog	fwereade__: just checking: did you see my CL above?	16:16
rog	fwereade__: or is your network playing up?	16:16
rog	hmm, can anyone see this?	16:17
niemeyer	It's funny that every time Mark starts a rant in canonical-tech, there's a bunch of people asking to subscribe to the list	16:29
rog	niemeyer: i hadn't looked at that list recently	16:41
fwereade__	rog, sorry, I keep not seeing notifications	16:49
rog	fwereade__: that's fine. this is what i said:	16:50
fwereade__	rog, I did, and thank you for the advice to start looking in util, would have been a little mystifying otherwise ;)	16:50
rog	15:18] <rog> niemeyer, fwereade_: https://codereview.appspot.com/5694068/	16:50
rog	[15:18] <rog> probably the most significant one yet.	16:50
rog	[15:19] <rog> fwereade_: if you wanna have a look, probably the best place to start is in util.go, which has the basic robustness primitive.	16:50
rog	fwereade__: np	16:50
niemeyer	rog: Yeah, I'm on reviews	16:51
rog	niemeyer: brill!	16:51
niemeyer	Man.. why is codereview so slow	16:54
niemeyer	rog: "there's deliberately only one securityGroup instance for each security group."	16:54
niemeyer	rog: Where's that enforced?	16:54
rog	niemeyer: there's only one occurrence of "&securityGroup" in the file, which is in newSecurityGroup, which is only called when group() has returned nil	16:57
rog	niemeyer: actually, i think the newSecurityGroup call in NewInstances (even though it conforms to that rule) might be ananachronism	16:58
rog	an anachronism	16:58
niemeyer	rog: There are two different calls to newSecurityGroup	16:58
rog	niemeyer: they're both called only when group() has returned nil	16:59
rog	niemeyer: i.e. they never create a group with the same contents twice	16:59
rog	niemeyer: but see above too	16:59
rog	i think the occurrence in NewInstances might be from before i implemented CreateSecurityGroup	17:00
niemeyer	rog: Still, this is a problem.. the current design is error prone, and there's a design not documented anywhere and scattered across the code	17:00
rog	niemeyer: if i deleted it from NewInstances (which i think i probably should) then it would not be scattered.	17:01
niemeyer	rog: It would still not be documented, and it would still be error prone	17:01
niemeyer	rog: All it takes is a single line: &securityGroup{}.. BOOM, bug.	17:01
rog	niemeyer: so you don't like comparing pointers?	17:02
rog	niemeyer: or are you asking for a comment?	17:02
niemeyer	rog: I like comparing pointers.. I don't like error prone code that will break behind our back in hard to notice ways	17:02
niemeyer	rog: I'm asking to make the code not error prone	17:02
niemeyer	rog: I don't want to remind everybody that touches that file that they can't create security groups	17:02
rog	niemeyer: if i put a comment next to the securityGroup type saying "only create this through newSecurityGroup", would that be ok?	17:03
rog	niemeyer: (and have newSecurityGroup do the check)	17:03
niemeyer	rog: The comment + the latter sounds good	17:03
rog	niemeyer: cool, will do.	17:04
niemeyer	rog: This will prevent people from making silly but reasonable mistakes, I hope	17:04
niemeyer	rog: Thanks, and sorry for the trouble	17:04
rog	niemeyer: np. it was worth it to find the bug in NewInstances	17:04
rog	well, not precisely a bug	17:05
niemeyer	rog: Please also add a short note in the permKey struct, next to the pointer definition	17:09
fwereade__	EOD, and I'm exhausted :). happy weekends all!	17:11
niemeyer	rog: Nice comments, btw, thanks	17:11
niemeyer	fwereade__: have a good time man :)	17:11
rog	fwereade__: happy weekend to you to	17:12
rog	o	17:12
rog	niemeyer: something like this?	17:15
rog	// permKey represents permission for a given security	17:15
rog	// group or IP address (but not both) to access a given range of	17:15
rog	// ports. Equality of permKeys is used in the implementation of	17:15
rog	// permission sets, relying on the uniqueness of securityGroup	17:15
rog	// instances.	17:15
niemeyer	rog: Sounds great, thanks	17:17
rog	niemeyer: cool.	17:28
rog	with those changes, does it LGTY?	17:29
rog	niemeyer: here's the other comment, BTW:	17:29
rog	// securityGroup holds a simulated ec2 security group.	17:29
rog	// Instances of securityGroup should only be created through	17:29
rog	// Server.newSecurityGroup to ensure that groups can be	17:29
rog	// compared by pointer value.	17:29
rog	niemeyer: PTAL	17:35
niemeyer	rog: // It returns nil if a security group with the given name already exists.	17:43
niemeyer	rog: Feels like a trap	17:43
rog	niemeyer: really?	17:43
rog	niemeyer: do you want it to return an error?	17:44
rog	niemeyer: i considered that, but i haven't got an error type yet	17:44
rog	niemeyer: and the caller needs to call fatalf with an appropriate error code	17:44
rog	niemeyer: i could just delete the newSecurityGroup function	17:45
niemeyer	rog: Yeah, an error sounds good.. or make it panic internally and delegate the responsibility of checking to the call site	17:45
rog	niemeyer: what about my last suggestion?	17:46
rog	niemeyer: just inline it inside CreateSecurityGroup	17:46
niemeyer	rog: The abstraction is fine.. just make it panic internally if the group already exists	17:47
niemeyer	rog: We just want to make the error visible rather than a hidden issue	17:47
rog	niemeyer: it won't be hidden for long... anything that tries to use the returned group will panic.	17:47
rog	niemeyer: but ok	17:47
niemeyer	rog: Not really. Returning a nil group means people could get away with setting a permKey.group to it, for instance.	17:48
rog	given that it's only called in one place, i don't really see why the creation needs its own function.	17:49
niemeyer	rog: Sure, inline then..	17:49
rog	something in me slightly balks at doing two linear searches of the group list... (i know that's a ridiculous concern, too!)	17:49
niemeyer	rog: Keep a comment about the problem, please	17:50
rog	niemeyer: will do	17:50
rog	niemeyer: done.	17:54
rog	weird	17:56
rog	my irc client appears to have logged me in twice	17:57
rog	ah! that must've been carmen opening my laptop	17:57
rog	niemeyer, TheMue: i'm off now. have a good weekend.	17:59
niemeyer	rog: LGTM.. thanks	17:59
TheMue	rohave a good weekend too	17:59
niemeyer	rog: Have a good weekend	17:59
rog	niemeyer: thanks for the reviews.	17:59
rog	niemeyer: always nice to finish with a submit. thanks.	18:00
niemeyer	rog: Indeed! :-)	18:00
niemeyer	rog: and np	18:00

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!