/srv/irclogs.ubuntu.com/2012/02/24/#juju-dev.txt

TheMuere08:25
niemeyerGood morning all!09:12
TheMueniemeyer: hiya09:18
fwereade__heya niemeyer, you're on early :)09:30
niemeyerfwereade__: Yeah, a bit :)09:30
* rog finally makes his network connection work again09:46
niemeyerrog: Welcome back09:47
rogniemeyer: hi!09:48
rogniemeyer: you're up earlier than usual...09:49
TheMuerog: hiya09:49
rogTheMue: morning!09:49
niemeyerrog: Yeah, a bit09:50
rogniemeyer: any chance you could have another look at those CLs? i'm hoping to get something pushed this week.09:50
niemeyerrog: Not right now09:52
rogniemeyer: k09:52
fwereade__niemeyer, btw, there was a conversation the other day I think you missed about gozk panicing in C when a Conn is closed (and something else is busy in C code)09:55
niemeyerfwereade__: Ugh, that's not good09:55
fwereade__niemeyer, would fixing this just be a matter of a RWMutex on handle accesses, or is it more subtle?09:55
niemeyerfwereade__: Well.. I'd need some more information about the crash09:55
fwereade__niemeyer, I've seen it in a couple of cases, but the common feature is that some goroutine is always somewhere like: launchpad.net/gozk/zookeeper._C2func_zoo_wexists(0x36edf80, 0x36ef380)09:57
niemeyerfwereade__: Ok, do you have a full traceback?09:57
fwereade__niemeyer, here's the one I just saw09:58
rogniemeyer: i had a brief look at this - it just happens when zk derefs the nil handle, because handle access isn't mutexed, as fwereade__ suggestes09:58
fwereade__niemeyer, http://paste.ubuntu.com/855178/09:58
rogsuggests09:58
fwereade__niemeyer, the set line shouldn't be the problem, because that conn is not closed09:59
niemeyerfwereade__: Is this looping while you're calling Close on the zk Conn?10:01
fwereade__niemeyer, yes10:01
niemeyerfwereade__: That's going back to the point we talked about10:02
niemeyerfwereade__: We shouldn't do that10:02
niemeyerfwereade__: We shouldn't ever close a connection behind the back of logic that we're managing ourselves10:02
fwereade__niemeyer, ok, maybe I'm missing something10:03
niemeyerfwereade__: That's true for tests, and also for real logic10:03
niemeyerfwereade__: We can even implement locking to prevent the nil pointer reference at some point, but that's not the most critical issue10:04
niemeyerfwereade__: Ok, but do you recall that conversation?10:04
fwereade__niemeyer, ok, in that case I guess the tests I was trying to write are not actually relevant, because I'm really trying to test deal-with-session-event handling that doesn't exist yet10:04
niemeyerFeb 22 12:09:11 <niemeyer>      fwereade_: We should never, ever, ever, leave background logic unattended while we're messing around with the  system state in disruptive ways10:05
niemeyerfwereade__: Hmm, can you expand on that?10:05
fwereade__niemeyer, yes, I do; the reason I saw this one just now is because all I *thought* I had active on that conn was a watch10:05
fwereade__niemeyer, I was trying to verify that the watch channel got closed when the underlying ZK connection got yanked out from underneath10:06
niemeyerfwereade__: Ok, that's an interesting case.. maybe we have something to fix then10:06
niemeyerfwereade__: Can you please paste waitFor?10:06
fwereade__niemeyer, http://paste.ubuntu.com/855189/10:07
niemeyerfwereade__: Sorry, I thought that was where the loop was.. TestDisconnectAliveWatch then, I guess10:09
niemeyerAh, this is the guy blowing up: launchpad.net/gozk/zookeeper._C2func_zoo_set2(0x0, 0x36ef360)10:10
fwereade__niemeyer, yep10:10
niemeyerfwereade__: Please paste the test and (*Pinger).run10:10
fwereade__niemeyer, before I paste this: I know the stuff in the goroutine comment is wrong :p10:10
fwereade__niemeyer,10:10
fwereade__http://paste.ubuntu.com/855193/10:10
niemeyerfwereade__: Cool :)10:11
fwereade__niemeyer, http://paste.ubuntu.com/855195/10:11
fwereade__niemeyer, wait, you're right, it is set and not exists, bugger, I'm at least partially on crack (in a way that I was not hitherto aware of)10:13
niemeyerfwereade__: You know it's wrong in which way?10:13
fwereade__niemeyer, "this should be equivalent"10:13
fwereade__niemeyer, ie, what we should actually be doing is *not* just closing the conn until we've dealt with everything10:16
fwereade__niemeyer, but if it is indeed the set that's blowing up, anmd it looks like it is, then I'm completely confused because that's using a different conn to the one that gets closed10:17
niemeyerfwereade__: Not really10:17
niemeyerfwereade__: Oh, wait.. yes10:17
fwereade__niemeyer, so I would appear to be crackful in an entirely fascinating and original way :/10:18
niemeyerfwereade__: But!10:18
niemeyerfwereade__: Please paste Close10:18
niemeyerfwereade__: I bet you have a race10:18
fwereade__niemeyer, oh balls, I try to hit the target node again10:18
fwereade__niemeyer, (to give clients as long a time as possible before they see a timeout)10:19
fwereade__niemeyer, (naively trusting that if the connection is borked it'll just error and I can ignore it)10:19
niemeyerfwereade__: That's fine10:19
niemeyerfwereade__: But a race is not fine.. please paste Close10:20
TheMueInteresting, http://cocode.io/beta/ will provide collaborative coding. It's written in Go knows Go as a highlighted lang. Sadly yet only GitHub, no Bazaar (or others).10:21
niemeyer<niemeyer> fwereade_: That's fine10:22
niemeyer<niemeyer> fwereade_: But a race is not fine.. please paste Close10:22
niemeyerfwereade_, fwereade__: Are you having wifi issues too?10:23
niemeyerfwereade_, fwereade__, fwereade: Are you having wifi issues too? :-)10:23
fwereadeniemeyer, seems so :p10:23
fwereadeniemeyer, http://paste.ubuntu.com/855206/10:24
fwereadeniemeyer, if you didn't see above10:24
niemeyerfwereade_, fwereade__, fwereade: I always knew you were not a single person..10:24
niemeyerfwereade: Yeah, you have a race indeed10:24
niemeyerfwereade: Can you please have a look at this post: http://blog.labix.org/2011/10/09/death-of-goroutines-under-control10:25
fwereadeniemeyer, I'm a globally distributed collection of outsourced drones, but don't tell anyone10:25
niemeyerfwereade: I'll highlight a few things afterwards, if they're not obvious10:25
fwereadeniemeyer, cool, thanks10:25
* fwereade reads10:25
rogniemeyer: where's the race?10:27
fwereadeniemeyer, I remember reading that when you wrote it, and totally forgot about it until now10:28
fwereadeniemeyer, it seems clear, but I'd appreciate the highlighting all the same10:28
rogoh yeah, in TestDisconnectAliveWatch. i was looking in the pinger methods10:31
fwereade_grar10:36
niemeyerfwereade_: Are you here? :)10:37
fwereade_niemeyer, er, probably10:38
niemeyerfwereade_: Ok :)10:38
niemeyerfwereade_: So, quicklY! :-)10:38
niemeyerfwereade_: Observe the way Stop is implemented in the blog post10:38
niemeyerfwereade_: It sends a fatal error, and *waits*10:39
TheMueniemeyer: next round of https://codereview.appspot.com/5671055/10:39
niemeyerTheMue: Thanks, I'll have a look at that next to rog's10:39
TheMueniemeyer: yep, thx10:39
niemeyerfwereade_: You're sending a stop signal, and ignoring the fact that, naturally, stopping isn't instantaneous10:40
fwereade_niemeyer, yeah; I had a "closed" channel I was waiting on originally, but it was pointed out that since closing was unbuffered I wasn't getting any benefit from waiting on closed after sending to closing10:40
niemeyerfwereade_: Uh.. this is bogus10:41
niemeyerfwereade_: Oh, or maybe not.. let me see again10:41
rogniemeyer: it looked ok to me :-)10:41
fwereade_niemeyer, I'm not saying there weren't subtle bugs in what I was doing, or in what I am doing ;)10:41
rogniemeyer: i *hope* it's not bogus otherwise my understanding of channels is fundamentally flawed...10:42
niemeyerrog: No, you're right.. as long it's unbuffered, that's fine10:43
rogphew10:43
niemeyerfwereade_: The race is in the test itself10:44
rogyup10:44
niemeyerfwereade_: No, sorry.. crack again..10:44
niemeyeraltConn is in a different end10:45
niemeyerHmm10:45
rogniemeyer: isn't is a race that waitFor(c, altConn, path) is called concurrently with altConn.Close() ?10:46
rog(not that i've looked at waitFor)10:46
niemeyerrog: No.. the pinger is blowing up with a nil handle10:47
niemeyerDespite anything else, there's a race here in that the handle is being cleaned up underneath its feet10:48
rogniemeyer: that still looks like a race to me10:48
niemeyerrog: :-)10:48
rogniemeyer: even if it's not the thing that's blowing up10:48
niemeyerrog: If the logic in the test is sane, the Close won't happen before the kill10:49
niemeyerrog: if..10:50
rogniemeyer: oh yes, i hadn't seen connect - i was assuming that the session event was the connection event, not the teardown event.10:51
rogbut presumably connect waits for the first session event10:51
fwereade_rog, yes10:51
niemeyerrog: That's what I'm assuming too10:51
rogfwereade_: how reproducible is this?10:56
fwereade_rog, not very :(10:56
fwereade_rog, it'll happen eventually10:56
rogfwereade_, niemeyer: AliveW is busy on altConn at the same time that altConn is closed, right?11:00
rogso that looks like a race to me11:00
fwereade_rog, yes; so that Close could happen at any time, and that's definitely a problem11:01
rogi.e. the server gets killed, the goroutine receives the session event and closes altConn, at the same time AliveW happens to do a Change11:01
fwereade_rog, there are similar tests that really are *only* watching when the conn gets closed, and they're fine; yes11:02
niemeyerrog: If that's the case, then the connection is being Closed before kill()11:02
rogniemeyer: i don't think so11:02
rogniemeyer: the kill happens, then the close happens, then AliveW does something11:03
niemeyerrog: AliveW is run before kill..11:03
rogniemeyer: AliveW has a goroutine11:03
fwereade_niemeyer, I guess it's not impossible that there's another session event confusing me11:03
niemeyerrog: OH, does it?11:03
niemeyerWhy?11:03
niemeyerTo transform the event, I guess11:03
rogniemeyer: yup11:03
niemeyerBut it should be listening only11:03
niemeyerOn a channel.. that's not supposed to blow up11:03
niemeyerfwereade_: Can you please paste the whole file, or just push the branch?11:04
rogyeah11:04
fwereade_niemeyer, sure11:04
fwereade_niemeyer, ~fwereade/juju/go-presence-nodes11:04
fwereade_niemeyer, when we're watching for aliveness we can't just dumbly watch a channel11:05
fwereade_niemeyer, we need to keep rewatching when we get changes before timeouts11:05
niemeyerfwereade_: Ho ho ho11:05
fwereade_niemeyer, technically, above, when we're watching for *deadness*11:05
fwereade_niemeyer, have I misunderstood something fundamental?11:06
niemeyerfwereade_: I'm just curious about the rewatching.. that may explain it11:06
fwereade_niemeyer, (also we may need to rewatch when waiting for liveness too, it's a bit of an edge case, but...)11:06
niemeyerfwereade_: Yeah, there's a race there..11:07
niemeyerfwereade_: You're firing goroutines that use zk in a way that closing the connection will ignore completely11:07
fwereade_niemeyer, huh, I thought that closing the conn would close all the watches, and I'd see that11:08
rogniemeyer: i don't think there's a way to do it without either having a way to Close an AliveW or by making zk.Close concurrent-safe11:08
rogniemeyer: personally, i think i'd opt for the latter11:08
rogniemeyer: (after all, all the other routines are ok to call concurrently)11:08
rogs/routines/zk methods/11:09
niemeyerrog: Yes, there may be something to change in gozk, but there's something to be handled in presence itself that I'm trying to understand11:10
rogniemeyer: presence seems ok to me, but i'm probably missing something11:11
niemeyerfwereade_: Why do we need awaitDead and awaitAlive to loop?11:11
fwereade_niemeyer, awaitDead to refresh the data watch until I get a timeout without the data watch having fired11:12
niemeyerfwereade_: Ah, right.. it's the opposite.. it fires when it doesn't get a watch11:12
niemeyerHmm11:12
fwereade_niemeyer, awaitAlive to handle the case where a known-unresponsive node is deleted -- it's still dead but not in the same way and IMO I shouldn't really alert the client to say "yeah, nothing changed"11:12
niemeyerWell, looks like we need our Watcher type back..11:14
rogniemeyer: if zk Close was concurrent-safe, everything would be fine11:15
niemeyerSo that we can stop this logic in a sane way11:15
rogniemeyer: just like closing a net.Conn11:15
rogniemeyer: then the logic just stops as a matter of course11:15
niemeyerrog: What do you mean by concurrency safe?11:16
rogniemeyer: i mean that you could Close a zk conn concurrently with executing some other operation on the conn11:16
fwereade_niemeyer, it feels like one of the advantages of gozk is that the watches can just get auto-closed, and it would be really nice if we could follow the same convention here11:16
fwereade_rog, nitpick, technically it's all the other methods that aren't concurrent-safe ;)11:16
niemeyerrog: That's not enough11:17
rogfwereade_: they're concurrent-safe with each other...11:17
rogniemeyer: no?11:17
niemeyerfwereade_: Exactly11:17
niemeyerrog: What fwereade_ said11:17
rogniemeyer: i don't understand. wouldn't a mutex around the handle access be acceptable?11:17
fwereade_niemeyer, it still seems to me that a RWMutex on handle around C calls would be enough; what am I missing?11:18
rogniemeyer: (every method would have to respect it, of course)11:18
niemeyerrog: This is something else, not what you originally said11:18
rogniemeyer: that's what is necessary to make it ok to call Close concurrently, AFAICS11:18
rogniemeyer: which is what i was suggesting11:18
niemeyerrog: You're still missing the point..11:18
fwereade_niemeyer, so am I I think11:19
niemeyerrog: It's not just about concurrency11:19
* rog usually does11:19
niemeyerrog: Close(); Set(foo).. BOOM!11:19
rogniemeyer: no.11:19
rogniemeyer: not if Set checks to see if handle is nil and then returns an error11:19
niemeyerrog: Dude.. please make up your mind11:19
fwereade_niemeyer, wouldn't we just grab a read lock, check for nil, and error back cleanly?11:20
rogfwereade_: exactly11:20
niemeyerfwereade_: YES!11:20
niemeyerfwereade_: We'd just do that.. that's not what was being said so far!11:20
fwereade_niemeyer, ha, I thought it was11:20
niemeyer<niemeyer> rog: What do you mean by concurrency safe?11:20
niemeyer<rog> niemeyer: i mean that you could Close a zk conn concurrently with executing some other operation on the conn11:20
rogniemeyer: i just said "make Close concurrent-safe". that doesn't imply that Close is the only method that needs to change.11:20
niemeyerrog: This is not about concurrency! Serial operations blow up!11:21
rogniemeyer: well, that's a different (although related) issue.11:21
fwereade_niemeyer, Close might block for a while (or *possibly for ever if enough is going on...) but that seems preferable to unrecoverable panics in C if if a hamfisted amateur like myself is trying to use the library11:21
niemeyerrog: Yes, it's the issue we're seeing11:21
rogniemeyer: i think we're seeing an issue because of concurrency, no?11:22
niemeyer<niemeyer> rog: This is not about concurrency! Serial operations blow up!11:22
rogniemeyer: is that the case in this example? i thought we were doing things in two separate goroutines11:22
niemeyer<niemeyer> rog: This is not about concurrency! Serial operations blow up!11:23
niemeyerrog: You can inline.. Close(); Set(); BOM!11:23
fwereade_rog, niemeyer: sorry, lamb chops are hot on the table, back shortly :)11:23
rogniemeyer: that's not what's happening in this issue though, right?11:24
niemeyerrog: Yes, it is11:24
rogniemeyer: in the same goroutine?11:24
niemeyerrog: No.. the fact there are two goroutines is completely irrelevant11:24
rogniemeyer: to my mind, it's not. if we were in a single goroutine it would be easy to avoid doing an operation after Close.11:25
rogniemeyer: it's the concurrency issue that makes it harder11:25
niemeyerrog: Sorry, but a "concurrency issue" is something else than what you have in mind11:26
rogniemeyer: ok, fair enough11:26
rogniemeyer: so...11:26
niemeyerrog: You can put a lock across every single function in gozk, and it will still blow up11:26
rogniemeyer: only if they don't check that handle is not nil before calling into C, no?11:26
niemeyerrog: Yes, and that's not a concurrency issue.. this is about using a connection after it's been closed.11:27
niemeyerrog: and make that not panic11:27
rogniemeyer: agreed. but just doing that won't fix our problem either.11:27
rogniemeyer: because then there would be a race.11:27
niemeyerrog: Sure, because you can't trust on the test for the handle to be valid without a lock11:29
rogniemeyer: exactly11:29
niemeyerfwereade_: So..11:37
rogniemeyer: i think a RWMutex would probably do the job just fine11:37
rogon zk.Conn, that is11:37
niemeyerfwereade_: You're right, we must definitely fix gozk itself so that we protect from that kind of issue11:38
niemeyerfwereade_: I'm just still wondering if we should change the interface for that11:38
niemeyerfwereade_: I guess it's fine to keep it running it in background, and simply error out, given that this is purely a read operation without any side effects11:38
rogniemeyer: that was my thought.11:40
niemeyerfwereade_: Btw, I'd prefer to not have the  (s state) as a parameter to the await functions11:40
niemeyerfwereade_: There's actually a bug in that regard in the current logic11:41
niemeyerfwereade_: No, there isn't, sorry11:42
niemeyerfwereade_: You're passing the parameter as a copy, which is safe11:42
niemeyerfwereade_: Still, I don't see why the parameter is necessary11:42
rogniemeyer: it saves passing the path and timeout parameters as separate arguments11:43
rogniemeyer: but i'm with you11:44
niemeyerrog: Yeah, I think the organization there needs some tweaking for clarification11:47
rogniemeyer: i wondered if things might be clearer if there was an internal watcher type which held the params that are being passed around.11:51
niemeyerrog: I was wondering about that too, but this is makes it seem like there wouldn't be much benefit compared to what is there now:11:52
niemeyer                                s, zkWatch, err = newStateW(conn, s.path)11:52
rogniemeyer: s, err := w.getStateW() ?11:52
niemeyerrog: The only bit that is actually constant is the path11:52
rogyeah11:53
niemeyerrog: Right, w is the path..11:53
niemeyerrog: Which renders fwereade_ design elegant11:53
rogit was the other methods i was really thinking about (waitAlive and waitDead)11:53
niemeyerrog: Right.. as methods, they would have to live on a type that is simply the path11:54
niemeyerrog: Because timeout and aliveness changes11:54
rogniemeyer: that's fine. they can still be fields in the waiter type. it will change when they change.11:55
rogniemeyer: (it's not used concurrently)11:55
niemeyerrog: There's no point in having them as fields.. it's only used in that one function11:55
rogmaybe "node" might be a better name for the type11:55
niemeyerrog: Yeah, path :)11:56
niemeyerrog: type node string, where node is the path11:56
niemeyerrog: That won't improve much what fwereade_ has there now, though. The design feels sound on that level11:57
* rog is thinking about it11:57
niemeyerrog: Well.. I guess we could have the connection on it too11:57
niemeyerWhich would make it more interesting11:58
rogniemeyer: yeah11:58
niemeyertype node string11:58
rogniemeyer: and the watch channel to send to11:58
niemeyerfunc (n node) readState() (alive, timeout, error)11:58
niemeyerfunc (n node) readStateW() (alive, timeout, watch, error)11:58
rogfunc (n node) get() error11:58
rogsorry11:58
rogfunc (n *node) get() error11:58
niemeyerfunc (n node) awaitAlive11:58
niemeyerfunc (n node) awaitDead11:58
rogfunc (n *node) getW() error11:58
niemeyerrog: What's get?11:59
niemeyerrog: get error? :)11:59
rogniemeyer: gets the current state of the node. stores it in n.11:59
rogniemeyer: getW also gets a wait channel and stores that in n too.11:59
niemeyerrog: That's a bad name then.. update() maybe11:59
rogsure11:59
niemeyerand updateW()12:01
niemeyerrog: Yeah, that should improve things indeed, +112:01
rogcool12:02
* fwereade_ reads back...12:02
niemeyerrog, fwereade_: So.. I'd just like to take an eagles view on the whole issue to close it down, if that's ok12:02
niemeyerWe have one bug, and one design approach we're agreeing that breaks a prior rule12:02
* rog feels those sharp eyes on him from far far above.12:02
niemeyer:)12:02
niemeyerThe bug is, as we discussed at length, the fact closing and using a connection is a blow up in gozk for sure. We must fix that.12:03
niemeyerThe design decision is a more subtle one12:03
niemeyerThe prior goal I was personally attempting to ensure is that we never have background logic under our control running in parallel with other stuff that will break down the concurrently executing logic12:04
niemeyerUnder that design approach, we'd need a Stop() method on the watcher12:04
niemeyerSo that we can stop it.. that would avoid the gozk bug too12:05
niemeyerBut, we're agreeing to not do that, and instead fix gozk, and establish a new condition:12:05
niemeyerIf the background logic is entirely innocuous, it's fine to allow it to die a cold death12:05
rogsounds good to me. as long as the logic *will* die, and not be left around as garbage12:07
fwereade_niemeyer, that sounds sensible to me personally, but then I would say that12:07
niemeyerAliveW is one example of that.. by itself, it's doing nothing. If it blows up in the background, there are zero side effects, _as long as the logic that is using the watch does nothing_!!12:07
fwereade_niemeyer, this feels consistent with every other ZK watch12:07
niemeyerIt's _NOT_ ok to have someone waiting on the watch resulting from AliveW, and doing further actions like changing the FS or whatever else, despite the fact the connection has been closed12:08
rogand the logic using the watch will just see a close on the channel, which means an error, so hopefully this kind of logic can cascade12:08
niemeyerfwereade_: Agreed12:08
fwereade_niemeyer, excellent12:08
rogfwereade_: agreed12:08
niemeyerrog: No, that's exactly what we don't want12:08
niemeyerrog: We'd have to reevaluate the use of the watch, if it's also in background12:09
rogniemeyer: no? i'm imagining some other innocuous logic layered on top of AliveW12:09
niemeyerrog: For it to be alright, it'd have to be innocuous too12:09
rogniemeyer: yes, definitely12:09
niemeyerrog: Cool, we're in agreement then12:09
rogyup12:09
niemeyerSo, those are the relevant decisions12:10
niemeyerGlad we're in agreement, phew :)12:10
niemeyerfwereade_: In addition to that, we were nitpicking over the interface around the state type12:10
niemeyerfwereade_: Which is not terribly important, but that might be interesting to sort out if you still have the energy12:11
fwereade_niemeyer, I saw that and I'm not strongly invested in what I currently have; the fields seemed to go well together, and to stop the function signatures breaking lines, and so I decided it was good :)12:11
niemeyerfwereade_: rog was suggesting a node type, which feels nice to clarify the state handling12:11
niemeyerfwereade_: Something like this is the status quo of the discussion:12:12
niemeyertype node struct { conn, path, timeout, alive }12:12
niemeyerfunc (node) update() error12:12
niemeyerfunc (node) updateW() (watch, error)12:12
rog*node12:13
niemeyerSorry, yes12:13
niemeyerfunc (*node) waitAlive()12:13
niemeyerfunc (*node) waitDead()12:13
niemeyer(the last two with the necessary return types and parameters)12:13
fwereade_niemeyer, LGTM12:13
fwereade_thanks guys, productive discussion :-)12:13
niemeyerfwereade_: waitAlive and waitDead would be implemented on top of the former two12:14
niemeyerfwereade_: Likewise!12:14
rogi keep on wondering if waitAlive and waitDead might not be better named as deadWait and liveWait respectively12:14
rogi.e. name them after the state they start in rather than the state they're waiting for.12:14
rogdunno12:14
niemeyerrog: -1.. these are nouns12:14
niemeyerwe need verbs12:14
rogyeah12:14
rogi want to call waitDead waitDeath12:15
rogbut can't think of an appropriate name for the other one12:15
fwereade_rog, I considered similar but "waitBirth" wasn't right12:15
fwereade_rog, indeed12:15
niemeyerwaitAlive and waitDead feels great to me12:15
rogyeah12:15
niemeyerAnd on that node, I'll do some reviews..12:16
rogthat's fine. just thought i'd mention it in case others has similar thoughts.12:16
fwereade_niemeyer, would you like me to take a look at gozk as well?12:16
niemeyerActually, let me finish answering emails first12:16
fwereade_niemeyer, I don't want to overload you :)12:16
rogi could do gozk12:16
niemeyerfwereade_: Yes, please12:16
niemeyeror rog.. whatever works12:16
rogor fwereade_ sure12:16
rogi'm a bit ahead of myself anyway currently12:17
* fwereade_ bows out gracefully12:17
rogfwereade_: how about you refactor presence (again!) and i'll do zk.12:18
fwereade_rog, give me a shout if it becomes a hassle in any way and I can grab it12:18
fwereade_rog, but that sounds good to me12:18
* rog thinks is shouldn't be too much problem.12:18
rogs/is/it/12:18
andrewsmedinaniemeyer: hi12:27
niemeyerandrewsmedina: Hey12:27
rogfwereade_, niemeyer: ok, zk changes made. now for the harder bit... the testing.12:35
fwereade_rog, good luck :)12:36
* rog thinks of a devious way of forcing a zk request to take a long time.12:38
andrewsmedinaniemeyer: in the last weekly version Go isn't  work with two packages in same project12:44
andrewsmedinaandrewsmedina: in some of your projects you put a example.go which is a different package.12:46
fwereade_rog, just a thought; I have a couple of `switch event.Type`s without defaults; if I get totally unexpected events, it would probably be sensible to `close(watch)`, right?13:08
rogfwereade_: you could probably panic13:08
rogfwereade_: it's part of the contract that the correct event types should be returned13:09
rogfwereade_: if they're not, then something's very broken13:09
rogthe contract with zk, that is13:09
fwereade_rog, in that case I think explicitly panicking may be excessively defensive13:09
rogfwereade_: i'd prefer to see a panic rather than a silent failure in that case13:10
fwereade_rog, ok, just `panic(event)` then?13:10
rogfwereade_: panic(fmt.Errorf("unexpected event %v", event))13:10
rogor something like that13:10
fwereade_rog, ok then :)13:11
rogit's nice to see the thing that caused the panic13:11
fwereade_rog, sounds good13:12
fwereade_rog, hm, I guess I can lose the forward() stuff once we have a Close-safe gozk13:16
rogfwereade_: yeah. that'll be good.13:16
rogfwereade_: i could push a version that you could use until i've done the tests, if you like.13:16
fwereade_rog, a good thing but also sort of a shame, that stuff made me very happy :)13:17
fwereade_rog, I guess that would be useful13:17
rogfwereade_: it's still there for when you need it again...13:17
rogfwereade_: in fact i think i might be using that technique in the zk tests... (my "devious way")13:17
fwereade_rog, indeed, the pleasure of deleting code still outweighs the pleasure I took in the code to begin with13:18
fwereade_rog, nice :)13:18
rogfwereade_: i'm gonna have a forwarder and stop all incoming traffic. then i know that the reply can't come back quickly...13:18
rogfwereade_: in fact i might steal your code directly, since it's still there...13:20
fwereade_rog, go for it :)13:20
TheMueSo, solved a merging prob, 76 tests passed13:24
rogTheMue: nice13:25
TheMuerog: yeah, it goes on, piece by piece13:26
rogTheMue: i bet you're glad you speeded up the tests now :-)13:26
TheMuerog: definitely13:35
TheMuerog: when zk is cold (first start) about 11 sec, then about 6.3 sec.13:36
rogTheMue: how long if you run just one test?13:37
TheMuenot yet tested13:37
fwereade_rog, TheMue, IME live-zookeeper tests have a 5s overhead13:39
fwereade_(when hot)13:39
rogso about 1.3 seconds to run the tests. that's not bad13:39
rogTheMue: try go gotest -gocheck.run patternMatchingOneTest :-)13:40
TheMuerog: with a lot i/o13:40
rogoh, i think it might be gocheck.f actually13:40
fwereade_rog, it is13:40
rogfwereade_: i'm going to change that soon - it should be the same as the equivalent test option13:41
rogfwereade_: i just haven't pushed the change yet13:41
rogwell, the merge request13:41
fwereade_rog, fine with me; I'll discover it, swear, check the options and be happy :)13:41
TheMuerog: a single test is 5.11 vc 6.36 for all tests13:45
rogain't java wonderful?13:45
TheMuefwereade_: your guess with 5 sec seems good13:45
TheMuerog: haha13:45
fwereade_niemeyer, should ExpectFailure (gocheck) possibly output something in test runs? as it is, its "don't forget this test exists" value seems a touch limited13:52
niemeyerfwereade_: The purpose is precisely to not forget that there are broken tests without having a failure that we'll learn to ignore13:54
fwereade_niemeyer, agree that ignored-by-convention failures are poisonous; I just don't see how it stops us forgetting any better than renaming the methods to DontTest* would13:56
niemeyerfwereade_: It mentions that there are expected failures at the end of the run, doesn't it?13:57
fwereade_niemeyer, perhaps I'm Doing It Wrong, but not afaics13:57
fwereade_william@diz:~/code/go/src/launchpad.net/juju/go$ go test launchpad.net/juju/go/state/presence13:57
fwereade_ok  launchpad.net/juju/go/state/presence5.890s13:57
niemeyerHmm13:57
niemeyerfwereade_: Run it without the package name13:58
niemeyerfwereade_: Within the directory13:58
niemeyerfwereade_: Or with -v13:58
niemeyerfwereade_: go test is eating the output13:58
fwereade_niemeyer, no dice either way13:58
fwereade_niemeyer, ah, somehow run it without using "go test"?13:58
niemeyerNo13:59
niemeyerfwereade_: There's something wrng indeed13:59
fwereade_niemeyer, it's not critical, shall I just open a bug?14:01
fwereade_niemeyer, (srry, I feel bad distracting you all the time while the store's still in progress)14:02
niemeyerfwereade_: No worries really14:02
niemeyerfwereade_: I haven't even managed to get to that yet14:02
fwereade_niemeyer, heh, that's a large part of why I feel bad about it ;)14:03
niemeyerfwereade_: Yes, please file a bug.. it's supposed to print it14:03
niemeyerHey cliff-hm14:25
niemeyercliff-hm: How're things going in the RedHat side of the world?14:25
cliff-hmPretty OK thanks. :)14:27
cliff-hmJust trying to understand and play with Juju a bit to see how it fits itself together.14:28
niemeyercliff-hm: Nice, please let us know what you think14:28
cliff-hm(I prefer not to hide away) :)14:29
cliff-hmSure14:31
cliff-hmI may try to do that contest, not sure yet.14:32
cliff-hm(In my spare, non-company time)14:32
niemeyercliff-hm: That'd be great14:38
niemeyercliff-hm: When can we expect juju to be integrated into RedHat's products? :)14:38
niemeyerrog: Can you please sort out this branch so we can take it out of the review list: https://code.launchpad.net/~rogpeppe/juju/fix-tutorial-with-expose-2/+merge/7537914:43
rogniemeyer: jeeze, i thought i did that months ago :-)14:43
niemeyerrog: and this one too: https://codereview.appspot.com/5674051/14:44
rogniemeyer: TheMue wanted to do something different there, so i was thinking of abandoning it14:45
niemeyerrog: That's a fine solution, but we need to reject it so that that it goes out of the review list14:45
rogniemeyer: ok. i'll reject it.14:46
niemeyerrog: Thanks14:46
cliff-hmniemeyer, I don't know ;) I'm doing this on personal time ... out of my own interests.14:46
niemeyercliff-hm: Ah, understood. I thought it was related to you being a managed of the Spacewalk/Satellite/RHN/etc team14:48
niemeyercliff-hm: Please drop us a note if you have any suggestions then14:49
cliff-hmMy interests stem from that background, yes. But I'm not under company orders to spy and look at what others do :) My day job is too busy for those tasks.14:50
cliff-hmAnd yeap. I will try to give feedback. Though likely to have more questions :)14:53
cliff-hm(been reading so far, hoping tonight and weekend to stay hands-on)14:53
niemeyercliff-hm: Sweet14:54
niemeyercliff-hm: I have good friends at RH, btw..14:54
rogniemeyer: both rejected - the earlier was submitted in some other form sometime.14:54
niemeyercliff-hm: Probably not very close to you, though14:55
cliff-hmwe have a lot of employees now. I know a bunch of people in my time here, but there is a lot more people that I do not know, than know.14:58
niemeyercliff-hm: Can imagine.. most of the people I know are around kernel development, I believe14:58
niemeyerRietveld cross-changes diffing is such a huuuge aid15:02
niemeyercliff-hm: Do you work close to the Openshift guys?15:02
niemeyerSuddenly rietveld is crawling15:09
niemeyerWhich makes this a good time to have lunch15:09
niemeyerrog: I did manage to LGTM one of hte pending branches, though15:09
rogniemeyer: magic, thanks15:09
niemeyerWill finish pending reviews when I'm back, and then will step out to do store work15:10
cliff-hmI know several, but not close with them. Closer to the Aoelous and Katello, pulp project guys.15:13
rogniemeyer, fwereade_: https://codereview.appspot.com/5694068/15:18
rogprobably the most significant one yet.15:18
rogfwereade_: if you wanna have a look, probably the best place to start is in util.go, which has the basic robustness primitive.15:19
rogfwereade__: just checking: did you see my CL above?16:16
rogfwereade__: or is your network playing up?16:16
roghmm, can anyone see this?16:17
niemeyerIt's funny that every time Mark starts a rant in canonical-tech, there's a bunch of people asking to subscribe to the list16:29
rogniemeyer: i hadn't looked at that list recently16:41
fwereade__rog, sorry, I keep not seeing notifications16:49
rogfwereade__: that's fine. this is what i said:16:50
fwereade__rog, I did, and thank you for the advice to start looking in util, would have been a little mystifying otherwise ;)16:50
rog15:18] <rog> niemeyer, fwereade_: https://codereview.appspot.com/5694068/16:50
rog[15:18] <rog> probably the most significant one yet.16:50
rog[15:19] <rog> fwereade_: if you wanna have a look, probably the best place to start is in util.go, which has the basic robustness primitive.16:50
rogfwereade__: np16:50
niemeyerrog: Yeah, I'm on reviews16:51
rogniemeyer: brill!16:51
niemeyerMan.. why is codereview *so slow*16:54
niemeyerrog: "there's deliberately only one securityGroup instance for each security group."16:54
niemeyerrog: Where's that enforced?16:54
rogniemeyer: there's only one occurrence of "&securityGroup" in the file, which is in newSecurityGroup, which is only called when group() has returned nil16:57
rogniemeyer: actually, i think the newSecurityGroup call in NewInstances (even though it conforms to that rule) might be ananachronism16:58
rogan anachronism16:58
niemeyerrog: There are two different calls to newSecurityGroup16:58
rogniemeyer: they're both called only when group() has returned nil16:59
rogniemeyer: i.e. they never create a group with the same contents twice16:59
rogniemeyer: but see above too16:59
rogi think the occurrence in NewInstances might be from before i implemented CreateSecurityGroup17:00
niemeyerrog: Still, this is a problem.. the current design is error prone, and there's a design not documented anywhere and scattered across the code17:00
rogniemeyer: if i deleted it from NewInstances (which i think i probably should) then it would not be scattered.17:01
niemeyerrog: It would still not be documented, and it would still be error prone17:01
niemeyerrog: All it takes is a single line: &securityGroup{}.. BOOM, bug.17:01
rogniemeyer: so you don't like comparing pointers?17:02
rogniemeyer: or are you asking for a comment?17:02
niemeyerrog: I like comparing pointers.. I don't like error prone code that will break behind our back in hard to notice ways17:02
niemeyerrog: I'm asking to make the code not error prone17:02
niemeyerrog: I don't want to remind everybody that touches that file that they can't create security groups17:02
rogniemeyer: if i put a comment next to the securityGroup type saying "only create this through newSecurityGroup", would that be ok?17:03
rogniemeyer: (and have newSecurityGroup do the check)17:03
niemeyerrog: The comment + the latter sounds good17:03
rogniemeyer: cool, will do.17:04
niemeyerrog: This will prevent people from making silly but reasonable mistakes, I hope17:04
niemeyerrog: Thanks, and sorry for the trouble17:04
rogniemeyer: np. it was worth it to find the bug in NewInstances17:04
rogwell, not precisely a bug17:05
niemeyerrog: Please also add a short note in the permKey struct, next to the pointer definition17:09
fwereade__EOD, and I'm exhausted :). happy weekends all!17:11
niemeyerrog: Nice comments, btw, thanks17:11
niemeyerfwereade__: have a good time man :)17:11
rogfwereade__: happy weekend to you to17:12
rogo17:12
rogniemeyer: something like this?17:15
rog// permKey represents permission for a given security17:15
rog// group or IP address (but not both) to access a given range of17:15
rog// ports. Equality of permKeys is used in the implementation of17:15
rog// permission sets, relying on the uniqueness of securityGroup17:15
rog// instances.17:15
niemeyerrog: Sounds great, thanks17:17
rogniemeyer: cool.17:28
rogwith those changes, does it LGTY?17:29
rogniemeyer: here's the other comment, BTW:17:29
rog// securityGroup holds a simulated ec2 security group.17:29
rog// Instances of securityGroup should only be created through17:29
rog// Server.newSecurityGroup to ensure that groups can be17:29
rog// compared by pointer value.17:29
rogniemeyer: PTAL17:35
niemeyerrog: // It returns nil if a security group with the given name already exists.17:43
niemeyerrog: Feels like a trap17:43
rogniemeyer: really?17:43
rogniemeyer: do you want it to return an error?17:44
rogniemeyer: i considered that, but i haven't got an error type yet17:44
rogniemeyer: and the caller needs to call fatalf with an appropriate error code17:44
rogniemeyer: i could just delete the newSecurityGroup function17:45
niemeyerrog: Yeah, an error sounds good.. or make it panic internally and delegate the responsibility of checking to the call site17:45
rogniemeyer: what about my last suggestion?17:46
rogniemeyer: just inline it inside CreateSecurityGroup17:46
niemeyerrog: The abstraction is fine.. just make it panic internally if the group already exists17:47
niemeyerrog: We just want to make the error visible rather than a hidden issue17:47
rogniemeyer: it won't be hidden for long... anything that tries to use the returned group will panic.17:47
rogniemeyer: but ok17:47
niemeyerrog: Not really. Returning a nil group means people could get away with setting a permKey.group to it, for instance.17:48
roggiven that it's only called in one place, i don't really see why the creation needs its own function.17:49
niemeyerrog: Sure, inline then..17:49
rogsomething in me slightly balks at doing two linear searches of the group list... (i know that's a ridiculous concern, too!)17:49
niemeyerrog: Keep a comment about the problem, please17:50
rogniemeyer: will do17:50
rogniemeyer: done.17:54
rogweird17:56
rogmy irc client appears to have logged me in twice17:57
rogah! that must've been carmen opening my laptop17:57
rogniemeyer, TheMue: i'm off now. have a good weekend.17:59
niemeyerrog: LGTM.. thanks17:59
TheMuerohave a good weekend too17:59
niemeyerrog: Have a good weekend17:59
rogniemeyer: thanks for the reviews.17:59
rogniemeyer: always nice to finish with a submit. thanks.18:00
niemeyerrog: Indeed! :-)18:00
niemeyerrog: and np18:00

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!