[00:01] <wallyworld_> thumper: i forwarded an email
[00:01] <wallyworld_> but you will need a previous one where the creds were sent
[00:01] <wallyworld_> i think you should have received both of those emails
[00:01] <thumper> probably
[00:02] <thumper> probably ignored them too
[00:02] <wallyworld_> lol
[00:02] <wallyworld_> so, i just source the creds and can then ssh in to that ip address
[00:02] <wallyworld_> then you can bzr pull the new rev as needed for ehatever source tree
[00:03] <wallyworld_> the bot uses a cron job to wake up every minute
[00:15] <davecheney> is the bot still alive
[00:15] <davecheney> has canonistack eaten that machine ?
[00:19] <thumper> davecheney: the bot is still there
[00:19] <thumper> it failed to merge my branch
[00:19] <thumper> because I use an updated crypto
[00:21] <thumper> lunchbreak means going to the supermarket with kids - always fun...
[01:01] <axw> hey thumper, wallyworld_ - welcome back
[01:02] <wallyworld_> hi. wish i could say it's good to be back :-)
[01:02] <wallyworld_> you were back last week?
[01:02] <axw> yup
[01:02] <wallyworld_> would have been quiet
[01:02] <axw> yeah it was
[01:03] <axw> fortunately my stuff took long enough that it didn't need reviews yet :)
[01:03] <wallyworld_> :-)
[01:04] <wallyworld_> soooo hot here today. may need to jump in the pool at some point to cool off. we had record breaking temperatures over the weekend. 44 degrees
[01:04] <axw> ouch
[01:04] <wallyworld_> almost as bad today
[01:04] <axw> gonna be 39 here
[01:04] <axw> I thought that was bad
[01:04] <thumper> it is 18°C here
[01:04] <wallyworld_> we get it worse cause the hot air blows from west to east
[01:05] <wallyworld_> wish it were 18 here
[01:05] <axw> I could do 18
[01:10] <axw> thumper: are you waiting for another review on the ssh stuff?
[01:10] <thumper> axw: nah, I think it is all good
[01:10] <axw> I intend to use your GenerateKey function in an MP of mine
[01:10] <thumper> just need to hack the bot
[01:10] <axw> cool
[01:11] <axw> bot's broken?
[01:13] <thumper> just dependencies
[01:16] <bigjools> wallyworld_: only record breaking temps if you believe the shite in the paper
[01:16] <wallyworld_> bigjools: you saying the paper makes up the recorded temps?
[01:17] <bigjools> it misquotes
[01:17]  * bigjools finds a reference
[01:17] <wallyworld_> you need to take off your tin foil hat
[01:17] <wallyworld_> it has cooked your brain with the heat
[01:17] <bigjools> not wearing one, just papers like to sell papers so they dramatise
[01:17] <bigjools> talking about clamitous weather is known to sell more
[01:18] <bigjools> calamitous even
[01:18] <wallyworld_> well the temp on sat was about 4, right?
[01:18] <wallyworld_> 44
[01:18] <bigjools> was 46 here
[01:18] <wallyworld_> so what's the issue?
[01:18] <wallyworld_> that's what the paper reported
[01:19] <bigjools> "record breaking"
[01:19] <wallyworld_> well it was
[01:19] <bigjools> nope
[01:19] <wallyworld_> when was it hotter
[01:19] <bigjools> ...
[01:19] <bigjools> when did measurements start?
[01:19] <wallyworld_> several years ago it was 41 or 42
[01:19] <wallyworld_> 1800s
[01:20] <wallyworld_> don't tell me you are going to be a pedant and say it may have been hotter 600000000000 years ago
[01:21] <bigjools> sigh
[01:24] <thumper> axw: I've logged into the bot and updated go.crypto
[01:24] <thumper> axw: attempting to land again
[01:30] <axw> thumper: thanks
[01:30] <axw> thumper: how come dependencies.tsv didn't do it?
[01:31] <axw> or did you not update that file?
[01:31] <thumper> axw: apparently the bot doesn't update based on that yet
[01:31] <thumper> so I'm told
[01:31] <axw> oh
[01:31] <thumper> I assumed it did
[01:31] <axw> just the LP recipe I guess
[01:37] <thumper> wallyworld_: where is the tarmac log on that machine?
[01:38] <wallyworld_> thumper: ~tarmac/logs i think
[01:39] <thumper> wow, doesn't log much useful
[01:39] <thumper> can see it is merging my code though
[01:39] <thumper> and done
[01:39] <wallyworld_> \o/
[04:40]  * thumper is done for the day
[04:40] <thumper> see y'all tomorrow
[08:15] <rogpeppe> mornin' all
[08:19] <TheMue> heya and happy new year
[08:35] <jam> hi rogpeppe and TheMue
[08:35] <rogpeppe> jam, TheMue: hiya
[08:36] <rogpeppe> and happy new year to you too
[10:02] <mgz> morning!
[10:04] <jam> morning mgz, just finishing up my 1:1 with rogpeppe, will chat with you in a sec
[10:04] <mgz> jam, sure
[10:12] <jam> mgz: I'm ready
[10:13] <mgz> jam: mumble?
[10:13] <jam> certainly, I'm there
[10:13] <mgz> almost
[10:13] <jam> well, I had been there, maybe it gets angry if you sit in an empty channel
[10:43] <jam> see you guys in the standup in a sec, just grabbing a coffee
[11:17] <jam> mgz: any idea how to recursively delete s3 buckets? The web interface requires you to empty buckets before you can delete them, but we have 50 "test" buckets that are left over
[11:25] <mgz> euca2ools should have something
[11:27] <mgz> hm, or maybe not, maybe you need the s3 command
[11:28] <jam> mgz: there is s3nukem http://robertlathanh.com/2010/07/s3nukem-delete-large-amazon-s3-buckets/
[11:29] <mgz> funny name
[11:29] <jam> mgz: adapted from s3nuke, but in a nice way :)
[11:32] <rogpeppe> natefinch: ha! it's another transient error - i retried and it succeeds after 2 seconds of returning that error...
[11:33] <rogpeppe> jam: s3cmd supports recursive removal
[11:34] <rogpeppe> jam: i seem to remember writing something to do concurrent recursive removal, but maybe i just piped the output of s3cmd into my concurrent xargs program
[13:59] <rogpeppe> jam: should juju 1.17.0 status work correctly against a 1.16.3 environment?
[14:04] <mgz> rogpeppe: it should fall back to reading the db
[14:05] <rogpeppe> mgz: it doesn't appear to be working
[14:05] <mgz> rogpeppe: so, if the db is compatible (we make sureit is...) it should be okay?
[14:05] <mgz> rogpeppe: can you tell if it's trying the api and failing... then?
[14:05] <rogpeppe> mgz: i just got a report that showed this status: http://hastebin.com/hudipomama.txt
[14:06] <mgz> that pastebin is much better for my silly detatched irc thing
[14:06] <mgz> shame it needs js...
[14:06] <mgz> well, that's pretty facinating
[14:07] <rogpeppe> mgz: yeah, that's what i thought
[14:07] <mgz> I assume you don't really have lots of machines? I wonder what all the ds are
[14:07] <rogpeppe> mgz: that's not my status, it's from a real installation
[14:07] <rogpeppe> mgz: they were using juju-devel ppa, not juju stable
[14:07] <mgz> ah, so maybe the machines are semi-accurate
[14:08] <rogpeppe> mgz: i think so
[14:08] <mgz> but the state connection apparent;y didn't work at all
[14:08] <mgz> just the get machines from the ec2 api bit
[14:09] <mgz> when I hack-tested this, when we added the fallback, this all worked
[14:09] <rogpeppe> hmm
[14:09] <mgz> but of course, that's not what landed, and dimiter may well have not tested that, I know I didn't
[14:11] <mgz> rogpeppe: can you get them to file a bug?
[14:11] <rogpeppe> mgz: will do
[14:13] <rogpeppe> mgz: just to check: there should be no probs upgrading from 1.16.3 to 1.16.5, right?
[14:15] <mgz> I have not heard of any, and certainly that's our general minor version promise is
[14:15] <mgz> the bump past 1.16.3 is rougher than normal though
[14:15] <mgz> we added various things that wouldn't normally make a minor release, but not anything that should mess with upgrades really
[14:20] <sinzui> looks like dependencies.tsv is invalid again http://162.213.35.54:8080/job/prepare-new-version/745/console
[14:23] <mgz> sinzui: it does indeed
[14:23] <mgz> I'll land a fix
[14:24] <mgz> blame is tim, r2182
[16:42] <rogpeppe> natefinch: i'm never seeing the instances get out of StartupState. looks like getStatus is returning the state in "state", not "myState".
[16:42] <rogpeppe> natefinch: it'd be good to have a test for that in the replicaset package
[16:44] <rogpeppe> natefinch: ah, it's actually as documented
[16:53] <natefinch> rogpeppe: which is documented?
[16:54] <rogpeppe> natefinch: the fact that the state is returned in "state", not "myState"
[16:54] <natefinch> rogpeppe: weird, wonder where I got "myState" from
[16:54] <rogpeppe> natefinch: i've fixed that, but run straight into another problem - that the two secondaries never move out of "UNKNOWN" state
[16:54] <rogpeppe> natefinch: you didn't read far enough down the page :-)
[16:55] <natefinch> rogpeppe: those documentation pages are pretty long at times ;)
[16:55] <rogpeppe> natefinch: myState is the field used for the state of the server you're talking to
[16:55] <rogpeppe> natefinch: it seems my secondaries are never managing to connect to the primary
[16:55] <natefinch> rogpeppe: hmm weird
[16:55] <rogpeppe> natefinch: it might be something to do with the localhost issue, i suppose
[16:56] <rogpeppe> natefinch: whatever that issue might be...
[16:56] <natefinch> rogpeppe: arg, possibly.  sigh.
[16:57] <natefinch> rogpeppe: I should have tested the mystate vs. state thing, but I had been intentionally trying not to test mongo itself, assuming that if I gave it commands it accepts, it would do the right thing.
[16:58] <rogpeppe> natefinch: it did :-)
[16:58] <rogpeppe> natefinch: unfortunately StateupState == 0
[17:00] <mramm2> rogpeppe: mgz: you guys around?
[17:00] <rogpeppe> mramm2: i am
[17:01] <mramm2> sabdfl is looking to setup a sprint at bluefin end of january
[17:01] <mramm2> we are going to have 5 MAAS clusters in a box ready for testing then
[17:02] <mramm2> and will be developing a set of demo's that the sales engineers can use for deploying openstack
[17:02] <mramm2> and then solutions on top of openstack using those maas clusters
[17:02] <mramm2> and sabdfl would like juju core representation
[17:03] <rogpeppe> mramm2: bluefin?
[17:03] <rogpeppe> mramm: bluefin?
[17:03] <mgz> mramm: I'm here too
[17:03] <mramm> 12:01 mramm has left IRC (Ping timeout: 240 seconds)
[17:03] <mramm> 12:01 mramm2: we are going to have 5 MAAS clusters in a box ready for testing then
[17:03] <mramm> 12:02 mramm2: and will be developing a set of demo's that the sales engineers can use for deploying openstack
[17:03] <mramm> 12:02 mramm2: and then solutions on top of openstack using those maas clusters
[17:03] <mramm> 12:02 mramm2: and sabdfl would like juju core representation
[17:03] <mramm> yep
[17:03] <mramm> bluefin
[17:04] <rogpeppe> mramm: sorry, what's bluefin?
[17:04] <mgz> representation at bluefin, I assume
[17:04] <mgz> rather than in a hangout or something
[17:04] <mramm> yep
[17:04] <rogpeppe> bluefin's a company?
[17:04] <sinzui> rogpeppe, our head office
[17:04] <mramm> it's the location of our offices
[17:04] <mgz> rogpeppe has first refusal, given I've been down recently, but it's an easy trip for me
[17:04] <rogpeppe> ah!
[17:05] <mgz> rogpeppe: worth going if you've not been before
[17:05] <mramm> rogpeppe is my preferred choice too
[17:05] <rogpeppe> i'd be happy to do it
[17:05] <mramm> unless you both want to go ;)
[17:05] <rogpeppe> but mgz has lots more MAAS experience than me
[17:05] <mramm> rogpeppe: I'll add you to the list
[17:06] <mramm> rogpeppe: that is true
[17:06] <rogpeppe> (i have never used MAAS)
[17:06] <mgz> nah, I have some specific maas knowledge that's probably not relevent here
[17:06] <mramm> but we should have a MAAS team member there too
[17:07] <rogpeppe> my openstack-fu is similarly limited
[17:07] <rogpeppe> but i'm fine in juju-core-land :-)
[17:08] <rogpeppe> mramm: do you know the actual dates?
[17:14] <rogpeppe> i'd quite like the opportunity to spend a bit more time around openstack stuff, actually
[17:21] <rogpeppe> natefinch: found the problem with the UNKNOWN thing - it takes 10s before the secondaries connect
[17:21] <rogpeppe> natefinch: my problem now is that the secondaries never seem to get out of STARTUP2STATE
[17:21] <natefinch> rogpeppe: ahh hmm ok.
[17:22] <natefinch> rogpeppe: we're not exactly speeding up the tests here ;)
[17:22] <rogpeppe> natefinch: indeed not
[17:26] <rogpeppe> natefinch: ah, found the problem, i think
[17:26] <rogpeppe> 2014/01/06 17:26:03 mongod:57236: Mon Jan  6 17:26:03.817 [conn4] key file must be used to log in with internal user
[17:28] <natefinch> rogpeppe: hmm... I didn't think you needed any authentication if you're connecting from localhost
[17:28] <rogpeppe> natefinch: i think it must be different for peers
[17:36] <rogpeppe> natefinch: bingo!
[17:36] <natefinch> rogpeppe: what did you find?
[17:37] <rogpeppe> natefinch: that was the reason, and now that i've done that, i've finally seen a  [PRIMARY SECONDARY SECONDARY] status...
[17:37] <natefinch> well good
[17:39] <rogpeppe> natefinch: it took about 30s from starting to wait for sync, to fully synced (that's with an empty db), and about 42 seconds from no-servers-running
[17:39] <natefinch> rogpeppe: ug
[17:39] <rogpeppe> natefinch: this is not the fastest sync in the world :-)
[17:40] <natefinch> rogpeppe: 30s to sync no data is pretty bad ;)
[17:40] <natefinch> rogpeppe: on localhost no less
[17:40] <rogpeppe> natefinch: for the record, here's the log of what was going on during that time: http://paste.ubuntu.com/6704366/
[17:43] <natefinch> rogpeppe: wonder if this has anything to do with it: "noprealloc may hurt performance in many applications"  honestly, I wouldn't think anything coudl hurt performance that badly with empty DBs on localhost
[17:44] <rogpeppe> natefinch: i agree - i don't think noprealloc could harm anything here
[17:44] <rogpeppe> natefinch: no data on an SSD
[17:53] <natefinch> rogpeppe: does it matter if it's an SSD if no data is there?   (there's a profound question question ;)
[17:54] <rogpeppe> natefinch: well, it might be doing *something* in that 30 seconds....
[17:54] <rogpeppe> natefinch: i'm thinking that the 30s might be due to the 10s heartbeart
[17:54] <rogpeppe> natefinch: 3 heartbeats for the status to propagate around the system
[18:20] <rogpeppe> niemeyer: happy new year!
[18:21] <rogpeppe> niemeyer: i'd like to confirm something about mgo, if you have a moment at some point
[18:22] <niemeyer> rogpeppe: Heya
[18:22] <niemeyer> rogpeppe: Thanks, and happy new year :)
[18:22] <niemeyer> rogpeppe: Shoot
[18:23] <rogpeppe> niemeyer: if i've performed an operation on a Session, it it possible that the session changes to use a different server at some point in the future (without doing anything explicit, such as SetMode) ?
[18:23] <rogpeppe> niemeyer: from my current experiments, it seems like that doesn't happen, but i'd like to be sure
[18:23] <niemeyer> rogpeppe: Depends on the current mode of the session.. if on Strong mode, no
[18:24] <niemeyer> (Strong is the default, FWIW)
[18:25] <rogpeppe> niemeyer: so if it's in Monotonic, it may switch without any explicit intervention?
[18:26] <niemeyer> rogpeppe: If it's Monotonic, it will switch deterministically on writes
[18:26] <rogpeppe> niemeyer: currently, i seem to get an EOF error even in Monotonic mode, but perhaps that might resolve itself if I keep trying
[18:26] <niemeyer> rogpeppe: From a slave to the master
[18:26] <niemeyer> rogpeppe: (that's the whole point of Monotonic)
[18:26] <rogpeppe> niemeyer: ah, but not if a primary goes down and is replaced by another one?
[18:27] <niemeyer> rogpeppe: If besides the primary that was the single server alive, then surely it'll break as suggested
[18:27] <niemeyer> rogpeppe: After it fails, it will not switch
[18:27] <rogpeppe> niemeyer: in this case there were two secondaries
[18:27] <niemeyer> rogpeppe: Or even reconnected without an ack from the code
[18:27] <niemeyer> reconnect
[18:27] <rogpeppe> niemeyer: but that's good to know
[18:28] <niemeyer> rogpeppe: In that case the behavior depends on whether the session was already hooked to the master
[18:28] <rogpeppe> niemeyer: my current plan is to make the single-instance workers (provisioner, firewaller, etc) move with the mongo master
[18:28] <niemeyer> rogpeppe: If the session was hooked to the mater, and the connection drops due to a server shutdown, you'll get an EOF
[18:28] <rogpeppe> niemeyer: that's the behaviour I want to rely on
[18:29] <niemeyer> rogpeppe: That EOF will not go away until: a) THe session is Closed, discarded, and re-created b) The session is Refresh'ed
[18:29] <niemeyer> rogpeppe: You want to rely on the fact the error doesn't go away?
[18:29] <rogpeppe> niemeyer: if that happens, we'll redial and run the workers if we are on the same machine as the mongo primary
[18:29] <niemeyer> rogpeppe:  You don't have to redial.. just Refresh the session at the control point
[18:30] <rogpeppe> niemeyer: i want to rely on the fact that if the primary gets reelected, that we're guaranteed to get an error so that we know for sure that we've not got two "single-instance" workers running at the same time
[18:31] <rogpeppe> niemeyer: that's useful to know, thanks
[18:31] <niemeyer> rogpeppe: You'll not get an error if a primary gets re-elected and the session was not hooked to it
[18:31] <rogpeppe> niemeyer: for those workers' connection to the State, i'd use a direct connection in strong mode
[18:31] <niemeyer> rogpeppe: E.g. a Monotonic session that got no writes will be hooked to a slave, assuming one is available
[18:32] <niemeyer> rogpeppe: That session has no reason to error out if the primary shifts
[18:32] <rogpeppe> niemeyer: it will get an error though, right?
[18:33] <niemeyer> rogpeppe: Hmm.. these two last sentences contradict each other.. :)
[18:33] <rogpeppe> niemeyer: so it'll be difficult to avoid (say) an API call failing because the primary shifts
[18:33] <niemeyer> rogpeppe: It will NOT error out if the primary shifts, because it was connected to a slave.. it doesn't care about the fact the primary went through trouble
[18:33] <rogpeppe> niemeyer: ah, but if it happens to be connected to the primary, then it will error out, right?
[18:33] <niemeyer> rogpeppe: It really depends on how you're structuring the code.. let me give an example
[18:34] <rogpeppe> niemeyer: so i guess it's down to chance
[18:34] <niemeyer> rogpeppe: Let's say you have an http server that creates a new session to handle each request
[18:34] <niemeyer> rogpeppe: Now, the primary just broke down, but there were no requests being serve
[18:34] <niemeyer> d
[18:34] <niemeyer> rogpeppe: Then, the primary gets re-elected..
[18:35] <niemeyer> rogpeppe: The driver picks up the new master, resyncs, and keeps things up
[18:35] <niemeyer> rogpeppe: Finally, the http server gets a new request to handle
[18:35] <niemeyer> rogpeppe: and runs Copy on a master session, creating a new session to handle this specific request
[18:35] <niemeyer> rogpeppe: This new request will not error out..
[18:36] <niemeyer> rogpeppe: As it was a fresh session to a working server
[18:36] <rogpeppe> niemeyer: ok
[18:36] <niemeyer> rogpeppe: It's not down to chance, any more than a TCP connection breaking due to a temporary outage on the packet path is down to chance
[18:36] <rogpeppe> niemeyer: (that doesn't apply easily to our case, because we've got very long-lived http requests)
[18:37] <niemeyer> rogpeppe: If there is a long-living session that is active (was used, not refreshed), the error will surface
[18:37] <niemeyer> rogpeppe: It doesn't pretend that things are sane when they're not
[18:38] <rogpeppe> niemeyer: ok
[18:38] <niemeyer> rogpeppe: No need for the direct mode for that, btw
[18:40] <rogpeppe> niemeyer: apart from the fact that Refresh presumably uses more up-to-date info for the server addresses, is there a significant difference between using Dial and using Refresh?
[18:41] <niemeyer> rogpeppe: Oh yeah
[18:41] <niemeyer> rogpeppe: Dial creates an entire new cluster
[18:41] <niemeyer> rogpeppe: Syncing up with every server of the replica set to figure their state and sanity
[18:41] <niemeyer> rogpeppe: Refresh is very lightweight
[18:41] <rogpeppe> niemeyer: ok, that's good to know
[18:42] <niemeyer> rogpeppe: It just cleans the session state and moves sockets back to the pool, assuming they're in a sane state
[18:42] <rogpeppe> niemeyer: can I rely on the fact that err==io.EOF implies that the server has gone down and a Refresh could clean it up?
[18:42] <niemeyer> rogpeppe: No.. EOF means socket-level EOF
[18:42] <niemeyer> rogpeppe: Whatever the reason why that happened
[18:43] <rogpeppe> niemeyer: ah, ok. is there a decent way of telling that the server has gone away and i need to refresh?
[18:43] <niemeyer> rogpeppe: But you can always refresh on errors
[18:43] <niemeyer> rogpeppe: and once/if the server works again, the new session will behave properly again
[18:43] <niemeyer> rogpeppe: IOW, no reason to reconnect
[18:43] <niemeyer> s/reconnect/redial/
[18:44] <niemeyer> rogpeppe: In fact, you can always refresh at the control point
[18:44] <niemeyer> rogpeppe: With errors or not
[18:44] <rogpeppe> niemeyer: after every API call?
[18:45] <niemeyer> rogpeppe: No.. by control point I mean the place where the code falls back to on every iteration
[18:45] <niemeyer> rogpeppe: For example, if there was a loop, it might be refreshed on every iteration of the loop to get rid of already acknowledged errors
[18:45] <niemeyer> rogpeppe: If it's an http request, it might refresh before every Accept (although that doesn't quite fit, since there would be multiple requests on any realistic server)
[18:46] <niemeyer> rogpeppe: Rarely in an application it would be okay to assume errors didn't happen, mid-logic
[18:46] <rogpeppe> niemeyer: i'm not quite sure how that would map to our code structure
[18:47] <rogpeppe> niemeyer: the main loop is in the rpc package, reading requests from the websocket and invoking methods
[18:48] <niemeyer> rogpeppe: What happens if an error happens in the middle of a request from the client being handled?
[18:48] <rogpeppe> niemeyer: the error gets returned to the client
[18:48] <niemeyer> rogpeppe: Okay, and I assume each client has its own session to the server?
[18:49] <rogpeppe> niemeyer: yes
[18:49] <niemeyer> rogpeppe: Okay, so you might Refresh before starting to handle a new RPC, for example
[18:49] <rogpeppe> niemeyer: it's a bit of a bad spot for us currently - we don't ever redial or refresh the connection for API requests
[18:49] <rogpeppe> niemeyer: ok, so Refresh really is that (<0.1ms) light weight?
[18:50] <niemeyer> rogpeppe: Yeah
[18:50] <niemeyer> rogpeppe: It's literally clean the session and put socket back in the pool
[18:50] <niemeyer> rogpeppe: mem-ops only
[18:51] <rogpeppe> niemeyer: cool
[18:51] <rogpeppe> niemeyer: so it doesn't matter if lots of ops call it concurrently whenever they like
[18:51] <niemeyer> rogpeppe: Yep, no problem
[18:52] <niemeyer> rogpeppe: Note that if you have multiple goroutines using a single session in a way that they might potentially call Refresh concurrently, that's not quite okay
[18:52] <rogpeppe> niemeyer: oh
[18:52] <niemeyer> rogpeppe: As they might be cleaning up errors from each other that ought to be acted upon
[18:52] <niemeyer> rogpeppe: That's why I asked whether each client has its own session
[18:53] <rogpeppe> niemeyer: (that's definitely the case for us, as *state.State holds a single *mgo.Session)
[18:53] <rogpeppe> niemeyer: ah, i misunderstood session there
[18:53] <rogpeppe> niemeyer: even if each client had its own session, it still wouldn't be ok, as one client can have many concurrent calls
[18:54] <niemeyer> rogpeppe: It's fine to use sessions concurrently, but if there's a fatal error in a session that is being concurrently used, the correct behavior is to let all goroutines see such error, as whatever they were doing got interrupted
[18:54] <rogpeppe> niemeyer: i'm not sure i quite see how one Refresh might "clean up" an error from another
[18:55] <rogpeppe> niemeyer: won't whatever they were doing be locked to a particular socket (during that operation), and therefore draw the correct error anyway?
[18:55] <niemeyer> rogpeppe: If such a concurrently used session gets Refresh calls while other goroutines are using it, such a fatal error may be incorrectly swiped under the carpet
[18:56] <niemeyer> rogpeppe: I don't get your previous points
[18:56] <niemeyer> rogpeppe: If a single session is used concurrently, it's a single session
[18:56] <niemeyer> rogpeppe: If that single session sees a fatal error in its underlying socket, it's the same socket (assuming Strong or Monotonic modes)
[18:56] <niemeyer> rogpeppe: for all code using it
[18:58] <rogpeppe> niemeyer: don't operations do Session.acquireSocket before doing something?
[18:59] <niemeyer> rogpeppe: That in itself is irrelevant
[18:59] <niemeyer> rogpeppe: A session gets a socket associated with it
[18:59] <niemeyer> rogpeppe: Assuming Strong or Monotonic mode
[19:00] <niemeyer> rogpeppe: There's no magic.. if such a session gets an error while it has an associated socket, and it's being concurrently used, all the concurrent users should observe the fault
[19:00] <rogpeppe> niemeyer: could you expand on why we want that to be the case?
[19:02] <niemeyer> rogpeppe: Sure
[19:02] <niemeyer> rogpeppe: Imagine the perspective of any of the concurrent users of the session
[19:02] <niemeyer> rogpeppe: When a session is re-established, it won't necessarily be to the same server
[19:03] <rogpeppe> niemeyer: ah, so for Monotonic mode, we might suddently skip events in the log?
[19:03] <niemeyer> rogpeppe: and depending on the mode, it may even walk history backwards (master socket becoming slave socket)
[19:03] <rogpeppe> niemeyer: that doesn't apply in Strong mode, presumably?
[19:04] <rogpeppe> niemeyer: ah, i guess it can
[19:04] <rogpeppe> niemeyer: unless Wmode==Majority
[19:04] <niemeyer> rogpeppe: That's irrelevant
[19:05] <niemeyer> rogpeppe: This just means a writer won't consider the write successful until it reaches a majority
[19:05] <niemeyer> rogpeppe: This point in time can easily happen without a particular slave reader observing the data
[19:06] <rogpeppe> niemeyer: is it possible that one of the ones that wasn't part of that majority is elected primary?
[19:06] <rogpeppe> niemeyer: (i thought the election process took into account up-to-dateness)
[19:06] <niemeyer> rogpeppe: Again, that's a separate problem
[19:06] <niemeyer> rogpeppe: We're talking about someone that is actively *reading from a slave*
[19:06] <niemeyer> rogpeppe: It doesn't matter who the master is
[19:06] <rogpeppe> niemeyer: i thouht that couldn't happen in Strong mode
 rogpeppe: and depending on the mode, it may even walk history backwards (master socket becoming slave socket)
[19:07] <niemeyer> rogpeppe: I did say "depending on the mod"
[19:07] <niemeyer> mode
[19:07] <rogpeppe> [19:03:56] <rogpeppe> niemeyer: that doesn't apply in Strong mode, presumably?
[19:07] <niemeyer> rogpeppe: Ah, sorry, okay
[19:07] <niemeyer> rogpeppe: It can *still* happen
[19:08] <niemeyer> rogpeppe: If a failure happens before the majority was reached
[19:08] <rogpeppe> niemeyer: ah
[19:09] <niemeyer> rogpeppe: Of course, we're getting into very unlikely events
[19:09] <rogpeppe> niemeyer: indeed, but it's nice to have some idea of these things
[19:09] <niemeyer> rogpeppe: As the re-election will still attempt to elect the most up-to-date server, despite majority concerns
[19:11] <rogpeppe> niemeyer: i wonder if a reasonable approach might be to Copy the session for each RPC call
[19:11] <niemeyer> rogpeppe: That sounds fine
[19:11] <niemeyer> rogpeppe: Easier to reason about too
[19:11] <rogpeppe> niemeyer: presumably there's no need to Refresh such a copy?
[19:11] <niemeyer> rogpeppe: No, as it's being Closed after the RPC is handled
[19:12] <niemeyer> rogpeppe: The session is buried, and the (good) resources go back to the pool
[19:14] <rogpeppe> niemeyer: ok, that's useful, thanks
[19:15] <niemeyer> rogpeppe: My pleasure
[19:15] <niemeyer> rogpeppe: I'll head off to a coffee break
[19:15] <rogpeppe> niemeyer: i shall head to supper
[19:15] <niemeyer> Still on holiday this week, btw
[19:15] <niemeyer> Should send a note
[19:16] <rogpeppe> niemeyer: ah, well, thanks a lot for the chat!
[19:16] <niemeyer> rogpeppe: Glad to chat
[19:28] <rogpeppe> niemeyer: FYI I tried Refreshing the session after an error, but I can't get it to work - it seems to try connecting to the old primary and never trying the old secondaries
[19:29] <rogpeppe> niemeyer: (when i was redialling with all the peer addresses, it worked)
[21:31] <thumper> o/