[00:16] <thumper> o/ wallyworld_
[00:16] <wallyworld_> hello
[01:00] <bigjools> halp: juju bootstrap says "environment already bootstrapped", juju status repeats: "ERROR TLS handshake failed: EOF" ad infinitum.
[01:01] <bigjools> I see no running machines in my env with other tools
[01:01] <bigjools> 1.16.5-saucy-amd64
[01:05] <thumper> ?!
[01:05] <thumper> bigjools: run destroy environment
[01:05] <bigjools> thumper: no :)
[01:06] <thumper> no?
[01:06] <bigjools> I want to keep it - it's doing this on two envs
[01:06] <bigjools> I can SSH into bootstrap node
[01:06] <bigjools> the other env which has none I could destroy
[01:06] <thumper> maas?
[01:06] <bigjools> canonistack
[01:08] <bigjools> so sorry let me be clearer.  The one env is genuinely empty so I've destroy-env'd it now.  The other is in use but juju status can't talk to it.
[01:09] <thumper> bigjools: if you ssh into the bootstrap node of the machine that is running
[01:09] <thumper> can you see if the machine agent is running?
[01:09] <thumper> axw: hey there, where was that method that I need to implement for the local provider to get the addresses working properly?
[01:10] <axw> thumper: containers/lxc/instance.go
[01:10] <thumper> kk
[01:10] <axw> method Addresses
[01:10] <thumper> ta
[01:10] <axw> np
[01:12] <axw> thumper: I'm glad you moved RemoteResponse, because I was going to have to do it otherwise. It was causing a circular import from utils/ssh->cmd->environs/config->util/ssh
[01:12] <thumper> :)
[01:12] <axw> which is why I'm going to have to revert my change of using JujuHomePath
[01:12] <thumper> should be landing now
[01:12] <axw> cool
[01:12] <thumper> had one intermittent failure landing so far
[01:13] <bigjools> thumper: will check shortly, on a call
[01:13] <axw> they've been quite frequent lately :(
[01:13] <thumper> bigjools: kk
[01:13] <thumper> axw: yes they have
[01:13] <thumper> axw: seems like a race condition somewhere
[01:13] <thumper> axw: any idea how to track it down?
[01:13] <axw> I'm sure there are multiple
[01:14] <axw> -race may help, will likely take some days of sifting I'd say
[01:18] <thumper> axw: the address updater only runs on a machine that has the job ManageEnviron
[01:18] <thumper> axw: this isn't sufficient
[01:18] <thumper> for containers
[01:18] <axw> oh :(
[01:18] <axw> maybe we should change that?
[01:18] <thumper> we need to have something running on every machine
[01:18] <axw> that's what I thought it did
[01:18] <thumper> nope
[01:19] <axw> thumper: ah, it assumes that the addresses are observable externally
[01:19] <thumper> axw: also it is a state worker
[01:19] <axw> hrm
[01:19] <thumper> so not over the api
[01:20] <sinzui> axw, did you see this behaviour yesterday: https://bugs.launchpad.net/juju-core/+bug/1269120
[01:20] <_mup_> Bug #1269120: win client bootstrap fails because it uses private ip <bootstrap> <windows> <juju-core:Triaged> <https://launchpad.net/bugs/1269120>
[01:21] <axw> sinzui: I did not
[01:21] <axw> it should be checking them all
[01:21] <sinzui> axw, then we can say the good news is the basic test I put together for CI is valuable :)
[01:21] <thumper> check versions
[01:22] <axw> sinzui: where's the rest of the log?
[01:22] <axw> is there an error?
[01:23] <sinzui> I don't have any more. The test terminates the machine before getting the logs. Well I cannot because of the IP issue. I can get more information tomorrow when I add the test to CI
[01:24] <axw> sinzui: I mean the juju bootstrap stdout/stderr - is that all there was?
[01:24] <sinzui> axw, actual, i can get the debug output in 30 mintues
[01:24] <sinzui> axw, that was all there was from --show-log
[01:24] <axw> mk
[01:25] <axw> sinzui: does your windows box have openssh on it?
[01:25] <axw> sinzui: because it's actually expected not to work at the moment :)
[01:25] <axw> I haven't submitted my fix yet
[01:25] <axw> I'm surprised that there's no error there though
[01:26] <sinzui> axw. as a matter of fact, it does have openssh installed for the benefit of CI. The behaviour is the same from powershell though
[01:27] <axw> sinzui: ok. I think it might be a good idea to exclude openssh from %PATH% for the tests, for a more standard user setup. may be worth having both, I suppose
[01:28] <sinzui> axw. I now little about windows...thank your for the recommendation
[01:28] <thumper> wallyworld_: here is part two of the work you reviewed this morning -  https://codereview.appspot.com/52470044
[01:28] <sinzui> axw, I learned today that I need to restart sshd each time I change the rules of user envs :)
[01:28] <wallyworld_> ok, looking real soon
[01:28] <axw> sinzui: nps. some people may have openssh/cygwin installed, but it's typical for people to use PuTTY on Windows
[01:29]  * sinzui has 19.5 years of  linux development experience, but only 2 on windows
[01:29] <axw> I was unfortunate to work across something like 5 OSes in my last job :)
[01:30] <axw> the worst of all worlds
[01:32] <axw> sinzui: I think it's probably not worth investigating further until my changes land
[01:32] <axw> theoretically there should be no change if openssh is there
[01:32] <axw> but theoretically it should work just as it does on Linux if it is there
[01:35] <sinzui> axw, how long with juju wait during a bootstrap before realising that it has failed?
[01:36] <axw> sinzui: 10 mins
[01:40] <thumper> waigani_: around?
[01:41] <waigani_> thumper: hello :)
[01:50] <sinzui> wallyworld_, did r2201 change the behaviour of CI can no longer create simple streams, nor can we publish new releases. This is the command that did not generate files: http://pastebin.ubuntu.com/6753792/
[01:51] <wallyworld_> looking
[01:52] <wallyworld_> sinzui: it was not meant to change any behaviour. if it did i suck. i'll look at the command locally and see what's happening
[01:52] <wallyworld_> it could be there's a code path there that doesn't generate metadata like it should
[01:52] <wallyworld_> sync-tools needs tools tarballs and metadata now
[01:53] <wallyworld_> since when if falls back to streams.canonical.com, it needs to use simplestreams to get the tools
[01:53] <wallyworld_> hence it needs to use the same method for local tools as well
[01:54] <wallyworld_> so if the source directory is missing metadata, you could generate it using juju metadata generate-tools
[01:55] <wallyworld_> iow, just having a directory will tools tarballs is not sufficient, you also need simplestreams metadata
[01:56] <wallyworld_> i can look at building that logic into the sync tools command if the source is local
[01:56] <wallyworld_> does that rambling make sense?
[01:58] <wallyworld_> as a short term fix, try $JUJU_EXEC metadata generate-tools -d $SOURCE
[02:00] <bigjools> thumper: sigh.  I *was* sshed into the bootstrap node and my session froze.  Now, all the canonistack instances have vanished.  FFS.
[02:00] <thumper> bigjools: :(
[02:01] <bigjools> I now have to figure out how to redeploy my whole setup
[02:10] <sinzui> wallyworld_, Since this is the process that makes data for streams.canonical.com. I think we need to land an immediate fix to calling metadata generate-tools. Do we not need to call sync-tool for 1.17.1 now?
[02:12] <wallyworld_> sinzui: all you need to do to get metadata ready for streams.canonical.com is to use the generate meadata command. the sync-tools is more intended for folks to grab tools from streams.c.c so they can upload to their own cloud
[02:13] <wallyworld_> so if you have tools tarballs, juju metadata generate-tools will produce the json ready to upload
[02:13] <sinzui> wallyworld_, but we also need to create the directory structure too? (tools/releases/*.tgz) or does meta-data do that too
[02:14] <sinzui> wallyworld_, we don't use sync-tools to deliver to the clouds. we are using it to put the tools and metadata in a directory that we sync to various clouds
[02:14] <wallyworld_> generate metadata assumes the tools are in a <dir>/tools/releases and will put the metadata in <dir>/tools/streams
[02:14] <wallyworld_> ah i see
[02:14] <wallyworld_> but sync-tools needs the tarballs
[02:15] <wallyworld_> so put the tarballs in a tools/releases dir and run generate-metadata and then you will have a dir structure ready to upload
[02:16] <wallyworld_> since you will end up with <dir>/tools/releases and <dir>/tools/streams
[02:16] <sinzui> wallyworld_, we make the tarballs and place them in a temp dir. We can make the dir structure. cp them to the releases dir, then run metadata to make the json...then cay on with signing.
[02:17] <wallyworld_> that sounds ok. sorry abut the change in behaviour, i didn't realise you guys were using sync-tools like that
[02:17] <wallyworld_> sadly i had to change it to fix the other XML error issue
[02:18] <wallyworld_> since the code used to rely on being able to list the file contents of a url
[02:18] <wallyworld_> it worked for s buckets but not for an arbitrary htlp url
[02:18] <sinzui> wallyworld_, I am glad to stop calling sync-tools. This is the script that is called after we make the package and before we publish to all the CPCs http://bazaar.launchpad.net/~juju-qa/juju-core/ci-cd-scripts2/view/head:/assemble-public-tools.bash
[02:18] <sinzui> I think I can get this sorted out quickly
[02:19] <wallyworld_> yeah, generate_streams() will be a lot more logical if it can just call a command to generate streams :-)
[08:58] <rogpeppe> fwereade: well done for finding the missing Close. Not sure how we all missed that for so long.
[08:58] <fwereade> rogpeppe, cheers
[08:58] <rogpeppe> fwereade: what i don't understand though, is why it only failed *some* of the time
[08:59] <fwereade> rogpeppe, I agree that's not clear -- the fact that removing the SetAdminMongoPassword helped is interesting though
[09:00] <fwereade> rogpeppe, hazmat had a patch that removed that, that apparently worked for him
[09:01] <rogpeppe> fwereade: yeah, we worked that out together
[09:01] <rogpeppe> fwereade: completely missing the missing Close :-)
[09:01] <fwereade> rogpeppe, yeah, those things can hide sometimes
[09:24] <rogpeppe> fwereade: totally trivial review? https://codereview.appspot.com/52460046
[09:24] <rogpeppe> or anyone else?
[09:24] <fwereade> rogpeppe, LGTM
[09:25] <rogpeppe> fwereade: ta
[09:27] <rogpeppe> fwereade: please merge https://code.launchpad.net/~fwereade/juju-core/fix-unclosed-conn-test/+merge/201723 - i wanna use it!
[09:44] <jam> fwereade: given it is a Close method, should we be using "gc.Check(err, gc.IsNil)" ?
[09:44] <jam> that way we can continue cleaning up even if one of the many Close calls fails?
[09:45] <jam> Assert will stop there, and fail to close the rest of the resources
[09:45] <fwereade> jam, other defers will still run, won't they?
[09:46] <jam> fwereade: defer I think will run? I'm not really sure. But you did change a "conn.Close(); conn.Close(); conn.Close()" section. I guess that is the same object, so it doesn't matter
[09:46] <jam> and all the rest appear to be in defer
[09:46] <fwereade> jam, well it's purportedly testing that multi closes work
[09:46] <jam> good enough, then
[10:37] <dimitern> rogpeppe, jam, re https://codereview.appspot.com/52050043 I mentioned in the description that updating api server addresses after connecting seems out of scope for this CL
[10:37] <dimitern> rogpeppe, jam, it gets us more than we had before - cached API endpoints, which speed up the CLI, which is already a big win IMO
[10:38] <dimitern> rogpeppe, jam, but the actual updating can come as a follow-up, can't it?
[10:38] <rogpeppe> dimitern: yes, i agree, as i said in my review ISTR
[10:38] <jam> dimitern: so I don't think the actual updating is going to look like what you've written, is my concern, which means redoing it
[10:38] <jam> I like the from-config stuff, as that is not likely to change a lot
[10:39] <rogpeppe> dimitern: the other thing that we should do is make sure that bootstrap saves the cached address
[10:39] <dimitern> jam, i'm changing the CL now to accommodate rogpeppe's parallels.Try logic and will repropose shortly
[10:39] <jam> dimitern: your structure requires us to have the updated addresses before we return from api.Open, but that seems unfortunate to delay waiting for another round trip
[10:39] <dimitern> jam, i'm not really following you there - why before api.Open?
[10:40] <rogpeppe> jam: i think this is at least an improvement
[10:40] <jam> dimitern: because at the end of api.Open you call SetAPIEndpoints immediately
[10:40] <rogpeppe> jam: as it caches the address we get from Environ.StateInfo
[10:40] <dimitern> rogpeppe, yes exactly
[10:40] <jam> dimitern: my suggestion is just not to do it from the api-from-environ case and only the api-from-config case (or whatever the exact names are)
[10:41] <jam> because the from-environ isn't giving us anything, so just pass nil to be clear that we don't have any new information
[10:41] <dimitern> jam, anyway i'd like you to take a look after i propose again, i'm testing live with EC2 now
[10:41] <dimitern> jam, api-from-environ is the same as api-from-config
[10:42] <jam> dimitern: my point is, there are 2 code paths, one returns the stuff it just read, the other goes to the Environ and pulls out info from state info and looks it up
[10:42] <jam> the latter should be cached
[10:42] <jam> the former already is
[10:42] <dimitern> jam, got you
[10:44] <dimitern> rogpeppe, problem is, with the new code I can't seem to be able to distinguish between "info connection failed, but config succeeded" and "both failed"
[10:45] <jam> dimitern: if you can't actually connect, I don't think we should cache, should we?
[10:48] <jam> mgz: poke for standup
[10:54] <mgz> jam: there seems to be no one there...
[10:54] <natefinch> mgz: may need to pop out and back in
[10:54] <natefinch> mgz: I had similar problem at first
[10:55] <mgz> well, this is annoying
[11:37] <dimitern> rogpeppe, jam, I'd appreciate a second look at https://codereview.appspot.com/52050043/
[11:37] <rogpeppe> dimitern: will do
[12:43] <rogpeppe> dimitern: i still don't see any new tests
[12:45] <dimitern> rogpeppe, I need your help for that I think
[12:45] <rogpeppe> dimitern: ok
[12:46] <dimitern> rogpeppe, it's tested live, but I have trouble figuring out how to set up the tests for the new functionality
[12:46] <dimitern> rogpeppe, generally, we need to test that cached info gets used first and failing to connect with it fails back to using the environ, and finally updates the cache
[12:47] <rogpeppe> dimitern: i *think* there's already code that checks that the cached info is used first
[12:47] <rogpeppe> dimitern: (test code, that is)
[12:48] <dimitern> rogpeppe, so what tests do you think we need to add?
[12:48] <rogpeppe> dimitern: i think that the only new test needed is to test that the cache is updated
[12:49] <dimitern> rogpeppe, ok, i'll look into it and prepare something, and paste it to you
[12:49] <rogpeppe> dimitern: thanks
[13:14] <TheMue> fwereade: next round of debug log is in
[13:17] <TheMue> adeuring: seen your comments on rietveld, but no changes. didn't used lbox propose?
[13:18] <adeuring> TheMue: argh. forgot it... done now.
[13:19] <TheMue> adeuring: great, thx, will take a look
[13:19] <adeuring> thanks
[13:31] <dimitern> rogpeppe, ok, i'll look into it and prepare something, and paste it to you
[13:31] <dimitern> rogpeppe, oops sorry
[13:32] <dimitern> rogpeppe, almost done btw
[13:32] <rogpeppe> dimitern: cool
[13:40] <dimitern> rogpeppe, http://paste.ubuntu.com/6756196/ there it is - TestWithInfoOnly is updated to check the cache is not changed
[13:41] <rogpeppe> dimitern: do you actually mean TestWithConfigAndNoInfo ?
[13:42] <rogpeppe> dimitern: TestWithoutInfoAndConfigUpdatesCache sounds like there's no info *or* config
[13:42] <dimitern> rogpeppe, yeah, I'll rename it, thanks (was wondering how to phrase it)
[13:43] <natefinch> rogpeppe: Does this test pass for you?  localLiveSuite.TestStartInstanceWithDefaultSecurityGroup    It fails 100% of the time for me.
[13:44] <natefinch> rogpeppe: under provider/openstack
[13:45] <rogpeppe> natefinch: yeah, it passes for me
[13:45] <rogpeppe> natefinch: have you done godeps -u ?
[13:45] <natefinch> rogpeppe: not recently. I bet that's the problem
[13:45] <rogpeppe> natefinch: yeah
[13:45] <rogpeppe> natefinch: you'll have to 'go get -u' the packages it complains about
[13:46] <rogpeppe> natefinch: (i should really make it work a bit better when the required deps aren't available locally)
[13:47] <rogpeppe> dimitern: i'm not sure i see how the first test is making sure that the cache hasn't been updated
[13:49] <dimitern> rogpeppe, should I check the modified time of the jenv file instead?
[13:49] <rogpeppe> dimitern: owd
[13:49] <rogpeppe> dimitern: (mistype)
[13:49] <natefinch> rogpeppe: godeps: cannot update "/home/nate/code/src/launchpad.net/gomaasapi": bzr: ERROR: branch has no revision ian.booth@canonical.com-20131017011445-m1hmr0ap14osd7li
[13:49] <natefinch> bzr update --revision only works for a revision in the branch history
[13:50] <rogpeppe> natefinch: as i said, you'll need to run go get -iu
[13:50] <rogpeppe> -u
[13:50] <rogpeppe> natefinch: i.e. go get -u launchpad.net/gomaasapi/...
[13:50] <rogpeppe> natefinch: unfortunately godeps only prints a single repo that's failed, so you'll probably need to do that several times
[13:50] <rogpeppe> natefinch: for each repo that's out of date
[13:53] <rogpeppe> dimitern: i wouldn't check the mtime
[13:53] <rogpeppe> dimitern: configstore.Storage is an interface, so you can intercept the Write method.
[13:54] <dimitern> rogpeppe, ah, good point - and a chance for me to use PatchValue
[13:55] <rogpeppe> dimitern: no need to use PatchValue i think
[13:55] <dimitern> rogpeppe, how then?
[13:55] <dimitern> rogpeppe, and why not?
[13:56] <rogpeppe> dimitern: you can just pass your custom store interface value into newAPIFromName
[13:56] <rogpeppe> dimitern: (that's why it exists seperately from newAPIClient
[13:56] <dimitern> rogpeppe, i'll try
[13:57] <rogpeppe> dimitern: and NewAPIClientFromName)
[13:57] <dimitern> rogpeppe, although the PatchValue approach seems cleaner
[13:57] <rogpeppe> dimitern: what value would you patch?
[13:57] <dimitern> rogpeppe, store.Write?
[13:57]  * rogpeppe thinks that patching values is something to be avoided if possible
[13:57] <dimitern> rogpeppe, or it only works for globals
[13:58] <rogpeppe> dimitern: it only works for globals
[13:58] <rogpeppe> dimitern: well, it only works for *values*
[13:58] <rogpeppe> dimitern: you can't patch methods
[14:08] <dimitern> rogpeppe, http://paste.ubuntu.com/6756292/ better?
[14:19] <dimitern> rogpeppe1, updated the CL with your last review, reproposing now
[14:19] <rogpeppe1> dimitern: ta
[14:24] <dimitern> rogpeppe1, https://codereview.appspot.com/52050043/ - does it look ok to land now?
[14:25] <rogpeppe1> dimitern: looking
[14:26] <dimitern> mgz, ping
[14:27] <dimitern> mgz, should we have a talk about networking, so I can be brought up to speed?
[14:27] <dimitern> mgz, perhaps with fwereade as well?
[14:28] <natefinch> I love it when I make a guess and it turns out to be right.  I had somehow munged my iptables in such a way as to prevent me from being able to print... resetting iptables fixed the problem.
[14:28] <mgz> dimitern: SURE
[14:28] <mgz> er, caps
[14:28] <dimitern> :) sounds like you're too eager?
[14:29] <fwereade> dimitern, mgz, ok, sgtm, I have half an hour
[14:29] <dimitern> fwereade, so now then? i'll send a link
[14:29] <natefinch> also rogpeppe1: thanks, updating stuff fixed my test failures.. I actually got them to pass on the first try. Amazing.
[14:30] <rogpeppe1> natefinch: yay!
[14:30] <dimitern> mgz, fwereade: https://plus.google.com/hangouts/_/calendar/bWFyay5yYW1tLWNocmlzdGVuc2VuQGNhbm9uaWNhbC5jb20.3tn7jebub5jn5mhuh5sf8acd70
[14:42] <rogpeppe1> dimitern: reviewed
[14:42] <dimitern> rogpeppe1, ta
[16:01] <natefinch> man I hate it when foo --help bar doesn't return help about bar
[16:03] <rogpeppe1> natefinch: ha yes
[16:04] <rogpeppe1> natefinch: s3cmd being one example
[16:04] <natefinch> mongod --replset has an optional seed list that you can append.... but I can't find what the format of the seed list is supposed to be
[16:09] <rogpeppe1> lunch
[16:30] <rogpeppe1> fwereade: we're not planning to lose default-series entirely, are we?
[16:31] <fwereade> rogpeppe1, I was hoping we could eventually tbh
[16:32] <rogpeppe1> fwereade: if we do, then what should EnsureAvailability use when it starts new machines?
[16:33] <fwereade> rogpeppe1, I think it uses something similar-but-different? default-series as controller of charm series should definitely not be depended upon long-term
[16:33] <rogpeppe1> fwereade: i guess we just have series as an argument to EnsureAvailability
[16:34] <fwereade> rogpeppe1, state-server-series perhaps? seems probably smart to deploy mongo across the same OSs where possible...
[16:34] <rogpeppe1> fwereade: i'm not sure
[16:35] <rogpeppe1> fwereade: i'm not sure we want to state that people *must* do that
[16:35] <natefinch> fwereade, rogpeppe1:  seems like defaulting to latest LTS is probably a sane default.... do most people even really care what OS their servers are running?
[16:35] <fwereade> rogpeppe1, natefinch: I think that would certainly default to latest-lts
[16:36] <fwereade> rogpeppe1, natefinch: but I can imagine reasonable use cases -- certain charms require a different version, and you want to deploy them densely, so you want all your machines to be... unctuous, or whatever we may call it
[16:36] <rogpeppe1> fwereade: yeah, i was thinking that too
[16:38] <natefinch> rogpeppe1, fwereade: yes, but I would hope most charms run well on latest LTS
[16:38] <rogpeppe1> natefinch: i doubt it
[16:38] <fwereade> natefinch, the particular case we've seen is needing a newer kernel version
[16:38] <rogpeppe1> natefinch: i suspect most charms will be on precise for a long time
[16:39] <fwereade> rogpeppe1, not so sure, that's being actively worked on
[16:39] <rogpeppe1> fwereade: i'll believe it when i see it :-)
[16:41] <natefinch> rogpeppe1, fwereade: that's still one of the things that surprises me about ubuntu (and linux in general) - that stuff which worked on the OS 2 years ago is assumed to be broken on the latest version.
[16:41] <natefinch> rogpeppe1, fwereade: not just assumed, but often is
[16:42] <rogpeppe1> natefinch: i agree, but that's just something we have to work with
[16:43] <rogpeppe1> natefinch: everybody assumes everything is utterly unportable
[16:43] <rogpeppe1> natefinch: once upon a time, you could actually do things portably across unixes, let alone linuxes
[16:43] <TheMue> natefinch: os/2? ah, i loved it. and scripting with rexx, even with ui (used watcom). editor has been spf/2 (i came from the mainframe at that time)
[16:43] <natefinch> rogpeppe1: boggles my mind... coming from Windows where stuff written for XP 13 years ago still works on Windows *\8
[16:54] <natefinch> rogpeppe1: btw, that extra info from replicaset code finally landed
[16:54] <rogpeppe1> natefinch: <o/
[16:54] <rogpeppe1> natefinch: \o/ even :-)
[16:55] <natefinch> rogpeppe1: only took two tries to pass the tests this time
[16:58] <natefinch> rogpeppe1: have some time to talk about EnsureMongoServer, now that I can actually get back to that?
[16:58] <rogpeppe1> natefinch: sure
[16:59] <natefinch> rogpeppe1: so it is just a matter of rewriting the upstart job as appropriate?
[17:00] <rogpeppe1> natefinch: yeah, and checking whether the upstart job is running already or not
[17:00] <natefinch> rogpeppe1: don't we need to rewrite it even if one is running?  Thinking of upgrade and/or when the list of servers changes
[17:01] <rogpeppe1> natefinch: i hope not. i don't want to have the list of servers inside the upstart file.
[17:02] <natefinch> rogpeppe1: ahh, ok, I misunderstood some of the text.  Yeah, I think it's best not to have the list in the upstart file (and should be unecessary)
[17:02] <rogpeppe1> natefinch: i think the upstart job should probably just run a shell script that gets the server list from somewhere, and upgrades could upgrade that.
[17:03] <natefinch> rogpeppe1: so, we already have upstart.MongoUpstartService ... is there anything else to do but just update that with --replSet juju?
[17:04] <natefinch> rogpeppe1: I don't think we even really need the list of servers to start mongo
[17:04] <rogpeppe1> natefinch: no?
[17:04] <rogpeppe1> natefinch: how does it find out about its peers?
[17:05] <natefinch> rogpeppe1: when you add it to the member list on the primary, magic happens, and it joins the group.  You don't have to directly tell the secondary about the rest of the servers (I think likely the primary pings it to let it know there's a replset in existence)
[17:06] <rogpeppe1> natefinch: ah, of course!
[17:06] <rogpeppe1> natefinch: because all servers connect directly to each other
[17:06] <natefinch> rogpeppe1: right
[17:07] <rogpeppe1> natefinch: in which case, i think you're right
[17:07] <natefinch> rogpeppe1: well, cool.
[17:11] <natefinch> rogpeppe1: we do still have to fix the upstart script on upgrade, though
[17:11] <rogpeppe1> natefinch: yeah, the first time
[17:11] <rogpeppe1> natefinch: (and of course if we want to change the mongo args, but that's another matter)
[17:17] <natefinch> rogpeppe1: yeah I meant changing the args (to add --replSet juju).   Figured it's better just to always update the script when we update juju
[17:17] <rogpeppe1> natefinch: seems reasonable
[17:18] <rogpeppe1> natefinch: but i don't think we always want to restart the service, do we?
[17:20] <natefinch> rogpeppe1: don't we restart the service by definition while upgrading?
[17:21] <rogpeppe1> natefinch: i'm not sure. currently we don't restart any service. perhaps that's reasonable to do though.
[17:21] <rogpeppe1> natefinch: (there are two services involved here, right?
[17:21] <rogpeppe1> )
[17:22] <natefinch> rogpeppe1: right, yeah, I was thinking about it incorrectly.
[17:30] <natefinch> rogpeppe1: so, I'm not sure where or when we'd call the code to recreate the upstart script
[17:31] <rogpeppe1> natefinch: in EnsureMongoServer?
[17:33] <natefinch> rogpeppe1: well, yes.  I thought I might need to actually call that function from somewhere, though
[17:33] <rogpeppe1> natefinch: yes, that function will be called from jujud
[17:33] <rogpeppe1> natefinch: inside the machine agent logic
[17:33] <rogpeppe1> natefinch: when the machine agent finds that it has a ManageState job
[17:33] <natefinch> rogpeppe1: So you're saying you'll have the code to call it?
[17:34] <rogpeppe1> natefinch: yeah - one of us will write it. EnsureMongoService is a primitive we'll use
[17:44] <natefinch> lunchtime for me
[17:54] <hazmat> rogpeppe1, got a bug report against deployer in #juju .. http://paste.ubuntu.com/6757161/ .. its an error message from the watcher impl that the watcher is stopped
[17:55] <hazmat> we're seeing it in a few different contexts, i'm just curious if this is normal behavior
[17:56] <rogpeppe1> it probably means that the state watcher has been stopped :-)
[17:56] <rogpeppe1> hazmat: can you reproduce it?
[17:56] <hazmat> rogpeppe1, but why would the watcher be stopped outside of the client requesting it?
[17:57] <rogpeppe1> hazmat: the watcher should only be stopped if the state is closed
[17:57] <rogpeppe1> hazmat: i'd like to see a transcript of the API messages
[17:57] <rogpeppe1> hazmat: a copy of machine-0.log would be really useful
[17:58] <hazmat> rogpeppe1, ack, asking
[18:04] <rogpeppe1> hazmat: i tell a lie. it can happen if either the watcher or the underlying state was closed
[18:04] <hazmat> rogpeppe1, i've got the api server log.. do you have a chinstrap account?
[18:04] <rogpeppe1> hazmat: i think so
[18:05] <hazmat> rogpeppe1, its in  ~kapil/machine-0.log
[18:06] <hazmat> rogpeppe1, yeah.. i'm thinking its client error, i don't recall the gui folks have ever complained about it, but i've seen a few reports against deployer
[18:11] <rogpeppe1> hazmat: hmm, interesting.
[18:23] <hazmat> rogpeppe1, anything of note there? it looks like stop is being called, but there's a lot of line noise.
[18:23] <rogpeppe1> hazmat: i can't see Stop being called (by that client anyway)
[18:27] <rogpeppe1> hazmat: i think the only interaction that client had with the API server is in the messages in ~rog/select.log on chinstrap
[18:28] <rogpeppe1> hazmat: i can't currently see a way that it could be happening
[18:29] <hazmat> rogpeppe1, hmm.. perhaps some isolation issue around multiple allwatchers?
[18:29] <rogpeppe1> hazmat: that's what i'm looking for, but it looks pretty tight to me
[18:30] <rogpeppe1> hazmat: it would help if there weren't two distinct errors that "state watcher was stopped" represents
[18:30] <rogpeppe1> hazmat: (there's a TODO in the code to change one of them)
[18:43] <rogpeppe1> hazmat: i don't see how it can happen, but there are a few places where better logging could help us. i'll fix that up so the next time it happens we'll have a bit more useful info.
[18:44] <rogpeppe1> hazmat: i think it might not be coincidence that client [1] goes away at a similar time to client [1A] getting the "state watcher is stopped" message
[18:44] <rogpeppe1> hazmat: but we don't log clients leaving, so i can't be sure
[18:45] <hazmat> rogpeppe1, well when the watch error happens its going to kill the process which stops a separate control connection
[18:45] <hazmat> rogpeppe1, fwiw this is the bug tracking https://bugs.launchpad.net/juju-core/+bug/1269519
[18:46] <_mup_> Bug #1269519: Error on allwatcher api <juju-core:New> <juju-deployer:New> <https://launchpad.net/bugs/1269519>
[18:46] <rogpeppe1> hazmat: ah, this is a python client which will be using a separate connection for each operation, yeah
[18:46] <hazmat> rogpeppe1, no..
[18:46] <hazmat> rogpeppe1, multiple operations on one connection, watches on separate connections
[18:47] <hazmat> er.. optionally watches on separate connections
[18:47] <rogpeppe1> hazmat: that's what i meant to say :-)
[18:47] <hazmat> :-)
[18:47] <rogpeppe1> hazmat: but multiple connections for a single client, anyway
[18:47] <hazmat> yup
[19:40] <thumper> morning
[19:44] <natefinch> morning thumper
[19:44] <thumper> morning natefinch
[19:47] <natefinch> thumper: btw, problem I had last night was out of date dependencies.  Man, wish there was a better way to keep that from happening.
[19:47] <thumper> natefinch: yeah...
[19:47] <thumper> natefinch: also, I noticed that godep doesn't fetch the remote branches
[19:47] <thumper> it assumes that the revisions are there
[19:47] <thumper> and just sets the working tree revision
[19:47] <natefinch> thumper: roger was feeling bad about that this morning
[19:48] <natefinch> thumper: in practice, what we really just need is a cron job to update those branches to head once a day
[19:48] <thumper> I think it should make the branch that are there actually have a tip of what we depend on
[19:49] <natefinch> thumper: we could just add a "juju" tag and update the tag as appropriate... then the aforementioned cron job could keep the local in sync with the tag
[19:50] <natefinch> same idea, basically
[19:50] <thumper> I don't think it is that hard..
[19:50] <thumper> and we could have a simple make target that does the godep call
[19:50] <thumper> make dep-update
[19:50] <thumper> or something
[19:51] <thumper> and include the ability for a quick check
[19:51] <thumper> don't fetch, just check that the tip of each dependency matches the file
[19:51] <thumper> that should be super fast
[19:51] <thumper> and could be part of the default make targets
[19:52] <thumper> I know some people aren't fans of makefiles
[19:52] <thumper> but they are handy
[19:54]  * thumper takes bug 1269363
[19:54] <_mup_> Bug #1269363: local environment broken with root perms <local-provider> <ssh> <juju-core:In Progress by thumper> <https://launchpad.net/bugs/1269363>
[20:28] <natefinch> thumper: got a sec?
[20:29] <thumper> natefinch: sure
[20:30] <natefinch> thumper: I need to find a place to rewrite the mongo upstart script, so we can add --replSet juju to the command line that we run, and then restart mongo
[20:31] <natefinch> thumper: roger had said we should do it in the machine agent somewhere, which is fine... except that I'm not sure it has access to the right information to write the upstart script
[20:32] <thumper> yeah...
[20:32] <thumper> I've been thinking about that
[20:32] <natefinch> upstart.MongoUpstartService() takes the mongo data directory and port
[20:32] <thumper> as part of the upgrade stuff
[20:32] <natefinch> cloudinit gets those from the MachineConfig, but I don't see a way for the machine agent to get to that info
[20:33] <thumper> hmm...
[20:33] <thumper> not sure...
[20:34] <natefinch> seems like half of software development is just figuring out how to get information from here to over there
[20:34] <thumper> for sure
[20:35] <thumper> I'm in that situation right now too
[20:35] <thumper> I know the problem, know what causes it,
[20:35] <thumper> just fixing it right...
[20:35] <natefinch> yep
[20:35] <thumper> that's the hard bit
[20:37] <natefinch> it would help if I was more familiar with the way all the code in this area interacts.  I guess now is the time to start figuring that out :)
[20:37] <thumper> :)
[20:51] <natefinch> well that's confusing..... there's an environs/cloudinit.go and an environs/cloudinit/cloudinit.go
[21:05] <thumper> natefinch: but wait, there's more...
[21:05] <thumper> there is cloudinit/cloudinit.go
[21:07] <natefinch> wow, that is.... something else
[21:09] <thumper> yes
[21:09] <thumper> yes it is
[21:09] <thumper> naming shit is hard
[21:10] <natefinch> that's true
[21:32] <natefinch> time for the old "I don't know where to put it, so just pick some place and let it shake out in the reviews"
[21:35] <thumper> :)
[21:36] <natefinch> I hate it when something as stupid as "append 'db' to the end of the path" turns into a whole pain in the ass of "well, now I need a central place to keep this logic"
[21:36] <natefinch> which of course is like 80% of actual programmnig
[21:49] <rogpeppe2> natefinch: why wouldn't the machine agent have the right info to rewrite the upstart script?
[21:49] <rogpeppe2> thumper, natefinch: a review of this would be appreciated: https://codereview.appspot.com/52850043
[21:49]  * thumper nods...
[21:51] <natefinch> rogpeppe2: two things, one is that the mongo directory is "db" under the machine's data directory, but that code was only in MachineConfig.addMongoToBoot
[21:51] <natefinch> rogpeppe2: the other thing is the mongo port
[21:52] <rogpeppe> natefinch: i don't understand the first
[21:52] <rogpeppe> natefinch: you can get the mongo port from state.EnvironConfig
[21:53] <natefinch> rogpeppe: the first is just that there was a piece of code hidden away in cloudinit that needed to be put somewhere accessible to the rest of the world
[21:53] <rogpeppe> natefinch: definitely. i want to move it out of cloudinit entirely
[21:55] <rogpeppe> natefinch: i'm hoping that jujud init can start the mongo server itself rather than it being done in cloudinit
[22:01] <natefinch> rogpeppe: not really sure how to get EnvironConfig from machineagent either....
[22:01] <rogpeppe> natefinch: it might require a new API call. let me check.
[22:04] <natefinch> sorry, gotta run, realized it's EOD for me.  email me if you figure it out, rogpeppe, otherwise I'm sure I can figure it out... just didn't know if there was an obvious place where that info was that I wasn't seeing.
[22:04] <rogpeppe> natefinch: it's available in the provisioner API, which is available to the machine agent, but i think it should be added to the machiner
[22:04] <rogpeppe> oh, too late
[22:07] <hazmat> rogpeppe, getting more reports of that same issue re stop watcher, in terms of helping to debug it..
[22:07] <hazmat> just turn up the log level and hand over more logs?
[22:08] <rogpeppe> hazmat: i've just proposed a CL that might help slightly in trying to narrow down the issue: https://codereview.appspot.com/52850043
[22:08] <hazmat> rogpeppe, cool
[22:10]  * thumper goes to check out some office space in town
[22:27] <rogpeppe> axw: ping