/srv/irclogs.ubuntu.com/2014/01/15/#juju-dev.txt

thumpero/ wallyworld_00:16
wallyworld_hello00:16
bigjoolshalp: juju bootstrap says "environment already bootstrapped", juju status repeats: "ERROR TLS handshake failed: EOF" ad infinitum.01:00
bigjoolsI see no running machines in my env with other tools01:01
bigjools1.16.5-saucy-amd6401:01
thumper?!01:05
thumperbigjools: run destroy environment01:05
bigjoolsthumper: no :)01:05
thumperno?01:06
bigjoolsI want to keep it - it's doing this on two envs01:06
bigjoolsI can SSH into bootstrap node01:06
bigjoolsthe other env which has none I could destroy01:06
thumpermaas?01:06
bigjoolscanonistack01:06
bigjoolsso sorry let me be clearer.  The one env is genuinely empty so I've destroy-env'd it now.  The other is in use but juju status can't talk to it.01:08
thumperbigjools: if you ssh into the bootstrap node of the machine that is running01:09
thumpercan you see if the machine agent is running?01:09
thumperaxw: hey there, where was that method that I need to implement for the local provider to get the addresses working properly?01:09
axwthumper: containers/lxc/instance.go01:10
thumperkk01:10
axwmethod Addresses01:10
thumperta01:10
axwnp01:10
axwthumper: I'm glad you moved RemoteResponse, because I was going to have to do it otherwise. It was causing a circular import from utils/ssh->cmd->environs/config->util/ssh01:12
thumper:)01:12
axwwhich is why I'm going to have to revert my change of using JujuHomePath01:12
thumpershould be landing now01:12
axwcool01:12
thumperhad one intermittent failure landing so far01:12
bigjoolsthumper: will check shortly, on a call01:13
axwthey've been quite frequent lately :(01:13
thumperbigjools: kk01:13
thumperaxw: yes they have01:13
thumperaxw: seems like a race condition somewhere01:13
thumperaxw: any idea how to track it down?01:13
axwI'm sure there are multiple01:13
axw-race may help, will likely take some days of sifting I'd say01:14
thumperaxw: the address updater only runs on a machine that has the job ManageEnviron01:18
thumperaxw: this isn't sufficient01:18
thumperfor containers01:18
axwoh :(01:18
axwmaybe we should change that?01:18
thumperwe need to have something running on every machine01:18
axwthat's what I thought it did01:18
thumpernope01:18
axwthumper: ah, it assumes that the addresses are observable externally01:19
thumperaxw: also it is a state worker01:19
axwhrm01:19
thumperso not over the api01:19
sinzuiaxw, did you see this behaviour yesterday: https://bugs.launchpad.net/juju-core/+bug/126912001:20
_mup_Bug #1269120: win client bootstrap fails because it uses private ip <bootstrap> <windows> <juju-core:Triaged> <https://launchpad.net/bugs/1269120>01:20
axwsinzui: I did not01:21
axwit should be checking them all01:21
sinzuiaxw, then we can say the good news is the basic test I put together for CI is valuable :)01:21
thumpercheck versions01:21
axwsinzui: where's the rest of the log?01:22
axwis there an error?01:22
sinzuiI don't have any more. The test terminates the machine before getting the logs. Well I cannot because of the IP issue. I can get more information tomorrow when I add the test to CI01:23
axwsinzui: I mean the juju bootstrap stdout/stderr - is that all there was?01:24
sinzuiaxw, actual, i can get the debug output in 30 mintues01:24
sinzuiaxw, that was all there was from --show-log01:24
axwmk01:24
axwsinzui: does your windows box have openssh on it?01:25
axwsinzui: because it's actually expected not to work at the moment :)01:25
axwI haven't submitted my fix yet01:25
axwI'm surprised that there's no error there though01:25
sinzuiaxw. as a matter of fact, it does have openssh installed for the benefit of CI. The behaviour is the same from powershell though01:26
axwsinzui: ok. I think it might be a good idea to exclude openssh from %PATH% for the tests, for a more standard user setup. may be worth having both, I suppose01:27
sinzuiaxw. I now little about windows...thank your for the recommendation01:28
thumperwallyworld_: here is part two of the work you reviewed this morning -  https://codereview.appspot.com/5247004401:28
sinzuiaxw, I learned today that I need to restart sshd each time I change the rules of user envs :)01:28
wallyworld_ok, looking real soon01:28
axwsinzui: nps. some people may have openssh/cygwin installed, but it's typical for people to use PuTTY on Windows01:28
* sinzui has 19.5 years of linux development experience, but only 2 on windows01:29
axwI was unfortunate to work across something like 5 OSes in my last job :)01:29
axwthe worst of all worlds01:30
axwsinzui: I think it's probably not worth investigating further until my changes land01:32
axwtheoretically there should be no change if openssh is there01:32
axwbut theoretically it should work just as it does on Linux if it is there01:32
sinzuiaxw, how long with juju wait during a bootstrap before realising that it has failed?01:35
axwsinzui: 10 mins01:36
thumperwaigani_: around?01:40
waigani_thumper: hello :)01:41
sinzuiwallyworld_, did r2201 change the behaviour of CI can no longer create simple streams, nor can we publish new releases. This is the command that did not generate files: http://pastebin.ubuntu.com/6753792/01:50
wallyworld_looking01:51
wallyworld_sinzui: it was not meant to change any behaviour. if it did i suck. i'll look at the command locally and see what's happening01:52
wallyworld_it could be there's a code path there that doesn't generate metadata like it should01:52
wallyworld_sync-tools needs tools tarballs and metadata now01:52
wallyworld_since when if falls back to streams.canonical.com, it needs to use simplestreams to get the tools01:53
wallyworld_hence it needs to use the same method for local tools as well01:53
wallyworld_so if the source directory is missing metadata, you could generate it using juju metadata generate-tools01:54
wallyworld_iow, just having a directory will tools tarballs is not sufficient, you also need simplestreams metadata01:55
wallyworld_i can look at building that logic into the sync tools command if the source is local01:56
wallyworld_does that rambling make sense?01:56
wallyworld_as a short term fix, try $JUJU_EXEC metadata generate-tools -d $SOURCE01:58
bigjoolsthumper: sigh.  I *was* sshed into the bootstrap node and my session froze.  Now, all the canonistack instances have vanished.  FFS.02:00
thumperbigjools: :(02:00
bigjoolsI now have to figure out how to redeploy my whole setup02:01
sinzuiwallyworld_, Since this is the process that makes data for streams.canonical.com. I think we need to land an immediate fix to calling metadata generate-tools. Do we not need to call sync-tool for 1.17.1 now?02:10
wallyworld_sinzui: all you need to do to get metadata ready for streams.canonical.com is to use the generate meadata command. the sync-tools is more intended for folks to grab tools from streams.c.c so they can upload to their own cloud02:12
wallyworld_so if you have tools tarballs, juju metadata generate-tools will produce the json ready to upload02:13
sinzuiwallyworld_, but we also need to create the directory structure too? (tools/releases/*.tgz) or does meta-data do that too02:13
sinzuiwallyworld_, we don't use sync-tools to deliver to the clouds. we are using it to put the tools and metadata in a directory that we sync to various clouds02:14
wallyworld_generate metadata assumes the tools are in a <dir>/tools/releases and will put the metadata in <dir>/tools/streams02:14
wallyworld_ah i see02:14
wallyworld_but sync-tools needs the tarballs02:14
wallyworld_so put the tarballs in a tools/releases dir and run generate-metadata and then you will have a dir structure ready to upload02:15
wallyworld_since you will end up with <dir>/tools/releases and <dir>/tools/streams02:16
sinzuiwallyworld_, we make the tarballs and place them in a temp dir. We can make the dir structure. cp them to the releases dir, then run metadata to make the json...then cay on with signing.02:16
wallyworld_that sounds ok. sorry abut the change in behaviour, i didn't realise you guys were using sync-tools like that02:17
wallyworld_sadly i had to change it to fix the other XML error issue02:17
wallyworld_since the code used to rely on being able to list the file contents of a url02:18
wallyworld_it worked for s buckets but not for an arbitrary htlp url02:18
sinzuiwallyworld_, I am glad to stop calling sync-tools. This is the script that is called after we make the package and before we publish to all the CPCs http://bazaar.launchpad.net/~juju-qa/juju-core/ci-cd-scripts2/view/head:/assemble-public-tools.bash02:18
sinzuiI think I can get this sorted out quickly02:18
wallyworld_yeah, generate_streams() will be a lot more logical if it can just call a command to generate streams :-)02:19
rogpeppefwereade: well done for finding the missing Close. Not sure how we all missed that for so long.08:58
fwereaderogpeppe, cheers08:58
rogpeppefwereade: what i don't understand though, is why it only failed *some* of the time08:58
fwereaderogpeppe, I agree that's not clear -- the fact that removing the SetAdminMongoPassword helped is interesting though08:59
fwereaderogpeppe, hazmat had a patch that removed that, that apparently worked for him09:00
rogpeppefwereade: yeah, we worked that out together09:01
rogpeppefwereade: completely missing the missing Close :-)09:01
fwereaderogpeppe, yeah, those things can hide sometimes09:01
rogpeppefwereade: totally trivial review? https://codereview.appspot.com/5246004609:24
rogpeppeor anyone else?09:24
fwereaderogpeppe, LGTM09:24
rogpeppefwereade: ta09:25
rogpeppefwereade: please merge https://code.launchpad.net/~fwereade/juju-core/fix-unclosed-conn-test/+merge/201723 - i wanna use it!09:27
jamfwereade: given it is a Close method, should we be using "gc.Check(err, gc.IsNil)" ?09:44
jamthat way we can continue cleaning up even if one of the many Close calls fails?09:44
jamAssert will stop there, and fail to close the rest of the resources09:45
fwereadejam, other defers will still run, won't they?09:45
jamfwereade: defer I think will run? I'm not really sure. But you did change a "conn.Close(); conn.Close(); conn.Close()" section. I guess that is the same object, so it doesn't matter09:46
jamand all the rest appear to be in defer09:46
fwereadejam, well it's purportedly testing that multi closes work09:46
jamgood enough, then09:46
dimiternrogpeppe, jam, re https://codereview.appspot.com/52050043 I mentioned in the description that updating api server addresses after connecting seems out of scope for this CL10:37
dimiternrogpeppe, jam, it gets us more than we had before - cached API endpoints, which speed up the CLI, which is already a big win IMO10:37
dimiternrogpeppe, jam, but the actual updating can come as a follow-up, can't it?10:38
rogpeppedimitern: yes, i agree, as i said in my review ISTR10:38
jamdimitern: so I don't think the actual updating is going to look like what you've written, is my concern, which means redoing it10:38
jamI like the from-config stuff, as that is not likely to change a lot10:38
rogpeppedimitern: the other thing that we should do is make sure that bootstrap saves the cached address10:39
dimiternjam, i'm changing the CL now to accommodate rogpeppe's parallels.Try logic and will repropose shortly10:39
jamdimitern: your structure requires us to have the updated addresses before we return from api.Open, but that seems unfortunate to delay waiting for another round trip10:39
dimiternjam, i'm not really following you there - why before api.Open?10:39
rogpeppejam: i think this is at least an improvement10:40
jamdimitern: because at the end of api.Open you call SetAPIEndpoints immediately10:40
rogpeppejam: as it caches the address we get from Environ.StateInfo10:40
dimiternrogpeppe, yes exactly10:40
jamdimitern: my suggestion is just not to do it from the api-from-environ case and only the api-from-config case (or whatever the exact names are)10:40
jambecause the from-environ isn't giving us anything, so just pass nil to be clear that we don't have any new information10:41
dimiternjam, anyway i'd like you to take a look after i propose again, i'm testing live with EC2 now10:41
dimiternjam, api-from-environ is the same as api-from-config10:41
jamdimitern: my point is, there are 2 code paths, one returns the stuff it just read, the other goes to the Environ and pulls out info from state info and looks it up10:42
jamthe latter should be cached10:42
jamthe former already is10:42
dimiternjam, got you10:42
dimiternrogpeppe, problem is, with the new code I can't seem to be able to distinguish between "info connection failed, but config succeeded" and "both failed"10:44
jamdimitern: if you can't actually connect, I don't think we should cache, should we?10:45
jammgz: poke for standup10:48
mgzjam: there seems to be no one there...10:54
natefinchmgz: may need to pop out and back in10:54
natefinchmgz: I had similar problem at first10:54
mgzwell, this is annoying10:55
dimiternrogpeppe, jam, I'd appreciate a second look at https://codereview.appspot.com/52050043/11:37
rogpeppedimitern: will do11:37
rogpeppedimitern: i still don't see any new tests12:43
dimiternrogpeppe, I need your help for that I think12:45
rogpeppedimitern: ok12:45
dimiternrogpeppe, it's tested live, but I have trouble figuring out how to set up the tests for the new functionality12:46
dimiternrogpeppe, generally, we need to test that cached info gets used first and failing to connect with it fails back to using the environ, and finally updates the cache12:46
rogpeppedimitern: i *think* there's already code that checks that the cached info is used first12:47
rogpeppedimitern: (test code, that is)12:47
dimiternrogpeppe, so what tests do you think we need to add?12:48
rogpeppedimitern: i think that the only new test needed is to test that the cache is updated12:48
dimiternrogpeppe, ok, i'll look into it and prepare something, and paste it to you12:49
rogpeppedimitern: thanks12:49
TheMuefwereade: next round of debug log is in13:14
TheMueadeuring: seen your comments on rietveld, but no changes. didn't used lbox propose?13:17
adeuringTheMue: argh. forgot it... done now.13:18
TheMueadeuring: great, thx, will take a look13:19
adeuringthanks13:19
dimiternrogpeppe, ok, i'll look into it and prepare something, and paste it to you13:31
dimiternrogpeppe, oops sorry13:31
dimiternrogpeppe, almost done btw13:32
rogpeppedimitern: cool13:32
dimiternrogpeppe, http://paste.ubuntu.com/6756196/ there it is - TestWithInfoOnly is updated to check the cache is not changed13:40
rogpeppedimitern: do you actually mean TestWithConfigAndNoInfo ?13:41
rogpeppedimitern: TestWithoutInfoAndConfigUpdatesCache sounds like there's no info *or* config13:42
dimiternrogpeppe, yeah, I'll rename it, thanks (was wondering how to phrase it)13:42
natefinchrogpeppe: Does this test pass for you?  localLiveSuite.TestStartInstanceWithDefaultSecurityGroup    It fails 100% of the time for me.13:43
natefinchrogpeppe: under provider/openstack13:44
rogpeppenatefinch: yeah, it passes for me13:45
rogpeppenatefinch: have you done godeps -u ?13:45
natefinchrogpeppe: not recently. I bet that's the problem13:45
rogpeppenatefinch: yeah13:45
rogpeppenatefinch: you'll have to 'go get -u' the packages it complains about13:45
rogpeppenatefinch: (i should really make it work a bit better when the required deps aren't available locally)13:46
rogpeppedimitern: i'm not sure i see how the first test is making sure that the cache hasn't been updated13:47
dimiternrogpeppe, should I check the modified time of the jenv file instead?13:49
rogpeppedimitern: owd13:49
rogpeppedimitern: (mistype)13:49
natefinchrogpeppe: godeps: cannot update "/home/nate/code/src/launchpad.net/gomaasapi": bzr: ERROR: branch has no revision ian.booth@canonical.com-20131017011445-m1hmr0ap14osd7li13:49
natefinchbzr update --revision only works for a revision in the branch history13:49
rogpeppenatefinch: as i said, you'll need to run go get -iu13:50
rogpeppe-u13:50
rogpeppenatefinch: i.e. go get -u launchpad.net/gomaasapi/...13:50
rogpeppenatefinch: unfortunately godeps only prints a single repo that's failed, so you'll probably need to do that several times13:50
rogpeppenatefinch: for each repo that's out of date13:50
rogpeppedimitern: i wouldn't check the mtime13:53
rogpeppedimitern: configstore.Storage is an interface, so you can intercept the Write method.13:53
dimiternrogpeppe, ah, good point - and a chance for me to use PatchValue13:54
rogpeppedimitern: no need to use PatchValue i think13:55
dimiternrogpeppe, how then?13:55
dimiternrogpeppe, and why not?13:55
rogpeppedimitern: you can just pass your custom store interface value into newAPIFromName13:56
rogpeppedimitern: (that's why it exists seperately from newAPIClient13:56
dimiternrogpeppe, i'll try13:56
rogpeppedimitern: and NewAPIClientFromName)13:57
dimiternrogpeppe, although the PatchValue approach seems cleaner13:57
rogpeppedimitern: what value would you patch?13:57
dimiternrogpeppe, store.Write?13:57
* rogpeppe thinks that patching values is something to be avoided if possible13:57
dimiternrogpeppe, or it only works for globals13:57
rogpeppedimitern: it only works for globals13:58
rogpeppedimitern: well, it only works for *values*13:58
rogpeppedimitern: you can't patch methods13:58
dimiternrogpeppe, http://paste.ubuntu.com/6756292/ better?14:08
dimiternrogpeppe1, updated the CL with your last review, reproposing now14:19
rogpeppe1dimitern: ta14:19
dimiternrogpeppe1, https://codereview.appspot.com/52050043/ - does it look ok to land now?14:24
rogpeppe1dimitern: looking14:25
dimiternmgz, ping14:26
dimiternmgz, should we have a talk about networking, so I can be brought up to speed?14:27
dimiternmgz, perhaps with fwereade as well?14:27
natefinchI love it when I make a guess and it turns out to be right.  I had somehow munged my iptables in such a way as to prevent me from being able to print... resetting iptables fixed the problem.14:28
mgzdimitern: SURE14:28
mgzer, caps14:28
dimitern:) sounds like you're too eager?14:28
fwereadedimitern, mgz, ok, sgtm, I have half an hour14:29
dimiternfwereade, so now then? i'll send a link14:29
natefinchalso rogpeppe1: thanks, updating stuff fixed my test failures.. I actually got them to pass on the first try. Amazing.14:29
rogpeppe1natefinch: yay!14:30
dimiternmgz, fwereade: https://plus.google.com/hangouts/_/calendar/bWFyay5yYW1tLWNocmlzdGVuc2VuQGNhbm9uaWNhbC5jb20.3tn7jebub5jn5mhuh5sf8acd7014:30
rogpeppe1dimitern: reviewed14:42
dimiternrogpeppe1, ta14:42
natefinchman I hate it when foo --help bar doesn't return help about bar16:01
rogpeppe1natefinch: ha yes16:03
rogpeppe1natefinch: s3cmd being one example16:04
natefinchmongod --replset has an optional seed list that you can append.... but I can't find what the format of the seed list is supposed to be16:04
rogpeppe1lunch16:09
rogpeppe1fwereade: we're not planning to lose default-series entirely, are we?16:30
fwereaderogpeppe1, I was hoping we could eventually tbh16:31
rogpeppe1fwereade: if we do, then what should EnsureAvailability use when it starts new machines?16:32
fwereaderogpeppe1, I think it uses something similar-but-different? default-series as controller of charm series should definitely not be depended upon long-term16:33
rogpeppe1fwereade: i guess we just have series as an argument to EnsureAvailability16:33
fwereaderogpeppe1, state-server-series perhaps? seems probably smart to deploy mongo across the same OSs where possible...16:34
rogpeppe1fwereade: i'm not sure16:34
rogpeppe1fwereade: i'm not sure we want to state that people *must* do that16:35
natefinchfwereade, rogpeppe1:  seems like defaulting to latest LTS is probably a sane default.... do most people even really care what OS their servers are running?16:35
fwereaderogpeppe1, natefinch: I think that would certainly default to latest-lts16:35
fwereaderogpeppe1, natefinch: but I can imagine reasonable use cases -- certain charms require a different version, and you want to deploy them densely, so you want all your machines to be... unctuous, or whatever we may call it16:36
rogpeppe1fwereade: yeah, i was thinking that too16:36
natefinchrogpeppe1, fwereade: yes, but I would hope most charms run well on latest LTS16:38
rogpeppe1natefinch: i doubt it16:38
fwereadenatefinch, the particular case we've seen is needing a newer kernel version16:38
rogpeppe1natefinch: i suspect most charms will be on precise for a long time16:38
fwereaderogpeppe1, not so sure, that's being actively worked on16:39
rogpeppe1fwereade: i'll believe it when i see it :-)16:39
natefinchrogpeppe1, fwereade: that's still one of the things that surprises me about ubuntu (and linux in general) - that stuff which worked on the OS 2 years ago is assumed to be broken on the latest version.16:41
natefinchrogpeppe1, fwereade: not just assumed, but often is16:41
rogpeppe1natefinch: i agree, but that's just something we have to work with16:42
rogpeppe1natefinch: everybody assumes everything is utterly unportable16:43
rogpeppe1natefinch: once upon a time, you could actually do things portably across unixes, let alone linuxes16:43
TheMuenatefinch: os/2? ah, i loved it. and scripting with rexx, even with ui (used watcom). editor has been spf/2 (i came from the mainframe at that time)16:43
natefinchrogpeppe1: boggles my mind... coming from Windows where stuff written for XP 13 years ago still works on Windows *\816:43
natefinchrogpeppe1: btw, that extra info from replicaset code finally landed16:54
rogpeppe1natefinch: <o/16:54
rogpeppe1natefinch: \o/ even :-)16:54
natefinchrogpeppe1: only took two tries to pass the tests this time16:55
natefinchrogpeppe1: have some time to talk about EnsureMongoServer, now that I can actually get back to that?16:58
rogpeppe1natefinch: sure16:58
natefinchrogpeppe1: so it is just a matter of rewriting the upstart job as appropriate?16:59
rogpeppe1natefinch: yeah, and checking whether the upstart job is running already or not17:00
natefinchrogpeppe1: don't we need to rewrite it even if one is running?  Thinking of upgrade and/or when the list of servers changes17:00
rogpeppe1natefinch: i hope not. i don't want to have the list of servers inside the upstart file.17:01
natefinchrogpeppe1: ahh, ok, I misunderstood some of the text.  Yeah, I think it's best not to have the list in the upstart file (and should be unecessary)17:02
rogpeppe1natefinch: i think the upstart job should probably just run a shell script that gets the server list from somewhere, and upgrades could upgrade that.17:02
natefinchrogpeppe1: so, we already have upstart.MongoUpstartService ... is there anything else to do but just update that with --replSet juju?17:03
natefinchrogpeppe1: I don't think we even really need the list of servers to start mongo17:04
rogpeppe1natefinch: no?17:04
rogpeppe1natefinch: how does it find out about its peers?17:04
natefinchrogpeppe1: when you add it to the member list on the primary, magic happens, and it joins the group.  You don't have to directly tell the secondary about the rest of the servers (I think likely the primary pings it to let it know there's a replset in existence)17:05
rogpeppe1natefinch: ah, of course!17:06
rogpeppe1natefinch: because all servers connect directly to each other17:06
natefinchrogpeppe1: right17:06
rogpeppe1natefinch: in which case, i think you're right17:07
natefinchrogpeppe1: well, cool.17:07
natefinchrogpeppe1: we do still have to fix the upstart script on upgrade, though17:11
rogpeppe1natefinch: yeah, the first time17:11
rogpeppe1natefinch: (and of course if we want to change the mongo args, but that's another matter)17:11
natefinchrogpeppe1: yeah I meant changing the args (to add --replSet juju).   Figured it's better just to always update the script when we update juju17:17
rogpeppe1natefinch: seems reasonable17:17
rogpeppe1natefinch: but i don't think we always want to restart the service, do we?17:18
natefinchrogpeppe1: don't we restart the service by definition while upgrading?17:20
rogpeppe1natefinch: i'm not sure. currently we don't restart any service. perhaps that's reasonable to do though.17:21
rogpeppe1natefinch: (there are two services involved here, right?17:21
rogpeppe1)17:21
natefinchrogpeppe1: right, yeah, I was thinking about it incorrectly.17:22
natefinchrogpeppe1: so, I'm not sure where or when we'd call the code to recreate the upstart script17:30
rogpeppe1natefinch: in EnsureMongoServer?17:31
natefinchrogpeppe1: well, yes.  I thought I might need to actually call that function from somewhere, though17:33
rogpeppe1natefinch: yes, that function will be called from jujud17:33
rogpeppe1natefinch: inside the machine agent logic17:33
rogpeppe1natefinch: when the machine agent finds that it has a ManageState job17:33
natefinchrogpeppe1: So you're saying you'll have the code to call it?17:33
rogpeppe1natefinch: yeah - one of us will write it. EnsureMongoService is a primitive we'll use17:34
natefinchlunchtime for me17:44
=== natefinch is now known as natefinch-lunch
hazmatrogpeppe1, got a bug report against deployer in #juju .. http://paste.ubuntu.com/6757161/ .. its an error message from the watcher impl that the watcher is stopped17:54
hazmatwe're seeing it in a few different contexts, i'm just curious if this is normal behavior17:55
rogpeppe1it probably means that the state watcher has been stopped :-)17:56
rogpeppe1hazmat: can you reproduce it?17:56
hazmatrogpeppe1, but why would the watcher be stopped outside of the client requesting it?17:56
rogpeppe1hazmat: the watcher should only be stopped if the state is closed17:57
rogpeppe1hazmat: i'd like to see a transcript of the API messages17:57
rogpeppe1hazmat: a copy of machine-0.log would be really useful17:57
=== bjf is now known as bjf[afk]
hazmatrogpeppe1, ack, asking17:58
rogpeppe1hazmat: i tell a lie. it can happen if either the watcher or the underlying state was closed18:04
hazmatrogpeppe1, i've got the api server log.. do you have a chinstrap account?18:04
rogpeppe1hazmat: i think so18:04
hazmatrogpeppe1, its in  ~kapil/machine-0.log18:05
hazmatrogpeppe1, yeah.. i'm thinking its client error, i don't recall the gui folks have ever complained about it, but i've seen a few reports against deployer18:06
rogpeppe1hazmat: hmm, interesting.18:11
hazmatrogpeppe1, anything of note there? it looks like stop is being called, but there's a lot of line noise.18:23
rogpeppe1hazmat: i can't see Stop being called (by that client anyway)18:23
rogpeppe1hazmat: i think the only interaction that client had with the API server is in the messages in ~rog/select.log on chinstrap18:27
rogpeppe1hazmat: i can't currently see a way that it could be happening18:28
hazmatrogpeppe1, hmm.. perhaps some isolation issue around multiple allwatchers?18:29
rogpeppe1hazmat: that's what i'm looking for, but it looks pretty tight to me18:29
rogpeppe1hazmat: it would help if there weren't two distinct errors that "state watcher was stopped" represents18:30
rogpeppe1hazmat: (there's a TODO in the code to change one of them)18:30
rogpeppe1hazmat: i don't see how it can happen, but there are a few places where better logging could help us. i'll fix that up so the next time it happens we'll have a bit more useful info.18:43
rogpeppe1hazmat: i think it might not be coincidence that client [1] goes away at a similar time to client [1A] getting the "state watcher is stopped" message18:44
rogpeppe1hazmat: but we don't log clients leaving, so i can't be sure18:44
hazmatrogpeppe1, well when the watch error happens its going to kill the process which stops a separate control connection18:45
hazmatrogpeppe1, fwiw this is the bug tracking https://bugs.launchpad.net/juju-core/+bug/126951918:45
_mup_Bug #1269519: Error on allwatcher api <juju-core:New> <juju-deployer:New> <https://launchpad.net/bugs/1269519>18:46
rogpeppe1hazmat: ah, this is a python client which will be using a separate connection for each operation, yeah18:46
hazmatrogpeppe1, no..18:46
hazmatrogpeppe1, multiple operations on one connection, watches on separate connections18:46
hazmater.. optionally watches on separate connections18:47
rogpeppe1hazmat: that's what i meant to say :-)18:47
hazmat:-)18:47
rogpeppe1hazmat: but multiple connections for a single client, anyway18:47
hazmatyup18:47
=== natefinch-lunch is now known as natefinch
thumpermorning19:40
natefinchmorning thumper19:44
thumpermorning natefinch19:44
natefinchthumper: btw, problem I had last night was out of date dependencies.  Man, wish there was a better way to keep that from happening.19:47
thumpernatefinch: yeah...19:47
thumpernatefinch: also, I noticed that godep doesn't fetch the remote branches19:47
thumperit assumes that the revisions are there19:47
thumperand just sets the working tree revision19:47
natefinchthumper: roger was feeling bad about that this morning19:47
natefinchthumper: in practice, what we really just need is a cron job to update those branches to head once a day19:48
thumperI think it should make the branch that are there actually have a tip of what we depend on19:48
natefinchthumper: we could just add a "juju" tag and update the tag as appropriate... then the aforementioned cron job could keep the local in sync with the tag19:49
natefinchsame idea, basically19:50
thumperI don't think it is that hard..19:50
thumperand we could have a simple make target that does the godep call19:50
thumpermake dep-update19:50
thumperor something19:50
thumperand include the ability for a quick check19:51
thumperdon't fetch, just check that the tip of each dependency matches the file19:51
thumperthat should be super fast19:51
thumperand could be part of the default make targets19:51
thumperI know some people aren't fans of makefiles19:52
thumperbut they are handy19:52
* thumper takes bug 126936319:54
_mup_Bug #1269363: local environment broken with root perms <local-provider> <ssh> <juju-core:In Progress by thumper> <https://launchpad.net/bugs/1269363>19:54
natefinchthumper: got a sec?20:28
thumpernatefinch: sure20:29
natefinchthumper: I need to find a place to rewrite the mongo upstart script, so we can add --replSet juju to the command line that we run, and then restart mongo20:30
natefinchthumper: roger had said we should do it in the machine agent somewhere, which is fine... except that I'm not sure it has access to the right information to write the upstart script20:31
thumperyeah...20:32
thumperI've been thinking about that20:32
natefinchupstart.MongoUpstartService() takes the mongo data directory and port20:32
thumperas part of the upgrade stuff20:32
natefinchcloudinit gets those from the MachineConfig, but I don't see a way for the machine agent to get to that info20:32
thumperhmm...20:33
thumpernot sure...20:33
natefinchseems like half of software development is just figuring out how to get information from here to over there20:34
thumperfor sure20:34
thumperI'm in that situation right now too20:35
thumperI know the problem, know what causes it,20:35
thumperjust fixing it right...20:35
natefinchyep20:35
thumperthat's the hard bit20:35
natefinchit would help if I was more familiar with the way all the code in this area interacts.  I guess now is the time to start figuring that out :)20:37
thumper:)20:37
natefinchwell that's confusing..... there's an environs/cloudinit.go and an environs/cloudinit/cloudinit.go20:51
thumpernatefinch: but wait, there's more...21:05
thumperthere is cloudinit/cloudinit.go21:05
natefinchwow, that is.... something else21:07
thumperyes21:09
thumperyes it is21:09
thumpernaming shit is hard21:09
natefinchthat's true21:10
natefinchtime for the old "I don't know where to put it, so just pick some place and let it shake out in the reviews"21:32
thumper:)21:35
natefinchI hate it when something as stupid as "append 'db' to the end of the path" turns into a whole pain in the ass of "well, now I need a central place to keep this logic"21:36
natefinchwhich of course is like 80% of actual programmnig21:36
rogpeppe2natefinch: why wouldn't the machine agent have the right info to rewrite the upstart script?21:49
rogpeppe2thumper, natefinch: a review of this would be appreciated: https://codereview.appspot.com/5285004321:49
* thumper nods...21:49
natefinchrogpeppe2: two things, one is that the mongo directory is "db" under the machine's data directory, but that code was only in MachineConfig.addMongoToBoot21:51
natefinchrogpeppe2: the other thing is the mongo port21:51
=== rogpeppe2 is now known as rogpeppe
rogpeppenatefinch: i don't understand the first21:52
rogpeppenatefinch: you can get the mongo port from state.EnvironConfig21:52
natefinchrogpeppe: the first is just that there was a piece of code hidden away in cloudinit that needed to be put somewhere accessible to the rest of the world21:53
rogpeppenatefinch: definitely. i want to move it out of cloudinit entirely21:53
rogpeppenatefinch: i'm hoping that jujud init can start the mongo server itself rather than it being done in cloudinit21:55
natefinchrogpeppe: not really sure how to get EnvironConfig from machineagent either....22:01
rogpeppenatefinch: it might require a new API call. let me check.22:01
natefinchsorry, gotta run, realized it's EOD for me.  email me if you figure it out, rogpeppe, otherwise I'm sure I can figure it out... just didn't know if there was an obvious place where that info was that I wasn't seeing.22:04
rogpeppenatefinch: it's available in the provisioner API, which is available to the machine agent, but i think it should be added to the machiner22:04
rogpeppeoh, too late22:04
hazmatrogpeppe, getting more reports of that same issue re stop watcher, in terms of helping to debug it..22:07
hazmatjust turn up the log level and hand over more logs?22:07
rogpeppehazmat: i've just proposed a CL that might help slightly in trying to narrow down the issue: https://codereview.appspot.com/5285004322:08
hazmatrogpeppe, cool22:08
* thumper goes to check out some office space in town22:10
rogpeppeaxw: ping22:27

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!