/srv/irclogs.ubuntu.com/2014/04/15/#juju-dev.txt

hazmatsmoser, interesting.. coreos guys rewrote cloudinit in go..00:03
smoseri hadnt' seen that.00:04
hazmatsmoser, its very limited subset and assumes coreos /systemd https://github.com/coreos/coreos-cloudinit00:06
hazmatits a bit much for them to call it  cloudinit... its almost zero feature set overlap00:07
perrito666did anyone see fwereade after this am? (and when I say AM I mean GMT-3 AM)00:16
davecheneyperrito666: its unusual to see him online at this time00:18
perrito666davecheney: I know, he just said that he was taking a plane and returning later and then I got disconnected00:19
davecheneyperrito666: ok, you probably know more than i then00:26
perrito666heh tx davecheney00:27
hazmathmm.. odd /bin/sh: 1: exec: /var/lib/juju/tools/unit-mysql-0/jujud: not found00:38
sinzuihazmat, looks like the last message in juju-ci-machine-0's log. Jujud just disappeared 2 weeks ago. Since that machine is the gateway into the ppc testing, we left it where it was00:41
sinzuithumper, I can hangout now00:42
hazmatsinzui, its odd its there.. the issue is deployer/simple.go00:42
hazmatit removes the symlink on failure, but afaics that method never failed, the last line is install the upstart job, and the job is present on disk.00:43
thumpersinzui: just munching00:44
thumperwith you shortly00:44
* sinzui watches ci00:44
hazmatsinzui, ie its resolvable with sudo ln -s /var/lib/juju/tools/1.18.1-precise-amd64/  /var/lib/juju/tools/unit-owncloud-000:44
hazmathmm.. its as though the removeOnErr was firing00:45
hazmateven on success00:45
* sinzui nods00:47
thumpersinzui: https://plus.google.com/hangouts/_/76cpik697jvk5a93b3md4vcuc8?hl=en00:49
sinzuiwallyworld, jam: looks like all the upgrade test are indeed fixed. I disabled the local-upgrade test for thumper. I will retest when I have the time or when the next rev lands00:50
wallyworld\o/00:50
thumpersinzui: do local upgrade and local deploy run on the same machine?00:50
thumpersinzui: can't hear you00:50
wallyworldsinzui: so if thumper actually pulls his finger out, we could release 1.19.0 real soon now?00:50
hazmatdeployer worker is a bit strange .. does it use a tombstone to communicate back to the runner?00:53
hazmatthumper, when you have a moment i'd like to chat as well..00:56
thumperhazmat: ack00:56
wallyworldhazmat: the deployer worker is similar to most others, it is created by machine agent but wrapping it inside a worker.NewSimpleWorker00:59
hazmatwallyworld, ah. thanks01:00
wallyworldnp. that worker stuff still confuses me each time i have to re-read the code01:01
hazmatthe pattern is a bit different01:04
hazmattrying to figure out why i'd get 2014-04-15 00:00:42 INFO juju runner.go:262 worker: start "1-container-watcher"  .. when there are no containers.. basically my manual provider + lxc seems a bit busted with 1.1801:04
hazmatalso trying to figure out if on a simpleworker erroring, if the runner will just ignore it and move on.01:04
hazmatwith no log01:04
hazmatthe nutshell being deploy workloads gets that jujud not found01:05
* hazmat instruments01:05
thumperhazmat: whazzup?01:08
hazmatthumper, trying to debug 1.18 with lxc + manual01:11
hazmatthumper, mostly in the backlog01:11
sinzuiWow.01:11
sinzuiabentley replace the mysql + wordpress charms with dummy charms that instrument and report what juju is up to. They have take 2-4minutes off of all the tests01:13
sinzuiAzure deploy in under 20 minutes01:13
sinzuiAWS is almost as fast as HP Cloud01:14
davecheneysinzui: \o/01:16
waiganiwallyworld: should I patch envtools.BundleTools in a test suite e.g. coretesting? Or should I copy the mocked function to each package that is failing and patch there?01:17
waiganiwallyworld: it's just there seem to be a lot of tests that are all effected/fixed by this patch01:18
wallyworlduse s.PatchValue01:18
waiganiwallyworld: yep I am01:18
waiganibut should I do it in a more generic suite?01:18
wallyworldso if the failures are clustered in a particular suite, you can use that in SetUpTest01:18
wallyworldnot sure it's worth doing a fixture for a one liner01:19
waiganiwallyworld: that is what I'm doing now, but aready I've done that in about 4 packages, with more to go01:19
waiganiwallyworld: oh okay, you mean just patch in each individual test?01:19
wallyworldpossibly, depends on hwere the failures are01:20
waiganiokay, I'll do it the verbose way and we can cleanup in review if needed01:20
wallyworldbut if the failures are in a manageable nuber of suites, doing the patc in setuptest makes sense01:20
waiganiokay01:21
thumperwhat the actual fuck!01:28
sinzuiwallyworld, CI hates the unit-tests on precise. Have you seen these tests fail consistently in pains before? http://ec2-54-84-137-170.compute-1.amazonaws.com:8080/job/run-unit-tests-amd64-precise/617/console01:33
sinzui^ The last three runs on different precise instances has tghe same failure01:34
thumpersinzui: I have some binaries copying to the machine01:34
wallyworldsinzui: i haven't seen those. and one of them, TestOpenStateWithoutAdmin, is the test added in the branch i landed for john to make upgrades work01:35
sinzuithank you thumper.01:35
wallyworldso it seems there's a mongo/precise issue01:35
wallyworldthumper: were you running some tests in a precise vm?01:36
thumperwallyworld: I have a real life precise machine01:37
thumperwallyworld: that it works fine01:37
thumperon01:37
thumperI've hooked up loggo to the mgo internals logging01:37
thumperso we can get internal mongo logging out of the bootstrap command01:37
thumperuploading some binaries now01:37
wallyworldhmm. so what's different on jenkins then to cause the tests to fail01:37
thumpernot sure01:37
thumpersame version of mongo01:38
thumpermy desktop is i38601:38
thumperci is amd6401:38
thumperthat is all I can come up with so far01:38
wallyworldif that is th cause then we're doomed01:38
thumper:-)01:38
thumperFSVO doomed01:38
wallyworldyeah :-)01:38
thumperthe error is that something inside mgo is explicitly closing the socket01:39
thumperwhen we ask to set up the replica set01:39
wallyworldthumper: so, one thing it could be - HA added an admin db01:39
thumperhence the desire for mor logging01:39
thumperwallyworld: my binaries work locally01:39
thumperand copying up01:39
thumperif that is the case01:39
thumperand my binaries work01:39
wallyworldand the recently added test which i reference above tests that we can ignore unuath access to that db01:39
thumperit could be that01:39
* thumper ndos01:40
wallyworldand that test fails01:40
thumperstill copying that file01:40
* thumper waits...01:40
* wallyworld waits too....01:40
thumperand here I was wanting to sleep01:40
thumpernot feeling too flash01:40
wallyworld:-(01:40
hazmatthumper, sinzui fwiw.  my issue was user error around series. i have trusty containers but had registered them as precise, machine agent deployed fine, unit agents didn't like it though. unsupported usage mode.01:41
thumperhaha01:41
hazmatthumper, concievably the same happens when you dist-upgrade a machine01:41
sinzuithumper, wallyworld: the machines the run the unit tests are amd64 m1.larges for precise and trusty. We 95% of users deploy top amd6401:41
thumperhmm...01:41
thumpersinzui: right...01:42
sinzuiwe saw numbers that showed a very small number were 1386, we assume those are clients, not services01:42
* thumper nods01:42
thumperwallyworld: can I get you to try the aws reproduction?01:42
thumperwallyworld: are you busy with anything else?01:42
wallyworldi am but i can01:43
wallyworldwhat's up with aws?01:43
thumperjust trying to replicate the issues that we are seeing on CI with the local provider not bootstrapping01:43
thumperit works on trusty for me01:44
thumperand precise/i38601:44
thumperbut we should check real precise amd6401:44
wallyworldok, so you want to spin up an aws precise amd64 and try there01:44
thumperright01:45
wallyworldokey dokey01:45
thumperinstall juju / juju-local01:45
wallyworldyarp01:45
thumperprobably need to copy local 1.19 binaries01:45
thumperto avoid building on aws01:45
wallyworldright01:45
thumperugh...01:51
thumperman I'm confused01:51
thumperwallyworld: sinzui: using my extra logging http://paste.ubuntu.com/7253010/01:57
thumperso not a recent fix issue01:57
wallyworldthumper: we should just disable the replica set stuff01:58
wallyworldit has broken so much01:58
thumperperhaps worth doing for the local provider at least01:58
thumperwe are never going to want HA on local01:58
thumperit makes no sense01:58
sinzuiclosed explicitly? That's like the computer says no01:59
thumpersinzui: ack01:59
* thumper has a call now02:01
sinzuiaxw, Is there any more I should say about azure availability sets? https://docs.google.com/a/canonical.com/document/d/1BXYrLC78H3H9Cv4e_4XMcZ3mAkTcp6nx4v1wdN650jw/edit02:06
axwsinzui: otp02:12
wallyworldthumper: sinzui: i'm going to test this patch to disable the mongo replicaset setup for local provider https://pastebin.canonical.com/108522/02:18
wallyworldthis should revert local bootstrap to be closer to how it was prior to HA stuff being added02:19
wallyworldand hence it should remove the error in thumper's log above hopefully02:19
axwsinzui: can I have permissions to add comments?02:20
thumpersinzui: this line is a bit suspect 2014-04-15 02:20:44 DEBUG mgo server.go:297 Ping for 127.0.0.1:37019 is 15000 ms02:21
thumpersinzui: locally I have 0ms02:21
sinzuisorry axw I gave all canonical write access as I intended02:22
axwsinzui: ta02:22
* sinzui looks in /etc02:22
axwsinzui: availability-sets-enabled=true by default; I'll update the notes02:23
thumperwallyworld: that patch is wrong02:27
wallyworldi know02:27
wallyworldfound that out02:27
wallyworlddoing it differently02:27
thumperwallyworld: jujud/bootstrap.go line 165, return there if local02:28
wallyworldyep02:28
axwsinzui: I updated the azure section, would you mind reading over it to see if it makes sense to you?02:37
sinzuiThank you axw. Looks great02:43
thumpersinzui: wallyworld, axw: bootstrap failure with debug mgo logs: http://paste.ubuntu.com/7253155/02:44
thumpersinzui: I don't know enough to be able to interpret the errors02:44
thumpersinzui: perhaps we need gustavo for it02:44
sinzuithanks for playing thumper02:44
wallyworldsinzui: can you re-enable local provider tests in CI? i will do a branch to try and fix it and then when landed CI can tell us if it works02:45
thumpersinzui: I'm done with the machine now02:45
sinzuiI will re-enable the tests02:45
wallyworldthanks02:46
wallyworldlet's see if the next branch i land works02:46
sinzuithumper, wallyworld . I think you had decided to disable HA on local...and how would I do HA with local...Does that other machine get proper access to my local machine that probably has died with me at the keyboard02:47
thumpersinzui: you wouldn't do HA with the local provider02:47
thumper:)02:47
wallyworldsinzui: we are trying to set up replicaset and other stuff which is just failing with local and for 1.19 t least, i can't see why we would want that02:48
sinzui:)02:48
wallyworldso to get 1.19 out, we can disable and think about it later02:48
sinzuiwallyworld, really, I don't think we ever need to offer HA for local provider.02:49
wallyworldmaybe for testing02:49
wallyworldbut i agree with you02:49
wallyworldi was being cautious in case others were attached to the idea02:49
wallyworldaxw: this should make local provider happy again on trunkhttps://codereview.appspot.com/8783004403:45
axwwallyworld: was afk, looking now03:57
wallyworldta04:01
axwwallyworld: reviewed04:05
wallyworldta04:05
wallyworldaxw: everyone hates that we use lcal provider checks in jujud04:05
wallyworldbeen a todo for a while to fix04:06
axwyeah, I kind of wish we didn't have to disable replicasets at all though04:06
axwI know they're not needed, but if they just worked it would be nice to not have a separate code path04:06
wallyworldaxw: yeah. we could for 1.19.1, but we need 1.19 out the door and HA still isn't quite ready anuway04:07
wallyworldit is indeed a bandaid. nate added another last week also04:08
axwwallyworld: yep, understood04:09
wallyworldmakes me sad too though04:09
sinzuiwallyworld, Your hack solved local. The last probable issue is the broken unit tests for precise. I reported bug 130783605:12
_mup_Bug #1307836: Ci unititests fail on precise <ci> <precise> <test-failure> <juju-core:Triaged> <https://launchpad.net/bugs/1307836>05:12
wallyworldsinzui: yeah, i just saw that but didn't think you'd be awake05:13
sinzuiI don't want to be awake05:13
wallyworldi didn't realise we still had the precise issue :-(05:14
wallyworldi'll look at the logs05:14
wallyworldhopefully we'll have some good news when you wake up05:14
sinzuiwallyworld, azure-upgrade hasn't passed yet. It may not because azure is unwell this hour. We don't need to worry about a failure for azure. I can ask for a retest when the cloud is better05:22
wallyworldrighto05:23
* sinzui finds pillow05:23
wallyworldgood night05:23
davecheney  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND05:32
davecheney 7718 ubuntu    20   0 2513408 1.564g  25152 S  45.2 19.6   2:41.51 juju.test05:32
davecheneymemory usage for Go tests is out of control05:32
wallyworldjam1: you online?05:47
jam1morning wallyworld05:47
jam1I am05:47
wallyworldg'day05:47
wallyworldjam1: so with you branch, and one i did, CI is happy for upgrades05:47
wallyworldbut05:47
wallyworlda couple of tests fail under precise05:47
wallyworldthere's the one you added for your branch, plus TestInitializeStateFailsSecondTime05:48
jam1wallyworld: links to failing tests ?05:48
wallyworldthe error says that a connection to mongo is unauth05:48
wallyworldhttp://ec2-54-84-137-170.compute-1.amazonaws.com:8080/job/run-unit-tests-amd64-precise/621/consoleFull05:48
jam1wallyworld: and are you able to see the local provider fail with replica set stuff, because neither Tim or I could reproduce it.05:48
wallyworldyeah, i saw it05:49
wallyworldand fixed05:49
wallyworldi had to disable HA for local provider05:49
jam1and while we don't have to have replica set local, I'd prefer consistency and the ability to test out HA locally if we could05:49
wallyworldsure05:49
wallyworldbut to get 1.19 out the door i went for a quick fix05:49
wallyworldwhich we can revisit in 1.19.105:49
wallyworldcurtis was ok with that05:50
jam1wallyworld: so I certainly had a WTF about why I was able to create a machine in "admin" but not able to delete it without logging in as the admin I just created.05:50
jam1wallyworld: so it seems like some versions of Mongo don't have that security hole05:50
jam1but I can't figure out how to log in as an actual admin, but I can try digging into the TestInitialize stuff a bit more for my test.05:50
wallyworldso we are using a different mongo on precise vs trusty?05:51
jam1wallyworld: 2.4.6 vs 2.4.905:51
wallyworldok, i didn't realise that05:51
jam1Trusty is the one that lets you do WTF stuff.05:51
wallyworld:-(05:51
wallyworldthere are 2 failing tests05:51
wallyworldmaybe more, i seem to recall previous logs showing more05:51
wallyworldbut the latest run had 2 failures only05:52
wallyworldthe other one was TestInitializeStateFailsSecondTime05:52
wallyworldjam1: i gotta run to an appointment soon, but will check back when i return. if we can this this sroted, we can at least release 1.19.0 asap and deal with the workarounds for 1.19.105:53
jam1wallyworld: is your code landed?05:54
wallyworldyep05:54
jam1k05:54
wallyworldhappy to revert it if we can find a fix05:54
jam1I'll pick it up05:54
wallyworldthanks, i can look also but ony found out about precise tests just before and sadly i gotta duck out05:55
jam1hmm... LP failing to load for me right now05:56
davecheneywallyworld: Ci is running an anchient version of mongo05:56
davecheneythat won't help05:56
jam1davecheney: sinzui: I would think we should run mongo 2.4.6 which is the one you get from the cloud-archive:tools06:02
davecheneyjam1: agreed06:04
jam1davecheney: are they running 2.2.4 from the PPA?06:07
davecheneyjam1: good point, 2.0 was all that shipped in precise06:09
jam1I'm just trying to find a way to reproduce, and I thought there was a 2.4.0 out there for a while, but I can't find it06:09
jam1and it isn't clear *what* version they are runnig.06:10
davecheneyjam1: Get:40 http://ppa.launchpad.net/juju/stable/ubuntu/ precise/main mongodb-clients amd64 1:2.2.4-0ubuntu1~ubuntu12.04.1~juju1 [20.1 MB]06:10
davecheneyGet:41 http://ppa.launchpad.net/juju/stable/ubuntu/ precise/main mongodb-server amd64 1:2.2.4-0ubuntu1~ubuntu12.04.1~juju1 [5,135 kB]06:10
davecheneythis is our fault06:10
davecheneyremember that old ppa06:10
jam1yep, thanks for pointing me to it06:10
jam1well, I can at least test with it.06:10
davecheneyso, that isn't he cloud archive06:10
davecheney:emoji concerned face06:10
jam1At one point we probably wanted to maintain compat with 2.2.4, but I'm not *as* concerned with it anymore.06:10
davecheney2.2.4 never shipped in any main archive06:12
davecheneyi don't think we have a duty of compatability06:12
davecheneyhttps://bugs.launchpad.net/juju-core/+bug/1307289/comments/106:12
davecheneyif anyone cares06:12
davecheneybtw, go test ./cmd/juju{,d}06:12
davecheneytakes an age because the test setup is constantly recompiling the tools06:12
davecheneywhy are the cmd/juju tests calling out to bzr ?06:27
davecheney FAIL: publish_test.go:75: PublishSuite.SetUpTest06:33
davecheneypublish_test.go:86:06:33
davecheney    c.Assert(err, gc.IsNil)06:33
davecheney... value *errors.errorString = &errors.errorString{s:"error running \"bzr init\": exec: \"bzr\": executable file not found in $PATH"} ("error running \"bzr init\": exec:06:33
davecheney \"bzr\": executable file not found in $PATH")06:33
davecheneywhat is this shit ?06:33
rogpeppemornin' all06:44
davecheneyhttps://bugs.launchpad.net/juju-core/+bug/130786506:47
davecheneythis seems like an obvious failure06:47
davecheneywhy does it only happen sporadically ?06:47
rogpeppedavecheney: that's been the case for over a year (tests running bzr)06:47
davecheneyrogpeppe: fair enough06:48
rogpeppedavecheney: i agree, that does seem odd06:48
jam1rogpeppe: do we have thoughts on how we would have a Provider work that didn't have storage? I know we don't particularly prefer the HTTP Storage stuff that we have.06:50
rogpeppejam1: we'd need to provide something to the provider that enabled it to fetch tools from the mongo-based storage06:51
jam1rogpeppe: so we'd have to do away with "provider-state" file as well, right?06:51
rogpeppejam1: other than that, i don't think providers rely much on storage, do they?06:51
jam1rogpeppe: we use it for charms06:51
rogpeppejam1: so... provider-state is *supposed* to be an implementation detail of a given provider06:52
jam1sure06:52
jam1it is in the "common code" path, but you wouldn't have to use it/could make that part optional06:52
rogpeppejam1: we don't really rely on it much these days06:53
jam1rogpeppe: we'd want bootstrap to cache the API creds and then we rely on it very little06:53
jam1you'd lose the fallback path06:53
rogpeppejam1: yeah, and we don't want to lose that entirely06:54
rogpeppejam1: for a provider-state replacement, i'd like to see the fallback path factored out of the providers entirely06:54
jam1well, it only works because there is a "known location" we can look in that is reasonably reliable. If a cloud doesn't provide its own storage, then any other location is just guesswork06:54
jam1anyway, switching machines now06:55
rogpeppejam1: ok06:55
rogpeppeaxw: looking at http://paste.ubuntu.com/7252280/, in the first status machines 3 and 4 are up AFAICS.06:56
rogpeppeaxw: and that's the status that i am presuming that ensure-availability was acting on06:57
axwrogpeppe: in the first one, yes, but how do you know when they went down?06:57
axwrogpeppe: my point was it could have changed since you did "juju status"06:58
rogpeppeaxw: there was a very short time between the first status and calling ensure-availability. i don't see any particular reason for it to have gone down in that time period, although of course i can't be absolutely sure06:58
axwright, that's why I asked about the log. I'm really only guessing06:58
rogpeppeaxw: luckily i still have all the machines up, so i can check the log06:59
axwrogpeppe: I see no reason why the agent would have gone down after calling ensure-availability either06:59
axwcool06:59
rogpeppeaxw: it would necessarily go down after calling ensure-availability, because mongo reconfigures itself and agents get thrown out07:00
axwrogpeppe: for *all* machines? not just the shunned ones?07:01
rogpeppeaxw: yeah07:01
rogpeppeaxw: we could really do with some logging in ensure-availability to give us some insight into why it's making the decisions it is07:01
axwyeah, fair enough07:02
=== vladk|offline is now known as vladk
rogpeppeaxw: here's the relevant log: http://paste.ubuntu.com/7252375/07:16
rogpeppeaxw: the relevant EnsureAvailability call is the second one, i think07:17
rogpeppeaxw: it's surprising that the connection goes down so quickly after that call07:17
axwrogpeppe: wrong pastebin?07:17
rogpeppeaxw: ha, yes: http://paste.ubuntu.com/7253848/07:18
axwrogpeppe: machine-3's API workers have dialled to machine-0's API server ...07:38
axwrogpeppe: not saying that's the cause, but it's strange I think07:38
rogpeppeaxw: that's not ideal, but it's understandable07:39
rogpeppeaxw: one change i want to make is to make every environ manager machine dial the API server only on its own machine07:39
axwyep07:39
jamaxw: rogpeppe: right, we originally only wrote "localhost" into the agent.conf. I think the bug is that the connection caching logic is overwriting that ?07:42
rogpeppejam: yeah - each agent watches the api addresses and caches them07:42
jamrogpeppe: I thought when we spec'd the work we were going to explicitly skip overwritting when the agents were "localhost"07:43
rogpeppejam: but also, the first API address received by a new agent is not going to be localhost07:43
jamrogpeppe: well, the thing that monitors it could just do if self.IsMaster() => localhost07:43
rogpeppejam: i don't remember that explicitly07:44
jamor not run the address poller if IsMaster07:44
jamsorry07:44
jamIsManager07:44
jamnot Master07:44
rogpeppejam: i don't think it's IsMaster - i think it's is-environ-manager07:44
rogpeppejam: right07:44
rogpeppejam: i've been thinking about whether to run the address poller if we're an environ manager07:45
rogpeppes/poller/watcher/07:45
rogpeppejam: my general feeling is that it is probably worth it anyway07:45
rogpeppejam: because machines can lose their environment manager status07:46
rogpeppejam: even though we don't fully support that yet07:46
jamrogpeppe: won't they get bounced under that circumstance?07:47
jamanyway, we can either simplify it by what we write in agent.conf, or we could detect that we are IsManager and if so force localhost at api.Open time.07:47
rogpeppejam: they'll get bounced, but if they do we want them to know where the other API hosts are07:48
rogpeppejam: i was thinking of going for your latter option above07:48
axwrogpeppe: I can't really see much from the logs, I'm afraid. there is one interesting thing: "dialled mongo successfully" just after FullStatus and before EnsureAvailability07:51
rogpeppeaxw: i couldn't glean much from them either07:51
rogpeppeaxw: i'm just doing a branch that adds some logging to EnsureAvailability07:51
rogpeppeaxw: then i'll try the live tests again to see if i can see what's going on07:52
axwrogpeppe: any idea why agent-state shows up as "down" just after I bootstrap? should FullStatus be forcing a resynchronisation of state?07:55
rogpeppeaxw: i think it's because the presence data hasn't caught up07:55
axwrogpeppe: oh. I wonder if that's it? FullStatus may be reporting wrong agent state in your test too07:55
rogpeppeaxw: we should definitely look into that07:55
rogpeppeaxw: i think that FullStatus probably sees the same agent state that the ensure availability function is seeing07:56
axwrogpeppe: yeah, true07:56
axwrogpeppe: https://codereview.appspot.com/8803004308:09
rogpeppeaxw: nice one! looking.08:09
axwjam: I've reverted your change from last night that eats admin login errors; this CL adds machine-0 to the admin db if it isn't there already08:10
jamaxw: any chance that we could get the port from mongo rather than passing it in?08:11
axwrogpeppe: this is just the bare minimum, will follow up with maybeInitiateMongoServer, etc.08:11
axwjam: can do, but it requires parsing and I thought it may as well get passed in since it's already known to the caller08:11
jamaxw: well we can have mongo start on port "0" and dynamically allocate, rather than our current ugly hack of allocating a port, and then closing it and hoping we don't race.08:12
axwjam: I assume you are referring to the EnsureAdminUserParams.Port field08:12
jamaxw: if it is clumsy to parse, then we can pass it in.08:12
axwoh I see what you mean08:12
axwumm. dunno. I will take a look08:13
jamwe *can* just start on port 37017, but that means other goroutines will also think that mongo is up, and for noauth stuff, we really want as little as possible to connect to it.08:13
jamaxw: I always get thrown off by "upstart.NewService" because *no we don't want to create a new upstart service*08:14
jambut that is just "create a new memory representation of an upstart service"08:14
axwjam: heh yeah, it is a misleading name08:15
jamaxw: I'm not sure why upstart specifically throws me off.08:15
jamas I certainly know the pattern.08:15
jamaxw: can "defer cmd.Process.Kill()" do bad things if the process has already died ?08:16
jamaxw: is it possible to do EnsureAdminUser as an upgrade step rather than doing it on all boots?08:16
axwjam: if the pid got reused very quickly, yes I think so08:17
jamaxw: I'm not particularly worried about PID reuse that fast08:17
axwjam: not really feasible as an upgrade step, as they require an API connection08:17
jamI'm more wondering about a panic because the PID didn't exist08:17
axwthen there's all sorts of horrible interactions with workers dying and restarting all the others, etc.08:17
axwjam: I'm pretty certain it's safe, but I'll double check08:19
wallyworldjam: hi, any update on the precise tests failures?08:19
axwjam: late Kill does not cause a panic08:22
jamwallyworld: they pass with mongo 2.4.6 from cloud-archive:tools, they fail with 2.2.4 from ppa:juju/stable08:23
jamon all machines that matter we use cloud-archive:tools08:23
jamwallyworld: so CI should be using that one08:23
wallyworldgreat, so we can look to release 1.1908:23
jamwallyworld: and axw has a patch that replaces my change anyway.08:23
jamwallyworld: the replicaset failure isn't one that I could reproduce...08:23
jamsince it is flaky08:24
wallyworldhmmm. i hate those08:24
wallyworldCI could reproduce it08:24
jamwallyworld: it is *possible* we just need to wait longer, but I hate those as well :)08:24
axwjam: this is what happens if you try to use "--port 0" in mongod: http://paste.ubuntu.com/7254007/08:24
jamaxw: bleh.... ok08:25
jamI don't think we want to use the "default mongo port of 27017" so we might as well use our own since we know we just stopped the machine08:25
jamstopped the service08:25
rogpeppeaxw: reviewed08:33
axwthanks08:34
rogpeppejam: using info.StatePort seems right to me (at least in production).08:37
jamrogpeppe: for "bring this up in single user mode so we can poke at secrets and then restart it" I'd prefer it was more hidden than that, but I can live with StatePort being good-enough.08:38
rogpeppejam: if there's someone sitting on localhost waiting for the fraction of a second during which we add the admin user, i think the user is probably not going to be happy anyway08:39
rogpeppejam: note that the vulnerability is *only* to processes running on the local machine08:40
rogpeppejam: and if there are untrusted processes running on the bootstrap machine, they're in trouble anyway08:40
jamrogpeppe: I'm actually more worried about the other goroutines in the existing process waking up, connecting, thinking to do work, and then getting shut down again.08:41
jamrogpeppe: more from a cleanliness than a "omg we broke security" perspective08:42
rogpeppejam: what goroutines would those be?08:42
jamrogpeppe: so this is more about "lets not force ourselves to think critically about everything we are doing and be extra careful that we never run something we thought we weren't". Vs "just don't expose something we don't want exposed so we can trust nothing can be connected to it."08:43
rogpeppejam: AFAIK there are only two goroutines that connect to the state - the StateWorker (which we're in, and which hasn't started anything yet) and the upgrader (which requires an API connection, which we can't have yet because the StateWorker hasn't come up yet.08:44
rogpeppejam: even if we *are* allowed to connect to the mongo, i don't think we can do anything nasty accidentally08:45
rogpeppejam: well, i suppose we could if were malicious08:46
axwrogpeppe: I tested by upgrading from 1.18.1. that's good enough right?08:47
rogpeppeaxw: i think so, yeah08:47
waiganiwa09:26
waiganiwallyworld: branch is up: https://codereview.appspot.com/87130045 :)09:26
waiganilbox didn't update the description on codereview, but did on lp??09:27
waiganianyway, bedtime for me.09:27
waiganinight all09:27
natefinchmorning all09:44
jammorning natefinch09:47
rogpeppeaxw: https://codereview.appspot.com/8808004309:48
rogpeppeaxw: a bit of a refactoring of EnsureAvailability - hope you approve09:48
wallyworldwallyworld: ok09:48
axwrogpeppe: cooking dinner, will take a look a bit later09:49
rogpeppejam, natefinch, mgz: review of above would be appreciated09:52
natefinchrogpeppe: sure09:54
rogpeppenatefinch: have you pushed your latest revision of 041-moremongo ?10:00
rogpeppenatefinch: (i want to merge it with trunk, but i don't want us to duplicate that work, as wallyworld's recent changes produce fairly nasty conflicts)10:01
wallyworldrogpeppe: if you can fix local provider, feel free to revert my work10:01
wallyworldi only landed it to get 1.19 out the door10:01
wallyworldand local provider + HA (mongo replicaets) = fail :-(10:02
rogpeppewallyworld: it seemed to work ok for me actually10:02
wallyworldnot for me or CI sadly10:02
natefinchrogpeppe: it's pushed now10:03
rogpeppewallyworld: how did it fail?10:03
wallyworldCI has been broken for days10:03
wallyworldmongo didn't start10:03
rogpeppepwd10:03
wallyworldhence machine agent didn't come up10:03
rogpeppewallyworld: what was the error from mongo?10:03
jamespagesinzui, I think I just got an ack to use 1.16.6 via SRU to support the MRE for juju-core10:04
wallyworldum, can't recall exactly, it will be in the CI logs10:04
jamespagesinzui, I'll push forwards getting it into proposed this week10:04
wallyworldmy local dir is now blown away10:04
rogpeppewallyworld: np, just interested10:04
wallyworldsorry, i should have taken beter notes10:05
wallyworldrogpeppe: i think that there wasn't much in the mongo logs from memory, tim had to enable extra logging10:05
wallyworldhe was debugging why stuff failes on precise10:05
wallyworldbut we know now that's due to 2.26 vs 2.4910:06
natefinchrogpeppe: are there tests for that EnsureAvailability code?10:11
rogpeppenatefinch: yes10:11
natefinchrogpeppe:  cool10:11
rogpeppenatefinch: the semantics are unaffected, so the tests remain the same10:11
natefinchrogpeppe:  awesome, that's what I figured.10:12
axwrogpeppe: reviewed. thanks, it's a little clearer now10:19
rogpeppeaxw: thanks a lot10:22
jamwallyworld: rogpeppe: The error I saw in CI was when Initiate went to do a replicaSet operation, it would get a Explicitly Closed message.10:29
jamNote, though, that CI has been testing with mongo 2.2.4 for quite some time.10:29
jam(and still is today, AFAIK, though I'm trying to push to get them to upgrade)10:30
rogpeppejam: interestin10:30
rogpeppeg10:30
jamrogpeppe: https://bugs.launchpad.net/juju-core/+bug/130621210:30
_mup_Bug #1306212: juju bootstrap fails with local provider <bootstrap> <ci> <local-provider> <regression> <juju-core:In Progress by jameinel> <https://launchpad.net/bugs/1306212>10:30
wallyworldyes, i do recall that was one of the errors10:30
jam2014-04-10 04:57:43 INFO juju.replicaset replicaset.go:36 Initiating replicaset with config replicaset.Config{Name:"juju", Version:1, Members:[]replicaset.Member{replicaset.Member{Id:1, Address:"10.0.3.1:37019", Arbiter:(*bool)(nil), BuildIndexes:(*bool)(nil), Hidden:(*bool)(nil), Priority:(*float64)(nil), Tags:map[string]string(nil), SlaveDelay:(*time.Duration)(nil), Votes:(*int)(nil)}}} 2014-04-10 04:58:18 ERROR juju.cmd supercommand.go:299 cannot initiat10:30
jamrogpeppe: natefinch: I wrote this patch https://code.launchpad.net/~jameinel/juju-core/log-mongo-version/+merge/215656 to help us debug that sort of thing if anyone wants to review it10:31
wallyworldalthough i'm running 2.4.9 locally and still has issues10:31
wallyworldhad10:31
jamwallyworld: interesting, as neither myself nor tim were able to reproduce it10:31
jamand I tried 2.4.9 on Trusty and 2.4.6 on Precise10:31
jamlocal bootstrap always just worked10:32
rogpeppenatefinch: i've merged trunk now - you can pull from lp:~rogpeppe/juju-core/natefinch-041-moremongo10:32
wallyworldall i know is that it didn't work, and then i disableded --replSet from the upstart script and it worked10:32
jamthough... hmmm. I did run into godeps issues once, so it is possible juju bootstrap wasn't actually the trunk I thought it was.10:32
wallyworldand that also then fixed CI10:32
natefinchjam: I think I've seen the explicitly closed bug once or twice.10:33
jamnatefinch: CI has apparently been seeing it reliably for 4+ days10:33
jamwallyworld: CI passed local-deploy in r 2628 http://ec2-54-84-137-170.compute-1.amazonaws.com:8080/job/local-deploy/10:34
jamand now even with axw's 2629 patch10:34
natefinchjam: google brings up a 2012 convo with gustavo about it where the culprit seemed to be load on mongo, but not definitively.  We should mention it to him10:35
jamnatefinch: given this is during bootstrap, there should be 0 load on mongo10:35
wallyworldjam: 2628 was my atch to make it work10:35
jamwallyworld: certainly, just mentioning CI saw it and was happy again10:36
wallyworlddon't worry, i was watching it :-)10:36
jamnatefinch: so I just checked the previous 6 attempts, and all of them failed with replica set: Closed explicitly.10:36
natefinchrogpeppe: thanks10:37
jamnatefinch: note that 2.2.4 failed other tests that using TestInitiate10:37
natefinchjam: the important part was: we should talk to Gustavo10:37
natefinch(where we probably means me :)10:37
jamwith being unable to handle admin logins.10:37
natefinchjam: interesting10:37
natefinchjam: any chance we can abandon 2.2.4?10:38
* natefinch loves dropping support for things10:38
jamnatefinch: hopefully. It shouldn't be used in the field. It is only in our ppa, which means only Quantal gets it.10:38
jamhttp://docs.mongodb.org/v2.4/reference/method/db.addUser/10:38
jamsays it was changed in 2.410:38
jamand is superseded in 2.6 by createUser10:39
jamnatefinch: and the mgo docs say we should be using UpsertUser: http://godoc.org/labix.org/v2/mgo#Database.UpsertUser10:39
mgzwe drop quantal support... tomorrow10:40
jamnatefinch: seems like mongo's security model is instable over 2.2/2.4/2.6 which doesn't bode very well for us managing compatibility10:40
mgzno, end of the week10:40
jammgz: well it wouldn't be hard to just put 2.4.6 into ppa:juju/stable10:40
jamregardless of Q10:40
natefinchjam: that seems wise10:41
jammgz: that would also "fix" CI, because they seem to install it from the PPA as well10:41
mgzwell, we'd have upgrade questions like that10:41
mgzbut yeah10:41
jamjamespage: ^^ is it possibel to get 2.4.6 into ppa:juju/stable ?10:41
natefinchrogpeppe: if you want to work on that moremongo branch, I can try to get that localhost stateinfo branch in a testable state.10:43
rogpeppenatefinch: ok10:43
rogpeppenatefinch: what more needs to be done in the moremongo branch?10:43
jamespagejam: context?10:43
jamjamespage: CI and Quantal users will install MongoDB from ppa:juju/stable, but it is currently 2.2.4 which is "really old" now.10:44
jamSo if we could just grab the one in cloud-archive:tools (2.4.6) it would make our lives more consistent.10:44
jamI believe that is the version in Saucy, and Trusty has 2.4.910:44
jamespagejam: I've pushed a no-change backport of 2.4.6 for 12.04 and12.10 into https://launchpad.net/~james-page/+archive/juju-stable-testing10:54
jamespagejust to see if it works10:54
jamespageI have a suspicion that its not a no-change backport10:54
jamjamespage: we only really need it for P for the CI guys10:54
jamsince Q is going EOL10:55
jamjamespage: we can potentially just point them at cloud-archive:tools if it is a problem11:06
jamespagejam: that might be better11:28
jamespagejam: that way they will get the best mongodb that has been released with ubuntu11:28
jamjamespage: well, we need them to be testing against the version that we'll be installing more than "just the best", but given that we install from their ourselves it seems to fit.11:30
jamThere is a question about U11:30
jamgiven that it won't be in cloud-tools11:30
jamso we may have to do another PPA trick11:30
jamnatefinch: the recent failure of rogpeppe's branch in TestAddRemoveSet is interesting. It seems to be spinning on: attempting Set got error: replSetReconfig command must be sent to the current replica set primary.11:32
jamcontext: https://code.launchpad.net/~rogpeppe/juju-core/548-destroy-environment-fix/+merge/21569711:33
jamespagejam: what about U?11:34
jamjamespage: in the version after Trusty, how do we install the "best for U". For Q we had to use the ppa:juju/stable because P was the only thing in cloud-archive:tools11:35
jamwhich then got out of date11:35
jamWe didn't have to for S because the "best" was the same thing as in cloud-archive:tools11:35
jamespagejam: the best for U will be in U11:35
jamjamespage: well, it wasn't in Q11:36
jamand when V comes out, it may no longer be the best for U, right?11:36
jamespagejam: that's probably because 18 months ago this was all foobar11:36
jamespagego juju did not exist in any meaningful way11:36
jamjamespage: sure. I can just see that 2.6 is released upstream, and we may encounter another "when do we get 2.6 in Ubuntu" where the threshold is at an incovenient point11:37
jamespagejam: you must maintain 2.4 compat as that's whats in 14.0411:37
rogpeppejam, natefinch, mgz: how about this? http://paste.ubuntu.com/7254781/11:42
mgzrogpeppe: seems reasonable11:43
mgzI prefer interpreted values to raw dumps of fields in status11:43
mgzas it's the funny mix between for-machines markup and textual output for the user11:44
natefinchrogpeppe: when does a machine get into n, n?11:44
rogpeppenatefinch: when it's deactivated by the peergrouper worker11:45
natefinchbut why would that happen?11:45
rogpeppenatefinch: ok, here's an example:11:45
rogpeppenatefinch: we have an active server (wantvote, hasvote)11:46
rogpeppenatefinch: it dies11:46
rogpeppenatefinch: we run ensure-availability11:46
rogpeppenatefinch: which sees that the machine is inactive, and marks it as !wantsvote11:46
rogpeppenatefinch: the peergrouper worker sees that the machine no longer wants the vote, and removes its vote11:47
rogpeppenatefinch: and sets its hasvote status to n11:47
rogpeppenatefinch: so our machine now has status (!wantsvote, !hasvote)11:47
rogpeppenatefinch: if we then run ensureavailability again, that machine is now a candidate for having its environ-manager status removed11:48
rogpeppenatefinch: alternatively, the machine might come back up again11:48
natefinchI see, so hasvote is actual replicaset status, and wants vote is what we want the replicaset status to be11:48
rogpeppenatefinch: yes11:48
natefinchsorry gotta run, forgot it's tuesday11:49
jammgz: so the branch up for review (which is approved) actually has the errors as a prereq11:49
mgzjam: yeah, I was sure there was something like that11:50
rogpeppenatefinch: i've dealt with a bunch more conflicts merging trunk and pushed the result: ~rogpeppe/juju-core/natefinch-041-moremongo11:54
rogpeppenatefinch: ping12:20
=== wesleymason is now known as wes_
=== wes_ is now known as wesleymason
* rogpeppe goes for lunch12:32
natefinchrogpeppe: sorry, just got back12:42
axwrogpeppe natefinch: I'll continue looking at HA upgrade - upstart rewriting and MaybeInitiateMongoServer in the machine agent. Let me know if there's anything else I should look at12:54
natefinchaxw: that seems like a good thing to do for now.  the rewriting should work as-is, once we remove the line that bypasses it12:55
axwnatefinch: it doesn't quite, because the replset needs to be initiated too12:56
axwnatefinch: and that's slightly complicated because that requires the internal addresses from the environment12:56
natefinchaxw: you should be able to get the addresses off the instance and pass it into SelectPeerAddress, and get the right one.  That's what jujud/bootstrap.go does.  Should work in the agent, too, I'd think12:58
axwnatefinch: yep, the only problem is getting the Environ. the bootstrap agent gets a complete environ config handed to it; the machine agent needs to go to state12:59
axwnatefinch: anyway, I will continue on with that. if you think of something else I can look at next, maybe just send me an email13:00
natefinchaxw: will do, and thanks13:00
axwnps13:00
rogpeppenatefinch: that's ok13:29
rogpeppenatefinch: how's localstateinfo coming along?13:30
rogpeppemgz, jam, natefinch: trivial (two line) code review anyone? fixes a sporadic test failure. https://codereview.appspot.com/8813004413:35
natefinchrogpeppe: haven't gotten far this morning.  My wife should be back any minute to take the baby off my hands, which will makes things go faster13:35
rogpeppenatefinch: k13:35
jamrogpeppe: shouldn't there be an associated test sort of change ?13:36
natefinchrogpeppe: how does that change fix the test failure?13:36
rogpeppejam: the reason for the fix is a test failure13:37
rogpeppejam: i can add another test, i guess13:37
natefinchideally a test that otherwise always fails :)13:37
jamrogpeppe: so this is that sometimes during teardown we would hit this and then not restart because it was the wrong type ?13:37
rogpeppejam: the test failure was this: http://paste.ubuntu.com/7255340/13:38
rogpeppejam: i'm actually not quite sure why it is sporadic13:39
natefinchI see, we always expect it to be errterminateagent, but we were masking that along with other failures13:40
rogpeppenatefinch: yes13:40
natefinchrogpeppe: how does the defer interact with locally scoped err variables inside if statements etc?13:41
natefinchmaybe that's the problem?  It's modifying the outside err, but we're returning a different one13:41
rogpeppenatefinch: the return value is assigned to before returning13:41
natefinchahgh right13:41
rogpeppenatefinch: from http://golang.org/ref/spec#Return_statements: "A "return" statement that specifies results sets the result parameters before any deferred functions are executed."13:42
jamrogpeppe: so it looks like you only run into it if you get ErrTerminate before init actually finishes13:44
rogpeppejam: i'm not sure why the unit isn't always dead for this test on entry to Uniter.init13:46
rogpeppejam: tbh i don't want to take up the rest of my afternoon grokking the uniter tests - i'll leave this alone until i have some time.13:46
rogpeppejam: (i agree that it indicates a lacking test in this area)13:47
jamrogpeppe: so LGTM for the change, though it does raise the question that if we wrapped errors without dropping context it might have worked as  well :)13:52
rogpeppejam: yeah, i know13:52
rogpeppejam: but i'd much prefer it if we have a wrap function that explicitly declares the errors that can pass back13:53
rogpeppejam: then we can actually see what classifiable errors the function might be returning13:53
rogpeppejam: there are 9 possible returned error branches in that function - it's much easier to modify the function if you know which of those might be relied on for specific errors13:56
natefinchrogpeppe: that would be pretty useful in a defer statement, since it would then be right next to the function definition, as well.13:56
rogpeppenatefinch: perhaps13:56
rogpeppenatefinch: tbh i'm not keen on ErrorContextf in general13:56
rogpeppenatefinch: it just adds context that the caller already knows13:57
natefinchrogpeppe: yes, I wouldn't have it change the message, just filter the types.  I don't want to have to troll through the code in a function to figure out what errors it can return13:57
rogpeppenatefinch: the doc comment should state what errors it can return13:58
rogpeppenatefinch: and i'd put a return errgo.Wrap(err) on each error return13:59
rogpeppenatefinch: (errgo.Wrap(err, errgo.Is(worker.ErrTerminateAgent) for paths where we care about the specific error)14:00
rogpeppenatefinch: i know what you mean about having the filter near the top of the function though14:01
natefinchrogpeppe: btw for want/hasvote, what about : non-member, pending-removal, pending-add, member?  I feel like inactive and active sound too ephemeral, like it could change at any minute, when in fact, it's likely to be a very stable state.   But maybe I'm over thinking it.14:17
jamnatefinch: fwiw I like your terms better14:18
rogpeppenatefinch: those terms aren't actually right, unfortunately.14:18
rogpeppenatefinch: there's no way of currently telling if a machine's mongo is a member of the replica set14:19
rogpeppenatefinch: even if a machine has WantVote=false, HasVote=false, it may still be a member14:20
rogpeppenatefinch: basically, every state server machine will be a member unless it's down14:21
rogpeppenatefinch: how about "activated" and "deactivated" instead of "active" and "inactive" ?14:22
natefinchrogpeppe: isn't the intended purpose that those with y/y that they're in the replicaset?  I guess if it doesn't reflect the replicaset, what does it reflect?14:24
rogpeppenatefinch: the intended purpose of those with y/y is that they are *voting* members of the replica set14:24
natefinchI see14:25
rogpeppenatefinch: we can have any number of non-voting members14:25
rogpeppenatefinch: (and that's important)14:25
natefinchmember-status: non-voting, pending-unvote, pending-vote, voting?    I know unvote is not a word, but pending-non-voting is too long and confusing.14:29
sinzuijamespage, I sent a reply about 1.16.4.14:31
jamespagesinzui, so the backup/restore bits are not actually in the 1.16 branch?14:32
sinzuijamespage, no backup14:33
jamespagehmm14:33
natefinchrogpeppe: check my last msg14:33
sinzuirestore aka update-bootstrap worked for customers who had the bash script14:33
sinzuijamespage, by not installing juju-update-bootstrap, I think we can show that no new code was introduced to the system14:35
rogpeppenatefinch: "not voting", "adding vote", "removing vote", "voting" ?14:36
perrito666jamespage: sinzui I assigned myself https://bugs.launchpad.net/juju-core/+bug/1305780?comments=all just fyi14:36
_mup_Bug #1305780: juju-backup command fails against trusty bootstrap node <backup-restore> <juju-core:Triaged by hduran-8> <https://launchpad.net/bugs/1305780>14:36
natefinchrogpeppe: sure, that's good14:36
rogpeppenatefinch: although i'm not entirely happy with the vote/voting difference14:37
rogpeppenatefinch: how about: "no vote", "adding vote", "removing vote", "has vote" ?14:37
sinzuijam, wallyworld: the precise unit tests are now running with mongo from ctools. I have a set of failures. They are different from before. CI is automatically retesting. I am hopeful14:37
natefinchrogpeppe: "voting, pending removal" "not voting, pending add"?  That makes it a little more clear that even though the machine is not going to have the vote in a little bit, it actually still does right now14:38
natefinch(and vice versa)14:38
rogpeppenatefinch: i think that it's reasonable to assume that if something says "removing x" that x is currently there to be removed14:39
rogpeppenatefinch: likewise for adding14:39
natefinchrogpeppe: fair enough14:39
rogpeppeha, i've just discovered that if you call any of the Patch functions in SetUpSuite, the teardown functions never get called.14:43
mgzrogpeppe: heh, yeah, another reason teardown is generally dangerous14:44
hazmatjam, thanks for the scale testing reports14:44
rogpeppemgz: i think we should change CleanUpSuite so it just works if you do a suite-level patch14:44
natefinchrogpeppe: whoa, really?  I assymed they'd do the right thing14:44
rogpeppenatefinch: uh uh14:44
natefinchrogpeppe, mgz: yes, definitely.   Totally unintuitive otherwise14:45
rogpeppei won't do it right now, but i'll raise a bug14:45
rogpeppewhere are we supposed to raise bugs for github.com/juju/testing ?14:46
rogpeppeon the github page, or on juju-core?14:46
natefinchlast I heard we were keeping bugs on launchpad14:49
natefinch(not my idea)14:51
rogpeppenatefinch: done. https://bugs.launchpad.net/juju-core/+bug/130810114:52
_mup_Bug #1308101: juju/testing: suite-level Patch never gets restored <juju-core:New> <https://launchpad.net/bugs/1308101>14:52
natefinchrogpeppe: I very well may have made that mistake myself recently.14:53
rogpeppenatefinch: that was what caused me to investigate14:53
rogpeppenatefinch: i knew it was an error, but i thought it would get torn down at the end of the first test14:53
rogpeppenatefinch: i wondered how it was working at all14:54
natefinchrogpeppe: yeah, of the two likely behaviors, never getting torn down is definitely the worse of the two14:54
natefinchrogpeppe: but also the one least likely to be obvious14:54
rogpeppenatefinch: yup14:55
alexisbjam, sinzui any news on the bootstrap issue? https://bugs.launchpad.net/juju-core/+bug/130621214:55
_mup_Bug #1306212: juju bootstrap fails with local provider <bootstrap> <ci> <local-provider> <regression> <juju-core:In Progress by jameinel> <https://launchpad.net/bugs/1306212>14:55
sinzuialexisb, thumper and wallyworld landed a hack to remove HA from local to make tests pass...14:56
sinzuialexisb, I think devs hope to fix the real bug...14:56
natefinchalexisb, sinzui:  jam looked into it some, and it may be due to an old version of mongo (2.2.x) that we don't really need to support anyway... cloud archive has 2.4.6 I believe, which may solve the problem14:56
sinzuinatefinch, already update the test14:56
alexisbsinzui, natefinch: can we add those updates to the bug?14:57
sinzuinatefinch, This is the current run with mongo from ctools: http://ec2-54-84-137-170.compute-1.amazonaws.com:8080/job/run-unit-tests-amd64-precise/633/console14:57
alexisbsinzui, any other critical bugs blocking the 19.0 release?14:58
natefinchsinzui: that looks good14:58
sinzuialexisb, I don't think the of HA removal of local as a hack. It is insane to attempt HA for a local env. I make close the bug instead of deferring it to the next release14:59
natefinchsinzui: but that's with wallyworld's hack, right?  We'll need to remove that hack at some point, and it would be good not to use an old version of mongo anyway.14:59
sinzuinatefinch, there is a previous run with failure, but different failures thatn before. CI choose to retest assuming the usual intermittent failure14:59
alexisbsinzui, understood15:00
natefinchsinzui: the devs discussed it this morning.  Having HA on local may not actually give "HA", but it can be useful to show how juju works with HA, like you can kill a container and watch juju recover, re-ensure and watch a new one spin up, etc15:00
sinzuinatefinch, wallyworld's hack was about not try to do HA setup for local.15:00
natefinchsinzui: it's basically just like the rest of local.... it's not actually *useful* for much other than demos and getting your feet wet.... but it's really useful for that.15:01
sinzuinatefinch, alexisb unit tests are all pass15:01
sinzui\0/15:01
sinzuinatefinch, alexisb azure is ill today and the azure tests failed. I am asking for a retest. the current revision will probably be blessed for release today15:02
alexisbsinzui, awesome!15:02
natefinchsinzui: what version of mongo is that running?15:02
* natefinch doesn't know what ctools means15:03
* sinzui reads the test log15:04
natefinchsinzui: ahh, I see, I missed it somehow, looks like 2.4.615:05
sinzuinatefinch,  1:2.4.6-0ubuntu5~ctools115:05
natefinchsinzui: happy15:05
jamespagesinzui, I'm not adverse to introducing a new feature - the plugins are well isolated but afaict its not complete in the codebase15:06
jamespagesinzui, if that is the case then I agree not shipping the update-bootstrap plugin does make sense15:06
jamespageotherwise afaict I have no real way of providing backup/restore to 1.16 users right?15:06
sinzuijamespage, They aren't complete since we know that a script is needed to get tar and mongodump to do the right thing15:07
jamespagesinzui, OK - I'll drop it then15:08
natefinchsinzui: is that the version of mongo we were running before wallyworld's hack?  I'd like to know if the version of mongo is the deciding factor15:11
sinzuinatefinch, for the unittests. that version of mongo is the fact15:11
sinzuinatefinch, for the local deploy, wallyworld's hack was the fix15:11
sinzuiand jam's fix for upgrades fixed all upgrdes15:12
natefinchok15:12
natefinchoh right, it was the version of mongo for upgrades that was changing how we add/remove users.15:13
sinzuinatefinch, CI has mongo from ctools though. all I did for test for the test harness was ensure that precise hosts add the same PPA as CI itself15:18
natefinchsinzui: ok, I thought someone had mentioned this morning that CI adds the juju/stable PPA for mongo, but I may have misunderstood or they may have been wrong15:21
sinzuinatefinch, If added the juju stable ppa, I ensure the ctools archive is added and then manually install it before we rung make install-dependencies15:22
natefinchsinzui: I believe you know what your tools are running better than some random juju dev :)15:23
natefinch(and my memory thereof)15:24
sinzuiAzure is very ill.15:26
sinzuiThe best I can do is manually retest hoping that I catch azure whe it is better15:26
natefinchpoor azure15:28
rogpeppenatefinch: ping15:31
natefinchrogpeppe: yo15:32
rogpeppenatefinch: i've just pushed lp:~rogpeppe/juju-core/natefinch-041-moremongo/15:32
rogpeppenatefinch: all tests pass15:32
rogpeppenatefinch: could you pull it and re-propose -wip, please?15:32
natefinchrogpeppe: sure15:32
rogpeppenatefinch: i guess i could make a new CL, but it seems nicer to use the current one15:33
natefinchrogpeppe: yeah15:33
natefinchrogpeppe: wiping15:34
natefinchrogpeppe: that should be wip-ing15:34
natefinchrogpeppe: done15:35
rogpeppenatefinch: ta15:35
rogpeppenatefinch: i've pushed again. could you pull and then do a proper propose, please?15:52
rogpeppenatefinch: and then i think we can land it15:53
natefinchrogpeppe: sure15:53
natefinchrogpeppe: one sec, running tests on the other branch15:53
jam1natefinch: sinzui: so I don't know if changing mongo would have made CI happy without wallyworld's hack. wallyworld was the only one who has reproduced the replicaset Closed failure, and he did so on trusty running 2.4.9, so it seems like it *could* still be necessary.15:56
jam1natefinch: the concern is that the code is actually not different in Local, so if it is failing there it *should* be failing elsewhere15:56
jam1and maybe we just aren't seeing it yet15:56
natefinchjam1: yep, also a good reason not to have local be different15:56
jam1natefinch: I believe my stance on local HA, it doesn't provide actual HA, it is good for demos, I like common codebase, but I'm willing to slip it if we have to.15:57
sinzuijam1 I think you are forgetting the unit test failed days before upgrade failed and days before local deploy failed.15:57
natefinchjam1: yeah.  perfect is the enemy of good15:57
mgzso, I fixed the test suite hang... but still don't understand why it actually did that.15:57
sinzuiI went to sleep with unit tests and azure failing (the latter is azure, not code)15:58
rogpeppei cannot get this darn branch to pass tests in the bot: https://code.launchpad.net/~rogpeppe/juju-core/548-destroy-environment-fix/+merge/21569715:58
rogpeppeit's failed on replicaset.TestAddRemoveSet three times in a row now, and the changes it makes cannot have anything to do with that15:58
rogpeppelet's try just one more time15:58
sinzuijam, I changed the db used in http://ec2-54-84-137-170.compute-1.amazonaws.com:8080/job/run-unit-tests-amd64-precise/ #632 I got a pass in #633. The errors were different between dbs15:59
jam1sinzui: so TestInitialize failing was the mongo version problem. The fact that adding --replicaSet to the mongo startup line caused local to always fail with Closed is surprising, but might be a race/load issue that local triggers that others don't.15:59
jam1natefinch: a thought, we know it takes a little while for replicaset to recover15:59
jam1IIRC, mongo defaults to having a 15s timeout15:59
jam1natefinch: is it possible that Load pushes local over the 15s timeout?16:00
rogpeppenatefinch: could you push your 043-localstateinfo branch please, so I can try to get the final branch ready for proposal?16:01
sinzuijam1 I changed the db for unit tests because it didn't match the db used by local/ci the one from ctool is the only one will trust now for precise16:01
jam1sinzui: right, the only place we actually use juju:ppa/stable is for Quantal and Raring16:02
natefinchjam: yes, possible. Mongo can be sporadically really slow16:02
jam1I forgot about R16:02
natefinchrogpeppe: reproposed moremongo16:02
jam1but I don't think 2.4 landed until Saucy16:02
rogpeppenatefinch: thanks16:02
rogpeppenatefinch: i'll approve it16:02
sinzuijam1, the makefile disagrees with your statement16:03
mgzR is no longer supported16:03
sinzuijam1 the make files doesn't know about ctools16:03
jam1sinzui: so what I mean is, when you go "juju bootstrap" and Q and R is the target, we add the juju ppa16:03
jam1sinzui: ctools doesn't have Q and R builds16:03
jam1only P16:03
natefinchrogpeppe:  pushed16:03
rogpeppenatefinch: ta16:03
jam1sinzui: but as mgz points out, Q is almost dead, and R is dead, so we can punt16:04
rogpeppenatefinch: how's it going, BTW?16:04
sinzuijam1, good, because I don't test with r (obsolete) and q (obsoelete in 3 days)16:04
jam1sinzui: otherwise the better fix is to get 2.4.6 into the ppa16:04
sinzuijam1, +116:04
sinzuijam1, I was not aware the versions were different until this morning16:04
jam1sinzui: I wasn't that aware either16:05
jam1sinzui: I just saw the failing and davecheney pointed out CI was using an "old" version, which I tracked down16:05
sinzuiI saw it too, but I haven't gotten enough sleep to see how the old version was selected16:06
jam1sinzui: it is nice to see so much blue on http://ec2-54-84-137-170.compute-1.amazonaws.com:8080/16:06
sinzuijam1. I see CI has started next revision will I was trying to get lucky with azure.16:07
natefinchrogpeppe: I think most tests pass on 043-localstateinfo now.  saw some failures from state, but they looked sporadic, haven't checked them out yet16:07
rogpeppenatefinch: cool. "most" ?16:08
mgzrogpeppe: can I request a re-look at https://codereview.appspot.com/8754004316:08
rogpeppemgz: looking16:08
mgzrogpeppe: I don't like that that fixed the hang... pretty sure it means the test is relying on actually dialing 0.1.2.3 and that failing16:09
natefinchrogpeppe: I didn't let the worker tests finish because I was impatient and they were taking forever, so possible there are failures there too16:09
natefinchrogpeppe:  state just passed for me16:09
rogpeppenatefinch: cool16:09
sinzuijam1, everyone. This is a first,  lp:juju-core r2630 passed 7 minutes after CI cursed the rev because I was forcing the retest of azure.16:10
rogpeppemgz: the code in AddStateServerMachine is kinda kooky16:11
sinzuijam1. I will start preparation for the release while the current rev is tested. I will use the new rev if it gets a natural blessing16:11
mgzAddStateServerMachine should probably just be removed16:11
rogpeppemgz: probably.16:11
mgzit's not a very useful or used helper16:11
mgzand its doc comment is wonky16:12
mgzI probably shouldn't have touched it, as stuff passes without poking there16:12
mgzbut it came up in my grep16:12
natefinchrogpeppe:  worker tests pass except for a peergrouper test - workerJujuConnSuite.TestStartStop got cannot get replica set status: cannot get replica set status: not running with --replSet16:13
rogpeppeit should probably be changed to SetStateServerAddresses16:13
rogpeppemgz: as that's the reason it was originally put there16:13
mgzright, something like that16:13
rogpeppemgz: but let's just land it and think about that stuff later16:14
mgzI'll try it on the bot16:14
rogpeppemgz: bot is chewing on it...16:15
natefinchrogpeppe: peergrouper failure was sporadic too, somehow.  All tests pass on that branch.16:19
sinzuialexisb, I will release in a few hours. CI is testing a new rev that I expect to pass without intervention. I can release the previous rev which pass with extra retests16:25
alexisbsinzui, awesome, thank you very much!16:25
mattywfolks, has anyone seen this error before when trying to deploy a local charm? juju resumer.go:68 worker/resumer: cannot resume transactions: not okForStorage16:26
rogpeppenatefinch: any chance we might get it landed today?16:26
rogpeppenatefinch: i'm needing to stop earlier today16:27
alexisbgo rogpeppe go! :)16:27
rogpeppealexisb: :-)16:27
* rogpeppe has spent at least 50% of today dealing with merge conflicts16:28
natefinchrogpeppe: yes16:31
rogpeppenatefinch: cool16:33
rogpeppenatefinch: BTW the bot is running 041-moremongo tests right now... fingers crossed16:38
rogpeppenatefinch: i've just realised that it would be quite a bit nicer to have a func stateInfoFromServingInfo(info params.StateServingInfo) *state.Info, and just delete agent.Config.StateInfo16:45
natefinchrogpeppe: yeah16:45
rogpeppenatefinch: n'er mind, we'll plough on, i think.16:46
natefinchrogpeppe: that was my thinking :)16:46
natefinchawww, I think my old company finally cancelled my MSDN subscription16:47
rogpeppenatefinch: i'm needing to stop in 20 mins or so. any chance of that branch being proposed before then?16:56
natefinchrogpeppe: https://codereview.appspot.com/8820004316:56
rogpeppenatefinch: marvellous :-)16:56
natefinch:)16:57
rogpeppenatefinch: reviewed17:05
natefinchrogpeppe: btw, I had to rename params to attrParams because params is a package that needed to get used in the same function17:06
rogpeppenatefinch: i know17:06
natefinchrogpeppe: oh, I misunderstood the comment, ok17:07
rogpeppenatefinch: i just suggested standard capitalisation17:07
natefinchrogpeppe: yep, cool17:07
natefinchrogpeppe: why is test set password not correct anymore?  It still does that, I think?17:09
rogpeppenatefinch: oh, i probably missed it17:09
rogpeppenatefinch: you're right, i did17:11
natefinchrogpeppe: cool17:11
rogpeppenatefinch: 41-moremongo is merged...17:26
natefinchrogpeppe:awesome17:27
natefinch43-localstateinfo should be being merged now17:28
natefinchrogpeppe: what's left?17:28
rogpeppenatefinch: it's actually retrying mgz's apiaddresses_use_hostport17:28
rogpeppenatefinch: i'm trying to get tests passing on my final integration branch17:29
natefinchrogpeppe: nice17:29
rogpeppenatefinch: currently failing because of the StateInfo changes (somehow we have a StateServingInfo with a 0 StatePort)17:29
rogpeppenatefinch: would you be able to take it over from me for the rest of the day17:30
rogpeppe?17:30
natefinchrogpeppe: yeah definitely17:30
rogpeppenatefinch: it needs a test that peergrouper is called (i'm already mocking out peergrouper.New)17:30
natefinchrogpeppe: what's the branch name?17:31
rogpeppenatefinch: i haven't pushed it yet, one mo17:32
rogpeppenatefinch: bzr push --remember lp:~rogpeppe/juju-core/540-enable-HA17:33
rogpeppenatefinch: there are some debugging relics in there that need to be removed too17:34
rogpeppenatefinch: in particular, revno 2355 (cmd/jujud: print voting and jobs status of machines) needs to be reverted and proposed separately as discussed in the standup17:34
natefinchrogpeppe: ok17:35
=== vladk is now known as vladk|offline
rogpeppenatefinch: i'd prioritise the other branches though17:39
rogpeppenatefinch: i have to go now17:39
rogpeppeg'night all17:39
natefinchrogpeppe: g'night17:39
sinzuinatefinch, Do you have a moment to review https://codereview.appspot.com/8817004518:05
natefinchsinzui: done18:06
sinzuithank you natefinch18:06
BradCrittendensinzui: would you have a moment for a google hangout?18:34
=== BradCrittenden is now known as bac
sinzuibac: yes18:35
bacsinzui: cool.  let me set one up and invite you in a couple of minutes18:35
bacsinzui: https://plus.google.com/hangouts/_/canonical.com/daily-standup18:39
=== Ursinha is now known as Ursinha-afk
=== Ursinha-afk is now known as Ursinha
davecheneygood morning worker ants23:09
perrito666davecheney: my window says otherwise23:16
davecheneyperrito666: one of us is wrong23:21
davecheneyi'll roshambo you for it23:21
* perrito666 turns a very strong light on outside and says good morning to davecheney 23:21
davecheneyperrito666: it helps with the jet lagg23:22
perrito666davecheney: I traveled under 20km today, I dont have that much jetlag :p23:22

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!