[00:00] <bradm> we've got a charm that seems to be stuck on installing, but the juju logs say the start hook ran, any ideas on how we can dig into it?
[00:00] <rick_h_> thumper: wallyworld ^ this is related to current IS issues
[00:00] <wallyworld> ok, thanks for update
[00:01] <menn0> perrito666: that does indeed seem to be a serious problem
[00:02] <menn0> perrito666: file a bug and point thumper at it. one of us will fix it soon.
[00:02]  * thumper keeps his head down
[00:02] <perrito666> thumper: candy?
[00:02]  * thumper mutters under his breath something about deadlines and too much work
[00:03]  * thumper ignores the candy
[00:03] <menn0> perrito666: we'll have to add an upgrade step to fix existing records
[00:07] <menn0> bradm: if you haven't already can you look at the logs on the unit's machine itself? (/var/log/juju/unit-FOO.log)
[00:09] <bradm> menn0: all good now, it was just taking a long time to realise it was up
[00:10] <menn0> bradm: ok, good to hear
[00:10] <bradm> well, all good might be a stretch, but its all onto ceph now
[00:10] <bradm> its making good progress
[00:14] <menn0> perrito666: I can certainly see the statuses env-uuid problem in a local env here
[00:14] <perrito666> menn0: adding the bug with detail
[00:14] <rick_h_> NOTICE: jujucharms.com and the charmstore are back up. The storage in IS is working to rebalance/sync and might time out or be slow for a bit longer.
[00:14] <rick_h_> menn0: ^
[00:15] <menn0> rick_h_: sweet
[00:15] <menn0> perrito666: I wonder how status lookups are even workings at all
[00:17] <perrito666> menn0: me too
[00:17] <perrito666> menn0: thumper https://bugs.launchpad.net/juju-core/+bug/1474606
[00:17] <mup> Bug #1474606: entities status is loosing env-uuid upon setting status. <juju-core:New> <https://launchpad.net/bugs/1474606>
[00:19] <perrito666> menn0: enjoy
[00:19] <rick_h_> hah, he says as he takes his candy back
[00:20] <perrito666> rick_h_: well I left a very nice report in exchange
[00:20] <perrito666> menn0: this happens for services, units, agents, machines and every other thing that has a status
[00:22] <menn0> perrito666: I see you've already got a fix for it too
[00:22] <menn0> perrito666: although I might try and fix this in the multi-env txn layer
[00:22] <menn0> too
[00:23] <perrito666> menn0: I do, but I was not sure if I cover all aspects of this issue
[00:23] <perrito666> I just fixed my patch of land
[00:24] <menn0> perrito666: understood
[00:26] <davecheney> what the heck is this test doing ?
[00:26] <davecheney> FAIL: kvm-broker_test.go:241: kvmBrokerSuite.TestStartInstancePopulatesNetworkInfo
[00:26] <davecheney> [LOG] 0:00.001 DEBUG juju.testing setting feature flags: address-allocation
[00:26] <davecheney> kvm-broker_test.go:251: instanceConfig := s.instanceConfig(c, "42")
[00:26] <davecheney> /home/ubuntu/src/github.com/juju/juju/container/testing/common.go:90: c.Assert(err, jc.ErrorIsNil)
[00:26] <davecheney> ... value *os.PathError = &os.PathError{Op:"mkdir", Path:"/var/lib/lxc", Err:0xd} ("mkdir /var/lib/lxc: permission denied")
[00:26] <davecheney> of course this is going to fail
[00:26] <davecheney> mortals don't have permissino to write to that dir
[00:33] <menn0> wallyworld: just confirmed... rsyslog is screwed both before and after the upgrade
[00:34] <menn0> :(
[00:34] <menn0> writing up a ticket now
[00:34] <wallyworld> oh dear
[00:35] <menn0> in different ways though so you know, at least that's interesting
[00:35] <wallyworld> even better
[00:35] <menn0> I think the issue in 1.24.0 has already been fixed in a later 1.24
[00:36] <wallyworld> the rsyslog issue? so do we need to ammend an upgrade step?
[00:37] <menn0> not sure yet
[00:37] <menn0> before the upgrade rsyslogd is continually being restarted by juju
[00:37] <menn0> every 30s or so
[00:37] <wallyworld> sounds like a bug we may have fixed yeah
[00:37] <menn0> the only thing in juju's logs is the rsyslog worker saying "reloading rsyslog config"
[00:38] <menn0> after the upgrade that stops
[00:38] <menn0> but most/all of the units can't connect
[00:38] <menn0> due to cert verification
[00:39] <axw> thumper: "I did say that the way to fix this properly was to not use the state method to load the charms."  -- what else would you use to enumerate charms, if not state?
[00:39] <menn0> which seems like the thing that's been attempted to be fixed several times now
[00:39] <menn0> anyway, i'll write up the ticket
[00:39] <menn0> wallyworld: ^^
[00:39] <mup> Bug #1474606 opened: entities status is losing env-uuid upon setting status. <juju-core:New for menno.smits> <juju-core 1.24:New for menno.smits> <https://launchpad.net/bugs/1474606>
[00:39] <mup> Bug #1474607 opened: worker/uniter/relation: HookQueueSuite.TestAliveHookQueue failure <juju-core:New> <https://launchpad.net/bugs/1474607>
[00:39] <thumper> axw: making a request of the collection and have it return the raw data as dicts
[00:40] <thumper> axw: that way you aren't expecting a particular structure
[00:40] <thumper> axw: we have had to write many upgrade steps in this way
[00:40] <thumper> and I believe it is better
[00:40] <thumper> because you just ask for what you need, and change what you must
[00:40] <thumper> and don't worry about the structure of the doc as much
[00:41] <wallyworld> menn0: sorry was distracted by a review. i recall vaguely the cer issue was fixed. rsyslog i think uses a different cert to state server connections
[00:42] <menn0> wallyworld: do you know when it was fixed given that I'm seeing this with an upgrade to 1.24.2?
[00:42] <wallyworld> menn0: not offhand - ericsnow might know
[00:45] <wallyworld> thumper: i am going to fix it like you wanted in 1.22 - do we really need to do the work in 1.24 if the step runs after env uuid has been added?
[00:45] <wallyworld> if yes, i can fix
[00:47] <axw> wallyworld: is there a test we can add that would have highlighted the issue?
[00:47] <axw> wallyworld: and that would highlight future issues
[00:48] <wallyworld> axw: for this case yes - a CI test that adds charms to the 1.20 env prior to upgrade and then checks that they are imported after
[00:49] <wallyworld> that's on my todo list to follow up on
[00:50] <axw> wallyworld: presumably we could write a unit test that adds a charm entry to state, and wipes out its env-uuid field to exercise the bug? but the CI test would be better, since it'll catch future bugs too
[00:50] <wallyworld> yeah, that was my thinking
[00:52] <wallyworld> menn0: and for further joy, bug 1469077 is back again, i'll need to point william to it
[00:52] <mup> Bug #1469077: Leadership claims, document larger than capped size <landscape> <leadership> <juju-core:Triaged> <juju-core 1.24:Confirmed> <https://launchpad.net/bugs/1469077>
[00:52] <menn0> wallyworld: yeah I saw that
[00:52] <menn0> wallyworld: so unawesome
[00:53] <wallyworld> i know right :-(
[00:59] <menn0> here's the rsyslog ticket: bug 1474614
[00:59] <mup> Bug #1474614: rsyslog connections fail with certificate verification errors after upgrade to 1.24.2 <regression> <juju-core:New> <juju-core 1.24:New> <https://launchpad.net/bugs/1474614>
[01:00] <menn0> wallyworld: ^^
[01:00] <wallyworld> looking
[01:01] <wallyworld> thanks for filing, a nice bug report
[01:03] <wallyworld> menn0: axw will pick up that rsyslog bug
[01:05] <wallyworld> axw: goose pr lgtm
[01:06] <axw> wallyworld: thanks
[01:09] <mup> Bug #1474291 opened: juju called unexpected config-change hooks after read tcp 127.0.0.1:37017: i/o timeout <hooks> <openstack> <sts> <uosci> <juju-core:New> <ceilometer (Juju Charms Collection):New> <https://launchpad.net/bugs/1474291>
[01:09] <mup> Bug #1474614 opened: rsyslog connections fail with certificate verification errors after upgrade to 1.24.2 <regression> <juju-core:New> <juju-core 1.24:New> <https://launchpad.net/bugs/1474614>
[01:19] <wallyworld> axw: tagging pr reviewed, but there is a question
[01:20] <axw> wallyworld: ok, thanks, will look in a sec
[01:50] <thumper> wallyworld: if we don't, it is just a time bomb for the next time things change
[01:50] <thumper> it is makeing a problem for future us
[01:51] <wallyworld> thumper: looking at the code - there's *lots* of current upgrade steps that use the docs directly, not maps. the only ones that use maps are the ones to inser env uuid
[01:53] <thumper> well... we are just making problems for ourselves IMO
[01:54] <wallyworld> i thumper your team wrote a lot of them
[01:55] <thumper> you are proably not wrong
[01:55] <wallyworld> i guess the difference is that the doc are used with the raw collection
[01:55] <thumper> I'm telling you the result of accumulated wisdom
[01:55] <wallyworld> so rawCollection.Find(&someoc)
[01:56] <wallyworld> one would hope that CI tests would evolve to better catch upgrade issues
[03:46] <lazyPower> thumper: sorry was deep in a support scenario.
[03:47] <lazyPower> thumper: sounds good mate, submit it earlier is way better than later, as the queue takes a couple days to sift down to newly submitted stuff
[03:47] <thumper> lazyPower: no worries
[03:47] <thumper> I'm not going to block my deployment on it getting reviewed :)
[03:47] <lazyPower> we're averaging ~ 5 days on initial touch, still trying to get that number down, but its way better than the 13 days in history.
[03:47] <lazyPower> as you shouldn't be :)
[03:47] <lazyPower> namespaces!!!!
[03:47]  * lazyPower toots the namespace horn
[03:48] <lazyPower> cheers
[03:50] <thumper> namespaces?
[04:05] <menn0> thumper, waigani, wallyworld: see email for findings related to bug 1474195
[04:05] <mup> Bug #1474195: juju 1.24 memory leakage <cpec> <deployer> <performance> <regression> <juju-core:Triaged> <juju-core 1.24:In Progress by menno.smits> <https://launchpad.net/bugs/1474195>
[04:05] <menn0> thumper, waigani, wallyworld: looks like will's theory was right
[04:16] <waigani> menn0: red box of death?
[04:17] <menn0> waigani: yep... see the note on the field
[04:17] <waigani> ah yep, just saw note
[04:17] <menn0> :)
[04:18] <waigani> nice work
[04:18] <menn0> waigani: when you added the auto env life assertion to the txn layer did you remove the ones that already existed elsewhere, or did they not exist anywhere before JES?
[04:18] <menn0> I guess it wasn't really necessary when there was just one env
[04:19] <waigani> menn0: yeah, it's going back a bit now, but I don't remember there being any - which as you point out makes sense.
[04:19] <menn0> cool
[04:20] <menn0> waigani: i'm trying to figure out the right places to check
[04:20] <menn0> adding a machine certainly
[04:20] <waigani> menn0: you mean where we really need to assert for a live environ?
[04:20] <menn0> waigani: yep
[04:21] <waigani> menn0: as a starting point, didn't will say whenever we add a service, unit, relation or machine?
[04:22] <mup> Bug #1454468 changed: nodes deployed successfully by maas but juju status remains pending with juju 1.23.2 and services stuck in allocating <deploy> <oil> <juju-core:Expired> <https://launchpad.net/bugs/1454468>
[04:25] <menn0> waigani: I wonder if we can reduce that set to just service and machine
[04:25] <waigani> so what's the worst case? we add a unit/relation to a dying environment...
[04:26] <menn0> waigani: I'm wondering if we can just add an environment cleanup to kill them
[04:26] <menn0> waigani: in fact, the current cleanupServicesForDyingEnvironment might already do it
[04:29]  * thumper is going to lie down
[04:29]  * thumper is not 100%
[04:29] <waigani> menn0: so that sets the existing services to dying, but expects that no new services can be added to a dying environment
[04:30] <menn0> waigani: standup hangout? (as per PM)
[04:30] <waigani> menn0: yep
[05:07] <wallyworld> menn0: should be any time we allocate something that costs
[05:07] <wallyworld> eg machine, storage etc
[05:09] <menn0> wallyworld: yep, that's what i'm looking at now... anything that results in a physical change certainly needs the env life assert
[05:09] <menn0> (as physical as a virtual machine is anyway)
[05:09] <wallyworld> well, physical change that costs $$$
[05:10] <wallyworld> don're really care about containers
[05:10] <menn0> wallyworld: i've got a pretty clear picture now of what I want to do. i'm going to catch will later on tonight to confirm
[05:10] <wallyworld> but machines yes
[05:10] <wallyworld> ok
[05:10] <menn0> wallyworld: that's a good point, maybe we don't check for containers
[05:11] <wallyworld> menn0: yeah, so for stuff that doesn't cost, we just have a cleanup job after env is killed
[05:11] <menn0> wallyworld: yep, and we already have most of that it turns out
[05:11] <wallyworld> i can't see why we'd check more than is necessary
[05:11] <wallyworld> and before JES, we didn't check
[05:12] <wallyworld> so TBH i'm not sure why we started checking with JES
[05:13] <wallyworld> i guess JES has greater chance of concurrent access
[05:13] <menn0> yep and b/c before when you issued destroy-environment everything died at that point including the API server
[05:14] <menn0> so you had very little opportunity to add a new machine or whatever once the env was dying
[05:14] <wallyworld> waigani: so is that +2 a ship it? btw - that func needs to be exported because it is in state package
[05:14] <wallyworld> and called by upgrades package
[05:14] <menn0> now for hosted envs the API server stays up so there's a much great chance of env changing operations as the env is dying
[05:14] <wallyworld> fair point
[05:15] <menn0> anyway, stopping now since i'm going to be back on later
[07:46] <dimitern> dooferlad, morning
[07:46] <dooferlad> dimitern: hi
[07:46] <dimitern> dooferlad, I thought we dealt with the kvm-inaccessable-after-reboot issue in 1.24 as well ? see bug 1474508
[07:46] <mup> Bug #1474508: Rebooting the virtual machines breaks Juju networking <juju-core:New> <https://launchpad.net/bugs/1474508>
[07:47] <dooferlad> dimitern: I thought so too.
[07:48] <dimitern> dooferlad, maybe the fix is in master only?
[07:48] <dooferlad> dimitern: will need to take a look and see if I missed landing it
[07:48] <dimitern> dooferlad, cheers
[07:53] <dooferlad> dimitern: darn, wasn't backported.
[07:53] <dooferlad> dimitern: will be trivial to do.
[07:56] <waigani> wallyworld: sorry, just saw your message - this for moving charm tests to state? yes, +2 shipit.
[07:56] <wallyworld> ta
[07:59] <waigani> wallyworld: ah, I thought I clicked shipit - done now. The pattern of needing exported state funcs for upgrade steps is something fwereade is keen to change - possibly just exporting one upgrade step from state which then calls the other unexported steps. But for now we just need to try to make it clear that while the func is exported, no-one except the upgrades package should be using it.
[08:00] <wallyworld> waigani: np. and this was for a 1.22 release, so old code
[08:02] <waigani> wallyworld: yeah true
[08:06] <dimitern> dooferlad, awesome! will you do the dance then please? - card, bug, etc.
[08:06] <dimitern> :)
[08:06] <dooferlad> sure
[08:06] <dimitern> ta!
[08:15] <menn0> fwereade: ping?
[08:45] <dooferlad> dimitern: http://reviews.vapour.ws/r/2163/ for a quick +2
[08:50] <dimitern> dooferlad, ship it! :)
[08:59] <jam1> fwereade: dimitern: food just arrived so I'm going to miss the standup. But I'm working on breaking down the Uncommitted state stuff into development items (I'd like to chat directly with fwereade later if you have time before our cycle review)
[09:00] <dooferlad> jam, fwereade, TheMue, dimitern: stand up!
[09:01] <dimitern> dooferlad, omw
[09:01] <TheMue> omw
[09:01] <dimitern> jam, thanks for the heads up
[10:24] <rogpeppe> can anyone tell me something about plans relevant to the EnvironmentsCacheFile feature?
[10:26] <rogpeppe> jam, fwereade: ^
[10:28] <jam> rogpeppe: I don't particularly know it by that name, but it looks like something thumper would have been doing to support multiple environments
[10:28] <mup> Bug #1474788 opened: ec2: provisioning machines sometimes fails with "tagging instance: The instance ID <ID> does not exist" <ec2-provider> <juju-core:Triaged> <juju-core 1.24:Triaged> <https://launchpad.net/bugs/1474788>
[10:29] <jam> just by reading its description from https://github.com/juju/juju/blob/master/feature/flags.go#L29
[10:29] <rogpeppe> jam: i'm just wondering what our future plans are. is the plan to do away with .jenv files entirely?
[10:30] <jam> rogpeppe: thats how I read the description in there. I haven't heard of that before, nor had read that particular detail in the JES stuff. But it does read that way.
[10:30] <rogpeppe> jam: surely we have some roadmap plans somewhere?
[10:30] <rogpeppe> cherylj: do you know about this, by any chance?
[10:31] <jam> rogpeppe: so I've got docs for JES CLI, JES Logging, MESS Work Items and one more. The last two are "Historical" and it might be described in there, but they are roughly before I started tracking all the proposals directly.
[10:37] <dooferlad> dimitern: this is what I have for the spaces API stuff: https://github.com/juju/juju/compare/net-cli...dooferlad:net-cli-apiserver-spaces?expand=1
[10:39] <dooferlad> dimitern: would be good to have a chat about if that is shaping up in the way you imagined. I am not sure I like having the stub network stuff in apiserver/testing. I think having its own package is nicer.
[10:39] <dooferlad> dimitern: what do you think?
[10:40] <dimitern> dooferlad, looking
[10:43] <dimitern> dooferlad, I like the refactoring around moving the shared stubs in apiserver/testing
[10:43] <dimitern> dooferlad, haven't looked at every line, but so far it looks solid
[10:44] <dimitern> dooferlad, please, s/ast/apiservertesting/ (or whichever alias for that path is more common)
[10:44] <dooferlad> do you have an opinion about if fake_spaces_subnets.go shoud be in its own package so we can just import and use it rather than having to call InitStubNetwork?
[10:45] <dimitern> dooferlad, also InitStubNetwork() could be defined as a method on a fixture struct, which can be embedded into the suites that need it and call it in SetUpSuite, rather than init()
[10:46] <dimitern> dooferlad, have a look at LiveTests (or was it Tests ?) for example
[10:47] <dooferlad> dimitern: sure, github.com/juju/juju/environs/jujutest/livetests.go right?
[10:47] <dimitern> dooferlad, I have a lingering feeling the shared stubs are not goroutine safe (when used outside apiserver/testing) - make sure you run with -race
[10:47] <dimitern> dooferlad, that's the one yeah
[10:49] <dooferlad> dimitern: great, thanks for the pointers.
[11:20] <wallyworld> fwereade: bug 1469077 has come up again on 1.24.2, so i removed the incomplete status
[11:20] <mup> Bug #1469077: Leadership claims, document larger than capped size <landscape> <leadership> <juju-core:Triaged> <juju-core 1.24:Triaged> <https://launchpad.net/bugs/1469077>
[11:23] <fwereade> wallyworld, grar. axw, do you have context on this? ^^
[11:23] <axw> fwereade: nope
[11:24] <wallyworld> fwereade: i have no context on the cause or fix sadly, but i see some info has been attached to the bug
[11:25] <jam> wallyworld: fwereade: I could see a case where contending on the txn-queue field and having it grow large enough that we can't handle all the txns listed before a new one comes in
[11:26] <jam> and then it grows every 30s until there are so many entries that it is larger than we're allowed to make a document (or in this case larger than the size of a capped collection?)
[11:26] <wallyworld> sounds plausible
[11:26] <fwereade> jam, yeah -- I just thought that *someone* had addressed the writes that caused that
[11:26] <fwereade> jam, I just forget who
[11:26] <fwereade> jam, perhaps I hallucinated it
[11:26] <jam> fwereade: we handled that for addresses by fixing the addresser
[11:26] <wallyworld> ah mr *someone* :-)
[11:26] <jam> I don't know of a fix for the leadership stuff
[11:27] <fwereade> jam, ok, bugger, I thought that was part of the stuff axw had done but evidently not
[11:27] <jam> m-enn-o had done some work to clean out transactions that are thought of as already applied (to handle our other assertion-only TXNs don't get cleaned out)
[11:27] <jam> but I would think that's a different issue.
[11:27] <fwereade> jam, yeah
[11:27] <fwereade> jam, ok, let's chalk it up to a hallucination then ;p
[11:28] <fwereade> jam, oh wait
[11:28] <fwereade> jam, I thought it was the remove/insert behaviour that led to growing txn queues, and mr.someone had made a fix to the lease persistor that stopped it doing that?
[11:29] <fwereade> wallyworld:
[11:29] <fwereade> 	// TODO(wallyworld) - this logic is a stop-gap until a proper refactoring is done
[11:29] <fwereade> 	// We'll be especially paranoid here - to avoid potentially overwriting lease info
[11:29] <fwereade> 	// from another client, if the txn fails to apply, we'll abort instead of retrying.
[11:30] <fwereade> wallyworld, originally it was remove/insert every time, which was causing unbounded queue growth
[11:30] <wallyworld> hmmm, le tm elook up that code
[11:30] <fwereade> wallyworld, state/lease.go
[11:31] <fwereade> wallyworld, maybe it wasn't backported..?
[11:31] <wallyworld> i don't recall that todo all at, yet it has my nname on it :-)
[11:32] <fwereade> haha
[11:32] <fwereade> I know the feeling
[11:32] <wallyworld> fwereade: i just checked the code, i.24 is the same
[11:35] <jam> wallyworld: I do remember fwereade reviewing a patch you submitted so that we changed how leases are requested so that it wouldn't be a "delete current one, create a new one" sort of operation.d
[11:35] <jam> fwereade: as far as that goes, *if* we ever get to a point where we have an invalid TXN in the queue (one that we cannot clear)
[11:36] <jam> then we'll overflow the txn-queue eventually
[11:36] <jam> because creatiion of a *new* txn adds a value to the field
[11:36] <jam> and then when we go to evaluate the txn, we see the bad txn and die, and now we have yet-another txn in the queue
[11:37] <jam> so the "document too big" could just be a symptom of "invalid TXN in queue"
[11:44] <wallyworld> jam: i vaguely recall that too, i'll have to go digging
[12:01] <jam> fwereade: iteration planning meeting?
[12:04] <fwereade> jam, there
[12:04] <jam> hm. I don't see you in the one I'm in
[12:08] <wwitzel3> katco: ping
[12:11] <mup> Bug #1474508 changed: Rebooting the virtual machines breaks Juju networking <juju-core:Fix Released by dooferlad> <juju-core 1.24:In Progress by dooferlad> <https://launchpad.net/bugs/1474508>
[12:17] <perrito666> morning all
[12:40] <wwitzel3> perrito666: o/
[13:07] <wwitzel3> ericsnow: ping
[13:24] <wwitzel3> so cold and alone
[13:24] <natefinch> wwitzel3: lol
[13:24] <wwitzel3> natefinch: these things tend to happen when working on rsyslog stuff ;)
[13:28] <natefinch> wwitzel3: ahh yeah, totally
[13:33] <dooferlad> dimitern, TheMue: please be opinionated at http://reviews.vapour.ws/r/2169/
[13:33] <TheMue> dooferlad: ok
[13:34] <TheMue> dooferlad: too many files, cannot be good *lol*
[13:57] <dimitern> dimitern, kiijubg\
[13:57] <dimitern> wtf?!
[13:57] <dimitern> dooferlad, looking :)
[14:14] <mup> Bug #1468815 opened: Upgrade fails moving syslog config files "invalid argument" <ci> <regression> <upgrade-juju> <juju-core:Triaged> <juju-core 1.24:Fix Released by ericsnowcurrently> <https://launchpad.net/bugs/1468815>
[14:32] <cherylj> rogpeppe: I didn't do the work to enable the cache file, but I might be able to answer specific questions you may have about it.
[14:33] <dimitern> dooferlad, reviewed
[14:44] <dooferlad> dimitern: thanks - exactly what I needed.
[14:44] <dimitern> dooferlad, cool :)
[15:02] <mup> Bug #1474885 opened: juju deploy fails with ERROR EOF <juju-core:New> <https://launchpad.net/bugs/1474885>
[15:02] <mup> Bug #1474892 opened: User friendly error message for system destroy could be improved <juju-core:New for cherylj> <https://launchpad.net/bugs/1474892>
[15:52]  * fwereade is stopping, has a review up: http://reviews.vapour.ws/r/2172/
[16:02] <katco> ericsnow: meeting
[16:24] <alexisb> katco, would you mind filling gsamfira in our the details of how we use feature branches?
[16:24] <alexisb> I pointed him to the wiki but he has some questioned I am not qualified to answer
[16:25] <katco> alexisb: sure thing
[16:26] <katco> gsamfira: lmk what questions you have
[16:49] <mattyw> fwereade, I'll be proposing small uniter changes later on, don't need reviewing yet but I'll ping you about them tomorrow, wanted to let you know before then just to let you know that part of it might be controversial but I think I have a good justification
[16:51] <davecheney> kvm-broker_test.go:201: kvm0 := s.startInstance(c, "1/kvm/0")
[16:51] <davecheney> /home/ubuntu/src/github.com/juju/juju/container/testing/common.go:90: c.Assert(err, jc.ErrorIsNil)
[16:51] <davecheney> ... value *os.PathError = &os.PathError{Op:"mkdir", Path:"/var/lib/lxc", Err:0xd} ("mkdir /var/lib/lxc: permission denied")
[16:52] <davecheney> I am seeing this error constantly on a fresh ubuntu machine
[16:52] <davecheney> it seems pretty fatal
[16:52] <davecheney> has anyone else ever seen this
[16:52] <davecheney> i'm sure it's because lxc pacakges are not installed, so /var/lib/lxc is not present
[16:52] <davecheney> but this seems like a pretty serious isolation failure
[16:59] <natefinch> davecheney: I haven't seen it, but I have lxc installed
[17:00] <davecheney> this is on a fresh install
[17:00] <davecheney> tests fail because this directory is
[17:00] <davecheney> 1. not present
[17:00] <davecheney> 2. will not be present, bucause /var/lib is owned by root
[17:01] <natefinch> davecheney: certainly, it's an isolation problem.  I wonder if there aren't a lot more similar problems in those tests, if lxc is not installed.
[17:02] <davecheney> i'm too scared to look
[17:02] <davecheney> also, how is that test supposed to pass on windows ?
[17:03]  * davecheney logs a bug and moves on
[17:03] <natefinch> davecheney: I presume all the tests are marked as skipped on windows
[17:11] <davecheney> do we have voting windows CI tests ?
[17:11] <natefinch> ericsnow, wwitzel3: review me? http://reviews.vapour.ws/r/2174/
[17:12] <natefinch> sinzui: what davecheney said ^
[17:12] <davecheney> https://bugs.launchpad.net/juju-core/+bug/1474946
[17:12] <mup> Bug #1474946: worker/provisioner: tests are poorly isolated <juju-core:New> <https://launchpad.net/bugs/1474946>
[17:12] <natefinch> davecheney: I know they run and passed at one time, but I don't know if they're voting or not.  I believe so, since I have gotten windows bugs
[17:12] <natefinch> from CI failures
[17:14] <sinzui> natefinch windows tests do vote. they have passed in but not in a week. I am told many test are skipped
[17:14] <natefinch> wwitzel3: btw, I already foward ported the first bug in your bug task: https://bugs.launchpad.net/juju-core/+bug/1370896
[17:14] <mup> Bug #1370896: juju has conf files in /var/log/juju on instances <canonical-bootstack> <logging> <rsyslog> <juju-core:Fix Committed by natefinch> <juju-core 1.24:Fix Released by natefinch> <https://launchpad.net/bugs/1370896>
[17:15] <sinzui> juju-ci-tools as a similar problem when we run its own suite on OS X. I created /var/lib/lxc on the machine to get a pass
[17:15] <davecheney> that's terrible
[17:15] <wwitzel3> natefinch: yeah, saw that, I discovered that the problem still exists in juju-1.24 master so I'm working on a fix now, before porting the other PRs
[17:15] <davecheney> why doesn't it fail for the landing bot ?
[17:17] <sinzui> davecheney: windows test suite is run by ci, not the merge bot, and since the test take about 2 hours to get a pass, do you really want to slow down merges? mgz suggested that the test suite be made reliable so that we could get the run down to 40 minutes per merge
[17:17] <davecheney> sinzui: i like forcing the issue
[17:18] <sinzui> ;)
[17:20] <mup> Bug #1474946 opened: worker/provisioner: tests are poorly isolated <juju-core:New> <https://launchpad.net/bugs/1474946>
[18:11] <sinzui> perrito666: juju-ci-tools has the first part of my testing arg change. I am going to do another round to, but It wont be merged until tomorrow.
[18:21] <perrito666> sinzui: tx for the heads up
[18:52] <katco> ericsnow: sorry, got caught up in meetings. reviewing your prs now
[19:29] <wwitzel3> afk picking my car up from the shop
[21:31] <menn0> perrito666: ping?
[21:32] <perrito666> menn0: pong?
[21:32] <perrito666> good morning
[21:32] <menn0> perrito666: good evening
[21:32] <menn0> perrito666: regarding the problem you found yesterday
[21:33] <perrito666> yes?
[21:33] <menn0> perrito666: thumper reminded me that using $set to overwrite a doc is a no-no with mgo/txn
[21:33] <menn0> perrito666: because it blows away the mgo/txn fields (txn-queue, txn-revno etc)
[21:33] <perrito666> menn0: oh, expand
[21:34]  * menn0 goes to find the mailing list post about this
[21:35] <menn0> perrito666: it was SO. the last paragraph here: http://stackoverflow.com/a/24458293/195383
[21:36] <menn0> perrito666: we should probably change the status update code to do a more conventional update
[21:36] <menn0> perrito666: and add some protection to stop people doing this again in the future.
[21:39] <perrito666> Definitely
[21:40] <menn0> perrito666: can you handle the first part (changing the status update code)
[21:40] <menn0> ?
[21:40] <menn0> perrito666: I'll handle the second part (preventing these kinds of updates)
[21:40] <perrito666> I wonder if that is not the cause of some of the eff ups of txns, this has been there for who knows how long
[21:40] <perrito666> I'll fix update
[21:41] <menn0> perrito666: it could well be
[21:44] <perrito666> I am just making a quick grocery shop and I'll send a patch upon returning
[21:46]  * perrito666 is surprised of how slow can the 10 > items line be
[21:47] <menn0> perrito666: no problems.. it can wait until tomorrow
[21:47] <menn0> perrito666, thumper: statusesC isn't the only place where we replace docs using $set
[21:47] <thumper> menn0: how many other places?
[21:48] <menn0> perrito666, thumper: not many: stateServingInfoC, constraintsC, settings
[21:48] <menn0> kinda important ones though!
[21:48] <thumper> :)
[21:49] <thumper> settings change often IIRC
[21:49] <thumper> moving a service around a gui updates settings doesn't it?
[21:49] <menn0> thumper: no that's annotations
[21:49] <thumper> ah
[21:49] <thumper> good
[21:49] <menn0> thumper: settings is all the relation and env settings
[21:49] <thumper> but equally important bits
[21:50] <thumper> relation settings is the core communication channel between services right?
[21:50] <menn0> thumper: esp b/c they all get watched for changes
[21:50] <perrito666> Aghh this line (all this conversation happened in the market line)
[21:50] <menn0> thumper: and also bad b/c constraints and settings are multi-env so really should have the env-uuid set
[21:51]  * thumper nods
[21:51] <thumper> fark!!!
[21:51]  * menn0 extends bug 1474606
[21:51] <mup> Bug #1474606: entities status is losing env-uuid upon setting status. <juju-core:Triaged by menno.smits> <juju-core 1.24:Triaged by menno.smits> <https://launchpad.net/bugs/1474606>
[21:52] <perrito666> Menn0 do we need some sort of repair steps?
[21:53] <menn0> perrito666: we will need to implement DB migrations to fix the env-uuid fields
[21:59]  * menn0 is having doubts and does a quick check to ensure that $set with a struct really replaces the whole doc
[22:11] <perrito666> menn0: can we implement db migrations to run even when not having min version change?
[22:11] <perrito666> by min I mean maj.min.micro
[22:11] <menn0> thumper, perrito666: no it does what we thought, so all made
[22:12] <menn0> urgh, so all bad
[22:12] <thumper> menn0: also... there are a bunch of weird relation bugs that I have a feeling are caused by this
[22:12] <menn0> thumper: could be
[22:12] <thumper> where an openstack deployment is made and some relations don't get the settings
[22:13] <thumper> especially if the relation config is more complicated
[22:13] <thumper> which is likely to be with some openstack charms
[22:13] <perrito666> basically anything being updated is left out of an env
[22:13] <perrito666> and most likely breaks a transaction
[22:13]  * perrito666 makes a t-shirt that says "every time you $set a doc a txn dies"
[22:14] <menn0> perrito666: all upgrade steps for the current major version are run whenever upgrading to any version within that major version so if we add upgrade steps they will get run
[22:14] <perrito666> excelent, I was in doubt there
[22:14]  * perrito666 feels ignored by the bot
[22:17] <menn0> bug 1474606 updated
[22:17] <mup> Bug #1474606: Document replacements using $set are problematic <juju-core:Triaged by menno.smits> <juju-core 1.24:Triaged by menno.smits> <https://launchpad.net/bugs/1474606>
[22:19] <perrito666> menn0: Ill propose a fix for status right away
[22:20] <perrito666> I take the migration step will be rather generic and just be called with all affected collections once all is fixed?
[22:26] <perrito666> menn0: btw, thanks for putting all that effort into this, I completely overlooked the txn issue.
[22:27] <menn0> perrito666: it was thumper who remembered this, not me.
[22:27] <perrito666> aww, I don't want to thank thumper, he did not take my candy
[22:37] <thumper> haha
[22:37] <thumper> perrito666: no if you were offering nice steak or wine... that would be a different proposition
[22:37] <thumper> s/no/now/
[22:38] <marcoceppi> jw4: you still around?
[22:38] <thumper> perrito666: I have a gut feeling that the replacement of the docs in the settings collection is the source of a collection of weird unreproducable relation errors
[22:38] <thumper> hey marcoceppi
[22:39] <thumper> marcoceppi: quickk question for you
[22:39] <marcoceppi> hey thumper o/
[22:39] <marcoceppi> thumper: shoot
[22:39] <thumper> marcoceppi: if you are deploying a large bundle, how often are there strange relation config issues?
[22:40] <marcoceppi> I wouldn't know, I've only done openstack bundles
[22:40] <marcoceppi> that's the biggest I've gotten
[22:43] <perrito666> thumper: I think you might not like steak how its done here :p but if you where ever to visit I might cook you a decent local meat dish with wine
[22:44] <thumper> :)
[22:44] <jw4> marcoceppi: yep, sorry missed your ping
[22:48] <alexisb> thumper, I am available when ever you would like to chat
[22:49] <marcoceppi> jw4: does action-fail immediately exit after it's called?
[22:49] <marcoceppi> as in, kill the action?
[22:49] <jw4> marcoceppi: I don't think so
[22:49] <marcoceppi> or do I still need to exit
[22:49] <marcoceppi> kk
[22:49] <marcoceppi> thanks
[22:49] <jw4> yw :)
[22:53] <jw4> marcoceppi: just confirmed - it only sets the status of the action but doesn't terminate execution
[23:00] <perrito666> thumper: menn0 what I am wondering, and you might be too, is how in the universe are these things working eventhough they lack env-uuid
[23:00] <perrito666> sounds like we have another bug somewhere
[23:00] <perrito666> at least in state
[23:01] <perrito666> http://reviews.vapour.ws/r/2178/ <-- fix for update status
[23:02] <menn0> perrito666: yes, I was wondering the same thing
[23:02] <menn0> perrito666: there might be a bug in the multi-env txn stuff
[23:03] <perrito666> mm, isnt (or wasnt) the env also encoded in the id?
[23:03] <perrito666> I just noticed the breakage once I needed to use something with an int _id
[23:05] <menn0> perrito666: ship it
[23:05] <menn0> perrito666: the env uuid is prefixed on to the front of the _id
[23:05] <menn0> perrito666: it needs to be a string
[23:05] <menn0> perrito666: where do you have int _ids?
[23:06] <perrito666> menn0: status-history works differently
[23:06] <perrito666> its a simple pile
[23:07] <menn0> perrito666: so it has int _ids?
[23:08] <perrito666> menn0: yes, sequential
[23:08] <perrito666> also doesnt use txn
[23:08] <perrito666> all by hand
[23:09] <menn0> ok, well if it's not using the txn system then it doesn't matter what you do
[23:09] <perrito666> menn0: yup its a different beast
[23:10] <perrito666> menn0: btw, I think that, at least for status, what is happening is that, since the ids of the entities and statuses are the same, it is returning the statuses correctly anyway (and the envuuid aware txn might be letting blank envuuid pass, which it shouldnt)
[23:12] <menn0> perrito666: yeah it's not supposed to
[23:12] <perrito666> menn0: its just a theory
[23:12] <perrito666> but behavior seems to suggest that this is happening
[23:12] <menn0> perrito666: i'm dealing with the another critical bug at the moment, then I'll get to this one
[23:13] <perrito666> life is fun, isn't it?
[23:13] <perrito666> if it makes you feel better, you are one day closer to the weekend than I am
[23:46] <thumper> perrito666: it is working because we don't use the env-uuid value unless we are cleaning up documents
[23:46] <thumper> perrito666: all the queries use the _id field
[23:46] <thumper> which is the same
[23:46] <thumper> and has the env-uuid prefixed