/srv/irclogs.ubuntu.com/2015/07/15/#juju-dev.txt

bradmwe've got a charm that seems to be stuck on installing, but the juju logs say the start hook ran, any ideas on how we can dig into it?00:00
rick_h_thumper: wallyworld ^ this is related to current IS issues00:00
wallyworldok, thanks for update00:00
menn0perrito666: that does indeed seem to be a serious problem00:01
menn0perrito666: file a bug and point thumper at it. one of us will fix it soon.00:02
* thumper keeps his head down00:02
perrito666thumper: candy?00:02
* thumper mutters under his breath something about deadlines and too much work00:02
* thumper ignores the candy00:03
menn0perrito666: we'll have to add an upgrade step to fix existing records00:03
menn0bradm: if you haven't already can you look at the logs on the unit's machine itself? (/var/log/juju/unit-FOO.log)00:07
bradmmenn0: all good now, it was just taking a long time to realise it was up00:09
menn0bradm: ok, good to hear00:10
bradmwell, all good might be a stretch, but its all onto ceph now00:10
bradmits making good progress00:10
menn0perrito666: I can certainly see the statuses env-uuid problem in a local env here00:14
perrito666menn0: adding the bug with detail00:14
rick_h_NOTICE: jujucharms.com and the charmstore are back up. The storage in IS is working to rebalance/sync and might time out or be slow for a bit longer.00:14
rick_h_menn0: ^00:14
menn0rick_h_: sweet00:15
menn0perrito666: I wonder how status lookups are even workings at all00:15
perrito666menn0: me too00:17
perrito666menn0: thumper https://bugs.launchpad.net/juju-core/+bug/147460600:17
mupBug #1474606: entities status is loosing env-uuid upon setting status. <juju-core:New> <https://launchpad.net/bugs/1474606>00:17
perrito666menn0: enjoy00:19
rick_h_hah, he says as he takes his candy back00:19
perrito666rick_h_: well I left a very nice report in exchange00:20
perrito666menn0: this happens for services, units, agents, machines and every other thing that has a status00:20
menn0perrito666: I see you've already got a fix for it too00:22
menn0perrito666: although I might try and fix this in the multi-env txn layer00:22
menn0too00:22
perrito666menn0: I do, but I was not sure if I cover all aspects of this issue00:23
perrito666I just fixed my patch of land00:23
menn0perrito666: understood00:24
davecheneywhat the heck is this test doing ?00:26
davecheneyFAIL: kvm-broker_test.go:241: kvmBrokerSuite.TestStartInstancePopulatesNetworkInfo00:26
davecheney[LOG] 0:00.001 DEBUG juju.testing setting feature flags: address-allocation00:26
davecheneykvm-broker_test.go:251: instanceConfig := s.instanceConfig(c, "42")00:26
davecheney/home/ubuntu/src/github.com/juju/juju/container/testing/common.go:90: c.Assert(err, jc.ErrorIsNil)00:26
davecheney... value *os.PathError = &os.PathError{Op:"mkdir", Path:"/var/lib/lxc", Err:0xd} ("mkdir /var/lib/lxc: permission denied")00:26
davecheneyof course this is going to fail00:26
davecheneymortals don't have permissino to write to that dir00:26
menn0wallyworld: just confirmed... rsyslog is screwed both before and after the upgrade00:33
menn0:(00:34
menn0writing up a ticket now00:34
wallyworldoh dear00:34
menn0in different ways though so you know, at least that's interesting00:35
wallyworldeven better00:35
menn0I think the issue in 1.24.0 has already been fixed in a later 1.2400:35
wallyworldthe rsyslog issue? so do we need to ammend an upgrade step?00:36
menn0not sure yet00:37
menn0before the upgrade rsyslogd is continually being restarted by juju00:37
menn0every 30s or so00:37
wallyworldsounds like a bug we may have fixed yeah00:37
menn0the only thing in juju's logs is the rsyslog worker saying "reloading rsyslog config"00:37
menn0after the upgrade that stops00:38
menn0but most/all of the units can't connect00:38
menn0due to cert verification00:38
axwthumper: "I did say that the way to fix this properly was to not use the state method to load the charms."  -- what else would you use to enumerate charms, if not state?00:39
menn0which seems like the thing that's been attempted to be fixed several times now00:39
menn0anyway, i'll write up the ticket00:39
menn0wallyworld: ^^00:39
mupBug #1474606 opened: entities status is losing env-uuid upon setting status. <juju-core:New for menno.smits> <juju-core 1.24:New for menno.smits> <https://launchpad.net/bugs/1474606>00:39
mupBug #1474607 opened: worker/uniter/relation: HookQueueSuite.TestAliveHookQueue failure <juju-core:New> <https://launchpad.net/bugs/1474607>00:39
thumperaxw: making a request of the collection and have it return the raw data as dicts00:39
thumperaxw: that way you aren't expecting a particular structure00:40
thumperaxw: we have had to write many upgrade steps in this way00:40
thumperand I believe it is better00:40
thumperbecause you just ask for what you need, and change what you must00:40
thumperand don't worry about the structure of the doc as much00:40
wallyworldmenn0: sorry was distracted by a review. i recall vaguely the cer issue was fixed. rsyslog i think uses a different cert to state server connections00:41
menn0wallyworld: do you know when it was fixed given that I'm seeing this with an upgrade to 1.24.2?00:42
wallyworldmenn0: not offhand - ericsnow might know00:42
wallyworldthumper: i am going to fix it like you wanted in 1.22 - do we really need to do the work in 1.24 if the step runs after env uuid has been added?00:45
wallyworldif yes, i can fix00:45
axwwallyworld: is there a test we can add that would have highlighted the issue?00:47
axwwallyworld: and that would highlight future issues00:47
wallyworldaxw: for this case yes - a CI test that adds charms to the 1.20 env prior to upgrade and then checks that they are imported after00:48
wallyworldthat's on my todo list to follow up on00:49
axwwallyworld: presumably we could write a unit test that adds a charm entry to state, and wipes out its env-uuid field to exercise the bug? but the CI test would be better, since it'll catch future bugs too00:50
wallyworldyeah, that was my thinking00:50
wallyworldmenn0: and for further joy, bug 1469077 is back again, i'll need to point william to it00:52
mupBug #1469077: Leadership claims, document larger than capped size <landscape> <leadership> <juju-core:Triaged> <juju-core 1.24:Confirmed> <https://launchpad.net/bugs/1469077>00:52
menn0wallyworld: yeah I saw that00:52
menn0wallyworld: so unawesome00:52
wallyworldi know right :-(00:53
menn0here's the rsyslog ticket: bug 147461400:59
mupBug #1474614: rsyslog connections fail with certificate verification errors after upgrade to 1.24.2 <regression> <juju-core:New> <juju-core 1.24:New> <https://launchpad.net/bugs/1474614>00:59
menn0wallyworld: ^^01:00
wallyworldlooking01:00
wallyworldthanks for filing, a nice bug report01:01
wallyworldmenn0: axw will pick up that rsyslog bug01:03
wallyworldaxw: goose pr lgtm01:05
axwwallyworld: thanks01:06
mupBug #1474291 opened: juju called unexpected config-change hooks after read tcp 127.0.0.1:37017: i/o timeout <hooks> <openstack> <sts> <uosci> <juju-core:New> <ceilometer (Juju Charms Collection):New> <https://launchpad.net/bugs/1474291>01:09
mupBug #1474614 opened: rsyslog connections fail with certificate verification errors after upgrade to 1.24.2 <regression> <juju-core:New> <juju-core 1.24:New> <https://launchpad.net/bugs/1474614>01:09
wallyworldaxw: tagging pr reviewed, but there is a question01:19
axwwallyworld: ok, thanks, will look in a sec01:20
thumperwallyworld: if we don't, it is just a time bomb for the next time things change01:50
thumperit is makeing a problem for future us01:50
wallyworldthumper: looking at the code - there's *lots* of current upgrade steps that use the docs directly, not maps. the only ones that use maps are the ones to inser env uuid01:51
thumperwell... we are just making problems for ourselves IMO01:53
wallyworldi thumper your team wrote a lot of them01:54
thumperyou are proably not wrong01:55
wallyworldi guess the difference is that the doc are used with the raw collection01:55
thumperI'm telling you the result of accumulated wisdom01:55
wallyworldso rawCollection.Find(&someoc)01:55
wallyworldone would hope that CI tests would evolve to better catch upgrade issues01:56
lazyPowerthumper: sorry was deep in a support scenario.03:46
lazyPowerthumper: sounds good mate, submit it earlier is way better than later, as the queue takes a couple days to sift down to newly submitted stuff03:47
thumperlazyPower: no worries03:47
thumperI'm not going to block my deployment on it getting reviewed :)03:47
lazyPowerwe're averaging ~ 5 days on initial touch, still trying to get that number down, but its way better than the 13 days in history.03:47
lazyPoweras you shouldn't be :)03:47
lazyPowernamespaces!!!!03:47
* lazyPower toots the namespace horn03:47
lazyPowercheers03:48
thumpernamespaces?03:50
menn0thumper, waigani, wallyworld: see email for findings related to bug 147419504:05
mupBug #1474195: juju 1.24 memory leakage <cpec> <deployer> <performance> <regression> <juju-core:Triaged> <juju-core 1.24:In Progress by menno.smits> <https://launchpad.net/bugs/1474195>04:05
menn0thumper, waigani, wallyworld: looks like will's theory was right04:05
waiganimenn0: red box of death?04:16
menn0waigani: yep... see the note on the field04:17
waiganiah yep, just saw note04:17
menn0:)04:17
waiganinice work04:18
menn0waigani: when you added the auto env life assertion to the txn layer did you remove the ones that already existed elsewhere, or did they not exist anywhere before JES?04:18
menn0I guess it wasn't really necessary when there was just one env04:18
waiganimenn0: yeah, it's going back a bit now, but I don't remember there being any - which as you point out makes sense.04:19
menn0cool04:19
menn0waigani: i'm trying to figure out the right places to check04:20
menn0adding a machine certainly04:20
waiganimenn0: you mean where we really need to assert for a live environ?04:20
menn0waigani: yep04:20
waiganimenn0: as a starting point, didn't will say whenever we add a service, unit, relation or machine?04:21
mupBug #1454468 changed: nodes deployed successfully by maas but juju status remains pending with juju 1.23.2 and services stuck in allocating <deploy> <oil> <juju-core:Expired> <https://launchpad.net/bugs/1454468>04:22
menn0waigani: I wonder if we can reduce that set to just service and machine04:25
waiganiso what's the worst case? we add a unit/relation to a dying environment...04:25
menn0waigani: I'm wondering if we can just add an environment cleanup to kill them04:26
menn0waigani: in fact, the current cleanupServicesForDyingEnvironment might already do it04:26
* thumper is going to lie down04:29
* thumper is not 100%04:29
waiganimenn0: so that sets the existing services to dying, but expects that no new services can be added to a dying environment04:29
menn0waigani: standup hangout? (as per PM)04:30
waiganimenn0: yep04:30
wallyworldmenn0: should be any time we allocate something that costs05:07
wallyworldeg machine, storage etc05:07
menn0wallyworld: yep, that's what i'm looking at now... anything that results in a physical change certainly needs the env life assert05:09
menn0(as physical as a virtual machine is anyway)05:09
wallyworldwell, physical change that costs $$$05:09
wallyworlddon're really care about containers05:10
menn0wallyworld: i've got a pretty clear picture now of what I want to do. i'm going to catch will later on tonight to confirm05:10
wallyworldbut machines yes05:10
wallyworldok05:10
menn0wallyworld: that's a good point, maybe we don't check for containers05:10
wallyworldmenn0: yeah, so for stuff that doesn't cost, we just have a cleanup job after env is killed05:11
menn0wallyworld: yep, and we already have most of that it turns out05:11
wallyworldi can't see why we'd check more than is necessary05:11
wallyworldand before JES, we didn't check05:11
wallyworldso TBH i'm not sure why we started checking with JES05:12
wallyworldi guess JES has greater chance of concurrent access05:13
menn0yep and b/c before when you issued destroy-environment everything died at that point including the API server05:13
menn0so you had very little opportunity to add a new machine or whatever once the env was dying05:14
wallyworldwaigani: so is that +2 a ship it? btw - that func needs to be exported because it is in state package05:14
wallyworldand called by upgrades package05:14
menn0now for hosted envs the API server stays up so there's a much great chance of env changing operations as the env is dying05:14
wallyworldfair point05:14
menn0anyway, stopping now since i'm going to be back on later05:15
=== menn0 is now known as menn0-afk
=== _stowa_ is now known as _stowa
dimiterndooferlad, morning07:46
dooferladdimitern: hi07:46
dimiterndooferlad, I thought we dealt with the kvm-inaccessable-after-reboot issue in 1.24 as well ? see bug 147450807:46
mupBug #1474508: Rebooting the virtual machines breaks Juju networking <juju-core:New> <https://launchpad.net/bugs/1474508>07:46
dooferladdimitern: I thought so too.07:47
dimiterndooferlad, maybe the fix is in master only?07:48
dooferladdimitern: will need to take a look and see if I missed landing it07:48
dimiterndooferlad, cheers07:48
dooferladdimitern: darn, wasn't backported.07:53
dooferladdimitern: will be trivial to do.07:53
waiganiwallyworld: sorry, just saw your message - this for moving charm tests to state? yes, +2 shipit.07:56
wallyworldta07:56
waiganiwallyworld: ah, I thought I clicked shipit - done now. The pattern of needing exported state funcs for upgrade steps is something fwereade is keen to change - possibly just exporting one upgrade step from state which then calls the other unexported steps. But for now we just need to try to make it clear that while the func is exported, no-one except the upgrades package should be using it.07:59
wallyworldwaigani: np. and this was for a 1.22 release, so old code08:00
waiganiwallyworld: yeah true08:02
dimiterndooferlad, awesome! will you do the dance then please? - card, bug, etc.08:06
dimitern:)08:06
dooferladsure08:06
dimiternta!08:06
=== menn0-afk is now known as menn0
menn0fwereade: ping?08:15
dooferladdimitern: http://reviews.vapour.ws/r/2163/ for a quick +208:45
dimiterndooferlad, ship it! :)08:50
jam1fwereade: dimitern: food just arrived so I'm going to miss the standup. But I'm working on breaking down the Uncommitted state stuff into development items (I'd like to chat directly with fwereade later if you have time before our cycle review)08:59
=== jam1 is now known as jam
dooferladjam, fwereade, TheMue, dimitern: stand up!09:00
dimiterndooferlad, omw09:01
TheMueomw09:01
dimiternjam, thanks for the heads up09:01
rogpeppecan anyone tell me something about plans relevant to the EnvironmentsCacheFile feature?10:24
rogpeppejam, fwereade: ^10:26
jamrogpeppe: I don't particularly know it by that name, but it looks like something thumper would have been doing to support multiple environments10:28
mupBug #1474788 opened: ec2: provisioning machines sometimes fails with "tagging instance: The instance ID <ID> does not exist" <ec2-provider> <juju-core:Triaged> <juju-core 1.24:Triaged> <https://launchpad.net/bugs/1474788>10:28
jamjust by reading its description from https://github.com/juju/juju/blob/master/feature/flags.go#L2910:29
rogpeppejam: i'm just wondering what our future plans are. is the plan to do away with .jenv files entirely?10:29
jamrogpeppe: thats how I read the description in there. I haven't heard of that before, nor had read that particular detail in the JES stuff. But it does read that way.10:30
rogpeppejam: surely we have some roadmap plans somewhere?10:30
rogpeppecherylj: do you know about this, by any chance?10:30
jamrogpeppe: so I've got docs for JES CLI, JES Logging, MESS Work Items and one more. The last two are "Historical" and it might be described in there, but they are roughly before I started tracking all the proposals directly.10:31
dooferladdimitern: this is what I have for the spaces API stuff: https://github.com/juju/juju/compare/net-cli...dooferlad:net-cli-apiserver-spaces?expand=110:37
dooferladdimitern: would be good to have a chat about if that is shaping up in the way you imagined. I am not sure I like having the stub network stuff in apiserver/testing. I think having its own package is nicer.10:39
dooferladdimitern: what do you think?10:39
dimiterndooferlad, looking10:40
dimiterndooferlad, I like the refactoring around moving the shared stubs in apiserver/testing10:43
dimiterndooferlad, haven't looked at every line, but so far it looks solid10:43
dimiterndooferlad, please, s/ast/apiservertesting/ (or whichever alias for that path is more common)10:44
dooferladdo you have an opinion about if fake_spaces_subnets.go shoud be in its own package so we can just import and use it rather than having to call InitStubNetwork?10:44
dimiterndooferlad, also InitStubNetwork() could be defined as a method on a fixture struct, which can be embedded into the suites that need it and call it in SetUpSuite, rather than init()10:45
dimiterndooferlad, have a look at LiveTests (or was it Tests ?) for example10:46
dooferladdimitern: sure, github.com/juju/juju/environs/jujutest/livetests.go right?10:47
dimiterndooferlad, I have a lingering feeling the shared stubs are not goroutine safe (when used outside apiserver/testing) - make sure you run with -race10:47
dimiterndooferlad, that's the one yeah10:47
dooferladdimitern: great, thanks for the pointers.10:49
wallyworldfwereade: bug 1469077 has come up again on 1.24.2, so i removed the incomplete status11:20
mupBug #1469077: Leadership claims, document larger than capped size <landscape> <leadership> <juju-core:Triaged> <juju-core 1.24:Triaged> <https://launchpad.net/bugs/1469077>11:20
fwereadewallyworld, grar. axw, do you have context on this? ^^11:23
axwfwereade: nope11:23
wallyworldfwereade: i have no context on the cause or fix sadly, but i see some info has been attached to the bug11:24
jamwallyworld: fwereade: I could see a case where contending on the txn-queue field and having it grow large enough that we can't handle all the txns listed before a new one comes in11:25
jamand then it grows every 30s until there are so many entries that it is larger than we're allowed to make a document (or in this case larger than the size of a capped collection?)11:26
wallyworldsounds plausible11:26
fwereadejam, yeah -- I just thought that *someone* had addressed the writes that caused that11:26
fwereadejam, I just forget who11:26
fwereadejam, perhaps I hallucinated it11:26
jamfwereade: we handled that for addresses by fixing the addresser11:26
wallyworldah mr *someone* :-)11:26
jamI don't know of a fix for the leadership stuff11:26
fwereadejam, ok, bugger, I thought that was part of the stuff axw had done but evidently not11:27
jamm-enn-o had done some work to clean out transactions that are thought of as already applied (to handle our other assertion-only TXNs don't get cleaned out)11:27
jambut I would think that's a different issue.11:27
fwereadejam, yeah11:27
fwereadejam, ok, let's chalk it up to a hallucination then ;p11:27
fwereadejam, oh wait11:28
fwereadejam, I thought it was the remove/insert behaviour that led to growing txn queues, and mr.someone had made a fix to the lease persistor that stopped it doing that?11:28
fwereadewallyworld:11:29
fwereade// TODO(wallyworld) - this logic is a stop-gap until a proper refactoring is done11:29
fwereade// We'll be especially paranoid here - to avoid potentially overwriting lease info11:29
fwereade// from another client, if the txn fails to apply, we'll abort instead of retrying.11:29
fwereadewallyworld, originally it was remove/insert every time, which was causing unbounded queue growth11:30
wallyworldhmmm, le tm elook up that code11:30
fwereadewallyworld, state/lease.go11:30
fwereadewallyworld, maybe it wasn't backported..?11:31
wallyworldi don't recall that todo all at, yet it has my nname on it :-)11:31
fwereadehaha11:32
fwereadeI know the feeling11:32
wallyworldfwereade: i just checked the code, i.24 is the same11:32
jamwallyworld: I do remember fwereade reviewing a patch you submitted so that we changed how leases are requested so that it wouldn't be a "delete current one, create a new one" sort of operation.d11:35
jamfwereade: as far as that goes, *if* we ever get to a point where we have an invalid TXN in the queue (one that we cannot clear)11:35
jamthen we'll overflow the txn-queue eventually11:36
jambecause creatiion of a *new* txn adds a value to the field11:36
jamand then when we go to evaluate the txn, we see the bad txn and die, and now we have yet-another txn in the queue11:36
jamso the "document too big" could just be a symptom of "invalid TXN in queue"11:37
wallyworldjam: i vaguely recall that too, i'll have to go digging11:44
jamfwereade: iteration planning meeting?12:01
fwereadejam, there12:04
jamhm. I don't see you in the one I'm in12:04
wwitzel3katco: ping12:08
mupBug #1474508 changed: Rebooting the virtual machines breaks Juju networking <juju-core:Fix Released by dooferlad> <juju-core 1.24:In Progress by dooferlad> <https://launchpad.net/bugs/1474508>12:11
perrito666morning all12:17
wwitzel3perrito666: o/12:40
wwitzel3ericsnow: ping13:07
wwitzel3so cold and alone13:24
natefinchwwitzel3: lol13:24
wwitzel3natefinch: these things tend to happen when working on rsyslog stuff ;)13:24
natefinchwwitzel3: ahh yeah, totally13:28
dooferladdimitern, TheMue: please be opinionated at http://reviews.vapour.ws/r/2169/13:33
TheMuedooferlad: ok13:33
TheMuedooferlad: too many files, cannot be good *lol*13:34
dimiterndimitern, kiijubg\13:57
dimiternwtf?!13:57
dimiterndooferlad, looking :)13:57
mupBug #1468815 opened: Upgrade fails moving syslog config files "invalid argument" <ci> <regression> <upgrade-juju> <juju-core:Triaged> <juju-core 1.24:Fix Released by ericsnowcurrently> <https://launchpad.net/bugs/1468815>14:14
cheryljrogpeppe: I didn't do the work to enable the cache file, but I might be able to answer specific questions you may have about it.14:32
dimiterndooferlad, reviewed14:33
dooferladdimitern: thanks - exactly what I needed.14:44
dimiterndooferlad, cool :)14:44
mupBug #1474885 opened: juju deploy fails with ERROR EOF <juju-core:New> <https://launchpad.net/bugs/1474885>15:02
mupBug #1474892 opened: User friendly error message for system destroy could be improved <juju-core:New for cherylj> <https://launchpad.net/bugs/1474892>15:02
* fwereade is stopping, has a review up: http://reviews.vapour.ws/r/2172/15:52
katcoericsnow: meeting16:02
alexisbkatco, would you mind filling gsamfira in our the details of how we use feature branches?16:24
alexisbI pointed him to the wiki but he has some questioned I am not qualified to answer16:24
katcoalexisb: sure thing16:25
katcogsamfira: lmk what questions you have16:26
mattywfwereade, I'll be proposing small uniter changes later on, don't need reviewing yet but I'll ping you about them tomorrow, wanted to let you know before then just to let you know that part of it might be controversial but I think I have a good justification16:49
davecheneykvm-broker_test.go:201: kvm0 := s.startInstance(c, "1/kvm/0")16:51
davecheney/home/ubuntu/src/github.com/juju/juju/container/testing/common.go:90: c.Assert(err, jc.ErrorIsNil)16:51
davecheney... value *os.PathError = &os.PathError{Op:"mkdir", Path:"/var/lib/lxc", Err:0xd} ("mkdir /var/lib/lxc: permission denied")16:51
davecheneyI am seeing this error constantly on a fresh ubuntu machine16:52
davecheneyit seems pretty fatal16:52
davecheneyhas anyone else ever seen this16:52
davecheneyi'm sure it's because lxc pacakges are not installed, so /var/lib/lxc is not present16:52
davecheneybut this seems like a pretty serious isolation failure16:52
natefinchdavecheney: I haven't seen it, but I have lxc installed16:59
davecheneythis is on a fresh install17:00
davecheneytests fail because this directory is17:00
davecheney1. not present17:00
davecheney2. will not be present, bucause /var/lib is owned by root17:00
natefinchdavecheney: certainly, it's an isolation problem.  I wonder if there aren't a lot more similar problems in those tests, if lxc is not installed.17:01
davecheneyi'm too scared to look17:02
davecheneyalso, how is that test supposed to pass on windows ?17:02
* davecheney logs a bug and moves on17:03
natefinchdavecheney: I presume all the tests are marked as skipped on windows17:03
davecheneydo we have voting windows CI tests ?17:11
natefinchericsnow, wwitzel3: review me? http://reviews.vapour.ws/r/2174/17:11
natefinchsinzui: what davecheney said ^17:12
davecheneyhttps://bugs.launchpad.net/juju-core/+bug/147494617:12
mupBug #1474946: worker/provisioner: tests are poorly isolated <juju-core:New> <https://launchpad.net/bugs/1474946>17:12
natefinchdavecheney: I know they run and passed at one time, but I don't know if they're voting or not.  I believe so, since I have gotten windows bugs17:12
natefinchfrom CI failures17:12
sinzuinatefinch windows tests do vote. they have passed in but not in a week. I am told many test are skipped17:14
natefinchwwitzel3: btw, I already foward ported the first bug in your bug task: https://bugs.launchpad.net/juju-core/+bug/137089617:14
mupBug #1370896: juju has conf files in /var/log/juju on instances <canonical-bootstack> <logging> <rsyslog> <juju-core:Fix Committed by natefinch> <juju-core 1.24:Fix Released by natefinch> <https://launchpad.net/bugs/1370896>17:14
sinzuijuju-ci-tools as a similar problem when we run its own suite on OS X. I created /var/lib/lxc on the machine to get a pass17:15
davecheneythat's terrible17:15
wwitzel3natefinch: yeah, saw that, I discovered that the problem still exists in juju-1.24 master so I'm working on a fix now, before porting the other PRs17:15
davecheneywhy doesn't it fail for the landing bot ?17:15
sinzuidavecheney: windows test suite is run by ci, not the merge bot, and since the test take about 2 hours to get a pass, do you really want to slow down merges? mgz suggested that the test suite be made reliable so that we could get the run down to 40 minutes per merge17:17
davecheneysinzui: i like forcing the issue17:17
sinzui;)17:18
mupBug #1474946 opened: worker/provisioner: tests are poorly isolated <juju-core:New> <https://launchpad.net/bugs/1474946>17:20
=== natefinch is now known as natefinch-afk
sinzuiperrito666: juju-ci-tools has the first part of my testing arg change. I am going to do another round to, but It wont be merged until tomorrow.18:11
perrito666sinzui: tx for the heads up18:21
katcoericsnow: sorry, got caught up in meetings. reviewing your prs now18:52
wwitzel3afk picking my car up from the shop19:29
=== natefinch-afk is now known as natefinch
menn0perrito666: ping?21:31
perrito666menn0: pong?21:32
perrito666good morning21:32
menn0perrito666: good evening21:32
menn0perrito666: regarding the problem you found yesterday21:32
perrito666yes?21:33
menn0perrito666: thumper reminded me that using $set to overwrite a doc is a no-no with mgo/txn21:33
menn0perrito666: because it blows away the mgo/txn fields (txn-queue, txn-revno etc)21:33
perrito666menn0: oh, expand21:33
* menn0 goes to find the mailing list post about this21:34
menn0perrito666: it was SO. the last paragraph here: http://stackoverflow.com/a/24458293/19538321:35
menn0perrito666: we should probably change the status update code to do a more conventional update21:36
menn0perrito666: and add some protection to stop people doing this again in the future.21:36
perrito666Definitely21:39
menn0perrito666: can you handle the first part (changing the status update code)21:40
menn0?21:40
menn0perrito666: I'll handle the second part (preventing these kinds of updates)21:40
perrito666I wonder if that is not the cause of some of the eff ups of txns, this has been there for who knows how long21:40
perrito666I'll fix update21:40
menn0perrito666: it could well be21:41
perrito666I am just making a quick grocery shop and I'll send a patch upon returning21:44
* perrito666 is surprised of how slow can the 10 > items line be21:46
menn0perrito666: no problems.. it can wait until tomorrow21:47
menn0perrito666, thumper: statusesC isn't the only place where we replace docs using $set21:47
thumpermenn0: how many other places?21:47
menn0perrito666, thumper: not many: stateServingInfoC, constraintsC, settings21:48
menn0kinda important ones though!21:48
thumper:)21:48
thumpersettings change often IIRC21:49
thumpermoving a service around a gui updates settings doesn't it?21:49
menn0thumper: no that's annotations21:49
thumperah21:49
thumpergood21:49
menn0thumper: settings is all the relation and env settings21:49
thumperbut equally important bits21:49
thumperrelation settings is the core communication channel between services right?21:50
menn0thumper: esp b/c they all get watched for changes21:50
perrito666Aghh this line (all this conversation happened in the market line)21:50
menn0thumper: and also bad b/c constraints and settings are multi-env so really should have the env-uuid set21:50
* thumper nods21:51
thumperfark!!!21:51
* menn0 extends bug 147460621:51
mupBug #1474606: entities status is losing env-uuid upon setting status. <juju-core:Triaged by menno.smits> <juju-core 1.24:Triaged by menno.smits> <https://launchpad.net/bugs/1474606>21:51
perrito666Menn0 do we need some sort of repair steps?21:52
menn0perrito666: we will need to implement DB migrations to fix the env-uuid fields21:53
* menn0 is having doubts and does a quick check to ensure that $set with a struct really replaces the whole doc21:59
perrito666menn0: can we implement db migrations to run even when not having min version change?22:11
perrito666by min I mean maj.min.micro22:11
menn0thumper, perrito666: no it does what we thought, so all made22:11
menn0urgh, so all bad22:12
thumpermenn0: also... there are a bunch of weird relation bugs that I have a feeling are caused by this22:12
menn0thumper: could be22:12
thumperwhere an openstack deployment is made and some relations don't get the settings22:12
thumperespecially if the relation config is more complicated22:13
thumperwhich is likely to be with some openstack charms22:13
perrito666basically anything being updated is left out of an env22:13
perrito666and most likely breaks a transaction22:13
* perrito666 makes a t-shirt that says "every time you $set a doc a txn dies"22:13
menn0perrito666: all upgrade steps for the current major version are run whenever upgrading to any version within that major version so if we add upgrade steps they will get run22:14
perrito666excelent, I was in doubt there22:14
* perrito666 feels ignored by the bot22:14
menn0bug 1474606 updated22:17
mupBug #1474606: Document replacements using $set are problematic <juju-core:Triaged by menno.smits> <juju-core 1.24:Triaged by menno.smits> <https://launchpad.net/bugs/1474606>22:17
perrito666menn0: Ill propose a fix for status right away22:19
perrito666I take the migration step will be rather generic and just be called with all affected collections once all is fixed?22:20
perrito666menn0: btw, thanks for putting all that effort into this, I completely overlooked the txn issue.22:26
menn0perrito666: it was thumper who remembered this, not me.22:27
perrito666aww, I don't want to thank thumper, he did not take my candy22:27
thumperhaha22:37
thumperperrito666: no if you were offering nice steak or wine... that would be a different proposition22:37
thumpers/no/now/22:37
marcoceppijw4: you still around?22:38
thumperperrito666: I have a gut feeling that the replacement of the docs in the settings collection is the source of a collection of weird unreproducable relation errors22:38
thumperhey marcoceppi22:38
thumpermarcoceppi: quickk question for you22:39
marcoceppihey thumper o/22:39
marcoceppithumper: shoot22:39
thumpermarcoceppi: if you are deploying a large bundle, how often are there strange relation config issues?22:39
marcoceppiI wouldn't know, I've only done openstack bundles22:40
marcoceppithat's the biggest I've gotten22:40
perrito666thumper: I think you might not like steak how its done here :p but if you where ever to visit I might cook you a decent local meat dish with wine22:43
thumper:)22:44
jw4marcoceppi: yep, sorry missed your ping22:44
alexisbthumper, I am available when ever you would like to chat22:48
marcoceppijw4: does action-fail immediately exit after it's called?22:49
marcoceppias in, kill the action?22:49
jw4marcoceppi: I don't think so22:49
marcoceppior do I still need to exit22:49
marcoceppikk22:49
marcoceppithanks22:49
jw4yw :)22:49
jw4marcoceppi: just confirmed - it only sets the status of the action but doesn't terminate execution22:53
perrito666thumper: menn0 what I am wondering, and you might be too, is how in the universe are these things working eventhough they lack env-uuid23:00
perrito666sounds like we have another bug somewhere23:00
perrito666at least in state23:00
perrito666http://reviews.vapour.ws/r/2178/ <-- fix for update status23:01
menn0perrito666: yes, I was wondering the same thing23:02
menn0perrito666: there might be a bug in the multi-env txn stuff23:02
perrito666mm, isnt (or wasnt) the env also encoded in the id?23:03
perrito666I just noticed the breakage once I needed to use something with an int _id23:03
menn0perrito666: ship it23:05
menn0perrito666: the env uuid is prefixed on to the front of the _id23:05
menn0perrito666: it needs to be a string23:05
menn0perrito666: where do you have int _ids?23:05
perrito666menn0: status-history works differently23:06
perrito666its a simple pile23:06
menn0perrito666: so it has int _ids?23:07
perrito666menn0: yes, sequential23:08
perrito666also doesnt use txn23:08
perrito666all by hand23:08
menn0ok, well if it's not using the txn system then it doesn't matter what you do23:09
perrito666menn0: yup its a different beast23:09
perrito666menn0: btw, I think that, at least for status, what is happening is that, since the ids of the entities and statuses are the same, it is returning the statuses correctly anyway (and the envuuid aware txn might be letting blank envuuid pass, which it shouldnt)23:10
menn0perrito666: yeah it's not supposed to23:12
perrito666menn0: its just a theory23:12
perrito666but behavior seems to suggest that this is happening23:12
menn0perrito666: i'm dealing with the another critical bug at the moment, then I'll get to this one23:12
perrito666life is fun, isn't it?23:13
perrito666if it makes you feel better, you are one day closer to the weekend than I am23:13
thumperperrito666: it is working because we don't use the env-uuid value unless we are cleaning up documents23:46
thumperperrito666: all the queries use the _id field23:46
thumperwhich is the same23:46
thumperand has the env-uuid prefixed23:46

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!