/srv/irclogs.ubuntu.com/2015/07/16/#juju-dev.txt

davecheneythumper: i've been staring at this one all afternoon, https://bugs.launchpad.net/juju-core/+bug/147505600:00
mupBug #1475056: worker/uniter/relation: HookQueueSuite.TestAliveHookQueue failure <juju-core:New> <https://launchpad.net/bugs/1475056>00:00
davecheneyit happens super reliably for me00:00
thumperdavecheney: with you shortly, writing big email00:00
davecheneythumper: that's ok00:00
davecheneyno action needed00:00
davecheneyjust letting you know 'cos I missed standup00:01
thumperdavecheney: ok00:06
davecheneyi'm a bit worried00:08
davecheneyi cannot see anything in the logic that the test actually guarentees00:08
davecheneyie, it's adding then removing a relatino00:08
davecheneyand hoping that happens fast enough that no events are generated00:08
davecheneythis is at best, a conincidence00:08
thumperhaha00:12
thumperthat's terrible00:12
perrito666davecheney: uff, that relies on the uniter being busy in a different path on the loop :|00:12
perrito666or not yet in it00:13
mupBug #1475056 opened: worker/uniter/relation: HookQueueSuite.TestAliveHookQueue failure <juju-core:New> <https://launchpad.net/bugs/1475056>00:27
wallyworldmenn0: is there a chance bug 1469077 is caused by the mgo/txn issue you are working on?00:32
mupBug #1469077: Leadership claims, document larger than capped size <landscape> <leadership> <juju-core:Triaged> <juju-core 1.24:Triaged> <https://launchpad.net/bugs/1469077>00:32
wallyworldit has been raised again as an issue00:33
wallyworldfor a 1.24 deployment00:33
menn0wallyworld: it's possible, not sure of the likelihood00:33
menn0wallyworld: do you know which collection has the out of control txn-queue fields?00:33
wallyworldnot yet00:34
menn0wallyworld: also note that I'm not working on that one yet00:34
menn0wallyworld: i'm currently dealing with bug 147419500:34
mupBug #1474195: juju 1.24 memory leakage <cpec> <deployer> <performance> <regression> <juju-core:Triaged> <juju-core 1.24:In Progress by menno.smits> <https://launchpad.net/bugs/1474195>00:34
menn0wallyworld: that's going well00:34
wallyworldyay00:34
wallyworldcannot resume transactions: document is larger than capped size 1326012 > 104857600:34
wallyworldis the only error i can see so far00:34
wallyworlddoesn't say what collection00:35
menn0wallyworld: yeah, you need to look at the DB to see where the problem is00:35
wallyworldok00:35
wallyworldhappens writing the lease token, so could be leadership related00:35
* perrito666 yells at bot00:36
davecheneythumper: which makes me wonder00:39
davecheneyshould I just delete the test ?00:39
davecheneythere cannot be code relying on this behavior00:40
davecheney'cos00:40
davecheneywell00:40
davecheneythe behaviour only exists in tests00:40
davecheneyin real life00:40
davecheneythere is no way this timing could exist00:40
menn0wallyworld: didn't we fix that problem already ... when we saw this before it was lease/leadership related too00:41
thumperdavecheney: what is it testing exactly?00:42
wallyworldmenn0: a fix was made to error if any concurrent change was made to leadership document. not sure though how the previous implementation or the current one would impact txn queue00:43
wallyworlds/leadership/lease00:43
menn0ok00:43
menn0I have no idea what's going on then00:44
wallyworldby error, i mean return with error rather than trying again00:44
wallyworldexit txn loop early00:44
wallyworldif anything, that should have helped the situation00:44
wallyworldso the bug was marked as incomplete00:44
wallyworldbut was recently reported as the issue still occurs :-(00:45
davecheneytest 0: Nothing happens if a unit departs before its joined is run00:46
menn0wallyworld: I think we need to point jam and fwereade  and this one00:49
menn0at this one00:49
wallyworldyeah00:49
wallyworldi'll ping them later00:49
axwwallyworld: would you PTAL at http://reviews.vapour.ws/r/2154/ ?01:05
wallyworldsure01:05
wallyworldaxw: looks ok, just a quibble01:09
axwwallyworld: ta01:09
menn0wallyworld, axw: is it important for there to be an assertion that the env is alive around createStorageOps?01:16
wallyworldyes01:16
menn0wallyworld, axw: I ask b/c it gets called as part of unit creation,  and we're trying to avoid that assertion when units are created01:16
wallyworldbecause storage costs $$01:16
axwwallyworld: yes, for persistent storage anyway. we don't want to destroy an environment while there's persistent storage around01:16
wallyworldwell some01:16
axwerr01:16
axwmenn0: :)01:16
wallyworldyes, just persistent01:16
wallyworldaxw: blonde moment - how could line 28 in this pastebin result in a nil pointer given that "ch" is used just above01:17
wallyworldhttp://pastebin.ubuntu.com/11885503/01:17
* axw looking01:17
menn0wallyworld: so yes if there's persistent storage involved?01:18
wallyworldmenn0: yeah01:18
axwwallyworld: ch.URL() dereferences the charmDoc.URL field01:18
menn0wallyworld: well that sucks b/c we can't fully remove this bottleneck then01:18
axwwallyworld: so if it's nil...01:18
wallyworldmenn0: we want to avoid provisioning machines / volumes etc that could cost the user01:18
menn0wallyworld: yeah I understand01:18
menn0wallyworld: what actually provisions the storage/01:18
menn0wallyworld: maybe we can block it there01:19
wallyworldaxw: sure, so why isn't the line number where the charm doc is then?01:19
wallyworldinside ULR()01:19
axwwallyworld: show me the panic?01:19
wallyworldmenn0: there's a storage provisioner01:19
wallyworldsimilar to machine provisioner01:20
wallyworldaxw: http://data.vapour.ws/juju-ci/products/version-2882/aws-upgrade-trusty-amd64/build-2233/machine-0.log.gz01:20
axwwallyworld: I feel like I'm missing something, that panic points to the MigrateCharmStorage function01:21
wallyworldaxw: yeah, it's in 1.2201:21
axwand not the state code01:22
wallyworldi have to move it01:22
wallyworldhad01:22
axwI see01:22
wallyworldbecause we needed to use the raw collection01:22
axwwallyworld: not entirely sure, possibly inlining?01:23
wallyworldyeah could be, weird though01:23
wallyworldhere's the new code https://github.com/juju/juju/blob/1.22/state/upgrades.go#L96401:24
wallyworldi'll do some digging01:24
axwyeah I found it, thanks01:24
wallyworldmenn0: did you find it?01:24
menn0wallyworld: yep I found the storage provisioner01:25
menn0wallyworld: it'll be a bit of work to add watching of env life in there01:25
wallyworldmenn0: it calls into state methods - may be able to modify one of those01:26
menn0wallyworld: I'll go for adding the assertion only for perisistent storage01:26
wallyworldaxw: ^^^ so if there's a EBS volume involved, that is bound to the machine, the above approach will be ok i think?01:27
wallyworldmaybe it sould assert the storage binding instead01:28
wallyworldor i mean do the assert if binding = env01:29
wallyworldbut wait, this is 1.2401:29
wallyworldso will be different01:29
axwyes I think that'll work. machines will prevent env death, so machine-bound storage will be fine01:29
wallyworldso do that for 1.2501:30
axwmenn0: why would you add env life watching?01:30
menn0axw: it's automatically added everywhere by the multi-env txn layer01:30
menn0axw: but that's created a massive perf bottleneck01:30
menn0so that's being ripped out01:30
menn0in favour of selectively adding it in a few key places01:30
menn0storage is one of those places01:31
axwmenn0: understood, by why does that mean adding a watcher?01:31
menn0b/c we don't want someone to be able to add storage to an env just has it's dying01:31
axwmenn0: the way things work atm with storage, we use cleanups to tirgger death of storage when the bound-to entity dies01:32
axwmenn0: so you destroy a machine with storage, then a cleanup is queued that destroys the attached storage01:32
menn0axw: but if storage is added as the env is dying and that txn takes a while to run the cleanup could miss it01:33
wallyworldaxw: that cleanup is only in master though from memory01:33
menn0axw: but I guess the machine or unit will be dead so the txn will probably still fail01:33
axwmenn0: the env can't die while there's still machines right?01:34
axwhrm01:34
* axw ponders01:34
wallyworldwe need a 1.24 solution too01:34
anastasiamacclear01:35
anastasiamacoops01:35
axwmenn0: if storage is added, then its life will be set to Dying by the cleanup regardless of whether it's been provisioned01:37
menn0axw: yes it can01:37
menn0axw: the first thing that happens is the env is set to Dying and then machines and everything else get killed off01:37
axwbut you're saying that the txn that adds the storage may happen after the cleanup...01:38
perrito666ah wonderful a test that only breaks when run non isolated....01:38
menn0axw: there's a slim chance that it could01:38
menn0axw: right now that's not possible because we have an automatically added env life assertion on almost all txns01:39
menn0axw: but that's going away01:40
menn0axw: seems like adding an extra check in the storage provisioners before it does anything might be sensible?01:40
menn0provisioner01:40
axwmenn0: sorry I mistyped before: the env can't be *removed* until there's no machines? i.e. it can go to Dying, but can't be Removed until the dependents  are gone?01:40
axwhrmph still doesn't really help01:41
menn0axw: yes that's right01:41
axwmenn0: we're going to have this problem with the machine provisioner too right?01:41
menn0no because the machine addition ops now include an explicit env life assertion01:42
menn0(but only for top level machines, not containers)01:42
axwmenn0: so why can't we do that in storage? they're no more plentiful than machine addition ops01:43
menn0we don't want that for units though because units are often added in huge bulk (this is where users are seeing the current bottleneck)01:43
menn0axw: b/c storage ops get added as part of unit addition01:43
axwmenn0: I think we could do it for machine storage (volumes, filesystems), but not storage instances01:43
menn0my unfamiliarity with storage is probably not helping here :)01:43
axwso do machines, except if you're using --to01:43
menn0axw: b/c I'm slow can you please summarise :)01:44
axwmenn0: if a charm requires storage, then adding a unit will add a "storage instance". that will cause the creation of either a volume or filesystem when the unit is assigned to a machine01:45
axwmenn0: a volume can be e.g. a loop device, or an EBS volume01:46
menn0ok01:46
axwmenn0: actually we never create storage without an accompanying machine, so if the machine is prevented due to env being Dying, then we're fine01:47
axwmenn0: the storage provisioner won't create a volume or filesystem until the due-to-be-attached machine is provisioned01:47
menn0axw: ok that sounds promising then01:47
menn0axw: I think you were hinting at this before, but what about when a unit is added to an already provisioned machine01:48
axwmenn0: so I think we can drop the env life checks in storage01:48
axwah yeah01:48
axw:|01:49
menn0axw: I guess the machine will be dying or about to die if the env is going down01:49
menn0axw: and that should clean up the storage?01:49
axwmenn0: it will... but only if the storage is bound to the machine. there's a concept of lifecycle binding, where storage is bound to either a unit/service, a machine, or the environment01:50
menn0axw: also, won't the storage provisioner itself die if the env goes to dying01:50
axwmenn0: currently we're fine because we always bind to either the unit, service or machine01:50
axwmenn0: there was an intention of binding storage to env initially if marked persistent though01:51
axwmenn0: I hope the worker would continue to run until the env is removed, not just Dying01:52
* menn0 checks 01:52
axwmenn0: otherwise the provisioner won't clean up any remaining things01:53
menn0axw: so it looks we're ok because there isn't a storage provisioner per env01:55
menn0axw: it's not run under the envWorkerManager01:55
axwmenn0: ok, cool01:56
menn0axw: the worker is up until the machine agent dies01:56
menn0the storage provisioner worker I mean01:56
* axw nods01:57
menn0axw: ok so it looks like we don't need env life assertions for the state stuff in storage then01:57
axwmenn0: so... I think we're ok unless/until we allow storage to be created that is bound to an env01:57
axwmenn0: currently not the case, so we're fine atm01:58
menn0axw: we can do the assert only for the case where storage is bound to the env01:58
axwyep, that should be fine01:58
menn0axw: which will be a fairly low frequency event I imagine so not a performance issue01:58
axwyes I think so01:59
menn0axw: thanks for your help01:59
axwmenn0: nps, thank you for fixing. sounds messy :)02:00
menn0axw: it is02:00
wallyworldaxw: found out why charm url is nil - serialisation changed between 1.20 and 1.22. which also means charm migration is broken in general and we didn't notice because migration function was never called02:00
axwwallyworld: :(02:01
wallyworldfixing now :-)02:01
thumpermenn0: based on axw's points above, we should at least get together to talk about environment destruction02:12
thumpermenn0, axw: because I feel that we hav some bad interactions02:12
thumperand I'd like to check02:12
menn0thumper: sure.02:47
menn0thumper: now?02:47
thumpernot just now, Rachel is arriving home shortly and I'll be stopping for coffee02:47
thumperbut perhaps in 30-40 minutes?02:47
menn0thumper: sure just let me know02:48
menn0thumper: with axw too/02:48
thumperaxw: have you got some time?03:16
thumperwaigani: how goes environment destroy?03:36
waiganithumper: merging cli command to jes-cli branch now.03:37
waiganithumper: and writing Will an email to review environ.Destroy branch03:37
thumperkk03:37
thumpercoolio03:37
waiganithumper: Will usually starts around 8, so I'll check in with him this evening and hopefully finish off / land tonight.03:39
thumpercool03:40
thumperwallyworld: any idea if master is capable of being blessed at the moment? or are there known failures?03:40
wallyworldthumper: not sure, i'd have to look at build logs03:41
wallyworldi don't know of any failures03:41
thumperthere was a windows issue at some stage03:41
thumperhas that all been fixed now?03:42
waiganithere's an open critical bug on 1.2503:42
waigani#146881503:42
mupBug #1468815: Upgrade fails moving syslog config files "invalid argument" <ci> <regression> <upgrade-juju> <juju-core:Triaged> <juju-core 1.24:Fix Released by ericsnowcurrently> <https://launchpad.net/bugs/1468815>03:42
* thumper sighs03:46
thumperwhy has it not been forward ported?03:46
menn0thumper, wallyworld: I have a likely fix to bug 1474195 ready... although I need to talk env destruction with thumper04:03
mupBug #1474195: juju 1.24 memory leakage <cpec> <deployer> <performance> <regression> <juju-core:Triaged> <juju-core 1.24:In Progress by menno.smits> <https://launchpad.net/bugs/1474195>04:04
wallyworldgreat04:04
thumperI'm waiting for axw before we talk destruction04:04
thumpermenn0: I can look at the fix if you like04:04
menn0thumper: pushing now04:05
menn0thumper: https://github.com/juju/juju/pull/280104:11
* thumper looks04:11
thumpermenn0: for the machine insertion04:16
thumpermenn0: does that method also do the containers04:16
thumper?04:16
thumperor is there a different one to add containers04:16
thumperas I thought we were going to skip the alive assertion for containers04:17
menn0a different one does containers04:17
thumperkk04:17
menn0see the docstring at the top of the method I added the assert to04:17
menn0wallyworld, thumper: any tips of debugging an lxc container that is stuck in "pending"?04:18
menn0I can't ssh to it04:18
menn0and lxc-console gives me nothing04:18
wallyworldmenn0: the logs are available locally04:18
thumpermenn0: look here: /var/lib/juju/containers/...04:18
wallyworld/var/lib/lxc/blah/root04:18
wallyworldthen look at cloud init logs04:19
thumperand also where wallyworld said04:19
menn0wallyworld: thanks, i'll look there04:19
thumperthe cloud init logs are in the /var/lib/juju/containers dir04:19
thumpermenn0: shipit04:19
menn0thumper: what about your concerns?04:20
thumperthis branch doesn't touch the concerns I have04:20
menn0ok great04:20
thumperany bad thing we are doing, we are already doing04:20
thumperwhich is why I think we need to talk to axw about environment destruction of hosted environments04:21
thumperbecause we are going "bullet to the head" on all the machines, then removing all the docs04:21
thumperwhat impact is this going to have for any attached storage04:21
menn0thumper: I want to do some manual performance comparisons and if it looks like things are faster then I'll merge04:21
menn0thumper, wallyworld: this appears to be why that container didn't start: http://paste.ubuntu.com/11886010/04:24
menn0any clues?04:24
thumperI'm guessing this line: WARN     lxc_start - start.c:signal_handler:307 - invalid pid for SIGCHLD04:24
thumperNFI why though04:24
* menn0 is googling04:25
wallyworldmenn0: yeah, NFI sorry04:29
menn0this looks like the bug (a race) but it was fixed in lxc 1.0.0-alpha2: https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/116852604:30
mupBug #1168526: race condition causing lxc to not detect container init process exit <bot-stop-nagging> <linux (Ubuntu):Confirmed> <lxc (Ubuntu):Fix Released> <https://launchpad.net/bugs/1168526>04:30
thumpermenn0: what version of lxc do you have?04:37
menn0thumper: 1.1.2 (stock vivid)04:38
menn0(as far as I know)04:38
thumperso not fixed released then...04:38
menn0thumper: ?04:38
thumpermenn0: try #lxcontainers04:38
menn0thumper: I will04:39
thumpermenn0: because it is happening to you...\04:39
menn0out of 10 containers 1 failed04:39
menn0but this happened earlier today and yesterday as well04:39
thumperyeah, but we create a lot of containers04:39
thumperwallyworld: master curse seems to be : bad record MAC, mongo not coming up, and intermittent failure collecting metrics in the uniter suite04:48
wallyworldsigh04:48
wallyworldthose would all be intermittent right04:49
thumperyup04:49
wallyworldi'll look at the logs when i can04:50
* thumper heading off until meeting later tongith05:16
wallyworldaxw: could you look at http://reviews.vapour.ws/r/2181/ when you get a chance? it looks larger than it is because i reverted the move done previosly06:43
axwwallyworld: ok06:51
wallyworldty06:52
axwwallyworld: LGTM06:55
wallyworldty06:56
mupBug #1475163 opened: when the uniter fails to run an operation due to an error, the agent state is not set to "failed" <juju-core:Triaged by wallyworld> <juju-core 1.24:In Progress by wallyworld> <https://launchpad.net/bugs/1475163>07:13
wallyworldaxw: and one more sorry http://reviews.vapour.ws/r/2184/07:22
axwwallyworld: reviewed07:42
axwwallyworld: machine provisioning and hook errors are a bit different: they're coming from the IaaS provider and the hook execution respectively. Maybe I misunderstood, but it sounded like these errors might include, say, errors talking to the API server08:02
wallyworldaxw: yeah, could be those. i think your idea not to include is good08:03
wallyworldfixing patching will be more work, but i have soccer now so will do later08:03
axwwallyworld: ok. I have to go out soon anyway, so will check later08:04
jamfwereade: dimitern: standup ?09:03
mupBug #1475212 opened: Environment destroy can miss manual machines and persistent volumes <juju-core:New> <https://launchpad.net/bugs/1475212>09:11
jamfwereade: so I'm supposed to be in a call now, but he's not arrived yet. So on the concept of Token being resuable...10:01
dimiterndooferlad, TheMue, fwereade, jam, sorry guys for missing standup - I had to renew my car insurance in the morning, but it took more time than expected :/10:01
jamdimitern: no worries10:02
dooferladdimitern: jam just said my standard response, so ^^10:02
fwereadejam, listening10:02
dimiternI've discovered yesterday after wasting almost a full day, that when running go test with both -race and -cover (or -coverprofile=) *itself* leads to races!10:03
TheMuedooferlad: hehe, maybe the number of calls will get negative when passing a black hole. but you're right, a bool flag would be enough10:04
dooferladdimitern: well, that sucks10:04
dimiternsupposedly fixed in go 1.3+, can be worked around by adding also -covermode=atomic (which is the default behavior in 1.3+)10:04
dooferladTheMue: I was more thinking about uint10:04
dooferladTheMue: but yes, types matter and sometimes we live with inappropriate choices10:05
perrito666Morning10:05
dimiterni'll send this to juju-dev as well, just in case I can save somebody else the same experience10:05
dimiterns/the same/from the same/ even10:06
jamfwereade: he showed, sorry. I did want to overview of how I felt tokens should work.10:06
fwereadejam, just braindump whenever you get the chance :)10:07
dimiternTheMue, hey10:37
TheMuedimitern: heya10:38
dimiternTheMue, didn't we discuss using bulk client-side api calls for the addresser?10:38
dimiternTheMue, like RemoveIPAddresses taking params.Entites and returning params.ErrorResults, error, rather than forcing the worker to remove them one by one?10:38
TheMuedimitern: have to take look in my notes10:39
TheMuedimitern: we talked about where the "work" has to be done when I suggested that e'thing could be done via one call on server-side10:40
dimiternTheMue, I don't insist on doing it now (just the addresser using api instead of state is already a big improvement, esp. around the entity watcher), but it seems to me it will be slightly better10:41
mupBug #1455628 changed: TestPingTimeout fails <ci> <intermittent-failure> <lxc> <test-failure> <unit-tests> <vivid> <juju-core:Triaged> <https://launchpad.net/bugs/1455628>10:41
mupBug #1456726 opened: UniterCollectMetrics fails <ci> <tech-debt> <juju-core:Triaged> <juju-core 1.22:Triaged> <https://launchpad.net/bugs/1456726>10:41
TheMuedimitern: when I asked why we need an API usable only for the worker and providing its calls10:41
dimiternTheMue, that's what *all* our apis are doing anyway :)10:42
anastasiamacdimitern: tyvm for being adventurous and running tests with 2 flags not one :D10:42
dimiternTheMue, however I see your point - we should (re)use better defined api interfaces across multiple workers/etc.10:43
TheMuedimitern: and I oriented at your instancepoller, which is acting on one machine each too10:43
dimiternanastasiamac, I'm even using -check.v :D10:43
anastasiamacdimitern: \o/10:43
TheMuedimitern: that's why I implemented the IPAddress(Proxy) as type10:43
TheMuedimitern: but n.p., I simply can change it, one tine missed gofmt dislikes my try to merge, hehe10:44
TheMuetiny10:44
dimiternTheMue, yes, as it was easiest to do - gradual improvement, over using state directly, but from design perspective we can do better for such workers that makes more sense to batch multiple ops in a single api call10:45
TheMueand I thought I ran my pre-commit check *grmfplx*10:45
dooferladTheMue: your git client doesn't auto-run the pre-commit hook?10:46
dimiternTheMue, so I suggest you go ahead and still land this (if you can perhaps add a TODO somewhere in the code we can improve the behavior by using bulk calls)10:46
TheMuedooferlad: different environment here, as you know. script integration didn't work, so I integrated it into my jdt (juju development tool)10:47
TheMuedimitern: ok, will do so10:47
dimiternTheMue, cheers10:48
dooferladTheMue: Clearly you need to switch clients :p10:48
TheMuedooferlad: it's not the client, it's more complex. will show you when having our next meeting.10:49
dooferladdimitern: is the logic behind addSubnetsCache just to speed things up? Isn't state fast enough and the canonical source of information?10:51
dimiterndooferlad, the main reason for its existence is to improve the case when multiple subnets are added in the same API call10:53
dimiterndooferlad, so I guess it might be actually moot if we don't allow users to add multiple subnets with the CLI (unless we add an "import these subnets definitions as a batch" thing, which was discussed at some point)10:55
dooferladdimitern: if we have the ability at some point to dump the output of juju status to a file, then load that back, then yes we will benefit.10:56
=== psivaa is now known as psivaa-afk
dimiterndooferlad, ewww.. yeah, I got your point :) but we'll have state deltas before that happens most likely10:57
dooferladdimitern: I mostly don't like caches because if somebody does something unexpected to what they are caching you can have "fun" finding bugs. In this case though, I was looking at it in terms of what I needed to do for space create.10:59
dimitern(just imagined having to parse a moving target like the status yaml output)10:59
dooferladdimitern: which seems to be, not caching.10:59
dimiterndooferlad, for space create I don't think you need to do it the same way11:00
dooferladdimitern: +111:00
dimiterndooferlad, I've realize addSubnetsCache now looks totally over-engineered to me :/11:00
dooferladdimitern: well, I am sure it was fun engineering, so I am not worrying!11:01
dimiterndooferlad, you bet :)11:02
alexisbfwereade, jam leads call11:05
mattywTheMue, not tried lfe yet, but it's on my list of things to try11:20
TheMuemattyw: it has a nice approach for lisplers, but it never will get a larger community *sigh*11:21
TheMuedooferlad: btw, just found why my pre-commit failed. only one missing line11:22
mattywTheMue, I'd love to have sessions at sprints where we can just hack on stuff11:22
mattywTheMue, maybe we should make the time this sprint11:22
TheMuemattyw: definitely would raise the experience with different approaches, avoiding to get routine-blinded11:24
perrito666morning all12:14
thumpercrap, perrito666 is back, time to go12:14
mupBug #1475271 opened: Intermittent test failure UniterSuite.TestUniterCollectMetrics <intermittent-failure> <test-failure> <juju-core:Triaged by cmars> <https://launchpad.net/bugs/1475271>12:17
perrito666I see thumper does the same I do to figure EOD12:20
perrito666has anyone noticed we are getting curses for no space left on device? mgz sinzui ?12:28
mgzyeah, I see the vivid build failing12:29
mgzwe have tests running in the current still though, so I was not in a rush to retest12:35
jamfwereade: ok. pie in the sky how tokens work feels like you would get the token at Auth checks, and then apply that token to each process you do. I feel like token failures are the sort of thing that wouldn't need to be retried if we knew they were the cause of the failure.12:57
fwereadejam, right12:58
jamFor example, if I was leader, and I said X, then I failed to be the leader for a while, then I was leader *again*, my original X should actually be invalid.12:58
fwereadejam, we could implement it like that but I'm not sure I think it's good12:58
fwereadejam, tokens have to be reusable anyway12:59
fwereadejam, other ops will cause ErrAborted12:59
fwereadejam, next time through the buildTxn func we need to check again12:59
jamfwereade: everything causes ErrAborted right? So we can't distinguish the why12:59
fwereadejam, but we have to distinguish why12:59
fwereadejam, hence the form of Runner.Run()12:59
fwereadejam, refusing to check again once a token's failed might be an interesting optimisatioon13:00
fwereadejam, but not relevant for my purpposes because I'll be returning the error as soon as I get one13:01
jamfwereade: so I don't quite see how Token.Read() isn't reusable.13:01
sinzuiperrito666: I just woke up and yes I am disapointed. The machine only had to live for 4 more days13:01
fwereadejam, it's a sinngle snapshot of past state13:01
fwereadejam, unless it's able to get fresh state and return an error, it will push everyhing into ErrExcessiveContention13:02
fwereadejam, by returning the same (failing) txn ops13:02
fwereadejam, always corresponding to the reality-check that's now several cycles inn the past13:02
jamfwereade: so is the use case that my leadership cert expired and I renewed it?13:02
fwereadejam, it is to catch the situation when the leadership lease expires and is removed while some other component is running a txn that depends on it13:03
fwereadejam, that other component (should!) have the looping form, in which it starts off using recent state from db or memory, interrogates that state for reasons to fail, then packages it up as asserts and sends it on to execute13:05
fwereadejam, the txn fails13:05
fwereadejam, what went wrong?13:05
fwereadejam, we need to read current leadership state to be able to pin it on that13:05
fwereadejam, sane?13:05
jamfwereade: so I agree that we want to be able to read the current state at some point, but I worry that we'll read the current state and apply it as the new "its ok to do this as long as this holds true"13:06
jamfwereade: so you want *a* token that says "the person who is making this request is the current leader"13:06
fwereadejam, no13:07
fwereadejam, I want a token that will, on request, tell me whether a unit is leader13:07
fwereadejam, existence of a token implies nothing13:07
fwereadejam, Check()ing a token implies that the fact the token is attesting to was recently true13:08
fwereadejam, passing an out ptr into check gives you a very specific tool that allows you to check whether it still holds true inn the future13:08
jamfwereade: so your Token interface only has Read()13:08
fwereadejam, sorry, I renamed it Check13:08
fwereadejam, otherwise the same13:08
fwereadejam, and those still-hold-in-the-future things are critically important; but yes, I don't know how best to encourage people to use mgo/txn correctly :(13:10
jamfwereade: so I think you're saying that Auth wants to return a Checker (and possibly calls it one time), but that the Checker is part of the inner loop13:10
fwereadejam, yeah13:10
fwereadejam, the initial call is technically redundant, am undecided, leaning towards not having it13:11
fwereadejam, most/all the actual use of the Token will be inside state13:11
jamfwereade: from an Auth func it is nice to fail early13:11
jamSetStatus failing immediately with "you're not the leader" rather than waiting until it goes to update the DB with an actual change?13:12
fwereadejam, agreed, there are forces pushing both ways :)13:12
fwereadejam, it won't try to run a txn...13:12
fwereadejam, I contend that constructing a txn is much cheaper than running one13:12
jamfwereade: I certainly agree that stuff in memory vs once you've written it to the DB13:13
fwereadejam, so what it will do is one up-to-date leadership check, and then hand over the ops representing it13:13
jamit seems a little funny to have something like GetAuth not actually have checked your auth on the assumption that once you've actually processed the request you'll have finally checked they're allowed.13:13
fwereadejam, point taken, but I think it follows from the mgo/txn dependency13:15
fwereadejam, technically, any auth that isn't checked *at txn time* is leaky13:16
fwereadejam, when working in state we just have to ...embrace the madness, and use the techniques that are reliable in this context :)13:17
fwereadeperrito666, http://reviews.vapour.ws/r/2185/ ?13:23
fwereadeperrito666, and whatever the other branch is13:23
fwereadeperrito666, does statusDoc have txn-revno or txn-queue fields?13:23
perrito666fwereade: arent those added by txn?13:25
fwereadeperrito666, yes13:25
fwereadeperrito666, unless you have those fields specified in your doc, [$set, doc] is fine13:26
perrito666fwereade: sorry I got distracted by watching a singer call ladybeard... odly hipnotizing13:26
fwereadeheh13:26
fwereadegood name :)13:26
perrito666bearded man in japanese 5yo girl costume singing metal version of jpop songs, amazing13:27
perrito666fwereade: this attacks the immediate issue with envuuid for this particular collection while a better fix is being worked for envuuid auto adding on Updates13:28
fwereadeperrito666, what makes you believe it changes anything?13:29
fwereadeperrito666, you have inserted a comment that is a straight-up lie13:29
perrito666oh?13:29
fwereadeperrito666, https://bugs.launchpad.net/juju-core/+bug/1474606/comments/113:30
mupBug #1474606: Document replacements using $set are problematic <juju-core:Triaged by menno.smits> <juju-core 1.24:Triaged by menno.smits> <https://launchpad.net/bugs/1474606>13:30
perrito666it is a partial lie, if I insert that doc as is it wipes envuuid13:30
fwereadeperrito666, ok, so you're saving a doc with an empty env-uuid field13:31
fwereadeperrito666, why do you not know the env-uuid?13:32
fwereadeperrito666, ohhh, right13:32
perrito666fwereade: I might need to change the var name so the comment is not confusing13:32
perrito666do not insert That doc13:32
perrito666:)13:32
fwereadeperrito666, this just makes me more adamant that it's the leavy multiEnv stuff that is the problem13:32
fwereades/leavy/leaky13:32
fwereadeperrito666, ok, so13:33
fwereadeperrito666, that comment is certainly not accurate re txn13:33
fwereadeperrito666, and re env-uuid13:35
fwereadeperrito666, can we not just drop the dependency on the env-uuid field and take them off all the doc structs?13:35
perrito666fwereade: I honestly do not know, I wouldn't think so13:38
fwereadeperrito666, well, we definitely can13:38
fwereadeperrito666, it's more "should we"?13:39
fwereadeperrito666, and the more I think the more I think "yes of course we should, it would take a day at the outside"13:40
fwereadeperrito666, counterpoint?13:40
fwereadeperrito666, which might mean 3 days in practice13:41
fwereadeperrito666, but how much dev time have these sorts of issues cost us already?13:41
* perrito666 sits like a rubber ducl13:44
perrito666duck13:44
fwereadeperrito666, haha13:45
fwereadeperrito666, so looking through state for EnvUUID it really doesn't seem like it's even used most of the time13:45
fwereadeperrito666, it exists only for the convenience of the multi-env layer13:46
fwereadeperrito666, but it also breaks the multi-env layer because you have to pay attention to that field all the time13:46
fwereadeperrito666, so13:47
fwereadeperrito666, if the multi-env layer just converted *everything* into bson.D *before* rewriting13:48
fwereadeperrito666, no more need for the fields13:48
fwereadeperrito666, right?13:48
fwereadeperrito666, there may be a couple of relevant fields we should keep13:49
fwereadeperrito666, but they're very much the minority13:49
fwereadeperrito666, quack. quack quack?13:49
fwereadeperrito666, and then we'd be able to insert docs that weren't pointers13:51
fwereadeperrito666, and we wouldn't have that scary surprising leakage out to the original docs either13:51
* perrito666 re-reads13:52
fwereadeperrito666, (and my lease stuff would Just Work without having to know it's in a multi-env collection, too)13:52
perrito666fwereade: ok a couple of things13:52
perrito6661st are you sure no one is working in anything whatsoever heavily dependent on this?13:52
perrito6662nd, even though I believe in the empirical proof you showed me, on the original discussion about txn a linke arose http://stackoverflow.com/questions/24455478/simulating-an-upsert-with-mgo-txn/24458293#24458293 which has gustavo saying it shouldn't13:54
fwereadeperrito666, that's my reading of it; I see 21 uses of .EnvUUID in state, and most of them are irrelevant13:54
perrito666I was rather wondering about work in process13:55
fwereadeperrito666, in that link, where does gustavo suggest you shouldn't $set a struct?13:58
wwitzel3axw: I'll handle the forward porting of that issue13:59
wwitzel3axw: well, the patch to master that is14:00
perrito666fwereade: the final paragrah seems to be implying it14:00
fwereadeperrito666, (1) "you can set every field in a value by offering the value itself to $set"14:02
fwereadeperrito666, (2) "If you replace the whole document with some custom content, these fields will go away"14:03
fwereadeperrito666, they are talking about different situations14:03
perrito666fwereade: I see14:03
perrito666that might have caused the missunderstanding14:03
fwereadeperrito666, yeah, it could be clearer14:05
fwereadeperrito666, in particular it *is* dangerous to do a $Set with any of our doc tyypes that include a txn-revno14:06
fwereadeperrito666, so we do need to keep an eye out for that14:06
fwereadeperrito666, but that's more a matter of watching the doc definitions, and only allowing TxnRevno when it's *really* necessary, and commenting it clearly14:07
=== natefinch is now known as natefinch-afk
fwereadekatco, do you have any time to review http://reviews.vapour.ws/r/2186/ ?14:34
katcofwereade: today is my meeting day :(14:34
fwereadekatco, ah bother, not to worry14:35
jamfwereade: I've been reading through https://pubsubhubbub.googlecode.com/git/pubsubhubbub-core-0.4.html and it doesn't feel like a great fit, as when you subscribe to a topic you pass an HTTP callback URL. We could do that internally but it does feel a bit odd. Certainly I don't really expect to have general routing back to a client outside of the current connection.14:41
katcojam: get in touch with https://github.com/go-kit/kit. they are actively soliciting feedback on features like this14:43
fwereadejam, agreed14:44
katcojam: doh, nm: in the "Non-goals" Supporting messaging patterns other than RPC (in the initial release) — pub/sub, CQRS, etc14:44
jam:)14:44
jamfwiw, I rather like https://github.com/grpc/grpc14:44
jambut it feels like we're rewriting our communication infrastructure a bit too much at that point.14:44
davecheneyjam i agree14:44
jamthere is https://godoc.org/google.golang.org/cloud/pubsub which is less about the HTTP aspects14:45
jamthough IIRC it is strictly a client for Google's cloud pub/sub and not a server implementation.14:45
fwereadejam, btw, 2172 has been superseded by reviews.vapour.ws/r/2186/ which has new-style Token14:50
fwereadejam, so, yeah, doesn't sound like very rich pickings14:51
fwereadeperrito666, LGTM14:55
mupBug #1475341 opened: juju set always includes value when warning that already set <juju-core:New> <https://launchpad.net/bugs/1475341>14:56
perrito666fwereade: it sounds like a more sincere comment :)14:57
* perrito666 is tempted of lunching a happy meal just to get a new minion toy14:57
fwereadeperrito666, can I hit you up for a review on reviews.vapour.ws/r/2186/ please?14:58
* perrito666 looks14:58
fwereadeperrito666, it's just a rework of the leadership interfaces such that my stuff and katco's has matching interfaces (well, at least they both implement CLaimer)14:59
fwereadeperrito666, cheers14:59
* perrito666 sees the lenght of the review and realizes hit was quite literal :p14:59
alexisbdavecheney, what part of the world are you in right now?15:02
* perrito666 tries to aquire a second monitor of the same model than the one he has and notices the price is the exact double of what he paid less than a year ago :p inflationary countries are fun15:03
katcowwitzel3: 1:115:04
davecheneyalexisb: san fran15:04
davecheneydamnit, i missed the opporutunity to say i was omnipresent15:05
alexisbheh15:06
* perrito666 looks over his shoulder just to make sure davecheney isnt15:06
davecheneyi'm watching, always watching15:06
fwereadeperrito666, it's almost all renames15:06
perrito666fwereade: ?15:06
perrito666ah the review15:07
fwereadeperrito666, the big review15:07
fwereadeperrito666, probably start with leadership/interface.go15:07
perrito666I would kill for threaded conversations on irc15:07
fwereadeperrito666, sorry, I should have said that in the blurb15:07
perrito666fwereade: for starters I would like the pr description to say more why than what15:14
perrito666by reading the code I can assert that you did exactly what that list of changes say, but I am not sure Ill be able to say what is the end result of it.15:15
fwereadeperrito666, heh, good point15:22
sinzuiperrito666: is bug 1474606 fix committed in 1.24?15:29
mupBug #1474606: Document replacements using $set are problematic <juju-core:Triaged by menno.smits> <juju-core 1.24:Triaged by menno.smits> <https://launchpad.net/bugs/1474606>15:29
perrito666sinzui: no, just a partial for 1.24 and master15:32
sinzuithank you perrito66615:32
perrito666sinzui: that is why I did not change anything on it15:32
TheMueso /me says goodbye, daughter has graduation ball today *proud-daddy-mode*16:16
dooferladTheMue: congratulations to you both!16:17
perrito666TheMue: congrats man :) have fun16:17
wwitzel3ericsnow: ping16:20
ericsnowwwitzel3: hey16:20
wwitzel3ericsnow: hey, is there anything you think you can break off of what you are doing or should I look in to destroy?16:20
ericsnowwwitzel3: halfway through this yak :/16:21
ericsnowwwitzel3: so maybe you had better16:21
ericsnowwwitzel3: it will depend on my state patch16:21
bdxhello everyone16:46
bdxcore: anyone familiar with this error showing up in the cloud-init-output.log on bootstrap node?16:47
bdxcore: 2015-07-16 16:43:47 ERROR juju.cmd supercommand.go:430 relative path in ExecStart ($MULTI_NODE/usr/lib/juju/bin/mongod) not valid16:47
bdxthen 2015-07-16 16:43:47 ERROR juju.cmd supercommand.go:430 failed to bootstrap environment: subprocess encountered error code 116:48
bdxand bootstrapping fails after16:48
bdxgrr16:48
perrito666ericsnow: ping?17:01
ericsnowperrito666: hi17:01
perrito666hi :D17:01
perrito666hey, are you still the reviewboardmonger?17:02
ericsnowperrito666: depends on what you need :)17:02
perrito666I was wondering if I could see the logs for rb, I find the javacript for comment/response textbox failing too often and the browser console says its an api call failing to respond17:03
perrito666also the js might need to be uncompressed17:03
perrito666some paths fail with a syntax error17:04
ericsnowperrito666: its the reviewboard service in the juju-ci4 env17:05
ericsnowperrito666: I can take a look but not quite yet17:05
perrito666no hurry just had the issue while we are in the same TZ so didnt want to let it pass17:05
mupBug #1475386 opened: unit not dying after failed hook + destroy-service <juju-core:New> <https://launchpad.net/bugs/1475386>17:12
=== natefinch-afk is now known as natefinch
rick_h_NOTICE: jujucharms.com is having a webui outage due to a failed redis. Charm deploys should work as normal and the API is available.17:14
mupBug #1475386 changed: unit not dying after failed hook + destroy-service <juju-core:New> <https://launchpad.net/bugs/1475386>17:21
natefinchbdx: I think the problem is that $MULTI_NODE is not getting expanded17:21
natefinchbdx: or not set or set weirdly17:22
natefinchbdx: kind of a terrible error message, sorry about that17:23
mupBug #1475386 opened: unit not dying after failed hook + destroy-service <juju-core:New> <https://launchpad.net/bugs/1475386>17:24
=== psivaa-afk is now known as psivaa
rick_h_NOTICE: jujucharms.com webui is back up17:37
davechen1yrick_h_: \o/17:48
natefinchsinzui: I'm trying to reproduce https://bugs.launchpad.net/juju-core/+bug/1471657   but when I try to get juju's code on stilson-07  I get this error:17:51
natefinchfatal: unable to access 'https://code.googlesource.com/google-api-go-client/': Received HTTP code 403 from proxy after CONNECT17:51
natefinchseems like it must be a proxy/firewall issue?17:51
mupBug #1471657: linker error in procsPersistenceSuite unit test on ppc64 <ci> <ppc64el> <test-failure> <unit-tests> <juju-core:Triaged> <juju-core feature-proc-mgmt:Triaged> <https://launchpad.net/bugs/1471657>17:51
sinzuinatefinch: those machines are on a private network. They cannot access google or aws, or hp or joyent. They cann access canonistack. I think you need to move to another machine17:53
natefinchsinzui: I'll take whatever PPC machine is available, I just knew how to connect to those.  Is there a different PPC machine I can use that has connection to the public internet?17:54
sinzuinatefinch: those are the only ones, and they have special access . all others are more restricted17:54
davecheneynatefinch, yes, you'll have to raise an RT to get that firewall exception17:55
davecheneyor you could just scp in the code from your machine17:55
davecheneythat's what I do17:55
sinzuiyep, I do that all the time17:56
natefinchdavecheney: yeah, that was going to be my next thought - scp.  I just igured, since so much of the rest of it worked, the fact that one random url didn't work seemed like more of a bug than intentional17:56
natefinchdavecheney: or maybe none of it worked and that's just the first leaf package to try to download.  I didn't actually check17:57
davecheneyjust part of life behind the firewall17:57
davecheneythis is a new dep for google gae17:57
natefinchdavecheney:  I see17:57
natefinchdavecheney: are you still in the US?  I presume you're not awake back home at this time of night17:58
sinzuinatefinch: I had a day last week spent taring, scping, untarring, go testing :( this situation is also true for our one machine that can run maas17:58
davecheneysadness17:59
sinzuidavecheney: natefinch There is a plan to add ppc64el to canonistack. That might fix this situation18:00
katcowwitzel3: how's that doc coming?18:19
wwitzel3katco: good, I think we have a couple ideas18:20
katcowwitzel3: mind if i tal?18:20
wwitzel3katco: shared the doc with you, which is pasted irc logs in to, I haven't distilled anything yet, so haven't given any structure to the doc18:23
katcowwitzel3: hrm. worried that this might be too complicated for a demo18:26
wwitzel3katco: ok18:29
katcowwitzel3: to give you some kind of idea. wallyworld's storage demo was bringing up postgres with external storage and then showing the contents of the external storage (i think)18:29
katcowwitzel3: cool idea would be cool, but i don't want it to be anything so elaborate i mess it up and don't know enough about the charms to fix it18:30
katcoericsnow: did you create a bug to track the OVA images card?18:32
ericsnowkatco: #146838318:33
katcoericsnow: ty... and is there an email i can piggy-back off to email ben?18:33
ericsnowkatco: not really18:33
katcoericsnow: the remaining wpm cards are created?18:36
ericsnowkatco: not yet18:37
katcoericsnow: wwitzel3: we need to be ready to go over the demo and how to get there by tomorrow18:39
ericsnowkatco: k18:39
wwitzel3katco: ok, in that case, updated the doc18:48
katcowwitzel3: simple, love it :p18:48
katcowwitzel3: not that i'm not *very* interested in what whit et. al. are working on (i.e. real-world use-cases)18:49
katcowwitzel3: but for demo, just need proof that it works18:49
katcowwitzel3: it would be cool to have a 2nd demo in case i'm feeling ambitious, if they have something ready to go18:50
katconatefinch: 1:119:05
natefinchkatco: oops, sorry, coming19:08
katcoericsnow: can you take a look at requirements section here: https://docs.google.com/document/d/1etgWYADQHVSY_yT5rd-_DqPXBNUIWYBj-z8-Cpxc2-U/edit#heading=h.u3tics2c141k19:19
katcoericsnow: and update with what else needs to be done?19:19
ericsnowkatco: sure19:19
katcowwitzel3: also, do we need a mysql component there as well to prove they can talk to each other?19:20
katcowwitzel3: whoop nm looks like that's there isn't it19:21
cmarsnatefinch, can you please take a look at http://reviews.vapour.ws/r/2188/ ?19:54
cmarsnatefinch, it's passing on hyperv19:55
cmarsand linux of course ;)19:55
natefinchcmars: np20:03
cmarsnatefinch, ty20:03
natefinchcmars: gah... whoever wrote ReplaceFile did it backwards :/20:03
natefinchcmars: Go standard is foo(dest, src)20:04
natefinchto mimic a = b20:04
cmarsnatefinch, i noticed that20:04
natefinchcmars: well, there's no fixing it now, I guess.20:04
cmarsnatefinch, that'd be a heavy lift20:04
cmarsnatefinch, os.Rename is kind of the same way though, http://golang.org/pkg/os/#Rename20:05
natefinchcmars: huh, weird, yeah20:05
natefinchcmars: probably written before they settled on the other scheme.  Oh well. Better to be consistent.20:06
cmarsnatefinch, i should return proper os.LinkErrors.. i'll fix that20:10
natefinchcmars:  reviewed20:10
cmarsnatefinch, thanks!20:10
natefinchcmars: welcome.  Anything to avoid working on this ppc bug ;)20:11
mupBug #1475425 opened: There's no way to query the provider's instance type by constaint <juju-core:New> <https://launchpad.net/bugs/1475425>20:42
davecheneythumper: sorry i'm on another call20:53
perrito666wallyworld: you are a bit frozen21:06
* perrito666 hums let it go to wallyworld 21:07
mupBug #1475056 changed: worker/uniter/relation: HookQueueSuite.TestAliveHookQueue failure <juju-core:New> <https://launchpad.net/bugs/1475056>21:12
perrito666wallyworld: time to get a new modem?21:17
wallyworldperrito666: maybe, trying to join again now21:17
wallyworldperrito666: except now chome hates me21:18
katcocherylj: still there?21:49
davecheneythumper: sorry i missed the standup21:55
davecheneywas on another call21:55
davecheneywrt the arm issue21:55
davecheneyis there a maas install that I can use to reproduce it21:56
davecheneywallyworld: you were trying to get access to the system ?21:56
davecheneydid you succeed ?21:56
wallyworlddavecheney: i didn't succeed, but maybe that's just me. there's access instructions in the bug21:57
thumperdavecheney: if you aren't able to get access through the instructions in the bug, try bugging the hyperscale time, Andrew Cloke or Sean21:58
alexisbdavecheney, Sean specifically said he would provide any access needed21:59
alexisbso we should hold them to that21:59
davecheneyare we talking about the same bug ?21:59
davecheneythere is nothing in the issue21:59
davecheneyhttps://bugs.launchpad.net/juju-core/+bug/141551721:59
mupBug #1415517: juju bootstrap on armhf/keystone hangs <armhf> <bootstrap> <hs-armhf> <juju-core:Confirmed> <https://launchpad.net/bugs/1415517>21:59
alexisbdavecheney, that is the one i am thinking of21:59
davecheneyare the instructions like hidden or something ?22:00
thumpercmars: what are the two return values of utils.MoveFile ?22:02
thumpercmars: or more specifically, why are you checking the ok value if err != nil?22:03
thumpercmars: isn't it more idiomatic go to not expect any other value to have meaning if err is not nil?22:03
thumpercmars: nm, went and read the source22:09
wallyworlddavecheney: damn connection problems today, not sure if you saw last messages22:12
davecheneynope22:16
davecheneyi kept saying "I'm not sure what access details you are seeing in that issue -- i cannot see them "22:17
wallyworld[08:09:58] <wallyworld> davecheney: the issue is that state server jujud process dies on arm22:17
wallyworld[08:10:16] <wallyworld> they can run workloads, but not state servers22:17
davecheneyok22:17
wallyworldte jujud process just disappears22:17
davecheneydmesg ?22:17
wallyworldi've asked for stuff like that22:17
wallyworldi think they want us to ssh in22:18
davecheneyok22:18
wallyworldand see for ourselves22:18
wallyworldthere's a whole maas cluster22:18
davecheneyok22:18
wallyworldyou need to use the vpn22:18
davecheneyfuk22:18
davecheneythat won't work from where I am22:18
wallyworldi can get http access to maas, but maas rejects my ssh attempts22:19
wallyworldand i known nothing about arm22:19
davecheneythis is linux22:19
davecheneythis is user space22:19
davecheneyit won't be arm specific22:19
wallyworldtrue, also not my specialty :-(22:20
wallyworldlow level systen stuff22:20
davecheneyi'm not sure what the next step is22:20
davecheneyx wants us to do y22:21
davecheneywe've trued y22:21
davecheneyit didn't work22:21
davecheneyhow can we break the stalemate22:21
wallyworlddidn't work for me. i've asked them to attach any post mortem and relevant info to bug22:21
davecheney+122:21
davecheneyi22:21
davecheneym subscribed to the bug22:21
wallyworldi may need to poke them again22:22
mupBug #1466087 changed: kvmBrokerSuite TestAllInstances fails <ci> <test-failure> <juju-core:Incomplete> <juju-core devices-api-maas:Triaged> <https://launchpad.net/bugs/1466087>22:27
mupBug #1474291 changed: juju called unexpected config-change hooks after read tcp 127.0.0.1:37017: i/o timeout <hooks> <openstack> <sts> <uosci> <juju-core:Invalid> <ceilometer (Juju Charms Collection):New> <https://launchpad.net/bugs/1474291>22:52
mupBug #1475386 changed: unit not dying after failed hook + destroy-service <destroy-service> <juju-core:New> <https://launchpad.net/bugs/1475386>22:52
thumperfark...22:52
thumperdavecheney: still here?22:52
davecheneythumper: ack22:52
* thumper is looking at bug 147494622:52
mupBug #1474946: kvmBrokerSuite worker/provisioner: tests are poorly isolated <blocker> <ci> <regression> <test-failure> <juju-core:In Progress by thumper> <https://launchpad.net/bugs/1474946>22:52
thumperI moved my /var/lib/lxc dir out of the way22:52
davecheneyit's a shitstorm22:52
thumperand confirmed that my user can't create a dir there22:52
thumperbut when I run the tests, they pass22:52
davecheneyyou have lxc installed22:53
thumperWT actual F22:53
thumperyes22:53
thumperbit the dir /var/lib/lxc doesn't exist22:53
davecheneymkdir -p will always pass if the directory exists22:53
thumperbecause I moved it22:53
davecheneywhat is the ownership of /var/lib ?22:53
thumperdoesn't allow my user to create dirs22:53
davecheneypossibly installing lxc changes gropu ownershipts22:53
davecheneypossibly installing lxc changes gropu ownerships22:53
thumperthat was the first thing I tested22:53
davecheneyputs you in wheel22:53
* thumper digs more22:54
thumperFFS22:57
thumperthis test is bullshit22:57
thumperit is a kvm test22:57
thumperthat checks the lxc dir for networking setup22:57
cmarsthumper, thanks for the review. i described the return bool here: https://github.com/juju/utils/blob/master/file_unix.go#L3023:02
cmarsthumper, did you want a comment in juju as well describing the use of it?23:02
davecheneyda fuq23:03
thumperjust in that use of it, yes23:03
cmarsthumper, ok, np23:03
thumpercode should be obviously correct when you read it23:03
thumperdavecheney: also, my version passes because for some reason, the lxc data dir is /home/tim/.local/share23:03
thumpermore modern lxc I guess23:03
* davecheney reaches for emoji23:04
davecheneypossibly, i'm on 14.04.223:04
thumperoh fuck23:06
* thumper head desks23:06
* thumper head desks23:06
* thumper head desks23:06
davecheneyalways a good sign ...23:06
* thumper head desks23:06
thumperin order to be a good citizen...23:06
thumperwe do this:23:06
thumperLxcContainerDir  = golxc.GetDefaultLXCContainerDir()23:07
thumperwhich does this:23:07
thumperrun("lxc-config", nil, "lxc.lxcpath")23:07
thumperfor root, it is probably the right thing23:07
thumperfor a user with modern lxc23:07
thumperit isn't23:07
* thumper thinks23:07
thumperugh23:07
thumpersince the local provider jujud runs as root23:07
thumperI think we are ok23:07
davecheneylxc-config won't exist if lxc isn't installed23:07
thumperbut this is why the tests passes23:08
thumperack23:08
thumperif there is an error23:08
thumperit returns /var/lib/lxc23:08
thumperwhich then doesn't exist23:08
thumperhowever23:08
davecheneyif lxc-config ... fails, we fall back to /var/lib/lxc ?23:08
thumperthe bigger problem23:08
thumperis that the test is bullshit23:08
davecheneyderp-tastic!23:08
thumperwe shouldn't be adding network config in lxc dir for kvm tests23:08
* thumper renames the function so it is obviously wrong23:10
thumperand removes it23:10
davecheneyphase 1. delete test23:10
davecheneyphase 2. ??23:10
davecheneyphase 3. build is green23:10
thumperphase 1: rename function to include LXC23:11
thumperphase 2: make it so the local dir can't be created23:11
thumperphase 3: run all tests23:11
thumperphase 4: remove lxc function from kvm test23:12
thumperphase 5: ensure no other failures23:12
thumperphase 6: send email to network folks to see what  should be there23:12
thumperphase 7: profit23:12
davecheney7 steps ?23:12
davecheneythat's too enterprise23:12
thumperhttp://reviews.vapour.ws/r/2190/diff/#23:22
menn0_wallyworld: the env life assert PR has now failed twice due to test timeouts in cmd/jujud/agent23:26
menn0_wallyworld: makes me think it's an effect of the change23:26
menn0_wallyworld: but of course it always works on my machine23:26
thumpermenn0_: did you want me to try here?23:27
menn0_thumper: if you have the time, yes please23:27
thumpermenn0_: if you want to review said branch above23:27
menn0_thumper, wallyworld: not sure if it's related but the race detector finds 11 races in that package23:28
thumperreally?23:28
thumpermenn0_: what did you do?23:28
menn0_davecheney: have you been backporting your data race fixes to 1.24?23:28
menn0_thumper: I haven't changed a thing in that package23:29
thumpermenn0_: no, he hasn't AFAIK23:29
davecheneyno23:29
davecheneyi have not23:29
menn0_thumper: that could be why the races are still there then23:29
thumper:)23:29
davecheneyyup23:29
menn0_thumper: this PR only touches state23:29
thumperthis is on 1.24 is it?23:29
menn0_thumper: yep23:29
menn0_the races might be nothing to do with the test hangs23:29
menn0_but it could be to do with txns being decoupled, changing the timings of things23:30
thumperwe should back port the apiserver wait group change23:30
thumperbecause that could be it23:30
menn0_thumper: I already did that I think23:30
* menn0_ checks23:30
menn0_thumper: yep that's there23:31
=== menn0_ is now known as menn0
thumperhmm23:31
wallyworldmenn0: sorry was in meeting, but normally if agent tests timeout more than once there's an issue23:32
menn0wallyworld: it's been 2 different tests in that pkg that have gotten stuck but they're both upgrade related23:35
menn0wallyworld: i'm going to have peek at them in case something obvious jumps out23:36
perrito666wallyworld: mm, that is the test that failed merging the patch yesterday, I think that it can only be reproduced by running the whole suite, I have been able to do it only once and no more so I could not get to it and thought it was one of the long standuing flaky testss23:41
wallyworldcould be a flakey test but i think work has been done recently to fix a lot of the agent related tests23:43
perrito666wallyworld: mm, could definitely be something in the change that fixes the issue with status, but that would mean that the test is waiting for the wrong assumption23:43
perrito666wallyworld: did you ever re-merge the code for agent status?23:44
wallyworldperrito666: which code?23:44
perrito666wallyworld: updateAgentStatus23:46
axwwallyworld: on master, AFAIK, bootstrap will put image metadata directly into gridfs without using swift23:47
wallyworldperrito666: that code as missing would only have failed to report a failed state23:47
wallyworldthe fix is merging now23:47
wallyworldthe refactoring reported non error status elsewhere23:47
axwwallyworld: it's just that we weren't searching it (the fix for that landed already I think?)23:48
perrito666wallyworld: it is odd that fixing the code would break the test :(23:48
wallyworldperrito666: the code hasn't merged yet23:48
wallyworldaxw: i didn't think it did put it into gridfs, or i don't recall if it did23:49
axwwallyworld: I'll find the code. I'm 99% sure it does23:49
wallyworldaxw: the search issue - that was cloud storage not being searched23:50
wallyworldie swift23:50
wallyworldi didn't think 1.24 and master were different in that respect23:50
axwwallyworld: oh... we're meant to be looking in gridfs as well23:50
wallyworldaxw: i didn't realise at all the simplestreams data has been added to gridfs23:51
axwwallyworld: https://github.com/juju/juju/blob/master/cmd/jujud/bootstrap.go#L23723:51
menn0thumper: so these tests are hanging because the machine agent Stop call is not returning23:51
axwwallyworld: we're writing the image metadata into "state storage", which is gridfs23:52
thumperheh23:52
menn0thumper: but it's not the apiserver... I can see that does stop23:52
thumperyeah...23:52
thumperoh?23:52
thumperinteresting23:52
thumperwhich one is it?23:52
menn0thumper: still digging through the logs to figure out which workers are not stopping23:52
* menn0 is very grateful to thumper for adding the extra logging in the runner23:52
wallyworldaxw: i see, i had thought that the stor used was EnvironStorage23:52
thumperaxw: need to talk to you about environment destruction23:53
axwthumper: mkay23:53
axwwallyworld: hm, so looking back over anastasiamac's change, I don't think that's actually what we should be doing. we're meant to be looking in gridfs, and the individual providers can add additional search paths if they want to (e.g. look in keystone)23:55
wallyworldaxw: i'd prefer not to have a simplestreams blob23:55
wallyworldstructyred data is much better23:55
axwwallyworld: I understand, and that's being fixed, but atm we're talking about *where* the blob is23:55
wallyworldsimplestreams should not be in env23:55
axwin provider storage vs. gridfs23:56
axwwe should not be perpetuating provider storage23:56
wallyworldagreed, and we're not23:56
wallyworldi didn't realise we weren't writing to provider storage23:56
axwwallyworld: the latest change reintroduces searching metadata in provider storage...23:56
thumperaxw: when do you have some time?23:57
wallyworldbecause i thought we were writing metadata there based on the information i had23:57
axwthumper: can chat now23:57
menn0thumper: looks like it might be the certupdater23:58
thumperhaha23:58
menn0thumper: it's blocked on a channel send23:58
thumperbwa haha23:58
thumpernaked send?23:58
thumperoh...23:58
thumperI remember that...23:58
thumperit is buffered23:58
thumperwith one value23:58
thumperbut sends twice23:58
menn0wasn't that fixed?23:58
axwthumper: where aboots?23:58
thumperI thought so...23:58
thumperperhaps not23:59
* menn0 keeps digging23:59
thumperaxw: https://plus.google.com/hangouts/_/canonical.com/env-destruction23:59
* menn0 loves tracebacks + decent logs23:59

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!