/srv/irclogs.ubuntu.com/2015/08/19/#juju-dev.txt

sinzuithumper: yes, I will jigger it now00:20
mwhudson_davechen1y: https://launchpad.net/ubuntu/+source/golang/2:1.5~rc1-0ubuntu1 \o/00:44
sinzuithumper: http://reports.vapour.ws/releases/2985/job/joyent-deploy-jes-trusty-amd64/attempt/117 used --debug00:49
davechen1ymwhudson_: /me pops cork00:51
sinzuithumper: menn0: joyent's networking non-sense is often a cause of not downloading agents from the state-server. That is the main reason for the reties on joyent jobs. Even when agents are started, the units fail. I think you can see thie in the 117 ^ attempt.01:01
thumpersinzui: the unit agent failure is almost certainly a problem with JES and the new debug log writer01:46
thumpermenn0: ^^01:46
thumpersinzui: the failing to download tools in the machine agents cloud init is something else01:46
thumpermenn0: I'm guessing the new manifold bollocks for the log sender is a problem01:47
thumpermenn0: the people making defining the dependencies almost certainly got it wrong01:47
sinzuithumper: yeah, joyent has 72.* and 165.* addresses. We often see the private nets cannot see each other when joyent gives us differing public addresses01:47
thumperah...01:48
thumpersinzui: what is our solution there?01:48
menn0thumper: sorry, playing catch up. where can I see the unit agent failure?01:49
sinzuithumper: retry, or deploy lots of machines to hold the 72.* addresses, release the 165.* addresses01:49
thumpermenn0: http://data.vapour.ws/juju-ci/products/version-2985/joyent-deploy-jes-trusty-amd64/build-117/unit-dummy-source-0.log01:49
sinzuithumper: we tried prefer-ipv6, but we still see agents downloads using ip401:49
menn0sinzui, thumper: i made a joyent specific routing fix a long time ago which allowed the various private nets they allocate to talk to each other01:49
thumpersinzui: hmm...01:49
menn0sinzui, thumper: is that not working any more?01:50
* thumper shrugs01:50
sinzuimenn0: I don't think it is that reliable, but maybe the firewall issue i describe in another bug is the root cause. I haven't escallated the other issue because I am still gathering evidence01:50
menn0sinzui: or maybe joyent has changed the way their network works01:51
sinzuimenn0: they changed firewal rules recently :)01:51
menn0sinzui: ok right01:52
menn0thumper: should I take a look at this panic?01:52
thumpermenn0: if you have the bandwidth, yes please01:53
menn0thumper: well there's also this bootstrap issue and the State pool work. tell me my priorites and I'll work to that :)01:54
thumperugh01:54
thumpermenn0: I'll poke the panic, you continue with bootstrap issue01:54
menn0thumper: k01:56
* thumper sighs02:20
thumperat least the panic is entirely reproducable with the local provider02:20
* thumper goes to make coffee02:20
menn0thumper: that's good to know. should be easy enough to track down then.02:40
thumperwell, you'd think that02:40
menn0:)02:40
* thumper pokes more02:40
menn0thumper: here's the bootstrap fix: http://reviews.vapour.ws/r/2407/02:41
thumpermenn0: the connection is null in sendLogRecord02:42
* menn0 pulls up current master02:42
thumperbah humbug02:42
* thumper takes a breath02:44
thumpermenn0: if you have time02:44
thumperwould love a hangout to talk this through as I read the code02:44
menn0thumper: sure02:44
* menn0 close the office door02:44
thumpermenn0: I'm just testing, but I *think* that the one line error fix will make things work04:05
thumperjust not as cleanly as we'd like04:05
menn0thumper: b/c it retries04:05
thumperack04:05
menn0thumper: and at some point the api info will be correct....04:05
menn0nasty04:05
thumperack04:06
thumperyeah04:06
thumperI'd still like to represent the dependencies correctly, but this will get us over the hump04:06
* thumper is deploying now04:06
* menn0 nods04:06
menn0it would definitely be nice to do it right04:06
thumperyep, that works04:07
huwshimithumper: Hey. I've got a branch ready to land for jujulib. Is there a process for landing branches yet?04:20
thumpernope, but I can land it for you04:21
thumperif it is reviewed04:21
huwshimithumper: It's this one. Has a couple of plus ones. https://github.com/markramm/jujulib/pull/304:22
* thumper takes a look at the actual diff too04:23
thumperhuwshimi: merged04:27
thumperhuwshimi: thanks, looks real good04:27
huwshimithumper: Brilliant, thanks!04:27
thumpernow that I've dealt with the bug I was looking at, I'm going to fix my review comments that didn't get addressed before mramm2 merged my branch04:27
thumperhuwshimi: hmm...04:31
thumperhuwshimi: found a bug in the makefile :)04:31
huwshimithumper: oh...04:31
thumpercalling make check when there is no .env setup fails04:31
* thumper will fix04:31
thumperhuwshimi: "rm .file" fails if .file doesn't exist04:34
thumper"rm -f .file" does not fail if it doesn't exist04:34
huwshimithumper: Ah, nicely spotted. Thanks04:35
thumpernow I get a lot of lint04:35
* thumper sighs04:35
huwshimithumper: Yes, yes you do.04:35
huwshimithumper: and an 'assert False' :)04:36
thumperhuwshimi: that was from mramm04:37
* thumper passes the buck04:37
huwshimihaha04:37
* thumper is done05:20
thumperlaters05:20
dimiternmorning08:44
dimiternfwereade, hey08:47
fwereadedimitern, o/08:48
dimiternfwereade, when you have a moment, I'd like you to review this http://reviews.vapour.ws/r/2406/ please08:48
dimiternfwereade, it should do everything we discussed about setting/getting constraints08:48
fwereadedimitern, cool, I will try to get to that, I see menn0 has covered a few things already?08:49
dimiternfwereade, yeah, and I really appreciate that, but since it's changes core functions a second look will be nice :)08:50
MmikeHi, lads. What is juju using for leader election, which algo/protocl?09:19
urulamajust to verify: can openstack provider deal with user tokens from Keystone instead of usernames and passwords? In case it does, how is refreshing done and where are they stored?09:54
dimiterndooferlad, hey, so my constraints branch is landing now10:46
dimiterndooferlad, after it lands it's a good time to sync up net-cli with master10:46
mupBug #1486553 opened: i/o timeout errors can cause non-atomic service deploys <juju-core:New> <https://launchpad.net/bugs/1486553>13:05
mattywfwereade, http://reviews.vapour.ws/r/2415/13:27
jogarret6204hi all,  I have an upgrade stuck, causing many errors such as: ..."blocked because upgrade in progress".  Any ideas how to unstick it?13:32
frankbanhi ocr, could you please take a look at https://github.com/juju/juju/pull/3035 ? (this is a MP against a feature branch for the GUI embedded story). thank you!13:35
natefinchjogarret6204: upgrading what version to what version?13:35
frankbankatco: ^^^13:35
jogarret6204natefinch:i think i am 1.24.3 now...  not sure how to tell what it is targeting..13:37
jogarret6204agent-version: 1.24.3.113:38
natefinchjogarret6204: likely going to 1.24.5, if you started the upgrade today or yesterday.13:39
jogarret6204any way to kick it along?  seeing other issues now13:39
jogarret6204message: agent is lost, sorry! See 'juju status-history all-in-one/34'13:39
jogarret6204but can't check that, that command returns: ERROR upgrade in progress - Juju functionality is limited13:40
natefinchjogarret6204: is it just one machine or a bunch of machines?13:40
jogarret6204bunch.  I have maas on baremetal, it uses a juju VM on same box for state server.  then about 10 of these machings in issue right now13:42
jogarret6204I opened a juju-deployer bug - this may actually be the cause of that13:43
jogarret6204http://bit.ly/1KvMybk13:44
natefinchjogarret6204: interesting.  I don't really know the deployer code, so can't comment on what it does or does not do.  But certainly the unit number should not be used as the count of units.13:48
natefinchfrankban: katco is going to be in late this morning, btw13:49
frankbannatefinch: ok thanks, there is no rush13:50
jogarret6204natefinch:  i'm thinking that it may not be a bug if I get this upgrade issue fixed.13:51
wwitzel3ericsnow, natefinch: can we delay standup maybe 10 minutes? I lost track of time and have bacon on the stove13:56
wwitzel3well, actually, in the oven, but same thing13:57
natefinchwwitzel3: I can wait for bacon13:58
ericsnowfwereade, cmars: how do I "uninstall" a worker from an engine?  (Runner has StopWorker...)14:09
fwereadeericsnow, you can't, yet; was waiting for a direct need to do so14:09
fwereadeericsnow, what's your use case?14:10
ericsnowfwereade: I have per-workload-process workers with a definite lifetime14:11
ericsnowfwereade: when Juju stops tracking such a workload process then the worker must be stopped and forgotten14:11
ericsnowfwereade: for now I am resorting to using runners but would rather use dependency engine14:12
fwereadeericsnow, that sounds sane for the short term14:12
ericsnowfwereade: yeah, I figured we'd sort it out later :)14:12
fwereadeericsnow, sorry, I wasn't expecting to need it until I got relatively deeply stuck into the machine agent14:13
ericsnowfwereade: no worries :)14:13
fwereadeericsnow, (out of interest, what are the dependencies of your process workers?)14:13
fwereadeericsnow, (and their responsibilities?)14:14
ericsnowfwereade: deps - mostly API client; responsibilities - e.g. periodically update status from the underlying technology (e.g. docker)14:16
fwereadeericsnow, are the workers expected to, e.g., restart the processes if they fail?14:17
fwereadeericsnow, or are they just observers?14:17
ericsnowfwereade: not yet14:17
ericsnowfwereade: for now just observing14:18
ericsnowfwereade: later potentially starting and stopping them14:18
fwereadeericsnow, cool, thanks14:18
ericsnowfwereade: np14:18
fwereadeericsnow, a thought re responsibilities, not sure if it's good14:18
fwereadeericsnow, how hard would it be for one such worker to know when it was finished, itself, and return something like ErrUninstallMePlease?14:19
ericsnowfwereade: I'll think about it (OTP)14:20
fwereadeericsnow, I'm not sure that even covers all the machine-agent use cases tbh, it might just be a bad idea, but let me know if you think of anything14:20
ericsnowfwereade: will do14:21
mupBug #1486297 opened: Action doesn't correctly translate unit name into tag if hyphen present <juju-core:New> <python-jujuclient:New> <https://launchpad.net/bugs/1486297>14:41
mupBug #1486297 changed: Action doesn't correctly translate unit name into tag if hyphen present <juju-core:New> <python-jujuclient:New> <https://launchpad.net/bugs/1486297>14:44
mupBug #1486297 opened: Action doesn't correctly translate unit name into tag if hyphen present <juju-core:New> <python-jujuclient:New> <https://launchpad.net/bugs/1486297>14:56
marcoceppialexisb: health checks, are those on the roadmap?14:59
alexisbheh marcoceppi we were just chatting about that14:59
* marcoceppi mind melds15:00
mupBug #1486297 changed: Action doesn't correctly translate unit name into tag if hyphen present <juju-core:New> <python-jujuclient:New> <https://launchpad.net/bugs/1486297>15:24
fwereadedimitern, I'm feeling dense, would you explain: I think we need some way to distinguish between the "fallback to env spaces constraint" and "explicitly clear spaces constraint" cases15:31
fwereadedimitern, do we not need it; or do we do it but I just don't see it?15:31
dimiternfwereade, I'll try at least :)15:33
marcoceppiHow do I tell juju what agent to bootstrap with?15:33
marcoceppiI have 1.24.4 isntalled but it bootstraps 1.24.5 trying to validate a regression15:33
dimiternfwereade, so explicitly empty values always override matching fallbacks (i.e. "mem=" overrides "" and "mem=4G"), but only when doing resolution from deployment to provisioning constraints15:33
dimiternfwereade, that's what now happens after my changes15:34
dimitern(soon to be available in master as well, when we merge net-cli)15:34
wwitzel3ericsnow, natefinch: https://github.com/juju/charm/pull/14315:35
ericsnowwwitzel3: ship-it15:36
natefinchofficially used loggo's per-package logging adjustments for the first time in 2 years: juju set-env logging-config="juju.worker.leadership=WARNING"15:36
natefinchfwereade: any chance we were going to de-spam the leadership logging sometime soon?15:38
fwereadenatefinch, huh, I thought we had15:38
fwereadenatefinch, I thought iit was mostly at trace level15:38
fwereadenatefinch, hmm, maybe we didn't do tracker?15:38
natefinchfwereade: yeah, the log lines I see are all from tracker15:39
fwereadenatefinch, damn, sorry15:40
natefinchfwereade: not the end of the world.  Fixable via loggo (as long as you don't need to see anything under warning from leadership)15:41
fwereadedimitern, ah, ok, and if we had "spaces=foo" and replaced it with "mem=4G" we'd get the fallback spaces; but "mem=4G spaces=" would ensure no spaces constraints? or have I completely confused myself?15:42
dimiternfwereade, exactly right15:43
fwereadedimitern, cool15:43
fwereadedimitern, ok, then, in state -- how do we store the distinction between those two cases?15:43
fwereadedimitern, we seem to have lost the pointers in that struct15:43
dimiternfwereade, FWIW resolution was broken in a few places, e.g. adding a machine does resolution, SetConstraints on it before deployment doesn't and takes whatever you give it15:44
fwereadedimitern, ahh, machine constraints weren't including env fallbacks?15:44
dimiternfwereade, when set, but when added it worked as expected15:44
dimiternfwereade, I *think* it's only important to store non-empty values (after doing resolution)15:45
fwereadedimitern, sorry, lost again, how can you set constraints on a machine?15:45
dimiternfwereade, before provisioning15:45
dimiternfwereade, m.SetConstraints()15:45
fwereadedimitern, can users do that?15:45
fwereadedimitern, (other than when adding?)15:45
fwereadedimitern, I think those values should just be coming from the resolved env+service constraints for the unit whose addition triggered machine addition15:46
dimiternfwereade, I don't think so15:47
dimiternfwereade, but it's still a bug15:47
fwereadedimitern, but either way, when we're storing service constraints in state we shouldn't resolve them15:47
dimiternfwereade, I agree, and I made it so it's definitely like this in both cases15:47
dimiternfwereade, we're not15:48
dimiternfwereade, we only resolve unit constraints when asked15:48
fwereadedimitern, how can we do that correctly if we're throwing away the distinction between "fallback" and "clear" in the service constraints we store in mongo?15:49
dimiternfwereade, let me look at the code15:50
fwereadedimitern, np, sorry it's taken me so long to start looking15:51
dimiternfwereade, ok, that's a good catch sir15:52
fwereadedimitern, yay, my brain still works :)15:52
dimiternfwereade, so we should store them when empty, at least for the services15:52
fwereadedimitern, I think so, yeah15:52
dimiternfwereade, good, it should be easy to fix15:52
fwereadedimitern, and probably across the board, even if the distinction is academic once resolved15:52
fwereadedimitern, cool15:52
TheMuedimitern: just before riding home by bike, http://reviews.vapour.ws/r/2419/ contains the changes we talked about yesterday. hints regarding the testing of the finishedWorker are welcome. I'll take a look when I'm at home16:08
wwitzel3cmars: ping16:15
mupBug #1486640 opened: Typos in help  <juju-core:New> <https://launchpad.net/bugs/1486640>16:24
dimiternTheMue, sure, will have a look in a bit16:30
jogarret6204just checking back in and seeing a team call here.. am I in wrong group for "general noob help"?  Sorry if so.  where is general help?16:46
lazyPowerjogarret6204: general noob help is in #juju :)16:47
lazyPower#juju-dev is primarly for those hacking on juju core16:47
lazyPoweryou're more than welcome to hang in both places though, we welcome all feedback16:47
jogarret6204don't want to slow you down here, so I'll hit the other.  thanks LazyPower and natefinch.16:51
natefinchjog: sorry, yeah, #juju is more the general help channel :)16:59
natefinchjog: sorry, obv not meant for you17:04
sinzuikatco: dimitern : can either of you arrange fix for bug 1486675. I think the test needs more smarts, juju is fine17:28
mupBug #1486675: supportedSeriesWindowsSuite.TestSupportedSeries fails <blocker> <ci> <regression> <unit-tests> <windows> <juju-core:Triaged> <https://launchpad.net/bugs/1486675>17:28
* perrito666 is connected though his phone and loving his isp17:28
perrito666well creating a simple stream over 1/2 3g really makes me save on heating, the phone is enough for the whole office17:29
natefinchheh17:30
=== tvansteenburgh1 is now known as tvansteenburgh
natefinchwow, this is a dumb error: ERROR environment destruction failed: destroying environment: container "nate-local-machine-1" is not yet created17:31
perrito666well, that was actually possible17:32
perrito666I think I recall davechen1y or thumper talking about a test that tried to reproduce that by being a race condition17:32
mupBug #1486166 changed: JES deploy fails <ci> <jes> <regression> <juju-core:Fix Released by thumper> <https://launchpad.net/bugs/1486166>17:33
mupBug #1486675 opened: supportedSeriesWindowsSuite.TestSupportedSeries fails <blocker> <ci> <regression> <unit-tests> <windows> <juju-core:Triaged> <https://launchpad.net/bugs/1486675>17:33
katcosinzui: dimitern's team is on bug-squad, so i'll leave it to him17:34
natefinchperrito666: my point was rather, if I am destroying the environment, I don't care that there's a container that isn't created yet, that's just one less thing to tear down.17:38
dimiternTheMue, you have a review17:39
dimiternsinzui, katco, sure, let me have a look17:40
dimiternsinzui, this looks like a fallout of a recent change I saw17:41
sinzuidimitern: yes17:41
dimiternsinzui, this one most likely https://github.com/juju/juju/pull/298117:43
dimiternbogdanteleaga, are you around?17:43
sinzuiyep, that is what I saw17:43
bogdanteleagadimitern, looking into it17:44
dimiternsinzui, katco, so in these case what - is a revert in order?17:44
dimiternbogdanteleaga, thanks!17:44
sinzuidimitern: a revert will unblock. otherwise race to land a fix17:44
bogdanteleagadimitern, sinzui https://github.com/juju/juju/pull/304017:47
bogdanteleagadimitern, sinzui I g2g now, merge it once it gets reviewed please17:48
sinzuithank you bogdanteleaga17:48
dimiternbogdanteleaga, awesome, ta!17:54
dimiternsinzui, setting it to merge17:56
sinzuiyou rock dimitern17:56
* dimitern wishes all bugs where like this :)17:58
natefinchOMG.... just realized what the problem I've been fighting with all day..... printing a value out with %v was causing a panic from inside fmt somewhere17:59
perrito666aghh gce is saying me I am not authenticated... only for one particular operation wth18:00
=== natefinch is now known as natefinch-afk
dimiternsinzui, btw I'm not sure if you're monitoring feature branches for trends in failures like for the main branches, but if there is some data about "net-cli", which we're planning to merge tomorrow, will be awesome18:06
sinzuidimitern: you cannot merge it because it has never passed http://reports.vapour.ws/releases#net-cli18:07
sinzuidimitern: merge tip, When CI blesses it, we can merge it into master18:07
dimiternsinzui, hmm that's useful to know18:07
dimiternsinzui, we just did that today18:07
sinzuidimitern: I shall try to force ci to retest net-cli.18:08
dimiternsinzui, it's currently only 4 commits behind18:08
* sinzui is trying ton retest 1.24 today as well18:08
dimiternsinzui, great, thanks! it will be nice to have some early feedback18:08
sinzuidimitern: maybe this issues are fixed already https://bugs.launchpad.net/juju-core/net-cli18:09
dimiternsinzui, I hope so, however the windows one is a bit worrying18:10
dimiternsinzui, or you mean because it's gone from master?18:10
sinzuidimitern: I hope the issue was really in master, and your merge fixed it18:13
dimiternsinzui, I'll give it a try now as I'm changing that list command18:15
sinzuialexisb: I think we have enough evidence to say Joyent's firewall changes did hurt Juju, and that deleting them when Juju destroys the environment fixes the issue: I want bug 1485781 fixes in 1.25 and 1.24 (maybe 1.22 if we ever plan a release)18:16
mupBug #1485781: Juju is unreliable on Joyent <joyent-provider> <reliability> <repeatability> <juju-core:Triaged> <juju-core 1.22:Triaged> <juju-core 1.24:Triaged> <https://launchpad.net/bugs/1485781>18:16
katcoericsnow: natefinch-afk: wwitzel3: sorry i missed the stand-up this morning. anything i can help with?18:38
ericsnowkatco: review http://reviews.vapour.ws/r/2405/ ?18:39
ericsnowkatco: (and don't sweat missing standup :)18:39
katcoericsnow: tal18:39
alexisbsinzui, thank you for getting the info on joyent and opening the bug19:20
alexisbthat is good stuff19:20
mupBug #1486712 opened: Race on uniter-hook-execution, prevents to resolve unit. <sts> <juju-core:New> <https://launchpad.net/bugs/1486712>19:24
katcoericsnow: reviewed19:53
ericsnowkatco: thanks!19:53
=== natefinch-afk is now known as natefinch
mupBug #1474885 changed: juju deploy fails with ERROR EOF <local-provider> <precise> <juju-core:Fix Released> <https://launchpad.net/bugs/1474885>21:37
mupBug #1486749 opened: juju backups create should fail earlier for hosted environments <juju-core:In Progress by cherylj> <https://launchpad.net/bugs/1486749>21:52
mupBug #1486749 changed: juju backups create should fail earlier for hosted environments <juju-core:In Progress by cherylj> <https://launchpad.net/bugs/1486749>21:58
mupBug #1486749 opened: juju backups create should fail earlier for hosted environments <juju-core:In Progress by cherylj> <https://launchpad.net/bugs/1486749>22:07
davecheneyalexisb: ping22:36
menn0waigani: would you mind taking a look at http://reviews.vapour.ws/r/2425/ pls?23:31
perrito666ericsnow: still here?23:51
ericsnowperrito666: barely23:51
* perrito666 sees ericsnow fading23:51
perrito666ericsnow: go use gce with the fields instead of the json file, is it enough to just copy the values of the json?23:52
ericsnowperrito666: pretty much23:53
perrito666ericsnow: I believe the issue I told you about earlier might be because storage provisioner tries to do stuff in a machine and finds itself lacking these creds23:53
ericsnowperrito666: the PK might not copy-and-paste quite right so you have to watch that23:53
ericsnowperrito666: does it do more than make calls on the provider?23:54
perrito666ericsnow: calls that require auth23:54
ericsnowperrito666: the provider has all the auth it needs23:55
perrito666the provider?23:55
ericsnowperrito666: provider/gce/...23:56
perrito666ericsnow: well I am getting auth errors from one of the machines23:56
perrito666so :) something is wrong23:56
ericsnow:)23:56
ericsnowperrito666: you're adding new methods to the gceConnection interface (in environ.go), right?23:57
perrito666ericsnow: yep23:57
ericsnowperrito666: then I would definitely not expect auth issues :/23:58
perrito666well it is only happening with one machine i think23:58
perrito666that I added with add-machine23:58
ericsnowperrito666: do you have to enable some extra permissions in the GCE developer console?23:58
perrito666so I am looking it up23:58
ericsnow(manually)23:58
perrito666ericsnow: where is that stored on the server?23:59
ericsnowperrito666: where is what stored?23:59
perrito666ericsnow: the oauth token23:59

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!