[00:20] thumper: yes, I will jigger it now [00:44] davechen1y: https://launchpad.net/ubuntu/+source/golang/2:1.5~rc1-0ubuntu1 \o/ [00:49] thumper: http://reports.vapour.ws/releases/2985/job/joyent-deploy-jes-trusty-amd64/attempt/117 used --debug [00:51] mwhudson_: /me pops cork [01:01] thumper: menn0: joyent's networking non-sense is often a cause of not downloading agents from the state-server. That is the main reason for the reties on joyent jobs. Even when agents are started, the units fail. I think you can see thie in the 117 ^ attempt. [01:46] sinzui: the unit agent failure is almost certainly a problem with JES and the new debug log writer [01:46] menn0: ^^ [01:46] sinzui: the failing to download tools in the machine agents cloud init is something else [01:47] menn0: I'm guessing the new manifold bollocks for the log sender is a problem [01:47] menn0: the people making defining the dependencies almost certainly got it wrong [01:47] thumper: yeah, joyent has 72.* and 165.* addresses. We often see the private nets cannot see each other when joyent gives us differing public addresses [01:48] ah... [01:48] sinzui: what is our solution there? [01:49] thumper: sorry, playing catch up. where can I see the unit agent failure? [01:49] thumper: retry, or deploy lots of machines to hold the 72.* addresses, release the 165.* addresses [01:49] menn0: http://data.vapour.ws/juju-ci/products/version-2985/joyent-deploy-jes-trusty-amd64/build-117/unit-dummy-source-0.log [01:49] thumper: we tried prefer-ipv6, but we still see agents downloads using ip4 [01:49] sinzui, thumper: i made a joyent specific routing fix a long time ago which allowed the various private nets they allocate to talk to each other [01:49] sinzui: hmm... [01:50] sinzui, thumper: is that not working any more? [01:50] * thumper shrugs [01:50] menn0: I don't think it is that reliable, but maybe the firewall issue i describe in another bug is the root cause. I haven't escallated the other issue because I am still gathering evidence [01:51] sinzui: or maybe joyent has changed the way their network works [01:51] menn0: they changed firewal rules recently :) [01:52] sinzui: ok right [01:52] thumper: should I take a look at this panic? [01:53] menn0: if you have the bandwidth, yes please [01:54] thumper: well there's also this bootstrap issue and the State pool work. tell me my priorites and I'll work to that :) [01:54] ugh [01:54] menn0: I'll poke the panic, you continue with bootstrap issue [01:56] thumper: k [02:20] * thumper sighs [02:20] at least the panic is entirely reproducable with the local provider [02:20] * thumper goes to make coffee [02:40] thumper: that's good to know. should be easy enough to track down then. [02:40] well, you'd think that [02:40] :) [02:40] * thumper pokes more [02:41] thumper: here's the bootstrap fix: http://reviews.vapour.ws/r/2407/ [02:42] menn0: the connection is null in sendLogRecord [02:42] * menn0 pulls up current master [02:42] bah humbug [02:44] * thumper takes a breath [02:44] menn0: if you have time [02:44] would love a hangout to talk this through as I read the code [02:44] thumper: sure [02:44] * menn0 close the office door [04:05] menn0: I'm just testing, but I *think* that the one line error fix will make things work [04:05] just not as cleanly as we'd like [04:05] thumper: b/c it retries [04:05] ack [04:05] thumper: and at some point the api info will be correct.... [04:05] nasty [04:06] ack [04:06] yeah [04:06] I'd still like to represent the dependencies correctly, but this will get us over the hump [04:06] * thumper is deploying now [04:06] * menn0 nods [04:06] it would definitely be nice to do it right [04:07] yep, that works [04:20] thumper: Hey. I've got a branch ready to land for jujulib. Is there a process for landing branches yet? [04:21] nope, but I can land it for you [04:21] if it is reviewed [04:22] thumper: It's this one. Has a couple of plus ones. https://github.com/markramm/jujulib/pull/3 [04:23] * thumper takes a look at the actual diff too [04:27] huwshimi: merged [04:27] huwshimi: thanks, looks real good [04:27] thumper: Brilliant, thanks! [04:27] now that I've dealt with the bug I was looking at, I'm going to fix my review comments that didn't get addressed before mramm2 merged my branch [04:31] huwshimi: hmm... [04:31] huwshimi: found a bug in the makefile :) [04:31] thumper: oh... [04:31] calling make check when there is no .env setup fails [04:31] * thumper will fix [04:34] huwshimi: "rm .file" fails if .file doesn't exist [04:34] "rm -f .file" does not fail if it doesn't exist [04:35] thumper: Ah, nicely spotted. Thanks [04:35] now I get a lot of lint [04:35] * thumper sighs [04:35] thumper: Yes, yes you do. [04:36] thumper: and an 'assert False' :) [04:37] huwshimi: that was from mramm [04:37] * thumper passes the buck [04:37] haha [05:20] * thumper is done [05:20] laters [08:44] morning [08:47] fwereade, hey [08:48] dimitern, o/ [08:48] fwereade, when you have a moment, I'd like you to review this http://reviews.vapour.ws/r/2406/ please [08:48] fwereade, it should do everything we discussed about setting/getting constraints [08:49] dimitern, cool, I will try to get to that, I see menn0 has covered a few things already? [08:50] fwereade, yeah, and I really appreciate that, but since it's changes core functions a second look will be nice :) [09:19] Hi, lads. What is juju using for leader election, which algo/protocl? [09:54] just to verify: can openstack provider deal with user tokens from Keystone instead of usernames and passwords? In case it does, how is refreshing done and where are they stored? [10:46] dooferlad, hey, so my constraints branch is landing now [10:46] dooferlad, after it lands it's a good time to sync up net-cli with master [13:05] Bug #1486553 opened: i/o timeout errors can cause non-atomic service deploys [13:27] fwereade, http://reviews.vapour.ws/r/2415/ [13:32] hi all, I have an upgrade stuck, causing many errors such as: ..."blocked because upgrade in progress". Any ideas how to unstick it? [13:35] hi ocr, could you please take a look at https://github.com/juju/juju/pull/3035 ? (this is a MP against a feature branch for the GUI embedded story). thank you! [13:35] jogarret6204: upgrading what version to what version? [13:35] katco: ^^^ [13:37] natefinch:i think i am 1.24.3 now... not sure how to tell what it is targeting.. [13:38] agent-version: 1.24.3.1 [13:39] jogarret6204: likely going to 1.24.5, if you started the upgrade today or yesterday. [13:39] any way to kick it along? seeing other issues now [13:39] message: agent is lost, sorry! See 'juju status-history all-in-one/34' [13:40] but can't check that, that command returns: ERROR upgrade in progress - Juju functionality is limited [13:40] jogarret6204: is it just one machine or a bunch of machines? [13:42] bunch. I have maas on baremetal, it uses a juju VM on same box for state server. then about 10 of these machings in issue right now [13:43] I opened a juju-deployer bug - this may actually be the cause of that [13:44] http://bit.ly/1KvMybk [13:48] jogarret6204: interesting. I don't really know the deployer code, so can't comment on what it does or does not do. But certainly the unit number should not be used as the count of units. [13:49] frankban: katco is going to be in late this morning, btw [13:50] natefinch: ok thanks, there is no rush [13:51] natefinch: i'm thinking that it may not be a bug if I get this upgrade issue fixed. [13:56] ericsnow, natefinch: can we delay standup maybe 10 minutes? I lost track of time and have bacon on the stove [13:57] well, actually, in the oven, but same thing [13:58] wwitzel3: I can wait for bacon [14:09] fwereade, cmars: how do I "uninstall" a worker from an engine? (Runner has StopWorker...) [14:09] ericsnow, you can't, yet; was waiting for a direct need to do so [14:10] ericsnow, what's your use case? [14:11] fwereade: I have per-workload-process workers with a definite lifetime [14:11] fwereade: when Juju stops tracking such a workload process then the worker must be stopped and forgotten [14:12] fwereade: for now I am resorting to using runners but would rather use dependency engine [14:12] ericsnow, that sounds sane for the short term [14:12] fwereade: yeah, I figured we'd sort it out later :) [14:13] ericsnow, sorry, I wasn't expecting to need it until I got relatively deeply stuck into the machine agent [14:13] fwereade: no worries :) [14:13] ericsnow, (out of interest, what are the dependencies of your process workers?) [14:14] ericsnow, (and their responsibilities?) [14:16] fwereade: deps - mostly API client; responsibilities - e.g. periodically update status from the underlying technology (e.g. docker) [14:17] ericsnow, are the workers expected to, e.g., restart the processes if they fail? [14:17] ericsnow, or are they just observers? [14:17] fwereade: not yet [14:18] fwereade: for now just observing [14:18] fwereade: later potentially starting and stopping them [14:18] ericsnow, cool, thanks [14:18] fwereade: np [14:18] ericsnow, a thought re responsibilities, not sure if it's good [14:19] ericsnow, how hard would it be for one such worker to know when it was finished, itself, and return something like ErrUninstallMePlease? [14:20] fwereade: I'll think about it (OTP) [14:20] ericsnow, I'm not sure that even covers all the machine-agent use cases tbh, it might just be a bad idea, but let me know if you think of anything [14:21] fwereade: will do [14:41] Bug #1486297 opened: Action doesn't correctly translate unit name into tag if hyphen present [14:44] Bug #1486297 changed: Action doesn't correctly translate unit name into tag if hyphen present [14:56] Bug #1486297 opened: Action doesn't correctly translate unit name into tag if hyphen present [14:59] alexisb: health checks, are those on the roadmap? [14:59] heh marcoceppi we were just chatting about that [15:00] * marcoceppi mind melds [15:24] Bug #1486297 changed: Action doesn't correctly translate unit name into tag if hyphen present [15:31] dimitern, I'm feeling dense, would you explain: I think we need some way to distinguish between the "fallback to env spaces constraint" and "explicitly clear spaces constraint" cases [15:31] dimitern, do we not need it; or do we do it but I just don't see it? [15:33] fwereade, I'll try at least :) [15:33] How do I tell juju what agent to bootstrap with? [15:33] I have 1.24.4 isntalled but it bootstraps 1.24.5 trying to validate a regression [15:33] fwereade, so explicitly empty values always override matching fallbacks (i.e. "mem=" overrides "" and "mem=4G"), but only when doing resolution from deployment to provisioning constraints [15:34] fwereade, that's what now happens after my changes [15:34] (soon to be available in master as well, when we merge net-cli) [15:35] ericsnow, natefinch: https://github.com/juju/charm/pull/143 [15:36] wwitzel3: ship-it [15:36] officially used loggo's per-package logging adjustments for the first time in 2 years: juju set-env logging-config="juju.worker.leadership=WARNING" [15:38] fwereade: any chance we were going to de-spam the leadership logging sometime soon? [15:38] natefinch, huh, I thought we had [15:38] natefinch, I thought iit was mostly at trace level [15:38] natefinch, hmm, maybe we didn't do tracker? [15:39] fwereade: yeah, the log lines I see are all from tracker [15:40] natefinch, damn, sorry [15:41] fwereade: not the end of the world. Fixable via loggo (as long as you don't need to see anything under warning from leadership) [15:42] dimitern, ah, ok, and if we had "spaces=foo" and replaced it with "mem=4G" we'd get the fallback spaces; but "mem=4G spaces=" would ensure no spaces constraints? or have I completely confused myself? [15:43] fwereade, exactly right [15:43] dimitern, cool [15:43] dimitern, ok, then, in state -- how do we store the distinction between those two cases? [15:43] dimitern, we seem to have lost the pointers in that struct [15:44] fwereade, FWIW resolution was broken in a few places, e.g. adding a machine does resolution, SetConstraints on it before deployment doesn't and takes whatever you give it [15:44] dimitern, ahh, machine constraints weren't including env fallbacks? [15:44] fwereade, when set, but when added it worked as expected [15:45] fwereade, I *think* it's only important to store non-empty values (after doing resolution) [15:45] dimitern, sorry, lost again, how can you set constraints on a machine? [15:45] fwereade, before provisioning [15:45] fwereade, m.SetConstraints() [15:45] dimitern, can users do that? [15:45] dimitern, (other than when adding?) [15:46] dimitern, I think those values should just be coming from the resolved env+service constraints for the unit whose addition triggered machine addition [15:47] fwereade, I don't think so [15:47] fwereade, but it's still a bug [15:47] dimitern, but either way, when we're storing service constraints in state we shouldn't resolve them [15:47] fwereade, I agree, and I made it so it's definitely like this in both cases [15:48] fwereade, we're not [15:48] fwereade, we only resolve unit constraints when asked [15:49] dimitern, how can we do that correctly if we're throwing away the distinction between "fallback" and "clear" in the service constraints we store in mongo? [15:50] fwereade, let me look at the code [15:51] dimitern, np, sorry it's taken me so long to start looking [15:52] fwereade, ok, that's a good catch sir [15:52] dimitern, yay, my brain still works :) [15:52] fwereade, so we should store them when empty, at least for the services [15:52] dimitern, I think so, yeah [15:52] fwereade, good, it should be easy to fix [15:52] dimitern, and probably across the board, even if the distinction is academic once resolved [15:52] dimitern, cool [16:08] dimitern: just before riding home by bike, http://reviews.vapour.ws/r/2419/ contains the changes we talked about yesterday. hints regarding the testing of the finishedWorker are welcome. I'll take a look when I'm at home [16:15] cmars: ping [16:24] Bug #1486640 opened: Typos in help [16:30] TheMue, sure, will have a look in a bit [16:46] just checking back in and seeing a team call here.. am I in wrong group for "general noob help"? Sorry if so. where is general help? [16:47] jogarret6204: general noob help is in #juju :) [16:47] #juju-dev is primarly for those hacking on juju core [16:47] you're more than welcome to hang in both places though, we welcome all feedback [16:51] don't want to slow you down here, so I'll hit the other. thanks LazyPower and natefinch. [16:59] jog: sorry, yeah, #juju is more the general help channel :) [17:04] jog: sorry, obv not meant for you [17:28] katco: dimitern : can either of you arrange fix for bug 1486675. I think the test needs more smarts, juju is fine [17:28] Bug #1486675: supportedSeriesWindowsSuite.TestSupportedSeries fails [17:28] * perrito666 is connected though his phone and loving his isp [17:29] well creating a simple stream over 1/2 3g really makes me save on heating, the phone is enough for the whole office [17:30] heh === tvansteenburgh1 is now known as tvansteenburgh [17:31] wow, this is a dumb error: ERROR environment destruction failed: destroying environment: container "nate-local-machine-1" is not yet created [17:32] well, that was actually possible [17:32] I think I recall davechen1y or thumper talking about a test that tried to reproduce that by being a race condition [17:33] Bug #1486166 changed: JES deploy fails [17:33] Bug #1486675 opened: supportedSeriesWindowsSuite.TestSupportedSeries fails [17:34] sinzui: dimitern's team is on bug-squad, so i'll leave it to him [17:38] perrito666: my point was rather, if I am destroying the environment, I don't care that there's a container that isn't created yet, that's just one less thing to tear down. [17:39] TheMue, you have a review [17:40] sinzui, katco, sure, let me have a look [17:41] sinzui, this looks like a fallout of a recent change I saw [17:41] dimitern: yes [17:43] sinzui, this one most likely https://github.com/juju/juju/pull/2981 [17:43] bogdanteleaga, are you around? [17:43] yep, that is what I saw [17:44] dimitern, looking into it [17:44] sinzui, katco, so in these case what - is a revert in order? [17:44] bogdanteleaga, thanks! [17:44] dimitern: a revert will unblock. otherwise race to land a fix [17:47] dimitern, sinzui https://github.com/juju/juju/pull/3040 [17:48] dimitern, sinzui I g2g now, merge it once it gets reviewed please [17:48] thank you bogdanteleaga [17:54] bogdanteleaga, awesome, ta! [17:56] sinzui, setting it to merge [17:56] you rock dimitern [17:58] * dimitern wishes all bugs where like this :) [17:59] OMG.... just realized what the problem I've been fighting with all day..... printing a value out with %v was causing a panic from inside fmt somewhere [18:00] aghh gce is saying me I am not authenticated... only for one particular operation wth === natefinch is now known as natefinch-afk [18:06] sinzui, btw I'm not sure if you're monitoring feature branches for trends in failures like for the main branches, but if there is some data about "net-cli", which we're planning to merge tomorrow, will be awesome [18:07] dimitern: you cannot merge it because it has never passed http://reports.vapour.ws/releases#net-cli [18:07] dimitern: merge tip, When CI blesses it, we can merge it into master [18:07] sinzui, hmm that's useful to know [18:07] sinzui, we just did that today [18:08] dimitern: I shall try to force ci to retest net-cli. [18:08] sinzui, it's currently only 4 commits behind [18:08] * sinzui is trying ton retest 1.24 today as well [18:08] sinzui, great, thanks! it will be nice to have some early feedback [18:09] dimitern: maybe this issues are fixed already https://bugs.launchpad.net/juju-core/net-cli [18:10] sinzui, I hope so, however the windows one is a bit worrying [18:10] sinzui, or you mean because it's gone from master? [18:13] dimitern: I hope the issue was really in master, and your merge fixed it [18:15] sinzui, I'll give it a try now as I'm changing that list command [18:16] alexisb: I think we have enough evidence to say Joyent's firewall changes did hurt Juju, and that deleting them when Juju destroys the environment fixes the issue: I want bug 1485781 fixes in 1.25 and 1.24 (maybe 1.22 if we ever plan a release) [18:16] Bug #1485781: Juju is unreliable on Joyent [18:38] ericsnow: natefinch-afk: wwitzel3: sorry i missed the stand-up this morning. anything i can help with? [18:39] katco: review http://reviews.vapour.ws/r/2405/ ? [18:39] katco: (and don't sweat missing standup :) [18:39] ericsnow: tal [19:20] sinzui, thank you for getting the info on joyent and opening the bug [19:20] that is good stuff [19:24] Bug #1486712 opened: Race on uniter-hook-execution, prevents to resolve unit. [19:53] ericsnow: reviewed [19:53] katco: thanks! === natefinch-afk is now known as natefinch [21:37] Bug #1474885 changed: juju deploy fails with ERROR EOF [21:52] Bug #1486749 opened: juju backups create should fail earlier for hosted environments [21:58] Bug #1486749 changed: juju backups create should fail earlier for hosted environments [22:07] Bug #1486749 opened: juju backups create should fail earlier for hosted environments [22:36] alexisb: ping [23:31] waigani: would you mind taking a look at http://reviews.vapour.ws/r/2425/ pls? [23:51] ericsnow: still here? [23:51] perrito666: barely [23:51] * perrito666 sees ericsnow fading [23:52] ericsnow: go use gce with the fields instead of the json file, is it enough to just copy the values of the json? [23:53] perrito666: pretty much [23:53] ericsnow: I believe the issue I told you about earlier might be because storage provisioner tries to do stuff in a machine and finds itself lacking these creds [23:53] perrito666: the PK might not copy-and-paste quite right so you have to watch that [23:54] perrito666: does it do more than make calls on the provider? [23:54] ericsnow: calls that require auth [23:55] perrito666: the provider has all the auth it needs [23:55] the provider? [23:56] perrito666: provider/gce/... [23:56] ericsnow: well I am getting auth errors from one of the machines [23:56] so :) something is wrong [23:56] :) [23:57] perrito666: you're adding new methods to the gceConnection interface (in environ.go), right? [23:57] ericsnow: yep [23:58] perrito666: then I would definitely not expect auth issues :/ [23:58] well it is only happening with one machine i think [23:58] that I added with add-machine [23:58] perrito666: do you have to enable some extra permissions in the GCE developer console? [23:58] so I am looking it up [23:58] (manually) [23:59] ericsnow: where is that stored on the server? [23:59] perrito666: where is what stored? [23:59] ericsnow: the oauth token