[00:00] sinzui, so in the openstack provider attempt, juju fires up a vivid instance, and i can see the nova console log that it's booted and ready with whatever userdata was passed to it. but there is a key mismatch. [00:00] sinzui, with the maas deploy, i get the same symptom (key issue), and it's much harder to get console output. [00:01] er rather, maybe not a key mismatch, but definitely a key issue. [00:01] beisner, I don't think key issues are series issues [00:01] sinzui, all those woes go away when I set default-series: utopic or trusty in the environments.yaml. [00:03] beisner, I think you found a bug :) [00:04] sinzui, ok, what can i do/collect to raise a helpful/meaningful bug on this? [00:04] ie. is there a way to get more verbosity from juju bootstrap? [00:05] beisner, the cloud-init-output log and maybe a machine-0 log if it gets that far [00:05] sinzui, unit 0 never comes alive according to juju [00:05] and juju debug-log is no help at that stage [00:06] beisner, I often ssh into the machine the moment I see the ip is open and tail the /var/log/cloud-init-output.log [00:07] sinzui, ah cool. i'll dive in a bit more. appreciate the guidance. [00:07] ericsnow: just realized I forgot to hit publish on that review, you have a review from me now. [00:07] wwitzel3: thanks [00:09] ericsnow: mostly minor, I found it all easy to follow, no comments about the service stuff since it is pretty generic boiler platey and we talkeda bout it before [00:09] wwitzel3: cool [00:20] thumper: you around? [00:28] sinzui: so we should be able to get 1.22 out now right? since the precise upgrade issue is only 1.23? [00:32] wallyworld, no, because we never got a pass for 1.22 [00:32] sinzui: assuming ci becomes happy [00:32] wallyworld, 1.22-beta4 has NEVER passed CI [00:32] :-( [00:32] wallyworld, if it does pass, then I release [00:33] let's hope this one works [00:41] thumper: ping [00:44] * thumper is here now [00:47] thumper: can i grab 5 when you are free? [00:48] wallyworld: sure, I'll be done chugging lunch in about 5 minutes [00:48] np [00:48] it's a banana protein smoothy [00:48] mmmmm [00:48] 5min for smoothie? [00:49] thy* [00:49] thumper: shall we meet in the 1:1 hangout ? [00:49] davecheney: yep, how about in 11 minutes? [00:49] thumper: go talk to wallyworld first [00:49] ta [00:49] i'll see ou in the hangout in whenever [00:53] wallyworld: our 1:1? [00:53] yup [00:58] sinzui: i just looked at the dashboard and the latest 1.23 run had local-upgrade-precise-amd64 passing [00:58] have you upped the timeout already? [01:13] wallyworld, I did, that was my proof [01:14] sinzui: logs are needed to help see where the time is going [01:15] i can't see any linekd to the dashboard [01:15] wallyworld, I can give you the failures, but the passes might be more informative since I also gave the slaves more time to collect [01:15] s/slave/services [01:16] sinzui: could you attach both to the bug for me? [01:16] wallyworld, I will see what I can do. I don't want to make them public if the contain credentials [01:16] sinzui: make the bug private? [01:18] wallyworld, I cannot because that will hide the critical blockers [01:18] sinzui: oh, maybe send privately? [01:18] wallyworld, this wouldn't be awkward if the credentials for reports.vapour.ws had not also failed this weekend too === kadams54-away is now known as kadams54 [01:25] wallyworld, did you see smoser's ruling on apt-get dist-upgrade [01:25] sinzui: one sec, just finishing standup [01:43] sinzui: read scott's bug comments, we should be ok - we don't use proposed pocket with precise [01:43] that i know of [01:49] menn0, thumper is this doc up to day??: [01:49] https://docs.google.com/a/canonical.com/document/d/1jsuoTbXZbj3wtoXCpc5MGFVvwuofYx3hLw2eNoXEmE0/edit#heading=h.aby6yid7wq2d [01:49] * menn0 looks [01:50] alexisb: yes, except that now we have a working proof of concept of logging to MongoDB and have run scaling tests (see your email) [01:50] wallyworld, we never use proposed [01:51] alexisb: I have been working on turning the POC into production code but whether it gets merged is somewhat dependent on whether the performance hit is deemed ok [01:51] menn0, yep understood, I just need a way to capture the work for logging that can be shared with those that are interested [01:51] wallyworld, but juju does do apt-get upgrade, which didn't give us an updated cloud-utils. apt-get install then did a remove [01:52] alexisb: ok. let me know if you need to know more. [01:52] ie answer the question for "why is logging for JES important and requires work" [01:52] nope I think that gets me what i need [01:52] thanks [01:52] sinzui: so maybe the fix to put cloud-utils and cloud-image-utils on the one line is not required anymore [01:53] wallyworld, it is required because we haven't changed anything else to ensure removals cannot happen [01:53] ok [01:53] wallyworld, sinzui: you guys seem to be discussing the same part of the code that I just filed a bug about. (bug 1424892) [01:53] Bug #1424892: rsyslog-gnutls is not installed when enable-os-refresh-update is false [01:54] menn0: nah, different [01:54] menn0, yes [01:54] this is the deb packaging issue whcich affected cloud-utils [01:54] * thumper sees that menn0 has answered all of alexisb's questions [01:54] coffee time then... [01:54] menn0: your issue is the behaviour of the flags to disable apt [01:55] menn0, apt-get update finds the new packages (in cloud-* example) but apt-get update will not install them because update is not permitted to install new deps! [01:55] menn0, but apt-get dist-upgrade can install new deps [01:55] menn0: when upgrading, do you recall if the state server rejects connections until all nodes in the env are deemed to have upgraded? [01:56] wallyworld: not quite ... let me quickly look at the code [01:59] wallyworld: a state server will accept API connections once it itself has upgraded [01:59] wallyworld: but state servers always upgrade first, before other nodes [01:59] menn0: ok, ta. i'm looking into the CI blocker where precise upgrades tie out [01:59] there's a bucket load of mongo connection failures that go on and on [02:01] but state server probably not at fault if it accepts connections after it has finished [02:01] wallyworld: do you have some logs handy? [02:02] menn0: i'll forward an email. sinzui had to increase timeout from 10 mins to 20 to make CI local precise upgrades pass. but just precise [02:02] wallyworld: that is certainly odd. [02:02] indeed [02:02] precise runs a slightly different mgo version [02:02] that's all i can think of off hand [02:03] wallyworld, hp's swift isn't publish failed. I am getting out the hammers [02:03] what's that about hp? [02:04] wallyworld, timouts uploading [02:04] oh joy [02:05] wallyworld, I switch the job to not rebuild. just try to publish what the previous job made [02:05] ok === kadams54 is now known as kadams54-away [02:14] wallyworld, sinzui - back for a bit, raised a bug on that vivid thing. should be readily reproducible but holler if there are any ?s. https://bugs.launchpad.net/juju-core/+bug/1424900 [02:14] Bug #1424900: Bootstrapping Vivid: ERROR failed to bootstrap environment, Permission denied (publickey), ci-info: no authorized ssh keys fingerprints found for user ubuntu [02:14] thank you [02:19] sinzui: menn0: with those logs, i see the 1st 4 minutes spinning up the state server and machines 1,2, then the state server upgrade completes in about minute, then we see a tonne of connection terminated errors lasting several mintes, so it seems there's an issue wit the worker nodes upgrading [02:19] wallyworld: yep, i'm looking at those logs now [02:19] wallyworld: the machine-0 logs are all perfectly normal [02:20] wallyworld: but the machine-1 logs indicate that the agent never restarted into the new version [02:20] menn0: in machine 1 lo, i see a 3 minute gap fetching tools [02:20] wallyworld: it sees the need to upgrade but never seems to reboot [02:21] menn0: the test may have timed out before machine 1 could restart [02:21] take off the 3 minutes to fetch tools [02:21] menn0, its about timing. I was on the machine when machine 1 was shutdown because of a timeout. it was upgrading. so I hacked the job on the precise slave to give it 20 minutes to see a pass. [02:21] and it probably would have been ok [02:22] but 20 mins is just crazy [02:22] menn0, all machines normally upgrade in less that 60 seconds. so 18 minutes is scarry [02:22] sinzui: this is what i see as the issue [02:22] 2015-02-23 18:36:07 INFO juju.worker.upgrader upgrader.go:201 fetching tools from "https://10.0.0.191:17070/environment/329350b1-edf2-4d62-8156-7338a12d3808/tools/1.23-alpha1.1-precise-amd64" [02:22] 2015-02-23 18:39:52 INFO juju.utils http.go:66 hostname SSL verification disabled [02:22] that almost 4 minute gap [02:22] looking at the timestamps, the machine was going VERY slowly [02:23] 30s between seeing the need to upgrade and then /starting/ to download the tools [02:23] menn0, yep. I rebooted the machine too. It is fast when I use it [02:24] sinzui, wallyworld: the agent is still starting up when it sees the need to upgrade. the timings between the various workers starting up is rather wide, like the system was crawling. [02:25] yes [02:25] it just seems machine 1 is very, very slow [02:25] menn0, it wasn't/isn't [02:27] sinzui: it might not be, but it just looks that way based on the logs [02:28] menn0, I also cleared /var/cache/lxc/cloud-precise [02:28] i'm not sure what to do now to diagnose further - if it really is just precise, that makes it very hard to reason about [02:28] we are still waiting for the same job to run with 1.22 to compare [02:29] i guess we see how 1.22comes out [02:29] wallyworld: we could spin up a precise instance on ec2 [02:30] we could, i might do that [02:33] wallyworld, sinzui: as an example, you can see the difference the "slow down" when you look at the deployer worker in the logs [02:34] wallyworld, sinzui: before the API disconnection (due to the state server upgrading itself), the deployer worker starts 1s after it's parent worker (api-post-upgrade) [02:35] wallyworld, sinzui: at the bottom of the logs it starts 5.5 mins after it's parent worker [02:35] wow [02:35] wallyworld, sinzui: that's pretty strange [02:36] that is indeed the difference we feel watching it [02:37] menn0, the upgrade jub just ran with 1.22. [02:37] hmmm. maybe something is trashing the disk? or eating to cpu? [02:37] wallyworld: that's what i'm thinking. and that thing could even be something in Juju itself. [02:37] 2015-02-24 02:35:45 INFO juju.cmd.juju upgradejuju.go:214 started upgrade to 1.22-beta4.1 [02:37] and status shows it complete at [02:37] 2015-02-24 02:36:09 [02:37] sinzui: what instance spec is being used for the precise tests? [02:38] sinzui: that's the kind of timing I would have expected [02:38] but is the above for the state server? or a machine? [02:38] ah [02:39] the cmd [02:39] menn0, the precise slave has 8G ram with 4 vcpu [02:40] menn0, at this moment it has lots of resources free, but it was busy last hour building and testing [02:40] menn0, there was nothing for us to cleanup when we started investigating. the machine was fast for us [02:41] sinzui: the precise slave is dedicated to running the juju test in question? [02:41] wallyworld, it is [02:43] sinzui: i'm looking at jenkins. are the latest successes because of the extended timeout? [02:43] menn0, the 2 master ones are [02:44] sinzui: kk [02:44] menn0, the 1.22 passed normally [02:46] sinzui: 18mins is just nuts [02:46] menn0, sure, but not for maas. I wondered if recent changes needs a new deps and extra work for precise [02:47] wallyworld: are you spinning up a precise instance? i'd like to poke around as the upgrade happens [02:47] menn0: just resetting by source tree [02:47] my [02:47] sinzui: perhaps bit I can't imagine what would cause this [02:47] but [02:55] sinzui: one thing that could be helpful is if the env had debug logging turned on. [02:56] sinzui: it starts of in debug but switches to info for the root logger [02:56] 2015-02-23 18:32:05 DEBUG juju.worker.logger logger.go:45 reconfiguring logging from "=DEBUG" to "=INFO;unit=DEBUG" [02:56] menn0, I can do that now that ci is locked down [02:56] menn0, I can switch it now, then we wait for 1.23 to test [02:57] wallyworld: meant to say... all those API disconnect messages in the machine-0 logs are just due to the repeated "juju status" polling that the test script does. that's fairly normal. [02:57] menn0, debug is in place [02:57] sinzui: awesome [02:58] menn0: i have a bootstrapped precise instance - did you want me to add your ssh public key? [02:58] wallyworld: please [02:59] * thumper tries for focus for an hour [02:59] if it is urgent, text me [02:59] * thumper doesn't expect anything urgent [03:00] menn0: 54.82.35.127 i'll start a precise worker machine [03:02] menn0: i'm a tool - i bootstrapped 1.23 not 1.21 [03:02] ffs [03:02] it's so hard to get good help these days... [03:03] forgot to type /usr/bin/juju [03:03] sigh [03:14] wallyworld: has that host gone away? I can't connect to it now. [03:15] menn0: yeah, almost done starting a 1.21 host, sorry [03:15] it's been slow to come up [03:17] menn0: machine 0 is 54.158.193.40 [03:17] machine 1 is 54.204.193.53 [03:19] wallyworld: I thought you were just going to test with the local provider on a single precise machine [03:19] menn0: i was curious to see how precise in general went [03:20] but we can do both [03:20] wallyworld: is my key there? [03:20] yep [03:20] do you want to use machine as host for a local env [03:20] machine 0 [03:21] mongo would need to be stopped [03:21] i should just fire up a new machine [03:21] wallyworld: I was using the wrong key... i have a personal one and a canonical one. fixed now [03:23] wallyworld: how about I fire up the machine for the local test [03:23] ok [03:29] menn0: upgrade on aws precise was fast, so it has to be a resource contention issue [03:29] wallyworld: ok. a useful data point. i'm just setting up this other instance now. [03:29] sinzui: SSD or magentic storage? [03:31] menn0, I think the latter. the machine is in Hp [03:31] sinzui: cool. i'll go with that. [03:41] wallyworld: it's 54.190.88.226. installing 1.21 now. [03:41] ok, does it have my ssh key imported? [03:42] wallyworld, all the mass deploy jobs are still failing. [03:42] hmmm [03:43] i wonder what's different compared with clouds [03:43] the same scripts should work on both [03:43] is there a url with logs? [03:46] wallyworld, no, we cannot get logs for maas because they have unresolvable dns [03:46] oh joy, this will be fun to solve [03:46] wallyworld, I am attempting it get into the maas to find the names, for the unit, them use virt-viewer to connect directly to the console [03:47] I dont' have creds though for maas 1.7 and 1.8 though [04:28] wallyworld: well i'm seeing a problem with the upgrade on precise but it doesn't look the same as what happen in the logs from sinzui [04:29] wallyworld: machine-0 fails to upgrade because of: [04:29] "set AvailZone in instanceData" failed: failed verification of local provider prerequisites: [04:29] cloud-image-utils must be installed to enable the local provider: [04:29] sudo apt-get install cloud-image-utils [04:29] wallyworld: shouldn't the upgrade step handle that itself? [04:30] that's expected [04:30] that package has to be on the host machine [04:30] as do one or two others [04:30] hmmm, maybe [04:30] but in general [04:30] the local provider checks for needed packages and prints those messages if they are not there [04:30] I installed juju-local so shouldn't this already have been taken care of [04:31] ? [04:31] wallyworld: maybe it got uninstalled by another operation? [04:31] cloud-image-utils isn't a prereq of juju local, maybe it should be? [04:32] wallyworld: i'll work around this for now so that I can get to the actual problem that CI is seeing [04:32] i think in trusty cloud-utils includes cloud-image-utils [04:32] i can't wait for precise to go away, but still 2 years keft [04:33] wallyworld, I cannot copy this crap gui's cloud-init-output [04:33] sigh [04:34] is there an obvious error? [04:35] wallyworld, I can say it is the same error as before where the apt-line is garbled [04:35] i'm trying to spin up maas locally but the bootstrap node refuses to transition from Deploying [04:35] that doesn't make sense as there should be no quotes there now [04:35] and why just maas and now AWS or HP etc [04:37] and it died as I almost got the log [04:37] wallyworld, as per the bug [04:37] util.py[WARNING]: Failed to install packages: ['bridge-utils', 'curl cpu-checker bridge-utils rsyslog-gnutls cloud-utils cloud-image-utils'] [04:38] sigh [04:38] * sinzui double checks commit [04:38] seems bridge-utils is added twice which is messing it up [04:38] wallyworld, are are testing your commit [04:39] yes on aws [04:39] works perfectly [04:39] wallyworld, apt doesn't care about duplicates on the command line [04:39] sure, but adding twice seems to mess up juju's rendering of the cmd line [04:40] sinzui: can i get access to your maas? [04:40] if i can't reproduce, i can't fix [04:40] wallyworld, sure but it so poorly documented I cannot offer much [04:41] maybe pm or email the ip of the controller and auth key [04:48] wallyworld, sinzui: i haven't been able to repro this precise upgrade timeout issue yet but I still have a few ideas [05:01] sinzui: ah, i see where bridge-utils is added twice - it's hard coded in the maas provider [05:02] that's why it is failing on maas [05:02] wallyworld, really> [05:03] wallyworld, jog hack the local machine to capture logs from machine-0. This isn't too helpful since the failures are machines 1 and 3 [05:03] sinzui: yes, on machine 0 we run the scripts ourselves, not cloud init [05:07] sinzui, wallyworld: ok, i'm stumped on how to reproduce this precise issue [05:07] sinzui: it may be the quickest thing to do is to go back to installing one package at a time now that cloud-image has been moved into the correct repo [05:07] sinzui, wallyworld: every upgrade works quickly for me [05:08] wallyworld, yes, with the caveat that we must also install clout-utils to force the upgrade [05:08] sinzui: yep, i'll install cloud-utils along with the other ones we expect [05:09] wallyworld, or we do dist-upgrade (but I think that is risky for today) [05:09] yeah, i'll keep it simple [05:09] menn0: i think juju might be ruled out - something must be thrashing the machine though [05:10] wallyworld: either that or i'm missing some aspect of the test setup [05:10] i'm about to EOD [05:11] ok, thanks for looking [05:15] sinzui: do i need to do apt-get install --target-release precise-updates/cloud-tools cloud-utils ? [05:16] or can i leave off the --target-release bit [05:16] i think i need it right? for precise? [05:25] sinzui, wallyworld: of course I just realised that I was testing with a 1.23 that was a little behind the times and didn't include some of the recent commits that might be contributing to the issue [05:25] * menn0 facepalms [05:25] sinzui, wallyworld: it might be worth repeating what i've done... [05:25] hard to get good help :-) [05:25] touche [05:25] i gotta fix this other one first :-( [05:26] but I really need to EOD or my wife is going to get pissed [05:29] np, leave it to us [05:50] hi wallyworld [05:53] wallyworld, I was just looking at the MaaS test results for 1.22 revision 79e5ea8a and still see the failure mentioned in bug 1424695. [05:53] Bug #1424695: maas cloud-init cannot download agent from state-server [05:54] wallyworld, it's looks like you through that commit should have fixed that bug? [05:54] thought even [05:58] wallyworld, looks like maybe you dropped for a bit, did you see my comment above? [05:58] jog: no [05:58] wallyworld, I was just looking at the MaaS test results for 1.22 revision 79e5ea8a and still see the failure mentioned in bug 1424695. [05:58] Bug #1424695: maas cloud-init cannot download agent from state-server [05:59] looks like you expected that commit to resolve that quoted package string issue? [05:59] jog: yes, sadly maas fails where the other clouds are ok, so i've marked bug as in progress again. i don't have a working maas to test with [06:00] wallyworld, you have access to finfolk.internal ? [06:00] no, i tried and couldn't get in [06:16] * jog wonders what happened to core vmaas setup on gremlin.internal that was setup during the Brussels sprint. [06:22] jog, well, we were supposed to use it for ipv6 work of maas and juju, but we didn't need to [06:22] jog, IIRC the maas guys used it for qa stuff (or was it finfolk?) [06:23] dimitern, finfolk is used by juju-qa but I have an extra env setup for debugging for anyone on core that needs it [06:24] jog, right, that's good to know then :) [06:42] dimitern: hi, i'm a little concerned that the vet warnings can be suppressed. shouldn't we be fixing the warnings? those provisioner ones are annoying :-) [06:43] wallyworld, they should've been fixed yesterday by jw4 [06:43] wallyworld, but maybe it bounced due to the blocker [06:44] wallyworld, nope - it did bounce - https://github.com/juju/juju/pull/1654 [06:45] wallyworld, and these appear for go1.4 only, which we don't officially support yet ;) [06:45] wallyworld, go vet in 1.4 is dumber than 1.2 - "%q" is reported for a type with String() method [06:56] wallyworld, I'm not sure how much you've seen of my comments [06:56] dimitern: my stupid connection to free node keeps dropping, so none sorry :-( [06:57] wallyworld, I thought so :) [06:57] wallyworld, basically go vet 1.4 is dumber than go vet 1.2 (which is still the official go version we're using) - "%q" for a type with String() is reported [06:58] wallyworld, and there's this PR which bounced due to the blocker - https://github.com/juju/juju/pull/1654 which fixes the warnings [06:58] wtf, go vet go dumber [06:58] what are they thinking [06:58] i'm on go 1.3.x [06:58] they were *not* thinking :) [06:59] dimitern: with the blockers - we tried and couldn't reproduce the precise uprade one today [06:59] we contend it's a machine thrashing issue on the test vm [06:59] wallyworld, the slowdown or the quoted packages? [07:00] slowdown' [07:00] with the packages one, i changed the behaviour but maas still failed (only maas, not aw etc) [07:01] so i'm looking to revert the apt behaviour to be as per 1.21 since as of today or yesterday the cloud archive has been updated [07:01] wallyworld, hmm.. maas took longer than usual to upgrade according to the bug reports [07:01] and we don't need to install cloud-utils and cloud-image-utils together [07:01] oh, we were working on the assumption of local taking too long, that's what cutis told us [07:02] one thing occurred to me - for precise, are we making sure we add the cloud-tools pocket? [07:03] dimitern: yeah, that's added [07:03] there's the MaybeAddCloudToolsWhatever in cloudinit that gets called for a given series - but IIRC it was put in an if block with enableAotUpdate [07:03] dimitern: and i just confirmed, doing apt-get install cloud-utils by itself breaks [07:03] but apt-get install cloud-utils cloud-image-utils works [07:03] but setting that up breaks on maas [07:03] wallyworld, due to cloud-init getting removed? [07:03] yeah [07:04] can you confirm cloud-image-utils comes from the cloud-tools pocket on precise? [07:04] it has to be fixed there [07:05] dimitern: actually, it may be that we are only adding the cloud-tools pocket when bootstrapping [07:05] that would explain things [07:05] wallyworld, that's wrong if we do [07:05] wallyworld, I'll have a look what triggers it [07:09] dimitern: i have a branch which i was going to propose pn gh to revert the apt behaviour to be more like 1.21, but with cloud-utils added (it needs to be installed to get the right version). bootstrap is fine, but adding a new machine fails. it may be fue to cloud-tools pcoket not being added except for at bootstrap. but i have to go out to a Foo Fighters concert, will be back later. here's the branch. https://github. [07:09] com/wallyworld/juju/tree/revert-apt-install-method [07:09] wallyworld, sure, I'll look into it [07:09] the above branch reverts the behaviour of multiple packags on the one apt-get line [07:09] wallyworld, enjoy the concert ;) [07:09] ty [07:10] dimitern: if you run up a machine, apt-cache policy cloud-utils should show 0.27 [07:10] on bootstrap and a worer node [07:10] worker [07:10] and cloud-image-utils should be there too [07:10] wallyworld, ok, will check both [07:10] i'll check back later in a few ours [07:11] tyvm [07:11] np [08:58] is gwacl's primary project site still launchpad.net/gwacl? [09:10] dimitern: did you see william today? [09:10] jam, not yet [09:20] wallyworld: did you want to discuss ensure-ha --to ? [10:32] natefinch: /wave [10:32] jam: howdy [10:32] jam: gimme 2 minutes to go get my coffee? [10:33] k [10:40] natefinch: I don't hear you [11:20] morning [11:59] natefinch_: still up? [11:59] perrito666: I am here... got up for an early meeting [12:00] perrito666: probably will have to go soon as the kids are stirring [12:01] natefinch_: I cannot help but notice you are OCR today and I have a very small change here http://reviews.vapour.ws/r/995/ [12:03] perrito666: ship it! [12:04] tx [12:04] I should have known that the trick was to find you half asleep [12:05] 3 blocking bugs? oh what have I done to deserve this [12:05] doh [12:10] * dimitern nailed the cause of the blocker [12:11] * dimitern hates cloud-init more than before now [12:22] fwereade: greetings [12:22] jam, heyhey [12:23] perrito666: doesn't removing restore break compat with older servers? [12:27] fwereade: so I commented on your last request. I was wondering if we would still want the server to give the official time. [12:28] jam, so the server would always return what you asked for? [12:29] jam, I'm open to being convinced, but I can't see what we'd use it for today [12:29] jam, and if we need it in the future it's a new api version anyway surely? [12:31] fwereade: the client could request an amount, the server may upgrade that to a longer amount and replies with the actual amount [12:31] fwereade: has better future compat if we change the minimum timeout, clients that were asking for something short still work [12:35] jam, surely even if we're just changing that behaviour we'd be adding a version anyway? [12:36] fwereade: then why worry about it at all? I thought the point of client requests was to allow flexing it [12:36] I don't think we'd *have* to change the version just because the boundary ranges change, if we make it clear [12:36] that said, we don't need to spend hours on this [12:37] jam, because 30s was a magic number embedded deep inside the server that I suspect is more than usually vulnerable to changes without proper foresight [12:38] jam, it's the client that needs 30s, and it will still be the client who needs more or less time in future, I think? [12:38] jam, boundary values perhaps, so long as we're making them looser, I suppose, but that's still a bit risky for my tastes [12:39] jam: mm, you have a good point I think I can make it use both ways according to the server version [12:41] jam, "may I be leader for the next X seconds [yes|no]" STM to be simple and not simplistic -- and, hmm, does not actually preclude creating a longer lease internaly -- it just means that the client couldn't take advantage of that knowledge to space out its requests more [12:42] fwereade: so I don't think any client is going to make hard time constraints. They might be able to say "it will take approx X" but I doubt anyone can guarantee that they won't take longer than that [12:42] jam, sorry, what won't take longer? [12:43] jam, time to renew the lease? [12:43] fwereade: whatever thing they are running that they are expecting to hold the lease [12:43] jam, hence putting it in the client's control, and having them request a 60s and renew it in 30s, independent of what else is happening [12:44] jam, based on user feedback what is desired/required is a guarantee that a successful is-leader call give you 30s grace in which you're sure you're the leader [12:45] jam, worst-case you request *just* before a failing renew, and you *still* get 30s of grace from that success [12:45] fwereade: as in we won't nominate a new leader for 30s after the last one expired ? [12:46] jam, no? we won't nominate a new one until immediately afterthe last one expires [12:46] jam, that's why I ask for 60s and guarantee 30 [12:46] jam, and refresh every 30 [12:47] jam, even when you fail your nth request, your n-1th one is still good, leaving you time to react [12:47] jam, while you are still leader and nobody else is stepping on your toes [12:48] fwereade: what is triggering this polling ? [12:48] if it's juju calling a leader hook, can't we easily be blocked waiting for some other hook to finish? [12:49] is it *while* running a given hook we expect them to split off a thread/process to call back into us to ensure that they're still active? [12:49] jam, the leadership tracker will be running anyway, independently of anything else -- the hook tool asks the context which asks the tracker if it can be leader [12:50] jam, nobody seems to want that [12:50] jam, best-effort leader-deposed execution once we're outside the reported grace period seems sufficient [12:51] fwereade: it *feels* to me like someone would write "ok, I need to reconfigure my workload, ask for leader for X seconds, start reconfiguring, oh reconfiguring took too long, but I'm stuck in that process" [12:51] who/what would actually refresh the leadership request [12:52] jam, the refreshing is continuous anyway while you're leader whether you're running a hook or not [12:52] fwereade: k, then I don't see any need to set any time, Juju refreshes at an interval of X [12:54] jam, I'm not so certain that we'll never need to tweak that... [12:54] fwereade: you just said that if we need to tweak it, it would be a version bump (IMO) [12:54] my point was, if we want to make it variable, make it easy to be variable and still compatible [12:54] jam, I thought that was what I was doing :) [12:54] or just make it fixed and we bump the API version when we need to change ti [12:55] it [12:55] fwereade: you made it variable from one side, but not the other [12:56] fwereade: if this is being confusing or feeling like I'm antagonizing you, I'm not trying to, I'm happy with a JFDI here. [12:56] but it seemed odd to say that only one side would get to change without an API version bump [12:56] when it isn't hard to make it graceful either way [12:57] jam, I know you're not, but I must admit I am a little confused so there's probably something worth figuring out [12:57] fwereade: I think my gut reaction is "why is this an error, and why can't we just request 0s and get told what the lease time is" [12:57] jam, my contention is that it is hard to make it graceful if we allow the server any more than a yes/no :) [12:58] asking for too long I could accept, asking for too short seems like "nope, just take this longer one" [12:58] fwereade: but I was thinking it was being exposed to the charm itself [12:58] and that charmers were going to have to say "I'm goingto run this op, I think it will take 3 minutes, give me a 3min lease" [12:58] which lead to the other problems [12:58] jam, if I ask to be leader for 60s and the server makes me leader for 300s, I'm still definitely leader for 60s [12:58] (who can actually refresh, etc) [12:59] jam, as far as I'm aware that was never on the table [13:00] jam, nobody asked for it, and it adds complexity and temptation to ask for infuriatingly long lease times that then lead to poor experiences when those units fail [13:00] fwereade: that and missed guestimates that then lead to a need to have a separate process that is refreshing. I think the point is Juju is doing all the refreshing (which actually has its own problem of Juju being up and happy, but the charm code stuck in an infinite loop) [13:01] fwereade: but that's probably still a decent place to be. [13:23] dimitern: looks like you found a fix for the cloud-utils issue. i didn't think to set apt-get update = true in cloud init for precise. how does that interact with the apt disable update settings? did you use GetPreparePackages() to selectively include target-series for precise? === Murali_ is now known as Murali [13:27] wallyworld, it overrides the apt-get update disable flag [13:27] wallyworld, it has to otherwise it won't work [13:27] dimitern: ah ok, and we can't add the cloud-tools repo without updating i assume [13:27] wallyworld, I'm using your approach with GetPreparePackages from utils/apt, but rather than joining all with " ", I'm calling AddPackage for each one [13:28] dimitern: yeah, that's exactly what i did in my latest branch i pushed before i left [13:28] wallyworld, yes - it update is off, no apt-sources or packages are installed [13:28] wallyworld, there's a quirk though due to cloud-init [13:28] dimitern: i discovered you had to do the packages one by one otherwise it would be sad [13:29] Foo Fifgters were farking awesome btw, bloody excellent concert [13:29] gad, can't type [13:29] Foo Fighters [13:29] wallyworld, sweet! :) [13:29] by ears are ringing :-) [13:29] my [13:30] wallyworld, btw - this is the meat of my patch http://paste.ubuntu.com/10389087/ (apart from a similar if (in the beginning) in the templateUserData func in the lxc-broker) [13:30] looking [13:31] dimitern: yeah, that matches my understanding also [13:32] dimitern: i can review once you propose [13:32] wallyworld, sure, I'll propose soon, but I wanted to test on a precise host just to be sure [13:32] np [14:01] dimitern: hangout? [14:02] dimitern: MAAS+Juju Network interlock [14:02] dooferlad, uh, yeah - omw, thanks! === ChanServ changed the topic of #juju-dev to: https://juju.ubuntu.com | On-call reviewer: see calendar | Open critical bugs: 1424695 1424777 === kadams54-away is now known as kadams54 [14:20] gsamfira: are you around? I've got a fix for windows testing stuff I'd like you to look at [14:50] mgz, hey [14:50] mgz, you'd probably guess what I'll ask about :) [14:53] dimitern: I hope to have an update, various things are borked [14:53] dimitern: who's ocrs today? [14:53] mgz, natefinch_ and anastasiamac according to the calendar [14:53] mgz, ok, thanks [14:54] hm, I wonder if I'm in the middle of those two [14:54] natefinch_: can I have a review plz? https://github.com/juju/testing/pull/52 [15:05] mgz: looking === natefinch_ is now known as natefinch [15:06] dimitern, howdy, sorry I was late for our 1x1 but I am on the hangout now whenever you are ready [15:06] alexisb, oh, omw [15:08] mgz: this is a lot more complex than it seems to need to be. If we know we don't need a whitelist on anything other than windows, why not just put a if runtime.GOOS != "windows" { return } at the top, and let the rest of the function be windows only? [15:09] look at the context above, there's also a JUJU_MONGOD variable that's whitelisted everywhere [15:10] (and potential for more I guess) [15:10] mgz: oops, missed the append of the testingVariables, sorry [15:10] I could make it shorter by just checking that, but it's written in such as way as it wanted to be extensible [15:17] yeah... it would probably be a lot easier if I weren't looking at it in a diff [15:34] dimitern: ping [15:35] katco, pong [15:35] dimitern: it looks like v2 S3 signing is slightly different than standard v2 signing [15:35] dimitern: amazon states: "Amazon S3 now supports the latest Signature Version 4. This latest signature version is supported in all regions and any new regions after January 30, 2014 will support only Signature Version 4." [15:36] natefinch: 1-on-1? [15:36] dimitern: are you ok with shifting s3 to only using v4? porting the special v2 signing stuff into our standard v2 signing method would be a bit of a pain [15:36] katco, let me assimilate this for a moment :) [15:36] dimitern: not a problem, please take your time :) [15:38] katco, so for v2 we can do that later I guess (drop the special signing and leave the post-jan-2014 one only) [15:38] katco, for v3 we need to make it work as needed, because we'll be switching to v3 pretty soon [15:38] dimitern: you're talking goose versions now? [15:39] katco, no - about goamz [15:39] dimitern: ack sorry... wires crossed. that's what i meant :) [15:39] katco, ok, sgtm then [15:39] katco, what's your immediate plan? [15:39] dimitern: so some clarification [15:39] dimitern: we want v3 of goamz in juju by this friday for the feature freeze [15:40] dimitern: this is to support efforts in the china region [15:40] dimitern: so goamz v3 would drop s3 signing v2 in favor of standard v4, and we would put that into juju by friday [15:40] dimitern: are we on the same page? [15:40] katco, sgtm, however I have one request [15:41] dimitern: sure thing [15:41] katco, I've already ported what was sensible to port from v1 to v2 [15:41] katco, would you be so kind to do the same for v2 to v3? [15:41] dimitern: with some guidance, sure. are the change-sets fairly self contained? [15:41] katco, it shouldn't be a lot anyway, and I can help [15:42] katco, I think so [15:42] dimitern: yeah sure thing then. let me get the live tests working and then i'll work on that [15:42] katco, cheers! [15:42] dimitern: same to you! tyvm o/ [15:43] dimitern: i would like to buy you a beer in nuremberg :) [15:44] katco, \o/ I'll hold you on to this though :P [15:44] dimitern: it will be my pleasure! :D [15:44] ;) [15:45] i have a stein that my mom got me... i'm wondering if i should bring it [15:45] dimitern, wallyworld ; fwiw it seems that go vet in 1.4.2 is saner than 1.4.1 [15:46] dimitern, wallyworld but in any case the PR that *did* land allows you to set the environment variable IGNORE_VET_WARNINGS="some non-zero string" which will cause the pre-push hook to report but not fail on go vet warnings [15:47] jw4, that's fine - I'd prefer the flexibility of being able to ignore it, if needed [15:48] dimitern: yep [15:51] ericsnow: yeah, we can pop into moonstone [15:51] natefinch: k [15:51] if I wanted to change the hostname during juju kvm creation do i need to modify the cloud-init config for that or is there an easier way [15:52] fwereade: still working on upgrade steps, but I wanted to get this wip in front of you sooner rather than later: http://reviews.vapour.ws/r/997/ [15:56] sinzui, I've marked bug 1424695 as duplicate of bug 1424777 as it's caused by the same issue [15:56] Bug #1424695: maas cloud-init cannot download agent from state-server [15:56] Bug #1424777: local-provider precise failed to upgrade [15:57] dimitern, hurray of sorts === ChanServ changed the topic of #juju-dev to: https://juju.ubuntu.com | On-call reviewer: see calendar | Open critical bugs: 1424695 [15:57] sinzui, indeed :) [16:03] it's like this I believe === ChanServ changed the topic of #juju-dev to: https://juju.ubuntu.com | On-call reviewer: see calendar | Open critical bugs: 1424777 [16:04] dimitern: is there a way to customize the hostname during a kvm CreateMachine? [16:05] stokachu, I don't know for sure, but I doubt it [16:05] stokachu, we're not setting anything specific in cloud-init for hostname [16:06] dimitern: is that file generated dynamically? [16:06] the cloud-init file [16:06] stokachu, well, yes [16:06] stokachu, before provisioning a machine [16:06] problem is all KVM creations have a hostname of 'ubuntu' [16:07] stokachu, that comes from the ubuntu-cloud image IIRC [16:07] yea i was hoping there was a way to change that prior to provisioning [16:08] stokachu, that's the first time I've heard someone wanting this btw - it's good you've filed a bug for it [16:09] it only affects local provider, trying to deploy ceph units, it fails b/c all 3 units have same hostname [16:09] mgz: gave you a review, just one minor tweak requested, but otherwise LGTM [16:11] dimitern: https://bugs.launchpad.net/juju-core/+bug/1326091 [16:11] Bug #1326091: deploying into kvm with local provider, hostnames are not unique [16:11] stokachu, cheers, if it's a blocker for you guys, I'd suggest pinging alexisb [16:13] yes stokachu can you plese send me mail [16:14] natefinch: thanks! I'll adjust and land [16:15] alexisb: will do thanks [16:15] dimitern: thank you too [16:20] stokachu, no worries [16:20] wallyworld, if you're still here - http://reviews.vapour.ws/r/998/ [16:20] ^ fixes the blocker bug [16:20] natefinch, axw, others? ^^ [16:23] hi dimitern, updated http://reviews.vapour.ws/r/974/, does it still look good to land? [16:24] dimitern: looking [16:24] cmars, hey! yes indeed - I've checked it already yesterday [16:24] natefinch, thanks [16:24] dimitern, awesome, thanks for the review! [16:24] cmars, np [16:25] brb === kadams54 is now known as kadams54-away === kadams54-away is now known as kadams54 [16:38] dimitern: is it possible to have --target-release series package1 package2 package3? And if so, will that still do the right thing with this code (i.e., they'll be joined by a space as the package name)? [16:38] natefinch, no it's not [16:39] natefinch, as far as I could understand from apt-get docs [16:40] natefinch, and even if it was, it still won't work due to the way cloud-init 0.6.3 passes them to apt-get [16:40] natefinch, e.g. "--target-release rel pkg1 pkg2 pkg3" [16:41] natefinch, well, actually - sorry, it *is* possible [16:41] natefinch, apt-get accepts that, but not cloud-init [16:42] dimitern: gah, what a pain in the ass. [16:42] natefinch, in reality, --target-release is just a hint to the apt policy engine to decide which candidates to prefer [16:43] natefinch, oh tell me about it :) I've been on it since 8 am [16:47] dimitern: I wonder if we should be checking to make sure that they don't do --target-release series pkg1 pkg2 ... but maybe that's being too smart? [16:48] natefinch, I have a couple of panics in place if for that reason [16:48] s/if// [16:49] natefinch, all tests pass - both make check and the live ones I've described [16:49] natefinch, I suppose you could argue if 99% of the existing tests pass, it's a bad thing and we need better ones [16:50] natefinch, but I'd rather land this and unblock everyone in a few hours, then I'm happy to improve the tests as a follow-up [16:51] natefinch, I'm porting the same fix to trunk and fully intend to retry the same live tests before proposing it === kadams54 is now known as kadams54-away [16:51] * dimitern is sick of blockers already - let's not add to that :) [16:52] katco, still around? [16:53] katco, is this ready to land https://github.com/go-amz/amz/pull/27/ ? it looks so to me - if it is, i'll go ahead and merge it [16:55] dimitern: no not quite yet [16:56] katco, ok, np [16:56] dimitern: almost there [16:56] * dimitern steps out for a while - will be back soon [16:58] dimitern: gave you a ship it === kadams54-away is now known as kadams54 [18:11] natefinch, can you edit the HA doc now? [18:15] natefinch, thanks! [18:32] man, I got review 999? [18:32] one off, one off [18:32] mgz: quick think of something else to PR [18:32] review please: reviews.vapour.ws/r/999/ [18:39] natefinch: ^ [19:02] is landing still blocked? [19:03] ok, nvm [19:08] alexisb: still view only [19:08] mgz: ship it [19:09] natefinch, ack, I dont have permission to give you write access we will have to bug wallyworld [19:11] alexisb: yeah, figured. No problem. I meant to ask for write access from him last night and forgot. [19:19] cmars, it is, I'm still testing the port of the fix for trunk, but 1.22 should get unblocked at least (not that it matters I guess) [19:20] dimitern, got it, thanks for fixing! [19:20] cmars, np [19:21] natefinch, FYI, there's the tech-debt bug 1425245 I filed for your suggestion [19:21] Bug #1425245: improve cloud-init tests after the fix for bug #1424777 unblocks CI [19:21] dimitern: thanks :) [19:23] :) [19:50] axw: llgoi ever get announced? [19:54] * hazmat switches to pm [20:01] thumper: do you have a link to the doc about CLI 2.0? [20:01] (asking for a friend) [20:01] natefinch: it isn't written yet :) [20:01] thumper: heh [20:01] thumper: thought it was already being worked on [20:02] (ish) === kadams54 is now known as kadams54-away [20:27] natefinch, PTAL http://reviews.vapour.ws/r/1001/ - fix for the blocker for trunk [20:45] thumper, ^^ [20:46] dimitern: ack [20:46] thumper, I'm waay pas EOD, so I'm going - please add $$fixes-1424777$$ if it's ok [20:46] dimitern: do you have the link for the 1.22 version? [20:47] dimitern: and thanks for the fix [20:47] thumper, https://github.com/juju/juju/pull/1670/ [20:47] ta [20:47] thumper, np [21:39] every time I unplug my external monitor my computer hangs... how sad [21:52] perrito666: tell Trevinho in #ubuntu-desktop [21:52] perrito666: tell him I sent you :-) [21:52] perrito666: although I'm not sure he'd be on right now as he lives in Italy [21:52] perrito666: but is is known to work weird hours [21:52] thumper: why that makes me think that I will get something thrown over my head [21:52] :-) [21:53] he's a good guy, and one of the current maintainers of unity 7 [21:53] thumper: anyway I am using vivid,so that is most likely my fault [21:53] perrito666: assuming you're using unity [21:53] thumper: it is unity, version is a mistery, whatever is shipped with vivid [22:25] why am I still getting "Build failed: Does not match ['fixes-1424777']" [22:30] cmars: can I be a smartass? [22:30] :p [22:31] cmars: jokes aside, that bug must be marked as critical and not fix commited? [22:32] perrito666, it's fix committed though [22:32] https://bugs.launchpad.net/juju-core/+bug/1424777 [22:32] Bug #1424777: local-provider precise failed to upgrade [22:33] cmars: it has to be "fixed released" before it's unblocked [22:33] marcoceppi made this recently: http://juju.fail/status.json [22:33] cmars: we need to make sure the ci test that it is supposed to fix is actually fixed [22:33] sinzui: ping [22:33] sinzui: can we see if dimiter's patch has fixed the precise upgrade? [22:35] jw4, marcoceppi that's pretty neat [22:36] cmars: I believe it actually uses the same mechanism the CI bot uses [22:36] cmars: yeah, I was just about to put a small webpage in front of it, but status.json will always be available for consumption [22:36] jw4: it does, about to push the code up which generates this page [22:36] marcoceppi: cool [22:40] thumper, it does, but aws is ill, preventing an official blessing, but we are switching everything we can off of aws [22:41] sinzui: cool [23:32] wallyworld: you around yet? [23:32] yup [23:32] wallyworld: got time for a quick hangout? have a series of s3 questions [23:32] sure [23:32] wallyworld: sweet... 1:1? [23:32] yup [23:34] ahaa I believe it chrome that goes crazy when x is resized