[00:00] <jw4> thumper: do you have some direction for me regarding feature flags?  We'd like to begin landing the Actions CLI behind a feature toggle until it's complete.
[00:01] <jw4> thumper: I'm planning on using an envar say DEV_ACTIONS_CLI or something
[00:02] <hazmat> wallyworld_, that would be great
[00:02] <wallyworld_> np, done
[00:02] <wallyworld_> or will be real soon :-)
[00:15] <wallyworld_> hazmat: are some of those instance types only available in certain regions?
[00:16] <wallyworld_> cause Juju is complaining there's missing cost data in regions for G1 etc
[00:16] <hazmat> wallyworld_, bummer
[00:16] <hazmat> wallyworld_, bummer.. needs better unit tests..
[00:17] <hazmat> wallyworld_, the G series are not publicly available, i put in the definitions for them.. but only setup mappings for the g5s for a  partner request
[00:17] <hazmat> their in a public beta atm, but pricing is not known
[00:17] <hazmat> but a partner was interested in using them
[00:17] <wallyworld_> what default would be sensible you think?
[00:18] <wallyworld_> or would a default even be worthwhile
[00:18] <wallyworld_> if i put in a high figure, that would mre or ess require explicit instance type selection by the user
[00:19] <wallyworld_> that probably could be ok i think
[00:19] <hazmat> wallyworld_, sounds reasonable
[00:19] <wallyworld_> ok, will do thanks
[00:19] <hazmat> wallyworld_, g series are mostly monster machines.
[00:19] <hazmat> so i imagine the price is.. well if you have to ask ;-)
[00:19] <wallyworld_> lol
[00:35] <thumper> jw4: oh hai
[00:35] <thumper> jw4: the feature flag branch needs tweaking and can land shortly after
[00:35] <thumper> didn't realise anyone else was waiting on it
[00:37] <davecheney> mwhudson: 1.5 discussion will start in the next 48 hours
[00:37] <davecheney> probably sooner
[00:38] <mwhudson> davecheney: \o/
[00:38] <mwhudson> in other news, i've found yet another gccgo bug
[00:43] <jw4> thumper: no worries - we just started talking about it too
[00:43] <wallyworld_> axw: i had to make a fix to gwacl to ensure all instance types had costs. I fixed the tests also. these changes are in rolesizes.go and rolesizes_test.go. However, I got sick of the non-standard formatting cause my editor gofmts on save, so the branch has had vanilla gofmt run over it so the diff is large. all tests run, the real changes are in the files i just mentioned. are you able to take a look?
[00:44] <wallyworld_> https://code.launchpad.net/~wallyworld/gwacl/ensure-all-roles-have-costs/+merge/243346
[00:44] <wallyworld_> ignore my crappy branch name
[00:44] <jw4> thumper: I will be interested to see it when the branch is ready to land
[00:44] <thumper> jw4: ack
[00:54] <davecheney> mwhudson: sorry, there is no prize for gccgo bugs
[00:58] <axw> wallyworld_: ok
[00:59] <wallyworld_> ty and sorry :-)
[00:59] <wallyworld_> hopefully you can ignore the white space
[01:00] <axw> wallyworld_: if you haven't already, can you please either delete the format target in Makefile, or change it to "go fmt"?
[01:02] <mwhudson> davecheney: heh heh
[01:02] <wallyworld_> axw: yeah did that :-)
[01:05] <axw> wallyworld_ hazmat: do we even care about actual costs? I think we only care about relative costs... I wonder if we could just do "1, 2, 3, ..." in order of role size
[01:05] <wallyworld_> axw: i made the costs arbitary and large
[01:05] <wallyworld_> and in order of role size
[01:05] <axw> wallyworld_: I guess I should just actually look at the MP ;)
[01:06] <wallyworld_> axw: that way, G1 etc are not selected by default, and require an explicit instance type constraint
[01:06] <axw> right. goodo
[01:22] <perrito666> I try to run my tests with -gocheck.f="myfilter" but whenever I use -gocheck.something the only run test is dependenciesTest.TestDependenciesTsvFormat
[01:22] <perrito666> anybody hit tat before?
[01:24] <davecheney> perrito666: are you still stuck on that dependency problem ?
[01:25] <perrito666> davecheney: nope, I just tried to run one test alone of a suite
[01:26] <perrito666> davecheney: tx for your help the other day, I realized too late you where not working that day, apologies
[01:28] <mwhudson> davecheney: ah no, it's probably just a go tool bug
[01:28] <mwhudson> davecheney: even tinier non-prize
[01:31] <davecheney> perrito666: meh, if i'm not working, i'll ignore you
[01:31] <davecheney> you don't need to appologise
[01:33] <perrito666> :p I am a very nice person, why would you ignore me?
[01:33]  * perrito666 is offended
[01:34] <mwhudson> yes
[01:37] <mwhudson> davecheney: do you remember enough of the pain to say if this makes sense? http://paste.ubuntu.com/9338212/
[01:40] <davecheney> mwhudson: i remember that link order was super fragile
[01:41] <davecheney> we occasionally see builds blow up on power64 when link order is wrong
[01:41] <davecheney> i can't say that what you have proposed is wrong
[01:41] <mwhudson> davecheney: yeah, that's the bug i'm poking at
[01:41] <davecheney> but it does make sense that that is the cause
[01:42] <mwhudson> davecheney: the problem seems to be that i added code to not pass the target of two actions for the same a.p
[01:42] <mwhudson> but there are two different a.p's for the package being tested when you run go test
[01:42] <mwhudson> (which makes sense, maybe, now that i think about it: one has the internal _test.go files in it)
[01:46] <wallyworld_> katco: i asked for tests, let me know if you have any questions
[02:03] <davecheney> mwhudson: juju uses 'external' tests a lot more than the stb lib
[02:03] <davecheney> and we're also leaning hevily on the export_test.go escape hatch to get access to internal functions during tests
[02:04] <davecheney> i'm sure this has agrevated the issue
[02:55] <jw4> thanks thumper - I see your PR
[03:08] <thumper> hmm... is juju/cmd hooked up to review board?
[03:08] <thumper> https://github.com/juju/cmd/pull/10
[03:17] <thumper> menn0: you might like that one :-) ^^^^
[03:22] <menn0> thumper: looking...
[03:23] <thumper> menn0: obviously the supercommand code wasn't complicated enough...
[03:24] <thumper> menn0: I've just noticed the I never got around to updating the doc strings for the new functions
[03:29] <menn0> thumper: i'll ignore that then :)
[03:29] <menn0> thumper: reviews for other repos don't go to RB then?
[03:30] <thumper> menn0: some do, some don't
[03:30] <thumper> I don't know which do
[03:31] <menn0> thumper: well this hasn't so I'll do the review on GH
[03:37] <menn0> thumper: so I should ignore the incorrect docstring for RegisterSuperAlias?
[03:38] <thumper> yes
[03:43] <ericsnow> axw: patch to add AZ to instanceData: http://reviews.vapour.ws/r/557/
[03:43] <ericsnow> axw: let me know if that's what you had in mind
[03:44] <ericsnow> axw: with that my unit-get patch will get a lot simpler :)
[03:46] <axw> ericsnow: yep, pretty much. I don't think Instance needs to get a new AvailabilityZone method; we should just return it from StartInstance
[03:47] <axw> I guess it doesn't hurt to have it there though
[03:48] <axw> ericsnow: oh right, it's needed for upgrades. never mind me
[03:49] <axw> ah hm, not really if it's a ZonedEnviron ... anyway.
[04:10] <menn0> thedac: review done
[04:10] <menn0> thedac: sorry, that was supposed to be for thumper
[04:38] <davecheney> mwhudson: the patch attached to this issue appears to be a patch, if it's not a patch, then remove the patch tag from this issue ...
[04:38] <davecheney> thanks, i guess
[05:19] <huwshimi> Hi, I was just wanting to try a few things with 1.21 and wondering which ppa to use?
[05:30] <axw> huwshimi: I think you want https://launchpad.net/~juju/+archive/ubuntu/devel
[05:31] <huwshimi> axw: A brilliant thanks, I'll take a look
[05:32] <huwshimi> *Ah
[05:48] <dimitern> morning
[07:35] <axw> fwereade_: any chance you could take a look at https://github.com/juju/charm/pull/77 some time soon? I expect we can iterate on this for changes due to mandatory constraints
[07:36] <fwereade_> axw, sorry, queued it up
[07:36] <axw> fwereade_: thanks
[09:06] <dimitern_> fwereade_, wallyworld_, TheMue, hey guys, any of you willing to review a fix for bug 1395908 ? http://reviews.vapour.ws/r/558/
[09:06] <mup> Bug #1395908: LXC containers in pending state due to juju-br0 misconfiguration <lxc> <oil> <juju-core:In Progress by dimitern> <juju-core 1.20:In Progress by dimitern> <juju-core 1.21:In Progress by dimitern> <https://launchpad.net/bugs/1395908>
[09:07] <wallyworld_> dimitern_: i've been called to dinner but can look afterwards if not done by then
[09:07] <dimitern_> wallyworld_, no worries
[09:08] <fwereade_> dimitern_, LGTM, very clean, ty
[09:09] <mattyw> morning all
[09:09] <dimitern_> fwereade_, thanks!
[09:10] <dimitern_> morning matty
[09:10] <mattyw> dimitern_, morning
[09:11] <mattyw> fwereade_, morning, would you have a few minutes to take a look at http://reviews.vapour.ws/r/553/
[09:46] <dimitern_> fwereade_, when you have 5m, I have a backport for the same bug fix - http://reviews.vapour.ws/r/559/ (for 1.21) - no other changes
[09:48] <fwereade_> dimitern_, LGTM again :)
[09:50] <dimitern_> fwereade_, cheers :)
[09:52] <voidspace> Nobody has posted to juju or canonical-tech about rocket...
[09:54] <fwereade_> voidspace, it was on cloud though
[09:54]  * fwereade_ popping out, bbs
[09:59] <voidspace> fwereade_: ah, I may not be on that list then...
[10:01] <voidspace> jam1: dimitern_: stdup?
[10:02] <dimitern_> voidspace, omw, sorry
[10:50] <fwereade_> menn0, ping
[10:51] <menn0> fwereade_: hi
[10:51] <fwereade_> menn0, remind/reassure me: did we manage to close off the paths whereby non-state server agents can end up connecting to state servers of an earlier version than themselves?
[10:53] <menn0> fwereade_: you mean while an upgrade is in progress?
[10:53] <fwereade_> menn0, yeah
[10:53] <menn0> fwereade_: yes, that was already done before I touched the upgrade code
[10:53] <menn0> fwereade_: state servers don't advertise the new version until they themselves have upgrades
[10:53] <menn0> upgraded
[10:53] <fwereade_> menn0, I'm just looking at the bit I wrote in the uniter for spinning on CodeNotImplemented from a particular call while waiting for the state server to be upgraded
[10:53] <fwereade_> menn0, and suspecting I can drop it
[10:53] <fwereade_> menn0, cool
[10:54] <fwereade_> menn0, thanks
[10:55] <menn0> fwereade_: and given that an upgrade can't start unless all the hosts in the env are on the current or next version there's really no way a non-state server can get ahead of the state servers
[11:11] <natefinch> fwereade_: do you mind if I tweak this a little?  I'm not really a fan of the cast here, since it can panic: http://bazaar.launchpad.net/~fwereade/juju-core/provider-skeleton/view/head:/provider/skeleton/config.go#L125
[11:12] <wallyworld_> natefinch: here's a fix for the critical bug on master http://reviews.vapour.ws/r/560/
[11:12] <natefinch> fwereade_: I'd much rather do it once in validateConfig and just store the values in a normal old struct
[11:12] <natefinch> wallyworld_: looking
[11:12] <wallyworld_> ty
[11:13] <fwereade_> natefinch, so, in *theory* it won't panic because it should have been validated on creation
[11:14] <fwereade_> natefinch, but *hell yes* put things in structs if you can
[11:14] <natefinch> fwereade_: kk will do.  I know it shouldn't panic, but it gives me the heebie jeebies
[11:21] <natefinch> man, %q was just the best idea ever
[11:32] <mattyw> hi folks, I'm consistently getting errors on FAIL: status_test.go:2702: StatusSuite.TestFilterSubordinateButNotParent. Anyone else? It seems to have been happening for a while for me so it must just be me
[11:33] <wallyworld_> jam1: do you know the history surrounding the change in api endpoints as per bug 1397376? it's claimed the change to only return a single address for api-endpoints is a regression, and it sure seems that way. also the use of dns name rather than ip address
[11:33] <mup> Bug #1397376: maas provider: 1.21b3 removes ip from api-endpoints <api> <cloud-installer> <landscape> <maas-provider> <juju-core:New> <juju-core 1.21:New> <https://launchpad.net/bugs/1397376>
[11:34] <wallyworld_> mattyw: katco should be able to help with that
[11:38] <natefinch> wallyworld_: lgtm
[11:38] <wallyworld_> natefinch: tyvm
[11:55] <mattyw> wallyworld_, ack, thanks very much
[11:55] <wallyworld_> np, she should be on soon
[12:01] <dimitern_> wallyworld_, your fix landed, but the bot still refuses new PRs
[12:01] <dimitern_> __JFDI__ doesn't seem to work either :/
[12:01] <wallyworld_> dimitern_: i think it takes a while for the fix to go through CI
[12:02] <dimitern_> wallyworld_, ah, my mistake - it did work with __JFDI__
[12:02] <wallyworld_> great
[12:18] <mattyw> fwereade_, ping?
[12:19] <fwereade_> mattyw, pong, in meeting, responding slowly
[12:27] <dimitern_> fwereade_, last backport - for 1.20; same bug, no changes - will you approve it? http://reviews.vapour.ws/r/561/
[12:54] <dimitern_> voidspace, can you have a look instead?  http://reviews.vapour.ws/r/561/
[12:56]  * dimitern is afk for a while
[13:02] <voidspace> dimitern: sure
[13:03] <voidspace> dimitern: LGTM
[13:09] <dimitern> voidspace, thanks!
[13:51] <fwereade_> can anyone think of a reason not to make "machineenvironmentworker.MachineEnvironmentWorker" into something a bit more readable like "proxy.Worker"?
[13:54] <perrito666> is anyone very very savvy about github.com/juju/cmd?
[14:04] <dimitern> fwereade_, +100
[14:04] <dimitern> perrito666, rogpeppe or thumper I think
[14:05] <perrito666> dimitern: tx, I figured out, testing subcommands was confusing me a bitç
[14:05] <rogpeppe> fwereade_: why "proxy"?
[14:06] <rogpeppe> fwereade_: i'd certainly use less stuttering
[14:07] <jw4> wallyworld_: how can I know definitively when CI is accepting new builds?  CI must be using some check that I can run myself before $$merge$$?
[14:08] <perrito666> fwereade_: +1 to have names that can be pronounced without taking a breath at the point
[14:09] <fwereade_> rogpeppe, because it's all about setting proxy-related env vars, and writing out files describing what proxies to use ;)
[14:09] <dimitern> jw4, I use this bookmark https://bugs.launchpad.net/juju-core/+bugs?field.status%3Alist=TRIAGED&field.status%3Alist=INPROGRESS&field.importance%3Alist=CRITICAL&field.tag=ci+regression+&field.tags_combinator=ALL
[14:09] <fwereade_> rogpeppe, but maybe I missed some detail?
[14:10] <jw4> dimitern: that's basically what I use too, but CI is still rejecting builds even though that reports zero bugs
[14:11] <dimitern> jw4, it was blocked earlier, but not anymore - a PR of mine landed less than an hour ago
[14:11] <rogpeppe> fwereade_: is there any reason to export the worker type at all from that package?
[14:11] <fwereade_> rogpeppe, fair point
[14:11] <jw4> dimitern: hmm; mine was rejected after yours was accepted
[14:11] <rogpeppe> fwereade_: i'd suggest proxyworker as the package identifier
[14:11] <rogpeppe> fwereade_: and define proxyworker.New as the creator of the worker
[14:12] <fwereade_> rogpeppe, "worker/proxyworker" is a little bit yucky, but yeah, that sounds sane
[14:12] <rogpeppe> fwereade_: and use "worker" as the internal type
[14:12] <rogpeppe> fwereade_: i think it's fairly idiomatic - e.g. http/httputil
[14:13] <fwereade_> rogpeppe, yeah
[14:14] <jw4> wallyworld_, dimitern yeah - again : Does not match ['fixes-1396981']
[14:14] <jw4> CI must be using a different query - I just want to know what that is so I don't have to keep guessing when CI gets unblocked
[14:26] <dimitern> jw4, they must've changed it
[14:27] <dimitern> mgz_, sinzui, any clue why the bot is still blocked on bug 1396981 ?
[14:27] <mup> Bug #1396981: Upgrade fails because product json is renamed <ci> <regression> <upgrade-juju> <juju-core:Fix Committed by wallyworld> <https://launchpad.net/bugs/1396981>
[14:27] <jw4> dimitern: oh well - I've waited almost a week to land this anyway - I can wait longer :)
[14:28] <sinzui> dimitern, 1. the bug is not Fix released, 2. we need to remove the rule for frozen index.json.
[14:28] <sinzui> dimitern, I think master will be open in about 30 minutes
[14:29] <sinzui> jam can you get someone to look into Bug #1398406? the gwacl change broke azure deploys for CI
[14:29] <mup> Bug #1398406: Azure provider attempts to deploy with unsupported "D1" RoleSize <azure-provider> <bootstrap> <ci> <regression> <juju-core:Triaged> <https://launchpad.net/bugs/1398406>
[14:30] <dimitern> sinzui, great, thanks!
[14:30] <dimitern> sinzui, jam is not here today, I can have a look perhaps
[14:31] <sinzui> thank you dimitern
[14:32] <dimitern> sinzui, btw, lest I forget - will there be a  1.20.14 release? I have just fixed an issue with maas there - bug 1395908
[14:32] <mup> Bug #1395908: LXC containers in pending state due to juju-br0 misconfiguration <lxc> <oil> <juju-core:Fix Committed by dimitern> <juju-core 1.20:Fix Committed by dimitern> <juju-core 1.21:Fix Committed by dimitern> <https://launchpad.net/bugs/1395908>
[14:33] <sinzui> dimitern, we need to do one for the gwacl change...but the change is broke tests
[14:33] <dimitern> sinzui, the gwacl one I'm looking at now?
[14:33] <sinzui> dimitern, I was just looking at that bug and think about how I might verify the change works with the QA maas
[14:34] <sinzui> dimitern, yes. bug 1398406 is fallout from bug 1389422
[14:34] <mup> Bug #1398406: Azure provider attempts to deploy with unsupported "D1" RoleSize <azure-provider> <bootstrap> <ci> <regression> <juju-core:Triaged> <https://launchpad.net/bugs/1398406>
[14:34] <mup> Bug #1389422: azure instance-types and regions missing <azure-provider> <constraints> <Go Windows Azure Client Library:Fix Committed by hazmat> <juju-core:In Progress by wallyworld> <juju-core 1.20:Triaged> <juju-core 1.21:Fix Committed by wallyworld> <https://launchpad.net/bugs/1389422>
[14:34] <dimitern> sinzui, well, if it's a virtual maas it can be tested by tweaking one or a few kvm domain configs (vms) to have disabled NICs
[14:35] <sinzui> thank you dimitern,
[14:35] <dimitern> sinzui, that's how I tested it locally, I can give you more details should you need them
[14:36] <sinzui> :)
[14:40] <dimitern> sinzui, with the fix juju will log "node "XXX" skipping disabled network interface "XYZ" for disabled devices; luckily by default only the primary interface is enabled (as discovered by lshw) so you should see these right after a node is acquired from maas (e.g. during bootstrap --debug)
[14:42] <dimitern> sinzui, and after a few "skipping disabled network interface.." log lines (depending on how many NICs are there) there will be a final "node "<maas-uuid>" primary network interface is "XYZ" (e.g. "eth2", which will appear in /etc/network/interfaces under the juju-br0 "bridge-ports" setting)
[14:55] <jw4> dimitern: (sinzui ) this updated query takes into account Fix Committed and not Fix Released... : https://api.launchpad.net/devel/juju-core?ws.op=searchTasks&status%3Alist=Triaged&status%3Alist=In+Progress&status%3Alist=Fix+Committed&importance%3Alist=Critical&tags%3Alist=regression&tags%3Alist=ci&tags_combinator=All
[14:56] <jw4> sinzui: that should be the right query for me to check status before merging right?
[14:57] <dimitern> jw4, do you have a bugs.launchpad.net link with the same filters?
[14:57] <perrito666> natefinch: ericsnow can we jump into the standup since we are in it?
[14:57] <jw4> dimitern, no but I think I can whip it up...
[14:57] <dimitern> jw4, cheers!
[14:58] <dimitern> sinzui, is bug 1398406 only for 1.21 or also for trunk and 1.20 ?
[14:58] <mup> Bug #1398406: Azure provider attempts to deploy with unsupported "D1" RoleSize <azure-provider> <bootstrap> <ci> <regression> <juju-core:In Progress by dimitern> <https://launchpad.net/bugs/1398406>
[14:58] <sinzui> dimitern, the Lp only claims the merge was in 1.21
[14:58]  * sinzui checks log
[14:59] <perrito666> natefinch: ?
[14:59] <sinzui> dimitern, I don't see the dep change in master. just 1.21
[14:59] <dimitern> sinzui, right; so after some analysis I think the issue is MS renamed "D1" to "Standard_D1" etc.
[15:00] <dimitern> sinzui, I'll propose a fix, can it be tested in the same CI scenario easily?
[15:00] <sinzui> yes
[15:04] <sinzui> dimitern, this is the command that failed. the constraint is important, the env can be any asure: juju --show-log bootstrap -e azure-deploy-precise-amd64 --constraints mem=2G
[15:05] <dimitern> sinzui, right, I'll try it locally as well
[15:05] <dimitern> thanks
[15:06] <sinzui> jw4, that call is right. I use
[15:06] <sinzui>  juju-ci-tools/check_blockers.py master 1234
[15:06] <sinzui> where the number is random
[15:06] <jw4> sinzui: perfect
[15:08] <jw4> dimitern: check_blockers.py is the best route, but I think the query would look like: https://bugs.launchpad.net/juju-core/+bugs?field.searchtext=&orderby=-importance&search=Search&field.status%3Alist=NEW&field.status%3Alist=CONFIRMED&field.status%3Alist=TRIAGED&field.status%3Alist=INPROGRESS&field.status%3Alist=FIXCOMMITTED&field.status%3Alist=INCOMPLETE_WITH_RESPONSE&field.status%3Alist=INCOMPLETE_WITHOUT_RESPONSE&field.
[15:08] <jw4> importance%3Alist=CRITICAL&assignee_option=any&field.assignee=&field.bug_reporter=&field.bug_commenter=&field.subscriber=&field.structural_subscriber=&field.tag=ci+regression+&field.tags_combinator=ALL&field.has_cve.used=&field.omit_dupes.used=&field.omit_dupes=on&field.affects_me.used=&field.has_patch.used=&field.has_branches.used=&field.has_branches=on&field.has_no_branches.used=&field.has_no_branches=on&field.has_blueprints.used=&field.
[15:08] <jw4> has_blueprints=on&field.has_no_blueprints.used=&field.has_no_blueprints=on
[15:08] <mgz_> jw4: it's possible to get a shorter version of that url :)
[15:09] <sinzui> dimitern, jw4 abentley: bug 1398406 is wrongly on master (and the wrong milestone), I am going to target it to the 1.21 branch and beta4 to get master unblocked
[15:09] <mup> Bug #1398406: Azure provider attempts to deploy with unsupported "D1" RoleSize <azure-provider> <bootstrap> <ci> <regression> <juju-core:In Progress by dimitern> <https://launchpad.net/bugs/1398406>
[15:09] <jw4> mgz_: lol - you mean with a shortner or by removing all the empty fields ?
[15:09] <dimitern> jw4, thanks!
[15:09] <jw4> mgz_: I keep trimming the empty fields by hand, but the search tool puts them back in whenever I tweak the params
[15:09] <dimitern> sinzui, cool, thank you
[15:10] <mgz_> jw4: :)
[15:12] <jw4> mgz_: (dimitern) how about : https://bugs.launchpad.net/juju-core/+bugs?field.searchtext=&orderby=-importance&field.status%3Alist=NEW&field.status%3Alist=CONFIRMED&field.status%3Alist=TRIAGED&field.status%3Alist=INPROGRESS&field.status%3Alist=FIXCOMMITTED&field.status%3Alist=INCOMPLETE_WITH_RESPONSE&field.status%3Alist=INCOMPLETE_WITHOUT_RESPONSE&field.importance%3Alist=CRITICAL&field.tag=ci+regression+&field.tags_combinator=ALL
[15:12] <jw4> tiny bit shorter
[15:12] <mgz_> that looks reasonable
[15:12] <dimitern> thanks again jw4
[15:13] <jw4> dimitern: :)
[15:20] <natefinch> ericsnow, perrito666: sorry, chrome crashed again, and I can't get back in.  I think we mostly were done.  I'll be back in like 5 minutes.
[15:20] <perrito666> natefinch: we noticed
[15:20] <natefinch> btw, is it possible to downgrade back to trusty?  Because this is just ridiculous
[15:20] <ericsnow> natefinch: ping me when you are ready for our 1-on-1
[15:20] <natefinch> ericsnow: will do]
[15:27] <perrito666> natefinch: mm, I am not sure, it should be it requires some crafting by hand and some things might break
[15:28] <perrito666> there are not that many changes in utopic
[15:28] <perrito666> natefinch: you could downgrade the browser
[15:28] <natefinch> I don't even know that it changed the browser.  I'm running Chrome, not Chromium
[15:30] <perrito666> natefinch: chances are that the browser updated and it might also be broken in trusty, chrome comes from a ppa iirc
[15:34] <alexisb> hey there team, I see katco has pickup one critical bug do we have an assignee for this one:
[15:34] <alexisb> https://bugs.launchpad.net/juju-core/+bug/1397376
[15:34] <mup> Bug #1397376: maas provider: 1.21b3 removes ip from api-endpoints <api> <cloud-installer> <landscape> <maas-provider> <juju-core:Triaged> <juju-core 1.21:Triaged> <https://launchpad.net/bugs/1397376>
[15:41] <natefinch> alexisb: looking
[15:53] <natefinch> sinzui: about that api-endpoints bug.... it sounds like the scripts were relying on behavior that was not guaranteed, but simply happened to be somewhat reliable in their particular environments.  I'm with Tim in that I'm not sure this is a regression.
[15:54] <sinzui> natefinch, They are relying on behaviour that worked most of the time, now doesn't work most of the time
[15:55] <sinzui> natefinch, I think we need to know if anyone has built tools on the unguaranteed behaviour. If deployer assumes it, we have a problem
[15:57] <sinzui> hazmat, juju 1.21 api-endpoints is more likely to return a dns address instead of an ip address. Some people assume it only returned an IP address. does deployer make such assumptions?
[16:02] <natefinch> sinzui: isn't deployer on the server?  api-endpoints is a client-only command,  I don't think deployer could call it if it wanted to.  Also, again, deployer is on the server, so it shouldn't need to get the api endpoint, right?
[16:02] <sinzui> natefinch, no, deployer drives your client
[16:02] <natefinch> or am I crazy about how deployer works
[16:02]  * natefinch is crazy, ok. :)
[16:02] <natefinch> I've never used the deployer, sorry :)
[16:02] <sinzui> well supplants your client with an bulk api clinet
[16:03] <sinzui> natefinch,  a deployer files/bundle is a summary of an end state of an env. deployer/quickstart walks juju through the steps to but the env into the state
[16:27] <voidspace> dimitern: ping
[16:28] <dimitern> voidspace, pong
[16:28] <voidspace> dimitern: I'm stuck on a failing test
[16:28] <voidspace> dimitern: this is for IPAddress.SetState
[16:29] <voidspace> dimitern: the Assert seems to prevent State being changed at all
[16:29] <voidspace> dimitern: this is the code https://github.com/voidspace/juju/compare/state-ipaddresses
[16:29] <voidspace> dimitern: this is the error message: http://pastebin.ubuntu.com/9346641/
[16:29] <dimitern> voidspace, looking
[16:30] <voidspace> dimitern: the failing test is TestIPAddressSetState
[16:30] <voidspace> dimitern: it fails on line 1306 - where we attempt to change the state (actually calling SetState)
[16:30] <voidspace> dimitern: I've checked other Asserts in our code and can't see the difference
[16:31] <voidspace> dimitern: the Assert itself is line 96 of the diff (first block - the new ipaddresses.go file)
[16:32] <dimitern> voidspace, unknownOrSame?
[16:32] <voidspace> dimitern: yes
[16:32] <voidspace> dimitern: to change the state, the existing state must be AddressStateUnknown, or the same state as the one we're setting
[16:32] <voidspace> so the state must be unknownOrSame
[16:33] <voidspace> which was your name I believe... yesterday :-)
[16:33] <dimitern> voidspace, yeah
[16:33] <dimitern> voidspace, hmm.. weird..
[16:34] <voidspace> dimitern: the test creates a new address, checks that the state is AddressStateUnknown and then attempts to set the state to AddressStateAllocated
[16:34] <voidspace> dimitern: and that fails with "transaction aborted"
[16:34] <dimitern> voidspace, how about if you use getCollection + defer like in other places, rather than using C: ipaddressesC
[16:34] <voidspace> dimitern: ok, I'll try that
[16:34] <dimitern> voidspace, ah, another thing I just noticed
[16:35] <voidspace> dimitern: hmm... other places uses st.runTransaction with C
[16:35] <dimitern> voidspace, state is omitempty, which means it can be null in addition to any valid state
[16:35] <dimitern> voidspace, it shouldn't be omitempty
[16:35] <voidspace> dimitern: ok
[16:35] <voidspace> dimitern: that might be it
[16:35] <voidspace> StateUnknown is ""
[16:35] <dimitern> voidspace, we have an "unknown" .. yeah :)
[16:37] <voidspace> dimitern: getting rid of omitempty worked!
[16:37] <voidspace> dimitern: thanks :-)
[16:38] <dimitern> voidspace, sweet!
[16:38] <voidspace> I knew there was a reason we kept you around ;-)
[16:38] <dimitern> :D
[16:38] <dimitern> indeed
[16:38] <voidspace> :-)
[16:43] <dimitern> ok, eod for me
[16:43] <dimitern> g'night all
[16:43] <voidspace> dimitern: g'ngight
[16:43] <dimitern> voidspace, o/
[16:52] <mattyw_> ericsnow, ping?
[16:52] <ericsnow> mattyw_: hey
[16:52] <mattyw_> ericsnow, hey hey, what's the current process for proposing a branch with a prereq?
[16:53] <ericsnow> mattyw_: Do it like normal and then use "rbt post -u --parent ..." afterward
[16:54] <mattyw_> ericsnow, perfect thanks very much
[16:54] <ericsnow> mattyw_: np :)
[16:59] <mattyw_> cmars, http://reviews.vapour.ws/r/563/
[18:13] <voidspace> g'night all
[18:33] <jw4> on call reviewer today available to PTAL?  Very small change - fully tested with juju-core (small fix to juju-core to follow this update) http://reviews.vapour.ws/r/565/
[18:34] <natefinch> jw4: what have we learned?
 never use regular expressions </chorus>
[18:36] <katco> natefinch: that is a rather firm stance to take
[18:36] <natefinch> katco: perhaps only slightly too firm.  Never use regular expressions longer than 10 characters?
[18:37] <katco> natefinch: i can at least consider that statement :)
[18:37] <katco> natefinch: very useful: http://www.colm.net/open-source/ragel/
[18:38] <jw4> natefinch: lol!
[18:38] <natefinch> jw4: haha, did we not even have a positive testcase for that?
[18:38] <natefinch> wait wait wait
[18:38] <jw4> natefinch: blush
[18:39] <natefinch> why is IsValidUUIDString not just
[18:39] <jw4> katco: interesting link
[18:39] <natefinch> _, err := utils.UUIDFromString(mystring)
[18:39] <katco> jw4: it's a very neat project
[18:39] <natefinch> return err == nil
[18:40] <jw4> natefinch: that's actually how this was caught
[18:40] <jw4> the check was passing but the UUIDFromString was panicing
[18:40] <natefinch> in fact... why do we even have an IsValidUUIDString?
[18:41] <jw4> natefinch: no answer except that we just started using that method too
[18:41] <jw4> natefinch: we could switch to the _,err approach and remove the bool version
[18:41] <natefinch> jw4: please do... there's no sense having two implementations which can get out of sync
[18:42] <jw4> natefinch: I'll have a juju core PR to follow this, and now a names package follow up too
[18:43] <natefinch> jw4: cool, sorry to give you more work, but it's definitely the better fix
[18:43] <jw4> natefinch: no problem - I'm semi blocked on CI anyway
[18:46] <perrito666> I am all for regexes as long as you have a test battery that covers every possible case for them :p
[18:47]  * perrito666 gets a friend to bring an en_US kb from ny and jumps of happiness like mario with a new coin
[18:51] <jw4> natefinch: the regex is useful because it's an easy way to verify that the 15th char is a '4' and the 20th char is one of '8','9','a','b'
[18:51] <jw4> natefinch: we  could do that check manually but the regex actually seems simpler
[18:54] <jw4> natefinch: utilis.UUIDFromString uses the regex to verify those special characters, and then the rest is just simple hex decoding
[18:54] <natefinch> jw4: you think that 73 character regex is simpler than  s[14] == '4' && switch s[19] { case  '8','9','a','b':    default: error } ?
[18:54] <jw4> natefinch: ;-p
[18:54] <jw4> natefinch: okay
[18:55] <natefinch> jw4: also, what's up with the 4 and 89ab thing anyway?
[18:55] <jw4> natefinch: something to do with valid UUID per a spec somewhere
[18:55] <jw4> natefinch: fwereade_ gave me a link but I forget it
[18:55] <jw4> natefinch: the 4 is a version number
[18:56] <perrito666> mm uuid validator, I seem to recall we have code that checks that
[18:56] <natefinch> ahh yeah, I see some comments about version 4
[18:56] <Makyo> sinzui, ping
[18:56] <jw4> natefinch: btw, we also have to validate length if we remove the regex.  I'll update the PR with those changes
[18:56] <sinzui> hi Makyo
[18:56] <jw4> perrito666: other than in the utils package?
[18:57] <perrito666> jw4: nope, same ugly regexp
[18:57] <Makyo> sinzui, was told to get in touch with you re: juju beta PPA
[18:57] <natefinch> jw4: is there a reason we're rolling our own and not using something like https://godoc.org/code.google.com/p/go-uuid/uuid
[18:57] <perrito666> I recall having a fake uuid breaking and trying to decypher that regex to figure out wtf until I just read the rfc
[18:58] <sinzui> Makyo, https://launchpad.net/~juju/+archive/ubuntu/devel ?
[18:58] <jw4> perrito666: fwereade_ pointed me to that package - we can ask him about go-uuid
[18:58] <Makyo> sinzui, ah, thanks, we didn't know the location.
[19:07] <perrito666> jw4: I know natefinch will not agree with me, but I would definitely go for something more in the side of http://pastebin.ubuntu.com/9348308/
[19:07] <perrito666> if I actually had to use that regex
[19:07] <perrito666> and properly name the blocks
[19:07] <perrito666> aaaand of course a nice doc explaining it a bit more
[19:07] <jw4> perrito666: I'm okay with that - if natefinch agrees I will go that route
[19:12] <natefinch> jw4: until such a time as we can use something more complete, that's fine... except you don't need sprintf
[19:12] <jw4> natefinch: +1
[19:16] <perrito666> duh, the sprintf was from the first implementation I wrote
[19:17] <perrito666> but I found it clearer by just concatenating the strings
[19:17] <jw4> perrito666: yeah
[19:17] <jw4> perrito666: natefinch: also - I'm guessing we want to allow upper case letters too
[19:17] <perrito666> jw4: It would be wiser to de-capitalize the uuid before matching
[19:19] <jw4> perrito666: kk
[19:19] <perrito666> can I trigger the compilation of a _windows file on linux??
[19:20]  * perrito666 knows the answer and starts a windows+
[19:20] <jw4> GO_ARCH=windows?
[19:21] <natefinch> jw4: so, yes, but I think you have to have the go tools built to run that way
[19:22] <jw4> natefinch: I see.  FWIW, I'm coming back to a non-regex approach again! :)
[19:22] <natefinch> jw4: http://dave.cheney.net/2013/07/09/an-introduction-to-cross-compilation-with-go-1-1
[19:23] <jw4> natefinch: cool.
[19:24] <natefinch> jw4: once the tools are built `GOOS=windows go build` will work
[19:27] <perrito666> if you build them, they will compile...
[19:43] <jw4> natefinch, perrito666 : http://reviews.vapour.ws/r/565/
[19:53] <perrito666> jw4: I lgtmd but you need davecheney to bless it
[19:53] <perrito666> jw4: I proposed the change :p I was not going to be against it
[20:24] <perrito666> we have tests failing because my windows is in spanish :s
[21:02] <perrito666> is reviewboard on for utils?
[21:04] <perrito666> it is, and it changes the description, how nice
[21:04] <perrito666> http://reviews.vapour.ws/r/567/
[21:32] <perrito666> ok EOD for me, cheers everyone
[21:32]  * menn0 is sick of CI being blocked
[21:33] <menn0> is anyone looking at bug 1398473?
[21:33] <mup> Bug #1398473: backup output format changed <backup-restore> <ci> <regression> <juju-core:Triaged> <https://launchpad.net/bugs/1398473>
[21:36] <perrito666> menn0: lemme check
[21:36] <menn0> perrito666: either the test needs to change or the output filename
[21:37] <menn0> perrito666: the other blocker is also backup related
[21:37] <menn0> perrito666: bug 1398448
[21:37] <mup> Bug #1398448: ERROR while creating backup archive: while bundling state-critical files: write to tar file failed: open /var/log/juju/all-machines.log: no such file or directory <backup-restore> <ci> <ha> <regression> <juju-core:Triaged> <https://launchpad.net/bugs/1398448>
[21:37] <menn0> perrito666: seems more serious
[21:37] <perrito666> ericsnow: did you merge the deprecation of backup before restore is merged?
[21:38] <ericsnow> perrito666: yes
[21:38] <perrito666> ericsnow: old restore will not restore the new backups
[21:38] <ericsnow> perrito666: why not?
[21:38] <perrito666> ericsnow: are you dumping with ooplog?
[21:38] <ericsnow> perrito666: the oplog doesn't matter
[21:39] <perrito666> ericsnow: ?
[21:40] <ericsnow> perrito666: the dumped oplog will just get ignored, giving exactly the same result as if dumped without the oplog
[21:40] <perrito666> ericsnow: except that loosing everything that happened between the beginning and end of juju backups create
[21:40] <perrito666> old backup would stop the state server while backing up
[21:41] <ericsnow> perrito666: I see
[21:41] <perrito666> yes, I know, its a PITA
[21:41] <perrito666> :)
[21:41] <ericsnow> perrito666: however, restore will land in 1.22 so it won't matter, right?
[21:42] <ericsnow> perrito666: is the backup archive (with oplog) breaking tests?
[21:42] <ericsnow> perrito666: that bug is related to the filename, not the archive content
[21:42] <perrito666> ericsnow: no that is just a side effect I noticed while looking the other issue
[21:42] <ericsnow> perrito666: k
[21:42] <perrito666> ericsnow: I believe a change in the test is in order, unless there is some strict rule on the file name
[21:43] <ericsnow> perrito666: looks like the test has some hard-coded regex for the filename that no longer works
[21:43] <perrito666> ericsnow: it has
[21:43] <perrito666> it expects it to have the old name
[21:44] <perrito666> sinzui: is this considered a break on backwards compatibility?
[21:44] <menn0> perrito666, ericsnow: if users of the old backup system have scripts which expect the old filename the change will break them
[21:44] <ericsnow> menn0: fair enough
[21:44] <menn0> perrito666, ericsnow: not sure how worried we should be about that though
[21:45] <perrito666> ericsnow: I presume you changed the script to call backups create, right?
[21:45] <menn0> wallyworld_: ^^^
[21:45] <ericsnow> perrito666: right
[21:45] <perrito666> ericsnow: you could add a mv at the end
[21:45] <perrito666> but, apart from that
[21:45] <perrito666> there is an underlying problem there
[21:46] <ericsnow> perrito666: I was thinking the same thing, though I don't like it :P
[21:46] <perrito666> ericsnow: although that script should be nuked before 1.22
[21:47] <ericsnow> perrito666: we have to keep the script around due to backward-compatibility
[21:47] <menn0> ericsnow: so what's the difference between the old and new style filenames?
[21:47] <sinzui> perrito666, menn0, ericsnow : I don't think we promised a naming convention for the backup file. The promise is a new file.
[21:47] <perrito666> it will produce a different format than previously and an unknowing user might try to restore one of these into an old juju (because of reasons)
[21:47] <ericsnow> menn0: the new filename has a .tar.gz suffix instead of .tgz, and now includes the env UUID in the filename
[21:48] <menn0> perrito666: that's a good point. given that the format has changed a new file name style is probably a good thing
[21:48] <ericsnow> menn0: actually, the env UUID isn't included in the filename
[21:48] <ericsnow> menn0: so just the suffix changed
[21:48] <menn0> ericsnow: ok
[21:49] <menn0> ericsnow: I agree that tar.gz is marginally better :)
[21:49] <perrito666> so this bug is on sinzui's side
[21:49] <menn0> so if the filename is staying the same let's change the test to expect the new filename
[21:49] <perrito666> although its our fault
[21:49] <perrito666> for not communicating properly the change
[21:50] <menn0> perrito666: we have access to the CI tests and can fix them too...
[21:50] <ericsnow> menn0: I'm all for just fixing the regex in the CI test :)
[21:51] <perrito666> menn0: we do, that is a particularly convoluted tests though :) because it tests an ugly plugin of juju
[21:51] <ericsnow> FWIW, I don't think there's a whole lot we can do about someone trying to restore a 1.22 backup using 1.20 or 1.21
[21:51] <perrito666> I am not sure for the case of backup, but it also might be expecting certain output from stdout
[21:51] <menn0> sinzui: remind me what the process for changing CI tests is
[21:52] <perrito666> ericsnow: we could print a hughe warning ni the script, that is it
[21:52] <sinzui> perrito666, abentley I agree that "juju-backup-20141202-141348.tar.gz" is still clearly a backup. We will loosen the regex
[21:52] <ericsnow> Doing so should still work fine, with the (very?) remote chance of introducing DB inconsistencies
[21:53] <perrito666> ericsnow: well, there will not be inconsistencies in CI, just in practice
[21:53] <abentley> sinzui: It is a regression, though.
[21:53] <sinzui> abentley, yep, I have targeted the bug and started a fix
[21:53] <abentley> sinzui: If it breaks my code, it can just as easily break someone else's.
[21:54] <ericsnow> perrito666: and probably very very rarely
[21:54] <menn0> ericsnow, perrito666: now what about the other issue relating to all-machines.log?
[21:54] <perrito666> ericsnow: I agree, in the current status of users for juju, although it is technical debt
[21:54] <perrito666> menn0: checking
[21:54] <menn0> ericsnow, perrito666: bug 1398448
[21:54] <mup> Bug #1398448: ERROR while creating backup archive: while bundling state-critical files: write to tar file failed: open /var/log/juju/all-machines.log: no such file or directory <backup-restore> <ci> <ha> <regression> <juju-core:Triaged> <https://launchpad.net/bugs/1398448>
[21:55] <sinzui> abentley, there is no promise of a specific naming convention. The new name is very similar and We...specifically me was probably too specific.
[21:55]  * ericsnow looks
[21:56] <perrito666> menn0: that is odd, I have personally done that process a lot of times
[21:57] <ericsnow> abentley: shouldn't /var/log/juju/all-machines.log always be there when the CI tests run?
[21:57] <perrito666> ericsnow: albeit, we might just let it pass if a log file is not there
[21:57] <ericsnow> perrito666: we already do that for a couple other files
[21:58] <ericsnow> perrito666: but I'd expect all-machines.log to be there
[21:58] <perrito666> ericsnow: I think logs are not that critical as config and db
[21:59] <perrito666> ericsnow: although yes, it should be there
[21:59] <abentley> sinzui: I think that it is a regression to change even behaviour that was not promised.  Users will depend on anything they can get their hands on.
[21:59] <abentley> sinzui: But it's your call.
[21:59] <perrito666> abentley: I dont think this qualifies as a change in behavior
[21:59] <sinzui> abentley, no one told me a naming convention. I picked a test that satisfied myself
[21:59] <abentley> perrito666: Then why is our test broken?
[21:59] <perrito666> we make a backup and download a file in the same format, the name changes but that is an accident
[22:01] <perrito666> abentley: you made a reasonable assumption on the file name, yet it is a bit extremist to call it a regression
[22:01] <perrito666> the compression is the same, the format is the same, it is compatible
[22:02] <perrito666> abentley: ftr, we suck for not spec-ing that and I do apologize for it
[22:02] <perrito666> I am working on a spec
[22:02] <abentley> perrito666: Yes, and the same could be said if you had changed the commandline arguments.
[22:02] <abentley> perrito666: But just like this change, changing the commandline arguments would break scripts.
[22:03] <jw4> perrito666: thanks
[22:03] <abentley> perrito666: I'm not upset that you changed it, I'm just concerned to make sure we don't break our customers.
[22:04] <perrito666> abentley: I am trying to rubber duck with you, I am not yet decided on if we break or not our customers
[22:05] <abentley> perrito666: You've already said the regex makes a "reasonable assumption" about the filename format.  Do you think it is unlikely that any of our customers made the same assumption?
[22:06] <perrito666> abentley: yes, I was re-reading myself
[22:08] <wallyworld_> menn0: hi, did you need me?
[22:08] <perrito666> abentley: I concede, it is a reasonable assumption and we might need to maintain it.
[22:08] <abentley> perrito666: Okay, well sinzui has overruled me, so I won't waste everyone's time discussing it further.  EOD for me.
[22:09] <perrito666> oh you can override abentley :p that is a cool feature
[22:09] <perrito666> have a nice EOD
[22:09] <perrito666> ericsnow: sinzui well its your call guys
[22:09] <perrito666> I am EOD too
[22:10] <Ice-x> does anyone know where I can buy dvd movies with a low price on the internet ?
[22:10] <ericsnow> perrito666: goodnight
[22:10] <menn0> perrito666: ciao
[22:15] <jw4> davecheney: ptal http://reviews.vapour.ws/r/565/
[22:15] <jw4> davecheney: perrito666 said I should get your review on that.
[22:20] <thumper> Ice-x: wrong channel
[22:28] <mwhudson> thumper, davecheney, alexisb: https://groups.google.com/d/msg/golang-dev/2ZUi792oztM/UCA2V7Ul3nkJ
[22:30] <davecheney> jw4: /me looks
[22:31] <jw4> davecheney: ta
[22:33] <jw4> thanks davecheney :)  Comments duly noted
[22:34] <jw4> davecheney: actually - my preference would be to use https://code.google.com/p/go-uuid/ and not even mess with it ourselves - the only benefit is that our version is presumably lighterweight
[22:35] <davecheney> jw4: if someone else has alredy written a package to to dhtat
[22:35] <davecheney> use it
[22:35] <davecheney> juju clearly has no policy against external depenencies
[22:36] <jw4> davecheney: I think I floated that idea and fwereade_ or someone pointed me to the utils package instead (sorry if I'm falsely accusing you fwereade_ )
[22:37] <davecheney> jw4: meh
[22:37] <davecheney> what's better than having one of something ?
[22:37] <davecheney> having more than one of something!!
[22:37] <jw4> davecheney: lol
[22:37] <davecheney> code reuse involves less heroism
[22:42] <ericsnow> abentley: so about all-machines.log in CI, why would it be missing?
[22:43] <menn0> ericsnow: it could be a logging regression
[22:44] <ericsnow> menn0: perhaps; do we have any other CI tests that rely on all-machines.log
[22:44] <ericsnow> ?
[22:44] <menn0> ericsnow: where the log file isn't being created on some state servers?
[22:44] <menn0> ericsnow: just a guess
[22:44] <ericsnow> menn0: yeah
[22:44] <ericsnow> menn0: could be a permissions issue too
[22:45] <menn0> ericsnow: i'm just spinning up an HA env on ec2 now to check
[22:46] <ericsnow> menn0: cool
[22:47]  * ericsnow remembers abentley is EOD already
[22:50] <katco> wallyworld_: can someone stamp http://reviews.vapour.ws/r/569/
[22:50] <katco> oops, that's to anyone, not just wallyworld_
[22:50] <wallyworld_> katco: sure, looking
[22:50] <katco> wallyworld_: ty sir
[22:53] <wallyworld_> katco: land away
[22:53] <katco> wallyworld_: sweet. what the heck is the fixes syntax (or where can i find it)?
[22:53] <wallyworld_> $$fixes12345$$
[22:53] <katco> ty
[22:53] <wallyworld_> oops
[22:53] <wallyworld_> $$fixes-12345$$
[22:54] <wallyworld_> katco: ^^^
[22:54] <katco> it's too late! the damage is done!!
[22:54] <katco> oh, the horror!
[22:54] <wallyworld_> sorry
[22:54] <wallyworld_> it will be rejected
[22:54] <katco> i'm just kidding around :)
[22:54] <wallyworld_> i know
[22:55] <menn0> ericsnow: perms on all-machines.log could be the problem. on a new state server the logs dir looks like:
[22:55] <menn0> -rw------- 1 syslog adm    99219 Dec  2 22:54 all-machines.log
[22:55] <menn0> -rw------- 1 syslog syslog   883 Dec  2 22:52 ca-cert.pem
[22:55] <menn0> -rw------- 1 syslog syslog   589 Dec  2 22:52 logrotate.conf
[22:55] <menn0> -rwx------ 1 syslog syslog    83 Dec  2 22:52 logrotate.run*
[22:55] <menn0> -rw------- 1 syslog syslog 95478 Dec  2 22:54 machine-0.log
[22:55] <menn0> -rw------- 1 syslog syslog   895 Dec  2 22:52 rsyslog-cert.pem
[22:55] <menn0> -rw------- 1 syslog syslog   891 Dec  2 22:52 rsyslog-key.pem
[22:55] <katco> it will be interesting to see if this completely fixes andreas's issue
[22:56] <ericsnow> menn0: I recall thumper saying something about log permissions recently
[22:56] <ericsnow> menn0: which is what made me think of it
[22:57] <menn0> ericsnow: but the backup runs as root right?
[22:58] <ericsnow> menn0: I expect the issue of permissions partly depends on the user running the CI tests
[22:59] <ericsnow> menn0: I don't know how they are set up
[22:59] <ericsnow> menn0: I'm less convinced it's a permissions issue (the error message is "no such file or directory")
[23:00] <menn0> ericsnow: I'm pretty sure these tests run against a separate ec2 env basically from the end user's persepective
[23:01] <menn0> ericsnow: i've just got that ec2 env up
[23:02] <menn0> ericsnow: there's no all-machines.log on the 2 new state servers created by ensure-availability
[23:02] <menn0> ericsnow: so that could be the problem
[23:02] <ericsnow> menn0: ah, yep, and from that perspective it's the API server process that is running the backup
[23:03] <menn0> ericsnow: ok the all-machines.log is there now on the new machines
[23:03] <menn0> ericsnow: it just takes a while to appear
[23:03] <menn0> ericsnow: that's probably the issue
[23:03] <ericsnow> menn0: so if one of them runs the backup we'd get the failure we're seeing
[23:03] <menn0> ericsnow: which state server runs the backup? the master or the API server that handled the request?
[23:03] <ericsnow> menn0: so a race condition of sorts?
[23:04] <ericsnow> menn0: the one that handled the request
[23:04] <menn0> ericsnow: yeah, probably a race condition
[23:04] <menn0> ericsnow: i've looked at the test and it waits for all machines to be "started" after ensure-availability
[23:05] <menn0> ericsnow: but when I looked all machines were in the started state
[23:05] <menn0> ericsnow: but the all-machines.log wasn't there yet
[23:08] <ericsnow> menn0: but it showed up after a little while, right?
[23:08] <menn0> ericsnow: yep
[23:08] <ericsnow> menn0: so in practice this isn't likely to bite users
[23:08] <menn0> ericsnow: looking at the logs for the rsyslog worker, there might be at least a 20s gap
[23:09] <menn0> ericsnow: it would suck though if something wasn't quite right with rsyslog meaning that backup couldn't be taken
[23:09] <ericsnow> menn0: agreed
[23:10] <menn0> ericsnow: maybe the backup system should be tolerant of missing log files
[23:10] <ericsnow> menn0: we already have precedent
[23:11] <ericsnow> menn0: what would be the importance of the logs on the restored host
[23:11] <ericsnow> menn0: do we have code that relies on all-machines.log?
[23:12] <ericsnow> menn0: did the /var/log/juju directory exist even when the log file was missing?
[23:15] <menn0> ericsnow: yes /var/log/juju was there with the local machine-N.log and the key files for logging
[23:15] <menn0> ericsnow: I don't think anything in Juju relies on all-machines.log except the debug-log feature
[23:15] <menn0> ericsnow: I don't think it's at all essential for the backup
[23:15] <ericsnow> menn0: okay, good
[23:16] <menn0> wallyworld_: do you agree that it isn't essential that all-machines.log is included in a backup?
[23:16] <wallyworld_> menn0: do we include other logs?
[23:17] <wallyworld_> i can see that it would be useful, but not strictly necessary to get a system up and running again
[23:17] <ericsnow> menn0: k
[23:18] <ericsnow> menn0: I'm working up a patch to allow for missing log files
[23:18] <ericsnow> wallyworld_, menn0: we also back up machine-0.log
[23:19] <katco> wallyworld_: (when you get a chance) regarding your comment about tests for apiserver/leadership regarding permissions: can i just stub out an authorizer that returns a failure and ensure the code behaves correctly?
[23:19] <wallyworld_> so in that case why not include all-machines.log
[23:19] <wallyworld_> katco: i think so - from memory, the other similar tests use a fakeAuthorizer, but i'd need to check to be sure
[23:19] <ericsnow> wallyworld_: all-machines.log does get backed up
[23:20] <katco> wallyworld_: cool, ty.
[23:20] <wallyworld_> ericsnow: ok
[23:20] <ericsnow> wallyworld_: the problem is where it didn't exist yet in the HA case
[23:20] <wallyworld_> doesn't it always exist?
[23:21] <ericsnow> wallyworld_: apparently it doesn't show up immediately on the extra hosts under HA
[23:21] <ericsnow> wallyworld_: menn0 saw something like a 20s gap
[23:21] <ericsnow> wallyworld_: and that was enough for CI to fail
[23:22] <wallyworld_> oh, i see
[23:22] <wallyworld_> so why doesn't backup use the primary state server?
[23:26] <ericsnow> wallyworld_: we use whichever one cmd.Command.NewAPIRoot gives us
[23:27] <ericsnow> wallyworld_: you're suggesting we force that to machine 0?
[23:27] <wallyworld_> ericsnow: the state server itself that receives the request could redirect to the primary (not necessarily machine 0). i'm just thinking out loud
[23:28] <ericsnow> wallyworld_: yeah, menn0 hinted at the same thing
[23:28] <menn0> wallyworld_: that seems pretty fragile
[23:28] <wallyworld_> why fragile?
[23:28] <menn0> what happens if the master changes
[23:28] <menn0> if state servers have come and go
[23:29] <menn0> what is machine-0 anyway?
[23:29] <wallyworld_> the cluster knows which is master
[23:29] <wallyworld_> surely?
[23:29] <wallyworld_> i didn't say machine 0
[23:29] <wallyworld_> i said NOT necessarily machine 0
[23:29] <menn0> the cluster does know who master is
[23:29] <wallyworld_> right
[23:29] <wallyworld_> so if the machine that receives the request is not master, it redirects to master
[23:29] <menn0> it seems like a lot of extra complexity just to get a non-essential log file
[23:30] <menn0> which will be there most of the time anyway
[23:30] <menn0> it's only not there for the first 20s ish after a new state server comes up
[23:30] <wallyworld_> trouble is, people may want it (I can't make that call), and the backup needs to be consistent
[23:30] <wallyworld_> it can't sometimes include all-machines and sometimes not
[23:31] <menn0> I guess the oldest state server (which may or may not be the master) will have the most complete all-machines.log
[23:31] <wallyworld_> i'm not saying we must include it etc - just offering an argument of why we should
[23:32] <ericsnow> wallyworld_: agreed that the contents of a backup archive should be consistent
[23:32] <wallyworld_> there's an argument we could leave it out
[23:32] <menn0> wallyworld_: the problem is, I've seen the mongo master change even when the master doesn't go away
[23:33] <wallyworld_> menn0: ericsnow: so, i think we need to ask stakeholders - do they want all-machines, and can we exclude it
[23:33] <wallyworld_> hopefully they'll say they don't want it - they just want a backup so that a system can be restored if needed
[23:34] <wallyworld_> for now, to remove the regression, we can ignore all-machines if it doesn't exist
[23:34] <wallyworld_> or just exclude always
[23:34] <ericsnow> wallyworld_: why was it included in the first place (along with machine-0.log)?
[23:35] <wallyworld_> don't know for sure, i think everything in logs was just included
[23:35] <menn0> wallyworld_: I don't think we're talking about excluding all-machines.log are we
[23:35] <menn0> wallyworld_: just not aborting the backup if isn't there
[23:36] <wallyworld_> menn0: well, we could exclude it so that we always have a consistent backup
[23:36] <menn0> wallyworld_: i.e. it'll get included if it's there but otherwise there will just be a warning or something but the backup continues
[23:36] <menn0> wallyworld_: ok fair enough
[23:36] <wallyworld_> i'd rather not have it there sometimes and not others
[23:36] <wallyworld_> that's IMHO
[23:36] <menn0> wallyworld_: fair enough
[23:37] <ericsnow> menn0: I'd be nervous about a backup archive having it only some of the time
[23:37] <menn0> wallyworld_: if we do decide it should be there I think it still shouldn't cause the backup to abort if it's not there
[23:37] <wallyworld_> agreed
[23:37] <menn0> wallyworld_: not being able to backup if rsyslogd isn't working would be kinda shite
[23:38] <wallyworld_> yep, i hope stakeholders just say we can exclude it
[23:38] <menn0> wallyworld_: so how about ericsnow makes an immediate change to make the log optional so we can get CI unblocked
[23:38] <wallyworld_> menn0: yep, that was my suggestion above
[23:38] <menn0> wallyworld_: and then once we've heard more from stakeholders there might be a further change
[23:38] <menn0> wallyworld_: ok. i missed that.
[23:38] <wallyworld_> and ericsnow needs to ask nate to engage stakeholers to see what we need to do
[23:39] <wallyworld_> we need to explain to them the implications etc so they understand the issues
[23:41] <menn0> ericsnow: i've just noticed there's another cause for this CI test failure too
[23:41] <menn0> ericsnow: ERROR unable to get DB names: EOF
[23:41] <menn0> ericsnow: that's from a different run. http://juju-ci.vapour.ws:8080/job/functional-ha-backup-restore/1165/console
[23:41] <ericsnow> wallyworld_: will do
[23:41] <menn0> ericsnow: looks unrelated to the all-machines.log issue
[23:41] <wallyworld_> ty
[23:42] <ericsnow> menn0: I'll take a look in a sec
[23:46] <menn0> ericsnow: ok, i'll leave you to it
[23:46] <ericsnow> menn0: k
[23:48] <menn0> sinzui, wallyworld_: do you think we can the all-machines.log backup CI blocker to no longer be a regression. it's really more of a race condition that's only likely to happen in the test.
[23:48] <wallyworld_> menn0: i do but it's not my ultimate call
[23:55] <ericsnow> wallyworld_, menn0: http://reviews.vapour.ws/r/572/
[23:56] <wallyworld_> ericsnow: why machine0Log?
[23:56] <wallyworld_> it won't be always machine0 surely?
[23:59] <ericsnow> wallyworld_: right