[00:06] thumper: dave around today? [00:22] wallyworld: yeah [00:22] thumper: i'm hoping he can look at bug 1335328 [00:23] * thumper looks at _mup_ [00:23] it's a follow up from work done last week to fix a windows issue [00:23] he did the fix, but curtis says might still be an issue there of somne sort [00:24] hmm... I don't think that davecheney has a test setup to be able to run the tests [00:24] do we have a machine he can use? [00:24] for windows that is [00:25] not sure, he ssemed to be able to do the original fix, thought he may be able to follow up [00:26] but i can also poke the cloud base guys via nate's squad [00:29] wallyworld: all I did was looked at where "unknown" was being set as the series [00:29] and tried to fix that [00:30] davecheney: ok, np. the cloud vase guys should fix it since they introduced it [00:31] wallyworld: ok, [00:31] osversion.go is all fucked up [00:31] bunch of logic in thre that is only called from on operating system [00:32] yeah :-( i'm pissed it didn't even compile at first [00:32] com.Commands = `(gwmi Win32_OperatingSystem).Name.Split('|')[0]` [00:32] out, _ := exec.RunCommands(com) [00:32] if out.Code != 0 { [00:32] return "unknown" [00:32] } [00:32] :-( [00:32] ^ so if gwmi is not installed, and they don't check the error [00:32] then give up and say it's unknown [00:32] nate was going to follow up last week [00:32] not sure of the status and he is away this week [00:33] not sure how t fark it got past review [00:33] wallyworld: thumper so, what do you want me to do ? [00:33] debug it via trial and error ? [00:33] ^ i'm fine with this [00:34] davecheney: i'll follow up with nate's team [00:34] wallyworld: that blocks me for today [00:34] if you are blocked and can fix that would be great :-) [00:34] understood [00:34] what is gwmi? [00:34] i won't get to talk to anyone east coast us till much later this evening [00:34] nfi [00:35] is that powershell? [00:35] y [00:35] why are they ignoring the error? which I'm assuming is in the _ [00:35] hulk smash [00:36] * thumper shakes his head [00:50] davecheney: do you just need a windows key? [00:50] davecheney: I know alexisb was asking about getting some people on the MSDN side of things last week [00:50] davecheney: but I can get you a key if you need one [00:55] rick_h_: thanks, lets see what happens today [00:55] davecheney: k [01:14] waigani: https://github.com/juju/juju/pull/182#issuecomment-47480573 -- merge failed, please don't set bug Fix Committed until it has [01:14] otherwise we may lose track of it [01:15] Caxw: hiya, i don't think your fix for replicaset startup (using Direct: true) has landed in 1.20 yet? [01:15] wallyworld: I backported something... can't remember which something. I'll check [01:15] axw: right, sorry about that [01:15] wallyworld: nope, not in 1.20. I'll backport now [01:16] axw: ta, i was looking at CI and local provider upgrade still intermittent [01:16] and then i checked the commits [01:17] wallyworld: did you see the azure bug I was working on Friday evening? I had put it against 1.20, cos it's a pretty bad first impression of the azure provider [01:17] it's not new though [01:17] AFAICT anyway [01:18] axw: yup, and i agree it should be worked on regardless of any misgivings in the email to the dev list :-) [01:19] wallyworld: what email? [01:20] axw: ah, just went to team leads, it said that "This issue is more than 6 months old. I am not inclined to divert efforts to fix regressions." from curtis [01:21] but i think if we can get a fix in place and it is low risk we should, since we will be highlighting all these other azure improvements [01:21] and so people can be expected to try out azure provider [01:21] and we don't want them to hit this bug [01:21] ok [01:21] well, that's imo anyway [01:22] wallyworld: so, one possible (kinda big, but I think low risk) way to fix it is to disable apt-get upgrade [01:22] it also makes bootstrap significantly faster, at the expense of not having up-to-date packages [01:23] hmm, that would then be inconsistent with other providers and i do think we'd want latest packages [01:23] wallyworld: I meant across the board [01:24] it's a bit arbitrary what we upgrade to, so I just wonder about the value [01:24] as opposed to just taking what the server team have released [01:24] hmmm. i can see the point [01:25] but we can't decide such things [01:25] there would be pros and cons [01:25] wallyworld: thumper [01:25] if we can make the current solution work that would be the preference for 1.20 [01:26] i'm going to change version/osVersion() to return a string, error [01:26] then all the places it returns an error [01:26] i will wrap it in a mustOsVersion function [01:26] wallyworld: sure, I will keep looking to see what's the deal. also, I'm OCR today so gotta take time out from fixing things [01:26] it is clear that juju cannot operate if it doesn't know the series [01:26] so we cannot continue at that point [01:26] axw: SURE, NP [01:26] bah [01:26] sorry capslock fail [01:26] :) [01:26] davecheney: how is that really changing anything? [01:27] I guess it is more idiomatic [01:27] thumper: we'll know where the unknown is coming from [01:27] rather than just being a string "unknown" [01:27] * thumper nods [01:27] fair enough [01:27] that propogates through the app until it finally hits something that tries to use it and fucks out [01:27] is it possible to catch the errors instead of panicing? [01:27] wallyworld: not in the places osVersion is called [01:27] sorry, not in all the places [01:28] ok [01:28] so you'd have two forms [01:28] osVersion => string, error [01:28] mustOsVersion => string or panic [01:29] not being familiar with the code, hard to say, but 1 way is always preferable, but if not better to panic than let an impossible value propagate [01:29] wallyworld: -EDOUBLENEGATIVE [01:30] did i leave out a comma [01:30] wallyworld: it wasn't clear [01:30] are you for panic'ing or not [01:31] prefer 1 way of doing things, but if that's not possible, then i guess a panic is ok since it prevents impossible value being propagated [01:31] func macOSXSeriesFromKernelVersion(getKernelVersion kernelVersionFunc) string { [01:31] majorVersion, err := kernelToMajor(getKernelVersion) [01:31] look at this !>?>! [01:31] if err != nil { [01:31] logger.Infof("unable to determine OS version: %v", err) [01:31] return "unknown" [01:31] sigh [01:31] how the fark did this get past review [01:31] so, unknown is being used as a sentinal for error [01:31] but it happens to be the same type as the valid value [01:31] so fits throught the hole and leaks [01:37] func (*CurrentSuite) TestCurrentSeries(c *gc.C) { [01:37] s := version.Current.Series [01:37] if s == "unknown" { [01:37] s = "n/a" [01:37] } [01:37] more evidence that "unknown" is used as a sentinal for error [01:41] menn0: before I land, can you please see my reply on https://github.com/juju/juju/pull/188 [01:41] menn0: if you've got a suggestion on how to improve that, I'm happy to incorporate [01:41] axw: I was just looking at it now [01:42] axw: give me 2 mins [01:42] sure [01:42] ta [01:43] axw: I don't quite understand why it's more difficult to do the mongo setup/upgrade work from Run() instead of inside the state worker [01:44] menn0: the state worker is started indirectly [01:44] sure the mongo setup work can be done can be done just before the first StartWorker call in Run() [01:44] surely even [01:44] there's a "state starter" worker, which starts the state worker when it knows it's a state server [01:45] axw: ok, right. it's getting clearer to me now [01:45] menn0: it could, but it'll involve checking for state server info, if we don't have it connecting to the API [01:46] axw: so a jujud can theoretically become at state server any time? [01:46] theoretically [01:46] ok I get it then. [01:46] there was talk about upgrading non-state servers to state servers [01:46] we haven't done that yet [01:46] what you've done makes more sense now. [01:46] okey dokey [01:46] * menn0 checks the PR one more time [01:47] axw: you land the work on both 1.20 and trunk, right [01:47] wallyworld: yes I will do [01:47] great, once all the 1.20 fixes in the area of upgrades are landed, i'll see if i can run the upgrade job on CI by hand [01:48] to get a feel for if it will be happy [01:49] axw, wallyworld: the PR looks ok to me now that I understand the way the state worker is started [01:49] \o/ [01:49] menn0: cheers [01:50] axw, wallyworld: I have been manually doing what the test that was failing in CI was doing (well with a little script) [01:50] axw, wallyworld: I'm happy to test it again once it lands. [01:50] waigani: are you still working on https://github.com/juju/juju/pull/66 ? would you please put "WIP" in the title if it's not ready for review? [01:50] failing test even ... [01:50] menn0: thanks [01:50] and it's been failing for you too? [01:51] wallyworld: sporadically yes (it's a race after all) [01:52] yeah. cool. just checking it *was* failing at least sometimes [01:52] davecheney: https://github.com/juju/juju/pull/156 <- is this ready to land? [01:53] wallyworld: I was never able to test the exact scenario because my machine-0 is trusty but thumper has given me a great tip about how to test with the local provider on canonistack (which gives you all precise machines) [01:53] ok [01:53] yes, i believe so [01:54] tim and william have resolved their differences [01:54] it's waiting for the 1.20 trunk to sabalise before I merge more difficult to backport changes [01:54] ok [01:54] menn0: did you log into a canonistack instance and run a local provider env from there? or was there some other magic involved? [01:55] axw: done. also added comment explaining that I'm waiting for the identity stuff to be sorted. [01:55] waigani: thanks [01:55] wallyworld: no I have been testing locally so far but will test with a local provider on a canonistack instance once this PR lands [01:55] ok [01:55] (that was thumper's trick) [02:12] waigani: you have failing tests in your branch if you didn't realise [02:14] axw: on PR 182? I've already fixed it [02:16] waigani: not according to the bot [02:16] http://juju-ci.vapour.ws:8080/job/github-merge-juju/289/consoleFull [02:16] there's a bunch of the usual mongo failures after your ones in cmd/juju [02:18] axw: okay, on it [02:32] Anyone else seen this test failure in state? export_test.go:58: [02:32] c.Assert(err, gc.IsNil) [02:32] ... value *errors.errorString = &errors.errorString{s:"cannot create log collection: local error: bad record MAC"} ("cannot create log collection: local error: bad record MAC") [02:33] s.ConnSuite.SetUpTest(c) fails [02:33] mac relates to the tls handshake between mongodb and juju [02:33] seems intermittent [02:33] race condition? [02:34] I've heard in the past that there's a problem with the mongo 2.4 TLS support, and that's fixed in 2.6 [02:34] we have seen that error occasionally, I think it used to be worse [02:36] wallyworld: https://github.com/juju/juju/pull/194 [02:36] axw: ok [02:37] the hell [02:37] odessa(~/src/github.com/juju/juju/version) % go test [02:37] # github.com/juju/juju/version [02:37] import cycle not allowed in test [02:37] package github.com/juju/juju/version (test) imports github.com/juju/juju/testing imports github.com/juju/juju/environs/config imports github.com/juju/juju/version [02:37] FAIL github.com/juju/juju/version [setup failed] [02:37] test's for this package don't even pass on darwin at the moment [02:40] thumper: looking [02:41] +1 [03:00] wallyworld: does the merge command work there, or manual [03:00] works [03:37] uh, what the f [03:37] juju/version tests don't pass at all on osx [03:37] thumper: OMG [03:37] i know how this happened [03:37] someone started to use version.Current in cmd/juju [03:38] so that meant version.Current.Series had to make sense on any _client_ platofmr [03:49] https://github.com/juju/juju/pull/195 [03:54] reg := regexp.MustCompile("^" + key)) [03:54] match := reg.MatchString(series) [03:54] the award for the most egregious use of regex [04:00] davecheney: wha??? [04:00] that makes no sense [04:00] why? [04:00] strins.HasPrefix(series, key) anyone ? [04:10] davecheney: key doesn't contain pattern characters? [04:11] axw: no idea [04:11] can't run the tests [04:11] well, i subtituted osversion_windows for osversion_linux and the tests pass [04:12] I mean, HasPrefix is only equivalent if key doesn't have any meta-chars [04:15] var windowsVersions = map[string]string{ "Microsoft Hyper-V Server 2012 R2": "win2012hvr2", "Microsoft Hyper-V Server 2012": "win2012hv", "Microsoft Windows Server 2012 R2": "win2012r2", "Microsoft Windows Server 2012": "win2012", "Windows Storage Server 2012 R2": "win2012r2", "Windows Storage Server 2012": "win2012", [04:15] they aren't regexes [04:15] the original author just used the wrong hammer [04:15] okey dokey [04:36] * thumper afk taking kids to ice skating [04:36] bbl === vladk|offline is now known as vladk [05:21] axw: jeez, not having much luck with session closed errors :-( [05:21] indeed [05:21] driving me crazy [05:22] we gotta fix those, maybe we can spike on it next week [05:22] yup, sounds like a plan [05:29] or this week if we get time after 1.20 goes out [05:45] wallyworld: axw how can I go from a commit, to the review that supports that commit ? [05:46] ie, given a hash, how can I find the PR for that hash [05:46] hmmm, not sure off hand [05:46] wallyworld: you're probably not going to like the answer [05:46] which is? [05:46] you can't [05:46] wtf [05:47] wallyworld: https://github.com/juju/juju/commit/f1e95e6507a30fab8a31508f46bfd70753ef452a#diff-0d404a754ae93c99bdaa41896be9ce3e [05:47] why is github so popular if it is missing so many key festures [05:47] i cannot find the PR for this commit [05:47] wallyworld: unit testing isn't popular in PHP [05:47] they don't miss what they don't know [05:47] sigh [05:48] wallyworld: can we make the bot insert the link to the PR in the commit message [05:48] we can [05:48] hmm, maybe I just need to find hte "merge commit" [05:48] davecheney: yeah, follow the parents up [05:48] https://github.com/juju/juju/commits/master/version/osversion_windows.go [05:48] then the PR for the merge [05:49] or.. hmm [05:49] that didn't work [05:49] nup [05:50] i cannot find who reviewed this file [05:50] https://github.com/juju/juju/commits/master/version/osversion_windows.go [05:50] davecheney: https://github.com/juju/juju/pull/95 [05:50] I think [05:51] so, when github does the merge you get [05:51] https://github.com/juju/errors/commit/6b882ebdb3eb178615c864192a2c1b4502ed86c4 [05:51] when our bot does it [05:51] we get no record [05:52] oh the irony [05:52] https://github.com/juju/juju/pull/95#discussion_r14192953 [05:52] there's plenty of those merge commits in the log [05:52] from the bot [05:52] nate spotted the problem 5 days ago [05:52] but the OP never came back to followup === vladk is now known as vladk|offline [06:46] morning dimitern [06:47] morning jam === vladk|offline is now known as vladk === vladk is now known as vladk|offline [07:52] morning all [07:52] hey voidspace [07:53] voidspace, we should have a quick chat to bring you up-to-speed with the current networking / ipv6 work [07:53] dimitern: yes, let me get coffee etc first and I'll ping you [07:54] voidspace, sure, no rush [07:54] dimitern: thanks :-) [07:56] morning [07:56] jam: ping [08:14] dimitern: ok, hangout? [08:14] voidspace, just a sec [08:14] dimitern: no problem [08:19] jam: no need to pong back anymore, already got it [08:46] wallyworld: finally found out the secret sauce to get azure to bootstrap with upgrades [08:47] wallyworld: I measured the time for a bootstrap with/without upgrade, apt-get upgrade added 6 minutes [08:47] 6 minutes [08:47] I'll propose my fix and mail the list about making upgrade optional and off by default [08:47] wow [08:47] also want to test apt-get using eatmydata [08:47] that may make it more reasonable [08:47] ok [08:48] i wonder if we can deploy an apt cache to azure [08:48] or mirror [08:50] wallyworld: I believe there is a mirror already [08:50] cloud-init configures apt mirrors [08:50] 6 minutes... doing what mostly? [08:51] mgz: that's timed from before "apt-get upgrade" to after [08:51] so... whatever apt-get upgrade is doing [08:51] I really don't think not upgrading is an option.. [08:51] why? [08:51] we have security fixes for a reason [08:53] mgz: but we don't continue upgrading after it's provisioned, so it just seems so arbitrary to do it at that point [08:53] mgz: if people want to keep it secure, then there should be something doing upgrades on a regular basis [08:53] yeah, and we don't reboot for kernel either [08:53] not just once and then that's it [08:54] anyway, I'll test with eatmydata, maybe it won't be so bad [08:54] axw: mgz: can we start the standup now? [09:05] axw: standup? [09:09] wallyworld: sorry brt [09:39] jam: around? just getting to looking at mongodb/juju-mongodb -> 2.6.x === Ursinha is now known as Ursinha-afk === Ursinha-afk is now known as Ursinha [09:56] wallyworld: the azure virtual network thing may have been coincidental [09:57] it's not happening now [09:57] ah [09:59] maybe see how it goes over the next day or so [10:02] yep [10:03] could someone please review https://github.com/juju/juju/pull/196 [10:04] resolves a 1.20 issue [10:05] axw, reviewed [10:05] thanks dimitern [10:05] * axw comes back later to handhold the bot [10:08] wallyworld: I might just change it back anyway, so we at least don't regress [10:08] rightio === vladk|offline is now known as vladk [10:37] dimitern: ping [10:37] dimitern: the "COntainer Addressability in EC2" section of the "Juju Networking Support Changelog and Roadmap" doc [10:38] dimitern: says: Found a working procedure to spin up LXC containers, allocate a private IP from the host using EC2 API, [10:38] dimitern: why are we using LXC on EC2? [10:38] voidspace, we need addressable containers everywhere basically, EC2 is the first step [10:39] dimitern: why do we need containers? [10:39] dimitern: I mean, isn't EC2 essentially already a container [10:39] voidspace, for higher density deployments [10:39] dimitern: I understand the need for addressability [10:39] dimitern: heh, containers within containers [10:39] voidspace, no, EC2 instances are more like KVM machines than LXC containers [10:40] dimitern: so they are containers... [10:40] dimitern: ok, and containers on EC2 is primarily a networking issue? [10:40] dimitern: I'm wondering why it's bundled with the networking story [10:40] seems like a separate issue - unless the *primary* problem of "nested containers" is addressability [10:41] voidspace, it is a networking issue - we can't get cloud-local ips for containers without additional work [10:41] dimitern: ok, cool [10:41] thanks [10:41] so we use the host api to get a cloud local address for the contained container [10:41] and providers need to support this [10:42] voidspace, yes, if the provider supports addressable containers, it needs to implement AllocateAddress, so we can get an extra private ip for an instance, which we later assign to the container on that instance [10:43] cool === gsamfira1 is now known as gsamfira [10:51] dimitern, quick check: when we're adding the implicitly pre-existing networks to the model, will we be assuming that services require juju-public (if it exists) along with juju-private (which they have to have to communicate with the state servers)? [10:51] voidspace: hey, welcome back [10:53] fwereade, right, juju-public and juju-private will be created automatically post-bootstrap, and all instances will be implicitly on them, regardless what other networks might be specified [10:53] perrito666: hey, hi [10:53] perrito666: it was a nice relaxing time away - so I'm actually happy to be back [10:53] perrito666: mostly just because I'm happy... [10:54] perrito666: how were things? [10:54] dimitern, juju-private is definitely required for all machines/services [10:54] dimitern, juju-public, if it exists, should probably be required for state servers and services -- at least by default [10:54] fwereade, yes, but it won't be part of the requested networks lists, it will just be assumed it is [10:55] dimitern, hmm, that feels special-casey [10:55] fwereade, it is pretty special :) [10:55] dimitern, not sure it's special enough [10:55] dimitern, juju-private, yes [10:55] fwereade, yeah [10:56] dimitern, juju-public (1) might not even exist and (2) might not be wanted for a number of services [10:56] dimitern, even for the state server, potentially [10:56] jam: chrome just crashed - sorry! [10:56] fwereade, but since that's only important at the time expose was called, it should be fine [10:57] dimitern, I think if it does exist juju-public should be the default for new machines/services, but I suspect that if we're being explicit about required networks we should not automatically tack on juju-public [10:57] fwereade, why not? [10:57] dimitern, ^juju-public seems like a very reasonable thing to ask for (eg) a super-secret db server [10:58] dimitern, and if juju-public is always assumed that's an immediate contradiction [10:58] morning, brb, updates want me to restart [10:59] fwereade, if juju-public is available at all, we can use it when needed (during expose), for any service, right? [10:59] fwereade, it won't get configured specifically [11:00] jam: lost connection! [11:00] fwereade, until we need it [11:00] jam: but no, that ticket was dropped [11:00] jam: it should be deleted [11:00] sorry [11:00] dimitern, I'm worried that being *unable* to say "don't allow this service on the public network" is a problem [11:00] voidspace: deleted [11:00] thanks [11:00] jam1: I have no in progress tasks [11:00] welcome back voidspace [11:01] dimitern, and given that that network may or may not exist I think we should avoid special-casing it too much [11:01] wwitzel3: hey, hi [11:01] fwereade, let's think about this a bit [11:01] dimitern, making it a default *if not otherwise specified*, but requiring juju-public explicitly if you also ask for other networks, feels more like what we need [11:01] dimitern, go for it [11:01] fwereade, in ec2 any machine is on the public network, and you can't restrict this with the default vpc setup [11:02] dimitern, I'm more thinking about maasy environments [11:02] dimitern, if the provider has no machines not on the public network, then provisioning will fail, and hopefully we can clearly explain why [11:03] fwereade, we can still say deploy --constraints networks=^juju-public [11:04] dimitern, how can that work if juju-public is always implicitly a required network? everything with ^juju-public will fail [11:06] fwereade: rogpeppe call time? [11:06] does anyone know where --version is being registered? [11:06] dimitern, if it's a required network *by default*, that can either be explicitly unspecified with `--network=` or (maybe better?) handled by the cli client such that ^juju-public doesn't send that, we might do better [11:06] rick_h_: ah yes [11:06] rick_h_, balls sorry [11:06] rick_h_, omw [11:07] fwereade, implicitly in the sense they're not part of instance selection criteria explicitly, unless explicitly specified [11:08] fwereade, i.e. trying --constraints networks=^juju-private will fail on the juju side, but networks=juju-private,^juju-public will be ok, and passed to the provider [11:09] dimitern, I thought --networks didn't take ^ [11:09] fwereade, i'm talking about constraints, since that's the way now to exclude networks [11:28] is there some pattern when operations on state are retried (with state.run(...)) and when they are not? [11:48] fwereade: andrew reviewed this and i fixed issues, you may want to take a final look https://github.com/juju/juju/pull/185 cool if you don't have time [11:49] wallyworld, I will try, for what that's worth :/ [11:49] tasdomas`, in general we'd expect them to be retried, is there something paticular yu're looking at? [11:49] dimitern, sorry, meeting took over my brain [11:50]