[01:34] Bug #1512191 opened: worker/uniter: update tests to use mock clock [01:37] Bug #1512191 changed: worker/uniter: update tests to use mock clock [01:43] Bug #1512191 opened: worker/uniter: update tests to use mock clock [02:25] anastasiamac: ping? [02:25] cherylj: pong? [02:25] hey :) [02:25] got a few minutes to chat? [02:25] cherylj: isn't it sunday for u? [02:25] of course [02:25] technically, yes [02:25] :) [02:26] tomorrow's meeting? [02:26] yes, let me get my headset [03:00] mwhudson: so far joining the IBM partner network has granted me [03:00] 1. zero access to the things I want [03:01] 2. spam [03:05] davechen1y: \o/ [03:06] davechen1y: i'm not sure i've gotten as far as getting spam [08:47] dimitern: did I miss you at our 1:1 ? I thought I was in the room a while [08:48] jam, sorry, I overslept :/ [08:53] dimitern: k. no prob. I just was making sure we still had the right schedule with tz changes [08:54] jam, yeah, the schedule is correct [09:44] dimitern: ping [09:45] voidspace, pong [09:48] dimitern: hang on - trying something [09:48] dimitern: may still need your help, will re-ping if necessary :-) [09:48] voidspace, :) sure [09:56] dimitern: dooferlad: frobware: just grabbing coffee and taking a loo break, will be a couple of minutes late to standup [09:56] sorry! [09:56] voidspace, np [10:03] jam, standup? [10:07] dimitern: omw [10:07] wallyworld: ping [10:10] jam: today is the day, my first bootstrap failed with the replica set failure; the day keeps getting better.... ??? :) [10:32] dimitern: are addressable containers in 1.24? [10:33] voidspace, there are some parts of it, but it's not working fully [10:34] dimitern: so there could be people using deployed environments with addressable containers [10:35] voidspace, in 1.24 ? [10:36] dimitern: yep [10:36] voidspace, that's possible of course, but I highly doubt it [10:36] dimitern: so making them "not work" on maas 1.8 would be a backwards compatibility issue... [10:37] voidspace, they won't work on maas without devices support, i.e. <1.8.2 [10:37] dimitern: what do you mean by won't work? [10:39] voidspace, juju cannot guarantee container resources will be full released [10:39] dimitern: don't we support the older ways of requesting addresses - we used to [10:40] dimitern: so by "won't work" you mean "will work but there might be a temporary issue later under some circumstances" [10:40] dimitern: I really dislike the abuse of the phrase "won't work" [10:40] voidspace, the "temporary" issue is quite critical for some of our users [10:41] dimitern: specific users in specific cases [10:41] dimitern: that we can communicate with [10:41] voidspace, since maas 1.8.2 is in trusty, as a user you most likely won't even see that error [10:44] dimitern: we have many users with many use cases, breaking stuff that works for one use case - when we have fixed the problem for the other use case and can communicate with them - seems like a real backwards step to me [10:45] dimitern: we're [potentially] breaking things for some users - to avoid a problem that we've already fixed another way! [10:46] voidspace, all of that depends on the definition of "works" [10:47] voidspace, does it work if you can re-do the same deployment on the same maas only a certain number of times? [10:47] dimitern: well sort of but "the feature does what it says but under some circumstances might temporarily leak resources when you've *finished using it*" is a funny definition of "doesn't work" [10:48] dimitern: using thousands of containers within a short space of time is a pretty specific use case - and one we *have addressed* [10:48] dimitern: we're not ignoring that use case, but to block *all other use cases* because of it is not good [10:48] morning [10:48] voidspace, sorry, but that's sounds to me like saying "leaving instances around after destroy-environment is not our problem, as it did work fine while the environment was running" [10:48] perrito666: morningg [10:49] dimitern: leaving instances around, that cost money, would be much worse and we should really avoid it [10:49] dimitern: temporarily leaking a dhcp lease is not the same [10:49] voidspace, it's the same, but it takes more retries for it to become a problem [10:50] voidspace, the same as with memory leaks really - a small leak won't be a problem, unless you run your application for a long time [10:50] :) [10:50] dimitern: they're not at all the same [10:50] dimitern: resource leakage is not a good thing [10:51] voidspace: hi [10:51] dimitern: having temporary leaks that don't cost money under specific known corner cases - addressed in a later release - is not the end of the world [10:51] dimitern: stopping *existing deployments* working, is much worse [10:52] voidspace, why you keep calling the leaks "temporary" ? it's not like they're going way by themselves after a while [10:52] dimitern: the dhcp lease expires, true? [10:52] voidspace, that depends on the dhcp server config [10:53] wallyworld: I would like to talk to you about bug 1403689 [10:53] Bug #1403689: Server should handle tools of unknown or unsupported series [10:53] sure [10:53] voidspace, I think MAAS dhcpd uses rather long leases by default [10:54] wallyworld: did you fix it in the server or client? [10:54] wallyworld: server I assume [10:55] voidspace: tim had already found and fixed most of the cases of mapping series -> version, but there was one place in simplestreams search that was not covered [10:55] dimitern: this is a specific use case for *experimentation*, real users aren't burning through all their dhcp leases! I'm not saying ignore the issue - we've fixed it! (Requiring maas 1.8). [10:55] dimitern: however blocking all other "normal uses" because of it, seems wrong / bad [10:55] voidspace: so all the usages that i can see that would panic or return an error have been patched [10:56] wallyworld: where? [10:56] wallyworld: we have this problem on 1.20 / 1.22 servers... [10:56] voidspace, can you explain which "normal users" will be blocked? [10:56] wallyworld: which can't be upgraded with --upload-tools [10:56] i fixed it in master [10:56] and 1.25 i think [10:56] dimitern: anyone using addressable containers [10:56] voidspace, for existing environments, it will keep working as before [10:56] upload-tools is bad [10:56] voidspace, for new environments, the new behavior is enforced by default [10:57] we try not to encourage its use [10:57] wallyworld: however if you want to give users new binaries to test a fix it is what we have [10:57] is there a use case for it? [10:57] wallyworld: unless you can suggest an alternative? [10:57] dimitern: upgrading a deployed environment [10:57] voidspace, the only affected users will be those using maas 1.7 or earlier [10:57] our policy afaik is to get them to upgrade to latest stable relase [10:57] aka 1.25 [10:58] uness that's changed [10:58] wallyworld: that upgrade doesn't work [10:58] from 1.22 to 1.25? [10:58] wallyworld: we need to know if a proposed change has fixed the problem they have [10:58] wallyworld: yep [10:58] wallyworld: lots of horrible problems [10:59] we haven't cuaght that in CI? [10:59] nope [10:59] CI should have flagged those issues [10:59] wallyworld: https://bugs.launchpad.net/juju-core/+bug/1507867 [10:59] Bug #1507867: juju upgrade failures [10:59] wallyworld: for a specific user [10:59] looking [10:59] voidspace: ah right ignore-machine-addresses [11:00] wallyworld: not just that though [11:00] what else? [11:00] there was a mongo corrution [11:00] yep [11:00] but we were witing for logs [11:00] mogo got corrupt before upgarde [11:00] and could be fixed by running repairDatabase() [11:00] wallyworld: meanwhile, I've fixed the ignore-machine-addresses issue [11:01] yay [11:01] wallyworld: but I can't get them to test that [11:01] what about trying upload-tools with a 1.25 client? [11:01] and having a custom jujud in the path [11:02] unless w ebackport all the series versio fixes (and there were several, older clients will get stuck i expect) [11:02] and we are not doing any new 1.20/1.22 releases [11:02] voidspace, so is the whole argument about displaying a warning if we detect no devices api instead of an error? [11:04] wallyworld: is the fix in the client then? [11:04] wallyworld: the 1.25 client won't attempt to upload a version that the 1.22 server rejects? [11:05] dimitern: a warning would be better, rather than refusing to create a new container (for deployed environment that may already have addressable containers created under 1.24) [11:05] wallyworld: sinzui said he *would* do new 1.22 release if required for this bug [11:05] voidspace: --upload-tools will IIRC choose a jujud in the path - so you put the jujud that you want to test where the client can see it [11:05] voidspace, in this specific case, I agree [11:05] dimitern: \o/ :-) [11:06] voidspace, :) [11:06] voidspace: and using a 1.25 client with all the series version fixes should work (that's my theory) [11:06] wallyworld: ok [11:06] voidspace, how about in other cases - new environment on older maas (<1.8)? [11:06] voidspace: if we are to do a new 1.22, then all the series version fixes from tim and me would need backporting [11:06] wallyworld: right [11:07] dimitern: I care less I guess - but I don't think addressable containers are broken just because deploying thousands of them and using destroy-environment force causes an issue [11:07] dimitern: that's a very specific (and experimental) use case - that we have a fix for [11:07] dimitern: so even then, preventing addressable containers seems wrong to me [11:07] dimitern: not the world's worst wrong, only a minor wrong... [11:07] dimitern: so I would prefer a warning then too [11:08] voidspace: we can do that backport if needed. but it would be intersting to try 1.25 client with custom 1.22 jujud push up via upload-tools [11:08] wallyworld: however, we are seeing --upload-tools *not work* on 1.22 (with a custom jujud in the path) [11:08] wallyworld: try it yourself, deploy 1.22 then try --upload-tools with only the new jujud in the path [11:08] wallyworld: you hit the wily bug [11:08] voidspace: it would be intersting to see then error then so we can see where the issue is [11:08] voidspace, how about an extra flag - error by default, with the flag - warning and proceed? [11:09] dimitern: more flags! don't like it [11:09] voidspace: ok, i'll try, but likely tomorrow [11:09] need to finish ome other stuff tonight [11:09] wallyworld: I'll try again today and email you (currently working on a different environment) [11:09] ok [11:09] wallyworld: and confirm that I can't upgrade from 1.22 to latest trunk [11:09] wallyworld: and you can believe me or not! :-) [11:10] man, we need to fix our upgrades [11:10] voidspace, users are unlikely to see a mere warning in the case where juju is used as a tool (e.g. autopilot or a scripted deployer-based deployment) [11:10] and figure out why CI didn't catch the issues [11:10] yeah [11:10] voidspace: i believe you but just don't have enough info yet [11:10] wallyworld: sure :-) [11:10] :-P [11:10] wallyworld: I'll email you and you can tell me what more diagnostic information you need [11:10] once i see the symptoms i can look at the code and see where the issue might be [11:11] dimitern: users are unlikely to hit the problem [11:11] ok, and i'll try also [11:11] dimitern: and if they do we have a known fix for them [11:11] wallyworld: thanks [11:11] voidspace, which is? [11:11] dimitern: upgrade maas... [11:11] voidspace, and how are we communicating that to the users? [11:12] dimitern: all our available communication channels [11:13] dimitern: creating hundreds of containers and then destroying them is a pretty specific use case [11:13] voidspace, like the docs that suggest "oh, and by the way just in case set disable-network-management: true in your environments.yaml" ? :) [11:13] heh [11:14] voidspace, yeah, it's one of the cases we should support well - for density [11:14] dimitern: yep, I definitely agree we should make it work [11:14] we don't really have any choice in that matter [11:19] dimitern: new topic [11:19] dimitern: hopefully less contentious [11:19] voidspace, ok :) [11:19] dimitern: I'm trying to recreate the ignore-machine-addresses issue [11:19] voidspace, yeah? [11:19] dimitern: I have a deployed environment (current trunk) with a deployed unit of wordpress [11:20] dimitern: on that machine I've added a new nic [11:20] dimitern: this is my nic definition http://pastebin.ubuntu.com/13081446/ [11:21] dimitern: I see "eth0:1" with that assigned (and spurious) 10.0 address when I do ifconfig [11:21] dimitern: but I don't see any issue with the machine from juju [11:21] dimitern: it isn't visibly picking up that new (wrong) address [11:21] voidspace, did you wait ~10m for the instance poller to try refreshing the machine addresses? [11:21] dimitern: no... [11:21] dimitern: :-) [11:21] dimitern: I'll go get coffee and see what happens [11:22] voidspace, :) [11:22] dimitern: thanks [11:22] voidspace, np, might not be the only thing, but I'd start there [11:22] dimitern: cool [11:27] dooferlad: frobware: I'm back around if you wanted to chat [11:28] i need a review of this please, towards fixing a juju critical bug: https://github.com/juju/persistent-cookiejar/pull/9 [11:29] mgz_: ^ [11:29] mgz_: i don't think that this will entirely fix CI problems with the cookies though [11:41] frobware, dooferlad, dimitern, voidspace: did any of you get a chance to play with the updated kvm_mass script? [11:41] jam not I [11:41] k [11:41] dooferlad: did you have any other questions about bug #1510651? [11:41] Bug #1510651: Agents are "lost" after terminating a state server in an HA env [11:42] jam, not yet, I have to fix my vmaas first [11:42] jam: I probably will have, just not yet. [11:43] jam: not me either [11:43] k. I'm happy to get feedback if there are thoughts about what could make it better [11:43] the next step I was considering was creating networks [11:43] jam: I have 12+ nodes in various combos already. [11:43] say you could tell maas what networks and what spaces you wanted, and then it would make sure those existed in libvirt [11:44] frobware: hopefully a given node isn't in more than one maas, given each maas wants to control its subnet [11:44] jam: no I have some half-baked naming scheme that mostly keeps me out of trouble. [11:47] frobware: heh. I was just using "m1-foo1" and was planning to go to "m2" if I set up another maas. [11:49] jam: not yet [11:50] dimitern: no dice [11:50] dimitern: it still reports imaginative-hose.maas as the dns name [11:50] dimitern: and I can still ssh to the machine via "juju ssh 1" [11:51] dimitern: I guess imaginative-hose still sorts earlier [11:51] dimitern: although 10.0 should sort before 172.16 - anything else I can do to trigger the bug [11:51] voidspace, but do you see the extra address you added? [11:51] dimitern: see it where? [11:52] voidspace, well, in the log - as part of the machine addresses [11:52] dimitern: I'll check [11:53] dimitern: not in all-machines.log [11:53] dimitern: I'll change the log level and check again [11:53] in 10 minutes... [11:54] voidspace, for the sake of the test, you could reduce the instance poller timeout [11:54] dimitern: yeah, adding better instrumentation would be a good idea too [11:54] dimitern: thanks [11:55] voidspace, looking at the network package's address sort order is: public IPs first, hostnames next (except "localhost"), cloud-local, machine-local, link-local [11:55] dimitern: I'll try and find the bug report and see if it has repro instructions [11:56] voidspace, there's also the piece of code in maas that *always* adds the hostname of the machine in the response of the provider addresses [11:57] dimitern: yeah, but the bug was a problem for maas users - so it is obviously possible to trigger it [11:58] voidspace, I think the difference is machines hosting units (and needing a preferred private address) and machines not hosting units (which only need the public address to display in status) [11:58] voidspace, so I'd try not add-machine + add extra IP, but deploy a unit and then add extra IP on that machine [12:00] dimitern: I did the latter anyway (used deploy and not add-machine) [12:00] I'll find the bug report [12:00] if you want master unblocked, could someone please review this? https://github.com/juju/persistent-cookiejar/pull/9 [12:01] rogpeppe, reviewed [12:01] dimitern: ta! [12:59] morning [14:45] Bug #1512371 opened: Using MAAS 1.9 as provider using DHCP NIC will prevent juju bootstrap [14:48] Bug #1512371 changed: Using MAAS 1.9 as provider using DHCP NIC will prevent juju bootstrap [14:51] Bug #1512371 opened: Using MAAS 1.9 as provider using DHCP NIC will prevent juju bootstrap [15:16] wwitzel3, can i get a review of http://reviews.vapour.ws/r/3040/ ? (fixes-1511717) [15:17] wwitzel3, thanks! [15:21] cmars: if a user has both an old juju client installed, and a newer juju in ~/.local or something for testing a shiny new feature [15:22] do we break them? [15:23] mgz_, hmm.. i guess such a user would need to use separate JUJU_HOME directories in that case, wouldn't they? [15:23] well, I know they don't in practice [15:23] mgz_, but they'd have to, because the newer juju will have providers that the old juju doesn't understand [15:23] when I give someone a binary to test I don't say "only use this with JUJU_HOME=/tmp" [15:34] natefinch: I believe we are still on step #1: make the unit tests pass with go 1.5 [15:35] mgz_: oh man. is there a list of what needs to be fixed? The LXD provider is dependent on go 1.3+ due to limitations with the LXD Go library === meetingology` is now known as meetingology [15:35] the remaining issues with run-unit-tests-wily-amd64 look like big environmental things rather than nice easy things like map ordering [15:36] bug 1494951 looks like one place to start [15:36] Bug #1494951: Panic "unexpected message" in vivid and wily tests [15:37] mgz_: do you know if there's a team assigned to get us working on 1.5? [15:37] I know that some of the other-way-uppers have fixed bugs relating to it, but just as good citizens [15:37] dimitern: ping [15:37] frobware 's team is on bug squad i think [15:38] natefinch: mgz_: frobware: seems like getting 1.5 bugs fixed needs to be high priority [15:39] with a plugin provider it wouldn't be a short-term issue... [15:39] just sayin' :) [15:39] the other thing I see a lot of in the history is worker/peer group related test failures [15:40] ericsnow: we'd still need 1.5 support in trusty, and I don't think we'd also have 1.2 in trusty, so that would be a problem [15:40] natefinch: true [15:40] yeah, we can't backport toolchain to trusty [15:41] katco, ack [15:41] mfoord, pong [15:42] dimitern: reading through the ignore-machine-addresses bug it looks like it only affected containers [15:42] dimitern: is that true? [15:42] dimitern: https://bugs.launchpad.net/juju-core/+bug/1463480 [15:42] Bug #1463480: Failed upgrade, mixed up HA addresses 1.22:Fix Committed by thumper> [15:42] I assume without addressable containers on as they're starting pre-1.24 [15:42] mfoord, yeah [15:42] also it looks hard to reproduce (timing related) [15:43] dimitern: so to reproduce this I really need to add an lxc container [15:43] and add the virtual nic there (?) [15:43] I have a build with extra instrumentation and a shorter poll time on the instancepoller [15:43] mfoord, let me think [15:43] mgz_: how can we require 1.5 if 1.5 is not in trusty? [15:43] although it looks to me like the instancepoller only requests provider addresses and that machine addresses are done by the machiner [15:44] mfoord, yeah, the machine addresses are updated on machiner startup [15:44] dimitern: so really rebooting the machine should trigger it [15:44] dimitern: I'll add a container and reboot [15:44] once the container is up [15:44] mfoord, no need to reboot - just restart the machine agent [15:45] dimitern: but should I add the extra nic to the container or to the host [15:45] or both just to be sure... [15:45] mfoord, I guess both, and that address should be like 10.0.0.x [15:45] dimitern: ok [15:45] thanks [15:47] natefinch: you can't, but how are you running an lxd provider on trusty? [15:47] mgz_: Is trusty never having anything beyond juju 1.25? [15:50] natefinch: that's not the plan, but I don't know what the intention is with your lxd provider work [15:50] mgz_: our intention is to have a juju provider that uses lxd in 1.26 [15:51] so, what's your plan with the existing backports to trusty scheme? [15:52] * mgz_ enjoys circular conversations [15:53] our normal plan is to leave that up to QA to sort out... [15:53] ^^ [15:53] how to release software you're writing is not someone else's problem [15:53] mgz_: it is when someone else is putting up the restrictions while simultaneously telling us to deliver software that has a problem with those restrictions [15:54] you do know those two things are not from me, and are different parties, right? [15:54] mgz_: absolutely. Sorry if my tone indicated I thought it was your fault. I know it's not. [15:54] the distro, and sanity in general, limits what we can do in terms of backports [15:55] and mark, and the desire for shiny features, wants everyone to have a great experience [15:55] mgz_: I guess the answer is, people at a higher pay grade are going to have to figure out what to do [15:56] mgz_: the LXD provider will fail gracefully if lxd is not installed... but the code still requires 1.5 to build. [15:56] there's so such thing as a !build for go versions, right... [15:57] we can always do the equivelent in the debian rules, just rm the package [15:57] mgz_: no, but you can just set flags at build time to trigger !build code [15:58] mgz_: however, having the same codebase support both 1.2 and 1.5 would seem to be adding a lot of developer/qa/etc overhead.... but again, that's above my pay grade. [15:58] anyway, the first thing is getting it working well in development [15:59] natefinch: well, it's what we do currently, and isn't too hard [15:59] I know it's anti-go, but python code manages to support multiple *interpreter* versions okay [16:01] trusty has go 1.21. vivid has go 1.33. wily/xenial have go 1.5.1 [16:01] +. [16:01] mgz_, natefinch trusty will need 1.5 for lxd as well as juju [16:01] the current plan is to work on getting it into backports [16:01] so it can be used both by juju and lxd [16:02] alexisb: actual backports? or srued? [16:02] mgz_, actual backports [16:02] sru not needed [16:03] okay, ace. so, the provider failing neatly is a requirement. [16:06] Bug #1512399 opened: ERROR environment destruction failed: destroying storage: listing volumes: Get https://x.x.x.x:8776/v2//volumes/detail: local error: record overflow [16:10] anyway, this is something to work out early, thanks for asking nate, we do want to know exactly how we're getting the lxc provider distributed. === You're now known as ubuntulog2 [16:25] mgz_: do you know whether cookie isolation in CI has been done yet? [16:28] jog set the env var as you'd discussed, not sure it's everywhere it's needed but at least in the obvious place. [16:28] hm, actually that change got reverted [16:28] rogpeppe: gimme a sec, I'll find out. [16:28] mgz_, It broke other tests [16:28] older versions of Juju [17:09] Bug #1511771 changed: regression setting tools-metadata-url [17:12] Bug #1511771 opened: regression setting tools-metadata-url [17:15] Bug #1511771 changed: regression setting tools-metadata-url [17:18] Bug #1511771 opened: regression setting tools-metadata-url [17:21] Bug #1511771 changed: regression setting tools-metadata-url [17:28] ericsnow: you mentioned in your review of my better error message PR that I should rebase against either your or wayne's personal branches... I hesitate to rebase against a personal branch. Are you guys going to get one of those things landed soon so I can just rebase against the main lxd branch? [17:28] natefinch: just waiting for your reviews :) [17:29] ericsnow: that the support using local lxd as remote? [17:29] natefinch: http://reviews.vapour.ws/r/3012/ and http://reviews.vapour.ws/r/3013/ [17:30] ericsnow: ok, yeah, I'm looking at those now. Guess it'll be an unofficial review day for me [17:30] natefinch: thanks [17:32] mfoord, a heads-up on our recent changes to rendering /e/n/i --- https://bugs.launchpad.net/juju-core/+bug/1512371 [17:32] Bug #1512371: Using MAAS 1.9 as provider using DHCP NIC will prevent juju bootstrap [17:34] ericsnow: gah... saw the copied file from github.com/lxd, so I went to look at their repo for licensing... and they don't even have a LICENSE file. Geez [17:34] natefinch: yep [17:39] mgz_, here's a cookie update for 1.25, what do you think? http://reviews.vapour.ws/r/3041/ [17:54] ericsnow: wow, that lxd/shared.GenCert function is awful. Have you filed a bug to their project to de-awful it? [17:54] natefinch: was waiting :) [17:55] ericsnow: for what? [17:55] natefinch: until we had settled down on our LXD provider work [17:56] ericsnow: If that's the only way to create certs for LXD, seems pretty awful regardless of what anyone else is doing [17:57] natefinch: agreed [17:58] ericsnow: I'm willing to write a bug now if you'd prefer. [17:58] natefinch: sure, though I'd prefer the reviews first :) [17:58] ericsnow: right right [18:03] frobware: I saw [18:03] frobware: ouch [18:03] frobware: although I don't think it's the recent changes to be fair, I think it's maas 1.9 [18:03] frobware: will look tomorrow [18:03] EOD [20:57] thumper: afk breakfast [20:57] i'll call you after taht [21:01] Bug #1512481 opened: register dns names for units in MAAS [21:04] Bug #1512481 changed: register dns names for units in MAAS [21:07] Bug #1512481 opened: register dns names for units in MAAS [21:09] davechen1y: ack [21:34] hey waigani i'd like to try writing a CI test. where should I start? [21:37] cmars: https://github.com/juju/juju/wiki/ci-tests :) [21:38] waigani, thanks [21:38] np [22:33] ah finally, the server holding my irc got cut from part of the world [22:33] (and by cut I mean something sliced the fiber) [22:39] perrito666: https://www.reddit.com/r/cablefail/comments/1y2ei8/lost_in_the_woods_call_a_backhoe/cfh4wox [22:41] lol [23:26] thumper: ping