[00:00] cherylj: so, we'll need to get that fix for bug 1567170 fixed [00:00] Bug #1567170: Disallow upgrading with --upload-tools for hosted models [00:00] yes, it landed in next, so we have some time [00:00] ah phew ok [00:02] wallyworld: so juju is looking for mongodump in /usr/lib/juju/bin/mongodump? [00:02] mwhudson: from memory, yreah [00:02] ERROR while creating backup archive: while dumping juju state database: error dumping databases: error executing "/usr/lib/juju/bin/mongodump": 2016-04-13T00:02:21.072+0000 error parsing command line options: --dbpath and related flags are not supported in 3.0 tools.; See http://dochub.mongodb.org/core/tools-dbpath-deprecated for more information; 2016-04-13T00:02:21.072+0000 try 'mongodump --help' for more information; [00:02] sounds like juju needs moar fixing [00:03] but well no reason not to upload my package [00:03] mwhudson: yeah, juju is currently hard coded to look in the mongo 2.4 binary path for mongodump [00:03] wallyworld: I fixed my patch, can you look at it so I can merge it after dinner? [00:03] perrito666: will do [00:04] mwhudson: it should be a quick fix to sort out that mongodump path, i'll do that today [00:04] wallyworld: that was after i copied the 3.2 tools to /usr/lib/juju/bin/ [00:04] wallyworld: looks like the command line needs to change too [00:04] oh right, yes, i didn't read the error [00:04] damn it [00:04] BTW MIR for this is going to be a doozy [00:05] yes [00:05] but that's not today's problem either [00:05] mwhudson: so we have today to get a blessed beta4 for includsion in xenial main [00:05] is my understanding [00:06] wallyworld: well i guess Someone (tm) should get to working filing MIRs for the deps [00:06] mwhudson: yeah, i must admit i have nfi about the process with all that [00:06] it is being driven by others [00:07] mwhudson: but i thought we had permission to put mongo packages needed for juju into main [00:07] along with juju itself [00:07] isn;t that already approved? [00:08] wallyworld: oh maybe [00:08] i thought so but am out of the loop a little on all that [00:29] wallyworld, axw: I'm thinking about picking up bug 1456916. this also relates to these old tickets: bug 892552 and 802117 [00:29] Bug #892552: juju does not extract system ssh fingerprints [00:30] menn0: it will be a fair bit of work [00:30] wallyworld, axw: to me, the easiest win seems to be to use per-model known hosts files for juju ssh and juju scp (instead of /dev/null) [00:31] wallyworld: why would we embed base suite? [00:31] perrito666: because that supresses logging output to console and other things [00:31] k [00:32] wallyworld, axw: that doesn't help with the "first connect" problem but does prevent MITM after the first connect while avoiding the scary warnings when addresses are reused between models [00:32] menn0: seems ok to me - so long as we can extract the fingerprints from cloud init, but i am not an expert [00:32] I guess there's still the problem of killing a machine and recreating it in the same model which could lead to the same address being reused [00:32] that is the issue [00:33] one of [00:33] wallyworld: yeah, one of [00:36] wallyworld: ok so the ultimate solution would be for juju to extract the ssh key for each new machine from cloud-init and store it... and then have juju ssh and juju scp get told the key before connecting so it can be written into the client's known_hosts [00:36] wallyworld: something like that? [00:37] menn0: yep [00:37] menn0: +1 [00:37] menn0: i assume we pass pass a bespoke known hosts file to the client each time [00:39] mwhudson: are we putting juju-mongo3.2 into trusty? [00:41] cherylj: i think you did a fix so that if mongo 3.2 were not found, it would not retry? ill need to retest, but it appeared not to work when i bootstrapped a trusty controller. took ages to bootstrap and cloud init was filled with mongo3.2 retries [00:42] i may be able to get a fix done today if i can sort out some other issues [00:46] wallyworld: we don't pass any known_hosts, we pass /dev/null. that's part of the problem [00:47] I railed against that at the time it was done, but the change went in anyway [00:47] sigh [00:47] axw: awesome, i removed juju-mongodb (only have mongo 3.2 installed) and tests fail \o/ [00:48] wallyworld: :/ [00:48] wallyworld: I'm guessing testing/mgo.go needs updating to look in the right spot [00:48] yep [00:48] i might worry about that later [00:53] perrito666: btw, removing ensureAdminUser is fine; need to double check restore, but bootstrap etc all works as expected [00:53] sweeeet, this is a great week [01:08] see you tomorrow juju-dev === redir is now known as redir_eod [01:16] wallyworld: yeah, I did. The fix was in juju/utils, so you'll need to make sure your deps are up to date to pull it in [01:18] ok, i'll double check [01:19] cherylj: ah, i changed my scripts (long story) and my godeps one had an issue, utils just got updated, so i'll retry [01:20] ok, phew [01:20] :) [01:20] menn0: if you get a minute https://github.com/juju/gomaasapi/pull/33 [01:22] mwhudson: when you upload the mongo 3.2 tools, the package will get installed automatically with mongo 3.2 db right? [01:23] * thumper needs food [01:23] * thumper goes to hunt in the kitchen [01:24] mwhudson: i'm testing a fix for that mongodump issue (the path and the args) and have no tools available - can you provide me wit the deb i can install manually? [01:24] Bug #1569632 opened: status: decide on statuses for migration [01:30] Bug #1569632 changed: status: decide on statuses for migration [01:33] wallyworld: it's in ppa:juju/experimental [01:33] ta [01:33] wallyworld: juju-mongo-tools32 [01:33] wallyworld: juju-mongo-tools3.2 [01:33] rather [01:38] mwhudson: yay, my fix worked, backups good now [01:38] wallyworld: nice [01:39] mwhudson: but we'll need that tools deb in the repos etc before beta4 goes out [01:39] wallyworld: slangasek said he'd look at it today [01:39] awesome [01:45] Bug #1569632 opened: status: decide on statuses for migration === alexisb-afk is now known as alexisb [01:48] wallyworld, menn0, axw is one of you available to look at: https://bugs.launchpad.net/juju-core/+bug/1569467 [01:48] Bug #1569467: backup-restore loses the hosted model [01:49] I suspect that it has come from the made-model-workers branch [01:49] alexisb: this is a known deficiency of backup/restore, meant to be fixedin 2.1 or later IIRC [01:49] but it's just a theory at this point [01:49] I don't think it is [01:49] I can do this test manually and it works [01:49] It's just failing in CI [01:49] for some reason [01:49] cherylj: oh? I was under the impression that backup/restore didn't work with hosted models [01:50] axw: the backup is being created from the admin model [01:50] axw, BR should work on the admin model [01:50] ok [01:50] the problem that's happening in CI is that the dummy hosted model that is created to rebootstrap with is not removed [01:50] and the existing models don't seem to get "hooked up" === natefinch-afk is now known as natefinch [01:51] I'm wondering if the restore is actually dying in the controller [01:51] i'm working on some other backup fixes for mongo 3.2 (with tools), can look a bit later [01:51] and it doesn't complete [01:51] I think we need another run with the environment kept after the failure [01:51] thank you wallyworld we leave it in your capable hands [01:51] I'll update the bug with where I am with things [02:07] anyone seen messages like this when a unit gets a hook error? https://paste.ubuntu.com/15803425/ [02:08] juju resolved --retry doesn't seem to get picked up unless i reboot the machine that it's happening on [02:08] should i open a bug? [02:09] cmars: probably [02:09] ok [02:13] d'oh, i already did. ok, attached a full log to LP:#1566130 [02:13] Bug #1566130: awaiting error resolution for "install" hook [02:28] wallyworld, anastasiamac: I have a question about the imagemetadata api endpoint. It has code to default the arch to amd64: https://github.com/juju/juju/blob/master/apiserver/imagemetadata/metadata.go#L184 ...when would that be triggered? Why would we ever get image metadata that doesn't specify an arch? Isn't the arch like one of the most important pieces of information? [02:29] natefinch: hopefully never... [02:29] (as in hopefully nver triggered) [02:29] anastasiamac: the reason I ask is because of bug #1564791 [02:29] Bug #1564791: 2.0-beta3: LXD provider, jujud architecture mismatch [02:30] natefinch: so when u reproduce the bug, u fall into this code? [02:30] anastasiamac: when using LXD, machines after the bootstrap machine on non-amd64 hosts are downloading amd64 juju binaries for some reason, which obviously don't work [02:31] anastasiamac: it's easy to reproduce the bug, it's very hard to add debugging changes, because it only happens if you *don't* use upload-tools. I'm actually not sure how to test this with my own binary [02:32] natefinch: I *think* u will only have no arch by this stage if ur image metadat does not have arch... which would b weird [02:32] natefinch: i think some early simplestream metadata may have omitted the arch, not sure now. but now a days, it should never be "" [02:32] wallyworld: ok, this may be a red herring then. [02:33] natefinch: well, depends on the simplestreams metadata [02:33] if they generate metadata without an explicit arch, then boom [02:34] actually, looks like they're explicitly requesting the wrong tools, now that I re-read the logs ni the bug [02:35] Attempt 1 to download tools from https://10.0.3.164:17070/tools/2.0-beta3-xenial-amd64 [02:36] so, not really the fault of the streams that it gave them what they were asking for [02:36] I'm having a heck of a time trying to figure out how we determine the arch that eventually goes into that download URL, though [02:41] natefinch: it should be the arch of the host on which the tools are to be run [02:50] menn0: can i bother you for a 99% red one? http://reviews.vapour.ws/r/4562/ [02:52] wallyworld: looking [02:54] wallyworld: best. change. ever. [02:55] yay :-) [02:58] wallyworld: I know the arch should be the arch of the host on which it is run, but in this case, the bug is that it's not :) [02:59] i need to read the bug [03:00] wallyworld: basically just that non-amd64 LXD environments are downloading amd64 tools for some reason [03:00] Bug #1554677 changed: Provider help topics need to be updated for 2.0 <2.0-count> [03:00] Bug #1569652 opened: help text for juju grant needs improving [03:00] Bug #1569654 opened: help text for juju revoke needs improving [03:01] were they bootstrapped on amd64 [03:01] with --upload-tools [03:02] wallyworld: no. it's been seen on arm64 and s390x [03:02] wallyworld: upload-tools actually fixes the problem [03:05] hmmm, not sure then, i'd have to debug [03:05] you'd need to trace the simplestreams request and response [03:06] wallyworld: do you have tips on how to debug? Since upload-tools avoids the bug, I don't really know how to get my own jujud into the system, but still have it look to simplestreams to get tools for further machinoes === menn0 is now known as menn0-afk [03:07] i'd just hack the code somewhere, eg don't store the uploaded tools in state to force a simplestreams check or something [03:08] not sure off hand, i'd need to read the code [03:08] wallyworld: ok, np [03:12] Bug #1567104 changed: Unable to connect to API [03:19] hi, I'm trying to understand when config.yaml is parsed when deploying a charm. [03:19] I'd like to use some values stored in config.yaml when my install, start and stop hooks are executed. [03:20] Do I have the right idea/ [03:20] ? [03:23] terje: yep.... anything the user sets in the charm's config will be available to the charm code by having the charm call the config-get command, which will any configuration set on the charm (default in yaml) [03:23] terje: https://jujucharms.com/docs/stable/authors-hook-environment#config-get [03:25] ok, great. [03:25] wallyworld: looks like perrito666's fix for the mongo path has some genuine test failures: http://juju-ci.vapour.ws:8080/job/github-merge-juju/7344/console [03:25] cherylj: already on it :-) [03:26] thanks :) [03:26] a couple of missing "." [03:26] natefinch: is config-get a shell function, sourced prior to executing my hooks? [03:26] (I don't see a shell command on my system) [03:27] terje: yep, there's a bunch of shell commands that get installed with a charm. They're called "hook tools" in juju terminology: https://jujucharms.com/docs/stable/authors-hook-environment#hook-tools [03:27] terje: they are in the path when a hook is executing [03:27] awesome, thank you. [03:27] ok [03:30] axw: https://bugs.launchpad.net/bugs/1551779 is fixed right? [03:30] Bug #1551779: New azure provider ignores agent mirrors <2.0-count> [03:31] wallyworld: no, didn't make the cut: https://github.com/juju/juju/pull/5068 [03:31] ok, np [03:31] it's ready to go whenever [03:31] jus saw the bug get changed, i thought it was fix committed [03:40] axw: i have to go afk for 45 minutes or so, if you see any landing failures, could you please retry? [03:40] wallyworld: okey dokey [03:41] ta [03:43] Bug #1568177 changed: configFunctionalSuite.TestUsingTCPRemote lxdclient no addresses match [04:05] sweet [04:06] well that was out of context... [04:06] it was for this: mwhudson: yay, my fix worked, backups good now [04:06] :)\ [04:06] irc window was scrolled [04:21] man, juju debug-log is sloooooooow [04:45] wallyworld: http://reviews.vapour.ws/r/4563/ reviewed [04:45] menn0: ty [04:45] menn0: the previous PR was actually lgtm, i just fixed the tests, i'll see if i can pull the job [04:46] or i can tweak those method names in a driveby next time [04:48] wallyworld: it's really not that important [04:48] wallyworld: don't pull the current job [04:48] menn0: ok, i'll fix real soon in another branch [04:59] wallyworld, axw: regarding SSH host key management... looks like there's 2 ways to tackle it [04:59] you can have cloud-init post them to a URL [04:59] or your can pre-generated them and have cloud-init install them for you [05:00] I prefer the latter as it's simpler, but it does mean a little extra work for the server to do for every new machine [05:01] menn0: we did explorer having a listener on the cloud init url at one point (for provisioning status and errors), but it is more moving parts [05:01] what's the cpu cost of generating a key? [05:01] wallyworld: pre-generating them server side does seem better [05:01] wallyworld: I will check [05:04] menn0: I don't think we can do option 1 without a lot of added complexity [05:05] as in, where do we post to [05:05] menn0: my intention was to generate client-side for bootstrap, then have the server generate a new key and publish it into state [05:05] menn0: then we'd query state for public keys when we want to SSH [05:06] no prompting needed for verifying fingerprints [05:06] axw: I don't like the complexity of #1 either. we'd need a endpoint on the API server. [05:06] axw: I think we're thinking the same thing. [05:07] axw: what do you mean generate the key client side and then have the server generate a new key though? [05:08] axw: what i'm thinking is that when a new machine is to be created, a the host keys are generated on the server and inserted into state, and also passed to cloud-init [05:08] when you pass the keys to cloud-init, ssh keys aren't generated on the machine and the one that were passed are used instead [05:08] axw: http://cloudinit.readthedocs.org/en/latest/topics/examples.html#configure-instances-ssh-keys [05:08] axw: "ssh_keys" [05:11] menn0: sec, trying to find something. there's something about cloud-config not being completely secure. I think it might just be because anything on the machine can query the metadata [05:11] axw: ok, that's worth following up [05:12] menn0: eh can't find it, I'm sure there's a bug aobut it tho. but basically on AWS and others, you can get the cloud-config metadata by GETting a statically defined URL [05:13] menn0: no ACLs other than being on the machine [05:13] axw: ok that would be bad [05:13] so if you wanted to run something non-privileged on the machine, it could easily become privileged [05:13] menn0: so, what I was thinking is that we'd pass through ssh_keys, and then have the machine agent generate a new one on startup [05:14] menn0: the initial ssh_keys one is really only necessary for the bootstrap machine tho [05:14] axw, wallyworld: FWIW generating both the RSA and DSA keys at the default (recommended) bit sizes takes less that 0.2s (combined) on my machine [05:15] menn0: and for dense openstack deployments with containers, there may be several spun up at once [05:15] (and containers can access the metadata URL) [05:16] i guess the host controller needing to run that many containers would cope [05:16] menn0: not sure if what I said about bootstrap & ssh_keys made any sense, let me know if a hangout would help [05:17] axw: it mostly makes sense but a hangout would be helpful [05:17] menn0: ok just eating lunch, give me 15 mins or so [05:17] axw: np [05:22] wallyworld: sorry, got distracted and forgot to check your merge again [05:23] axw: tis ok. there was a spurious failure, i resubmitted [05:23] would be good to have the CI bot sending messages in here [05:24] axw: +1 === mup_ is now known as mup [05:38] menn0: https://plus.google.com/hangouts/_/canonical.com/juju-ssh-keys?authuser=1 [05:38] if you're free now [05:40] axw: coming! [05:50] menn0: wrong button [05:50] brb [05:50] LOL [05:50] what have I done [06:01] yay! master can bootstrap on xenial again :) [06:13] there's another issue today though [06:13] 2016-04-13 06:12:08 ERROR juju.worker.dependency engine.go:526 "proxy-config-updater" manifold worker returned unexpected error: unknown watcher [06:13] id (not found) [06:48] axw: this shit just keeps popping up everywhere :-( http://reviews.vapour.ws/r/4564/ [06:49] wallyworld: :/ looking [06:49] ta [06:54] wallyworld: LGTM [06:54] ty [06:59] wallyworld: hey, thanks for fixing that juju-mongodb3.2 issue! [06:59] dimitern: np, one more to go, landing now, affects restore [07:00] wasa joint effort with horatio [07:00] wallyworld: I've found a new proxyupdater issue on master, which I'm testing a fix for - such a fix should qualify for landing on master, but first I'll file a bug report for it [07:01] ok. we need to get a bless within 6 hours or so, maybe a bit more [07:13] Bug #1569725 opened: proxyupdater api facade does not set NotifyWatcherId in the result [07:23] frobware, voidspace, dooferlad: http://reviews.vapour.ws/r/4565/ fixes the bug above, please take a look [07:40] dimitern: looking [07:44] dimitern: apart from the test changes is the fix just in proxyupdater.go? [07:44] dimitern: ah, I see it. [07:50] dimitern: +1 from me. [07:51] * dooferlad will be back a bit later [07:51] frobware: the fix is mostly in the api/proxyupdater changes [07:52] dooferlad: are we doing 1:1 in 10? [07:53] dooferlad: oh, it's 9:30... [07:53] frobware: indeed [07:53] * dooferlad really is going now [07:53] frobware: I've observed at last what you're describing - lxd containers coming up with 1 NIC (as reported by lxc info), but inside having multiple NICs; applying the machine profile we create directly on the container manually seems to fix this (lxc info shows 8 NICs as expected) [07:55] dimitern: it depends... the output from lxc list (with netinfo) can dribble in [07:55] dimitern: if you can get to the same stage then don't apply the profile manually, just sit on watch lxc list for a bit [07:55] frobware: so not seeing all nics there is ok as long as they actually work - that's what you're saying? [07:55] :) [07:56] dimitern: there's the two case: 1) we don't get anything. 2) as you've descibed, but I'm not 100% sure this is because the info from lxc list can dribble in over time. [07:57] dimitern: I want to change the way we apply the profile anyway. it can be applied to the container directly (need to expose some LXD api) [07:57] frobware: I've seen the dribbling you're talking about [07:58] dimitern: want to sync for 30 mins? [07:58] frobware: don't you have another call? [07:58] dimitern: would be interested in any results from LXD bashing [07:58] dimitern: yep, but in 30mins [07:58] frobware: I'm not there yet (heavily testing lxd deployments in w/ w/o charms) [07:59] dimitern: sure, but (gentle nudge) let's sync anyway. :) [07:59] frobware: so if you don't mind I'd like to get the AC removal well tested first and finish it, so I can focus on LXD testing [07:59] i.e after standup will be better [07:59] dimitern: sure [08:00] frobware: cheers [08:02] jam: ping [08:02] frobware: hi [08:02] jam: I'm guessing my PR for detectSubnet() does the wrong thing... [08:03] frobware: so its entirely possible that we're using it incorrectly, but the intent of detectSubnet is find a subnet that is not in use [08:04] and the highest one we found +1 is not in use, but clearly the highest one we find *is* in use. [08:04] jam: so then my fix is not a fix... [08:04] jam: it would read better as detectSubnetNotInUse() :- [08:04] frobware: renaming "detectSubnet" to findUnusedSubnet would probably be a step in figuring out what is wrong [08:04] frobware: jinx? [08:05] paraphrase jinx [08:05] 2016-04-13 08:04:42 DEBUG juju.cmd.juju.commands ssh.go:180 proxy-ssh is false [08:05] jam: OK, I'll revert and look at that this morning. [08:05] wtf?! since when [08:05] can't juju ssh 1/lxd/0 because of that [08:05] dimitern: there was a big discussion about it recently [08:05] only juju ssh 0 works [08:06] jam: and it was decided proxy-ssh is bad? [08:06] dimitern, jam: interesting. pretty sure I was ssh'ing to 0/lxd/0 yesterday [08:06] dimitern: the main motivation was that in multi-model controllers, proxy-ssh means you have to have ssh access to the API server. [08:06] which is very bad for security [08:06] dimitern: frobware: is "next" unblocked yet? (as in - do lxd tests pass again, can we land branches) [08:06] If I let you create a model on my controller, that doesn't mean I want you to have SSH access to root my API server. [08:07] ah, I see updates on master and next [08:07] jam: it depends whether 0/lxd/0 has a dns-name coming from a vlan (e.g. 10.100.19.0/24), which might only be accessible via the controller [08:07] so probably [08:07] frobware: dimitern: well aren't X/lxd/Y supposed to be on the routable subnets anyway? [08:08] dimitern: so for people that have complex networking setups, they can do the jumps themselves, which is slightly unfortunate, or they can pass "--proxy" to juju ssh [08:08] jam: yes, though it's nominally easier (for me) to think of them as logical machines (i.e., 0/lxd/3) then to dig (ho) out its IP addr [08:08] but the default is that you can only use it if you are an admin on the controller. [08:08] jam: they are, but what we deem the machine's private address for ssh still uses the legacy scoped selection + sorting [08:08] nice! --proxy it is then [08:08] yay! works [08:09] dimitern: so for people that aren't admins "--proxy" will fail (which is why we set it to default false) [08:09] but if you have access to the API server its available for you [08:10] jam: yeah, this makes sense, but unfortunately also means we have to sort out the private address picking now [08:10] dimitern: well, we should generally fix that anyway, right ? :) [08:10] jam: indeed :) [08:13] dimitern, jam: http://reviews.vapour.ws/r/4566/ [08:14] frobware: shipit [08:14] ty [08:14] 'this change is not incorrect' is confusing [08:14] but the change itself is good [08:14] frobware: LGTM [08:16] jam: :) the change is incorrect (was in my head, but clearly not my fingers). [08:16] wallyworld: hey, is there a way to disable the model switching to "default" from "admin" after bootstrap? [08:17] it's a bit frustrating to do parallel bootstrap tests on different controllers [08:17] axw: ^^ [08:17] dimitern: no, that was the intended behaviour [08:18] dimitern: not at the moment. that shouldn't stop parallel bootstrap though [08:18] we don't wanted users in general doing things with the admin model [08:18] wallyworld: it's an odd experience first time round though [08:18] wallyworld, axw: makes sense, but it's a bit inconvenient to have to pass -m local.xxx:yyy each time.. I guess needs to be scripted [08:19] frobware: why is it odd? you deploy workloads to hosted models [08:19] dimitern: juju switch is your friend [08:19] the only reason I do it it's because I don't want to wait yet another 5m after bootstrap to be able to add a container to a machine in the default model [08:20] wallyworld: as a new user I bootstrap and then try and add-machine lxd:0 and it doesn't. I guess most people deploy workloads. [08:20] wallyworld: switch doesn't work well with parallel bootstraps - each does a switch at the end [08:20] wallyworld: heh, dimitern's comment is what I try to avoid too [08:21] frobware: the default hosted model is empty that is true [08:21] machine 0, the old so called bootstrap machine, is not something we want users to think about [08:21] wallyworld: and I still don't understand how I can get upgrade-juju to work in the default model [08:21] the controller is an inrernal detail [08:22] I almost wish to have a --use-admin-model flag to bootstrap :) i.e. I want you to not switch to "default" as I know what I'm doing [08:23] perhaps we could special case --default-model=admin? (pretty sure that is broken atm, now that I think of it) [08:23] so if you did "juju bootstrap -d admin", then you would have no secondary model [08:23] axw: that sounds exactly what I'd use [08:23] frobware: i think you need to upgrade the controller tools first [08:24] frobware: hosted models need to use tools compatible with their host [08:25] wallyworld: ok, let me spin up a machine in the default model and report on my upgrade step. my experience my actually be a symptom of bug #1569361 as I'm generally only trying to use LXD containers [08:25] Bug #1569361: LXD containers fail to upgrade because the bridge config changes to a different IP address [08:32] babbageclunk: did your storage stuff land? [08:37] babbageclunk: I see that it didn't yet [08:37] babbageclunk: I have a branch with deployment status implemented for maas 2 - trying it with your storage stuff merged in [08:37] ok, it fails because a default zone is specifed [08:38] voidspace: No, I was out by the time they got Jenkins unblocked. Do you know, is the next branch still going? [08:38] babbageclunk: stuff has landed on next - so I assume we're still using it [08:38] voidspace: How do I retry a merge - is it just $$retry$$ [08:38] thumper: should we be targetting next or master [08:38] voidspace: cool, thanks. [08:38] babbageclunk: $$anything$$ [08:39] voidspace: really? I thought there were different commands? [08:39] babbageclunk: nope [08:39] it's a regex match [08:42] voidspace: ok, fixing merge conflicts and kicking it off again. [08:44] voidspace: I've heard mention of $$jfdi$$ though - is that a special case? [08:46] babbageclunk: that is special [08:46] babbageclunk: so, *something* is adding a zone of "default" to my StartInstance call - which is causing the AllocateMachine call to fail [08:46] this didn't happen before I don't think [08:46] but I haven't found where it's added yet [08:47] ah no - the machine was deployed, so there actually were no machines available [08:47] the error message was just confusing! [08:48] voidspace: ok, that was an easy merge. Just running the tests and pushing now, then I'll queue the merge again. [08:48] ok, now I get: "ERROR could not record instance in provider-state: cannot record state instance-id: Not Found [08:48] - 4y3h7p" [08:48] babbageclunk: so somewhere we're being inconsistent with what we use as an id [08:49] however, it looks like bootstrap continues [08:50] babbageclunk: this is with your storage branch and my branches with storage/constraints/deploymentStatus [08:52] cloud-init is running [08:53] dimitern: frobware: so with the latest of my branches (not yet tested) and babbageclunk's not-yet-merged storage branch [08:53] frobware: dimitern: bootstrap gets as far as deploying the node (maas status changes to Deployed) [08:53] voidspace, babbageclunk: great! [08:54] waiting to see if it gets any further - bootstrap has not yet returned. It has already reported one error, which will probably be fatal later but it hasn't stopped bootstrap. [08:55] I'll give maas2 bootstrap a try today then [08:58] ERROR failed to bootstrap model: bootstrap instance started but did not change to Deployed state: instance "4y3h7p" is started but not deployed [08:58] not true - it is Deployed [08:58] probably caused by: [08:58] ERROR could not record instance in provider-state: cannot record state instance-id: Not Found [08:58] - 4y3h7p [09:00] axw: off to soccer, i had to require that restore fix, pprpof test error, if it fails again and you notice, could you requeue for me? [09:00] wallyworld: ok, gotta go make dinner shortly though [09:01] np, only if you notice [09:01] i'll check later [09:01] looks like restore is f&cked [09:01] errors setting restore status [09:01] :/ [09:01] txn assert error [09:01] but Assert is empty [09:01] need to diagnose [09:19] fwereade_: I have pulled Status and Life back out put params.Model, and made worker/undertaker responsible for setting model status. if you have time, PTAL [09:20] axw, will do, tyvm [09:21] axw, while they're not *immediately* relevant, did you have any thoughts re the philosophy-of-status essay in my reply? [09:33] babbageclunk: if you merge voidspace/maas2-deployment-status into your storage branch [09:33] babbageclunk: and then attempt to bootstrap [09:33] babbageclunk: you should see the "cannot record state instance-id" error [09:34] babbageclunk: it would be great to work on that [09:35] babbageclunk: isn't there also your interfaces work to complete? (a MAAS 2 version of maasObjectNetworkInterfaces [09:36] voidspace: yup - I need to get stop-instances and interfaces up to date with next and do PRs for them. [09:36] babbageclunk: maas2Instance.volumes also needs implementing - and *may* be the cause of this failure (unlikely though) [09:36] voidspace: then I'll grab your branch and see if I can find the instance-id problem [09:43] babbageclunk: cool === cargonza_ is now known as cargonza [09:55] dimitern: that system id is correct [09:55] dimitern: the deployed machine is 4y3h7p, the rack controller is 4y3h7n [09:57] voidspace: hmm, but does the provider-state file get created ok? [09:57] dimitern: so it's possible a bug in the way we fetch the instance (machine) [09:57] dimitern: no idea [09:57] dimitern: it needs investigating [09:57] voidspace: and then we get 404 trying to get instance 4y3h7p? [09:57] dimitern: but the id is correct [09:58] dimitern: well, that's what it looks like on the basis of no investigation beyond checking the system id of the machine [09:58] fwereade_: sorry, went afk. I do like the philosophy of separating collection, and summarisation/representation/visualisation. part of the reason why I removed the migration statuses from my PR [09:58] voidspace: maybe we're not passing agent-name and it doesn't find it.. [09:58] dimitern: babbageclunk is going to look into it [09:58] dimitern: we are passing agent name [09:58] voidspace: ok [09:58] dimitern: well, at the juju level [09:58] it's possible gomaasapi screws things up :-) [09:58] but we can check that too [09:58] fwereade_: i.e. because migration and lifecycle are quite different, and so their statuses should probably be recorded separately [10:01] fwereade_: what I don't have a good idea about is how to represent all of that to the user. but at least we can make that call at the UI level [10:12] frobware, dimitern, frobware: StopInstances branch - http://reviews.vapour.ws/r/4567/ [10:13] (menn0 told me the trick of removing changes from the branch it was chained off - sorry about the other times it wasn't done!) [10:27] babbageclunk: LGTM [10:28] babbageclunk: urm.. I meant the other one of yours, looking at the one above now [10:28] Bug #1569802 opened: add support for "decrement-container-interfaces-mtu" config option [10:33] dimitern: awesome, thanks [10:36] babbageclunk: reviewed [10:58] mgz: aren't we using go 1.6 for merge gating now? just noticed "go version go1.2.1" still used by github-merge-juju [10:59] :( My maas2-storage merge run got a failure in worker/resumer tests. [11:01] They pass when I run them locally - anyone else seeing that? [11:02] babbageclunk: yeah, that's one of the flaky ones - seen it before, just $$retry$$ [11:46] frobware, babbageclunk: fyi - https://github.com/juju/juju/pull/5130 if you have no objections, I suggest merging this [11:49] dimitern: makes sense to me. [11:50] dimitern: I got a failed CI unit test run this morning for my LXD revert - do you know if this was already present in next? [11:52] frobware: which one? [11:52] dimitern: http://juju-ci.vapour.ws:8080/job/github-merge-juju/7359 [11:56] frobware: that looks like it's on next and also a mongo-related flakiness [11:57] dimitern: yeah, submitted again [12:11] Bug #1555211 changed: Model name that "destroy-model" accepts doesn't match "list-models" output <2.0-count> [12:18] (late) morning [12:18] babbageclunk: did you make any progress tracking that issue? [12:21] voidspace: no, sorry - was making review changes and wrangling branches. [12:22] voidspace: I just got up to trying out your branch now, but when I try bootstrapping I get "ERRIR Requested map, got ." [12:23] fwereade_: i am stupid, can i ask you to look at something? [12:24] i sort of hope for ann answer without having to dig too deeply and debug [12:24] voidspace: Ed Hope-Morley was just here. [12:26] babbageclunk: you need to set the MAAS2 feature flag [12:26] babbageclunk: ah, cool [12:26] babbageclunk: export JUJU_DEV_FEATURE_FLAGS=maas2 [12:27] voidspace: thanks, was just asking that. [12:27] babbageclunk: otherwise it assumes maas 1 (we put the maas 2 work behind a feature flag) [12:28] voidspace: now I get a panic! [12:29] voidspace: nil-pointer dereference. Do I need to be on a later version of MAAS? [12:31] voidspace: http://pastebin.ubuntu.com/15809705/ [12:33] voidspace: Oh, is that because the storage is nil? Does this branch have the storage changes in it? [12:33] babbageclunk: no - you need my branch plus storage [12:34] babbageclunk: is storage merged? [12:34] voidspace: ok [12:34] voidspace: No, grr - and the build for it just failed, I think because interfaces just got merged. [12:35] voidspace: 5th or 6th time's the charm! [12:36] :-) [12:43] voidspace: the fun bit is, I've still got a merge for stop-instances in the queue. That's based off storage, and I've updated it now so storage might get merged with that. [12:45] voidspace: What happens to the storage PR in that case? Will it just become merged automatically when all of its changes are already in the destination branch? [12:45] voidspace: Right, my head's feeling a bit explody, I'm going for a run. [12:47] babbageclunk: if you set stop-instances to merge and that is based off storage then yes - merging stop-instances will merge storage I think [12:47] babbageclunk: but as you want to merge both that doesn't matter [12:48] babbageclunk: not sure what will happen to the PR, it will *probably* get marked as merged [12:48] voidspace: yup, that's all good. [13:18] * babbageclunk is actually going for a run now === babbageclunk is now known as babbageclunk-afk [13:19] wallyworld, sorry, missed you, in meeting [13:19] wallyworld, how can I help, if you haven't already solved it? [13:20] fwereade_: there's a txn that returns Aborted - it appears to have just started to act up [13:20] func (info *RestoreInfo) SetStatus(status RestoreStatus) [13:20] in state/backups/restore.go [13:21] the initial state being passedin is Pending [13:21] and the txn fails, so restore aborts [13:21] i can't see why off hand [13:21] if i comment out the assert, it works [13:21] initially, it's an empty assert for pending [13:22] ah, not state/backups, just state [13:22] wallyworld, hold on, ringing a bell... [13:23] wallyworld, heh, there might be a few things going on [13:24] it's such a simple txn [13:24] wallyworld, the failure would seem to indicate that something else has already set the status away from pending [13:24] fwereade_: not really [13:25] for pending, the assert is empty [13:25] ah, but initially there won't be a doc [13:26] wallyworld, d'oh, yes [13:26] so it should be Insert [13:26] so how the fuck did this ever work [13:26] the code is from 2014 [13:27] wallyworld, heh, I actually think I must have broken it [13:27] maybe something else inserts a doc initially [13:27] oh, when? [13:28] wallyworld, creating a RestoreInfoSetter would implicitly set a status, which would trigger txn failures when two things created them at once [13:28] wallyworld, especially infurating when one of them was only creating the "setter" in order to get, grrmbl [13:28] see https://bugs.launchpad.net/juju-core/+bug/1569467/comments/2 [13:28] Bug #1569467: backup-restore loses the hosted model [13:29] that comment has a timeline of sorts [13:29] wallyworld, but, yeah, if there wasn't *something* else creating the restore status, I don't see how that could have ever worked [13:29] fwereade_: a quick code search shows nothing now is creating an initial entry [13:30] cherylj: see scrollback - that restore issue has been identified as due to a refactoring earlier - i don't think it can be fixed in time for beta4 [13:31] wallyworld, concur [13:31] cherylj: we will just have to release note it and fix for the next release [13:32] wallyworld, but, well, I made that change a long time ago and I'm 99% sure it was on MADE-model-workers pre-bless... maybe something else got mangled in a merge? [13:32] fwereade_: or CI didn't pick it up [13:32] because the reason it has started failing is due to newer multi-model tests [13:33] wallyworld, right, but I'm with you in that I can't see how it could have worked at all [13:33] where the hosted model is created with a name named after the test [13:33] wallyworld, (by the way, fun feature of that file: const currentRestoreId = "current") [13:34] fwereade_: one option maybe just to get it working is to comment out the restore status setting [13:34] wallyworld, ha [13:34] as in restore will work but status will be unknown; i'd have to check to see where status is used [13:35] wallyworld, yeah... you know, I'm actually not sure it is even used [13:36] fwereade_: looks like a watcher is used to set a restoring flag on the machine agent [13:37] so we can block api calls [13:37] so it sort of is needed i think [13:38] wallyworld, yeah, sort of, even though that flag is not goroutine safe, and doesn't appear to have any way to dc existing connections [13:39] wallyworld, wait a mo [13:39] wallyworld, if the assert is empty [13:40] wallyworld, I don't think that'd ErrAborted, would it? [13:40] fwereade_: tl;dr; no quick fix for beta4 for inclusion in xenial [13:40] given we need to get a CI bless asap now [13:40] wallyworld, surely it would run and try to modify a document that wasn't there and fail silently? [13:41] wallyworld, yeah, agreed [13:41] fwereade_: that's what i thought, but it seems it does because the doc doesn't exist and Insert is not used [13:41] wallyworld, oh, that scenario explicitly gives us ErrAborted? [13:41] i'm going by what the logs report [13:41] appears so [13:41] according to the logs [13:42] either way, it's broken [13:42] fwereade_: i hacked up a version where for pending, the first status set call, i made the Assert nil, and that seemed to allow it to get futher [13:43] so in that case it may have silently failed, because the next status set call to restoring failed [13:43] wallyworld, it'll still fail later though, won't it? FinishRestore will never succeed [13:43] since that call did have an assert [13:43] yep [13:44] Bug #1569898 opened: cmd/pprof: sporadic test failure [13:44] so the restore nazi says "no restore for you in beta4" (with apologies to Seinfeld and the Soup Nazi) [13:45] cherylj: when is the drop dead cutoff for beta4? [13:45] ha, yesterday [13:45] heh [13:45] we need to get a release out asap [13:46] this whole restore situation is confusing to be because it works for me on joyent [13:46] to me, not to be [13:46] heh [13:48] wallyworld: in your recreate, did you see anything weird for mongo in syslog? [13:49] cherylj: didn't look in syslog - the error is quite apparent from the juju logs [13:49] cherylj: are you sure it worked? [13:49] it may have appeared to work [13:49] it would look like it did from juju status [13:49] but it would not have [13:50] wallyworld: I could run juju status on the hosted model and it showed me the right info (correct machine / service) and list-models showed the right info [13:50] I bootstrapped with a different model name for the default model [13:50] and that was right too [13:51] hah ok, i did not see that in my testing, it failed deterministically, and the cause is now apparent from the code [13:51] cherylj: did you use upload-tools [13:51] your restore may have grabbed beta3 tools [13:51] which may not have the problem [13:52] I did upload-tools [13:52] er wait [13:52] not for restore, you're right [13:52] that may well be the reason [13:52] thank goodness there's some sense to the situation then [13:52] heh [13:52] yeah, agreed, we needed to be able to explain it [13:53] but sadly we will go to beta4 without restore working [13:54] cherylj: i landed a few branches today to fix various issues with restore and mongo3.2; so all that should be ok now, but of course now that that is fixed, we see this other issue, although i did much of my testing on trusty containers for the lastest failures [13:56] Bug #1569898 changed: cmd/pprof: sporadic test failure [14:02] wallyworld: do you know where I should be looking for the code I need to hack for upload-tools? [14:02] Bug #1569898 opened: cmd/pprof: sporadic test failure [14:02] Bug #1569914 opened: help text for juju show-controller needs improving [14:02] natefinch: to do which bit? [14:03] natefinch: did you see john's explanation? [14:03] you may not need to hack anything [14:04] wallyworld: I saw the text, but it doesn't really help me. [14:04] doesn't it explain why amd64 is being chosen? [14:05] the provider is not asking for the right arch === dimitern` is now known as dimitern [14:05] i only skimmed it [14:05] wallyworld: it doesn't say where the code is that it's talking about. The provider is a ton of twisty code [14:05] i have not looked at it [14:06] wallyworld: also, when he says images, does he mean the OS image or the tools image? [14:06] os image [14:06] wallyworld: well, ok, I know that's incorrect [14:07] wallyworld: the alias of the OS image may be ubuntu-trusty, but the arch is correct... we can see that in lxd... and the fact that the OS image runs at all means it's the right arch [14:07] wallyworld: the problem is that cloud-init is requesting the wrong tools version [14:08] wallyworld: e.g. Attempt 1 to download tools from https://10.0.3.164:17070/tools/2.0-beta3-xenial-amd64... [14:08] Bug #1569914 changed: help text for juju show-controller needs improving [14:09] right ok, so that last bit is the binary version string [14:09] wallyworld: what I have been trying to figure out is how we generate the URL to request.... it's just really hard to trace it back just by looking at the code, and I haven't been able to figure out how to use upload-tools and still let the code request stuff from streams [14:09] i gave you instructions :-) [14:10] which i have not tried, i think they will work [14:10] wallyworld: I don't know where any of that code lives [14:10] wallyworld: I spent a few hours trying to find that code last night [14:11] i'd have to search for it to be exact. there's a ToolsStorage struct in state which is used to store the tools [14:11] then i'd sesrch for what increments the version Build field [14:11] to see how to stop that ebing incremented [14:12] you could also trace what sets the cloud init config [14:12] see see where it is telling it to download tools of a certain arch [14:13] I have tried that, too. [14:13] wallyworld: cherylj: doh... sorry i inverted the fix to that bug [14:13] katco: you had your glasses on back to front [14:13] heh [14:13] wallyworld: lol [14:14] Bug #1569914 opened: help text for juju show-controller needs improving [14:17] natefinch: i just did a quick code search - did you look at func InstanceConfig() in instanceconfig.go ? [14:17] that's where the cloud init tools are set up i think [14:17] natefinch: some (possibly) interesting factoids [14:17] 1 - This test had been passing on arm64 until the host was updated to xenial [14:18] frobware: hi, what's the current state of lxd/juju? [14:18] frobware: i'm finally done with some customer stuff and can get back to looking at it now [14:18] (well, i have to fly back to denver, but i can look this afternoon/late evening) [14:19] 2 - the problem with s390 could be attributed to s390x not being an valid arch in beta3 (see bug 1554675) [14:19] Bug #1554675: Unable to bootstrap lxd provider on s390x [14:21] natefinch: so looking at that code, it appears the tools arch comes from the machine's recorded HardwareCharacteristics, which may be nil [14:22] this is just a guess - i've not seen this code before, it's all been refactored [14:22] so when the container is created, if the hardware characteristics do not include the arch, that could explain it [14:23] but i'd need to trace it through [14:23] you don't need to hack around upload tools to do that [14:23] just add extra debugging [14:25] cherylj: natefinch: using upload tools would get around that invalid arch issue in beta3 [14:25] wallyworld: for the s390 case at least. Don't know if it's the same for arm [14:26] cherylj: arm should have been properly defined in juju since forever === babbageclunk-afk is now known as babbageclunk [14:27] yeah, I know :( [14:27] but, it may not make a difference [14:28] how could they have bootstrapped a s390x lxd provider anyway? [14:28] I don't know... just spewing data points [14:28] wallyworld: cherylj: since agents use deb archs "dpkg --print-architecture" on the client host will show probbly arch for the container [14:28] voidspace, dimitern: gah! Is allWatcherStateSuite.TestStateWatcherTwoModels another flaky test? [14:29] yes [14:29] :( [14:29] oh, cherylj, that probably wasn't to me, was it. [14:30] babbageclunk: it was, sorry :) [14:31] cherylj: d'oh [14:31] cherylj: ok, requeuing. Thanks! [14:31] wallyworld: where's the code that gets the arch from the machine? [14:32] can't recall, i 'll look [14:32] frobware: i just tried a deploy with master on GCE, looks like it's still broken with no network. i know you said you had a patch for this in progress, what's the state of it? === tasdomas` is now known as tasdomas [14:34] cherylj: really quick review for inverting that bug fix? http://reviews.vapour.ws/r/4571/diff/# [14:34] katco: we should probably cancel the $$merge$$ request on my revert, then [14:35] sinzui: can we do that? ^^ [14:35] natefinch: i've traced it back from the state docs - there's a SetInstanceInfo() API on the machine facade called by the provisioner [14:35] it's still in the queue [14:35] cherylj: me trues [14:35] natefinch: and that gets it's infro from StartInstance() [14:36] cherylj: whatever is easier [14:36] cherylj: your PR? [14:36] natefinch: this is maas right? [14:36] wallyworld: lxd [14:36] sinzui: https://github.com/juju/juju/pull/5131 [14:36] wallyworld: sounds like the problem may be that startinstance isn't setting the arch when it should be [14:37] yes, which is what john said i think [14:37] cherylj: I don't see it in the queue to cance [14:37] wow 2 hours ago [14:37] yeah, there's quite a backlog for merging [14:38] cherylj: revert-upload-tools is canceled [14:38] thanks, sinzui :) [14:38] sinzui: ty! [14:38] natefinch: looks like lxd does the right thing at a quick look [14:38] but all this can be debugged [14:39] wallyworld: yeah, doing that now [14:39] ok, good luck, i'm out of here for a few hours to zzzzzzzzz [14:39] wallyworld: good, talk to you tomorrow [14:40] ttyl [14:42] sinzui, cherylj, can this get bumped to a higher priority? https://bugs.launchpad.net/juju-core/+bug/1567676 [14:42] Bug #1567676: windows: networker tries to update invalid device and blocks machiner from working [14:42] the machiner is broken [14:44] bogdanteleaga: sure I can target for rc1 [14:45] voidspace: ok, so I see the power-settings clearing bug Tim was talking about, I think. [14:46] babbageclunk: heh, much better not to set the power :-) [14:47] cherylj: any suggestions on what to pick up next? [14:47] you can take bogdanteleaga's bug he just mentioned :) [14:47] cherylj: let me tal [14:47] I just added a card for it [14:48] cherylj: can i reproduce this on linux, or do i need a windows installation? [14:50] katco: I suspect it's windows only [14:50] cherylj: do we have any windows installations to test against? [14:51] katco: there is in CI. sinzui, abentley - can katco access a MAAS that has windows images to deploy? [14:52] cherylj: katco sure. I can add katco to munna [14:52] sinzui: cherylj: ty... any documentation on how to utilize this? [14:53] katco: I have no documentation. These masses are all new to me [14:54] sinzui: ok [14:54] sinzui: do i need to be on a vpn or anything? [14:54] katco: cloud-city and the CI runs have configureations in them [14:55] katco: you will not, you well enter as the CI bot with all its privs [15:02] natefinch: standup time [15:03] tych0: it's currently in master [15:03] alexisb, tych0: is is possible to get GCE creds to poke around? [15:07] alexisb: ping [15:09] dimitern, omw [15:10] ok [15:14] Bug #1569948 opened: help text for juju list-machines needs improving [15:14] Bug #1569949 opened: log spam: "skipping observed IPv6 address ..." [15:25] katco, ericsnow: and once I figure out how to do this dance with upload-tools, I'll write it down so others can benefit from it. [15:27] natefinch: sounds good [15:28] natefinch: nice [15:28] ericsnow: natefinch: btw they're doing something to the sidewalk again -.- i'm hopeful they don't cut anything this time [15:28] katco: buena suerte [15:34] I'm deploying a charm I'm working on to private openstack clouds. In doing so, I need a few things (endpoints, username/pw, token's), etc. [15:35] Currently I put them in config.yaml and use config-get to access them when the charm deploys but since they differ, per install this isn't going to work [15:35] is there a better way to go about this? [15:36] terje: you might have more luck in #juju [15:37] terje: that's where the charming folks hang out, and i'm sure they've run into that problem before [15:37] perrito666: ping? [15:37] cherylj: pong [15:37] yeah, folks on this channel just aren't as charming [15:37] perrito666: looks like the mongo path changes broke windows unit tests: http://paste.ubuntu.com/15813652/ [15:38] cherylj: whaaa? I am pretty sure I +build !windows to that test [15:38] Bug #1569963 opened: log spam: image metadata, apiworkers manifold worker, KVM [15:38] Bug #1569969 opened: No way to set credential for current model. [15:39] cherylj: there it is https://github.com/juju/juju/blob/master/mongo/internal_test.go#L4 I wonder what is going on [15:39] natefinch: do you know if build flags are valid for tests? [15:39] lulz, ok. thanks. There's never anyone active in #juju but I'll give it a shot. [15:42] perrito666: the failing test is in state/backups/internal_test.go [15:42] perrito666: absolutely [15:42] which does not have the windows build flag [15:42] cherylj: oh, I see, I did not touch that why would it be failing [15:42] (sorry, I cut that part off from the paste) [15:43] * perrito666 goes check [15:44] here's the full paste: http://paste.ubuntu.com/15813799/ === mup_ is now known as mup [15:44] cherylj: I see no state/backups/internal_test.go [15:45] cherylj: actually, grep says that test is not here [15:45] https://github.com/juju/juju/blob/master/state/backups/internal_test.go#L28 [15:46] cherylj: ohhh, I see, Ian most likely merged this while my patch was on the queue [15:47] there it is [15:49] crickets in #juju .. [15:49] marcoceppi: anyone that can help terje out? [15:50] terje: katco hi, I'll answer in #juju [15:50] marcoceppi: thx dude [15:50] yea, thanks! [15:59] gah... stupid construction! power keeps cutting out [15:59] thank god the internet seems unaffected [16:01] praise be to the internet [16:02] may we all bask in its warm glow and loving embrace [16:02] so sayeth we all === katco` is now known as katco [16:03] sinzui: we have 2.0beta3 in streams somewhere, right? [16:03] or mgz_ ^ [16:03] natefinch: devel streals [16:03] devel streams [16:04] sinzui: what's the url for that? I'm trying to write some documentation and I can't actually find those tools in streams. I'm sure I'm just missing them somewhere [16:04] natefinch: agent-stream: devel [16:05] no need to set agent-metadata-url [16:05] sinzui: no no... what do I type into chrome? [16:05] sinzui: I just want to see what's available [16:06] natefinch: This is the official streams location https://streams.canonical.com/juju/tools/streams/v1/index2.json [16:08] sinzui: where's the actual list of available images? I don't know how to parse that json [16:09] natefinch: images? oh, that would be cloud-images.ubuntu.com [16:09] sinzui: s/images/tools [16:10] sinzui: for example: https://streams.canonical.com/juju/tools/proposed/ [16:10] sinzui: except, with 2.0beta3 in the list [16:10] sinzui: this is for human consumption, not machine [16:10] natefinch: that is obsolete from a long time ago [16:11] sinzui: somewhere there is a link to juju-2.0beta3-trusty-amd64.tgz .... can you pleaase just find that url for me? [16:12] natefinch: https://streams.canonical.com/juju/tools/streams/v1/index2.json is what juju uses [16:12] natefinch: in that file is a path the the stream you want to use, "devel" [16:13] natefinch: a devel juju will then read https://streams.canonical.com/juju/tools/streams/v1/com.ubuntu.juju-devel-tools.json [16:16] sinzui: what is the base of the relative path that they give? e.g. "agent/2.0-beta3/juju-2.0-beta3-trusty-amd64.tgz" ? [16:17] natefinch: the agent-metadata-url used by Juju. Juju's default is https://streams.canonical.com/juju/tools [16:18] Bug #1569982 opened: pathsSuite.TestPathDefaultMongoExists failing on windows [16:18] sinzui: ahh, thank you [16:19] natefinch: The directories are browsable on that server :) https://streams.canonical.com/juju/tools/agent/ [16:19] sinzui: yes, that's the URL I was looking for [16:23] cherylj: bug 1559715 is about the model instances being left behind [16:23] Bug #1559715: restore-backup is unreliable [16:23] cherylj: I will re-title it [16:29] perrito666: https://github.com/juju/juju/pull/5135 [16:29] sinzui: yeah, good idea [16:46] jam: frobware: https://github.com/juju/juju/pull/5136 [16:47] tych0: your testing for this problem is from the tip of master? [16:47] frobware: yes [16:48] tych0: ok, makes perfect sense [16:48] frobware: although i haven't tested against "next", but it looked to me like they were the same [16:48] tych0: the reason I was asking is because there is (or has been a race) in that part of the code [16:49] right, i assume that is what you were working on? [16:49] tych0: the "if len(networkConfig.Interfaces) > 0 " can happen very occasionally on MAAS [16:49] you mean it is == 0 even on maas? [16:49] tych0: or, put another way, it's possible for len == 0 even on MAAS... [16:49] tych0: let me dig out the bug [16:49] frobware: right, i'm not testing on mass, but gce [16:50] and it is always broken for me [16:50] tych0: bug #1564395 [16:50] Bug #1564395: newly created LXD container has zero network devices [16:50] that's what i was complaining about a few weeks ago [16:51] tych0: can you send me the machine-0.log without your fix? [16:51] Bug #1566531 changed: Instances are left behind testing Juju 2 [16:51] Bug #1570009 opened: pathsSuite.TestPathDefaultMongoExists fails because of windows path [16:51] frobware: sure [16:51] (i'm on a plane right now so it might be slow :) [16:51] tych0: planes are fast. :-D [16:51] frobware: but i think without the above juju/lxd container type is basically broken everywhere, no? [16:52] can I get a quick review? http://reviews.vapour.ws/r/4574/ [16:52] tych0: yes-ish. what's not clear is whether that's due to the bug ^^ -- ie., by the time we get to "your patch" it was going to be empty come what may [16:52] frobware: http://paste.ubuntu.com/15815783/ [16:53] frobware: oh. does juju provide network config in all cases? [16:53] frobware: i read somewhere in the code that it was opitonal [16:53] maybe in the struct InstanceSpec comments or something [16:57] tych0: checking... === redir is now known as redir_brb [16:59] frobware: ah, it's on StartInstanceParams [17:00] * babbageclunk is BACK! [17:21] Bug #1570009 changed: pathsSuite.TestPathDefaultMongoExists fails because of windows path === redir_eod is now known as redir [17:31] tych0: followed up in RB [17:31] perrito666: review, pretty please? http://reviews.vapour.ws/r/4574/ [17:31] cherylj: updated the release notes but will follow-up again tomorrow in the cold light of day... [17:32] cherylj: ship it [17:32] tyvm, frobware! [17:41] frobware: i'm not sure i understand your comment [17:42] frobware: since we don't create those devices at the moment, don't we want to use the default profile whether they're nil or not? [17:51] Bug #1305509 changed: state/watcher: possible data race in commonWatcher [17:51] Bug #1517747 changed: provider/joyent/gomanta: data race <2.0-count> [17:51] Bug #1570031 opened: Cannot bootstrap MAAS2 since Juju2 does not use MAAS API V2 [17:53] tych0: it's not clear whether we should immediately fall back to "default" [17:54] tych0: I need to able to test this. [17:54] frobware: what other options are there? [17:55] I love the way in juju status that we try to save horizontal space by labelling the Instance ID column "INS-ID" ... but then the actual ID is juju-c9d1b54c-95ba-4046-8b16-06e16b02ada8-machine-0 [17:55] tych0: NetworkConfig.Device and NetworkConfig.NetworkType may have valid values [17:55] frobware: right, but nothing in the code uses those right now right? [17:56] tych0: in container/lxd.go no, but that could be an oversight having only tested this with MAAS [17:57] tych0: I'm trying to understand from the POV of the other providers, particularly AWS with and without juju's address-allocation feature flag [17:57] right. i'm not proposing a fix for that, i'm proposing a fix for something else: that there is no networks at all in containers on GCE or AWS [17:57] tych0: so it wasn't clear to me this was broken on AWS. [17:58] i think it's broken everywhere that doesn't provide network configuration, which includes AWS right? [17:58] i haven't actually tested it there, i just heard someone (maybe you?) mentioned that GCE and AWS were similar in that respect [18:05] tych0: just trying on AWS [18:12] Bug #1570031 changed: Cannot bootstrap MAAS2 since Juju2 does not use MAAS API V2 [18:12] Bug #1570035 opened: Race in api/watcher/watcher.go [18:17] katco: ericsnow just opened up a bug with the same error of that windows bug you're working on [18:17] katco: and I'm assuming he's not on windows :) [18:17] cherylj: k [18:17] katco: so maybe there's a way to recreate without windows [18:17] cherylj: dunno... ericsnow, are you? ;p [18:17] bug 1569963 [18:17] Bug #1569963: log spam: image metadata, apiworkers manifold worker, KVM [18:17] cherylj: not on windows :) [18:17] katco: ^^^ [18:18] cherylj: this looks like a completely different error? [18:18] machine-0: 2016-04-13 14:54:23 ERROR juju.worker.dependency engine.go:526 "apiworkers" manifold worker returned unexpected error: setting controller network config: cannot set controller provider network config: cannot set link-layer devices to machine "0": invalid device "unsupported0": Type "" not valid [18:18] cherylj: oh nm i see the message now [18:18] ok, thought I was going crazy [18:18] which is entirely possible [18:19] aren't we all? heh heh heh [18:25] tych0: I'm using your tree and AWS seems busted (for add-machine lxd:0) [18:26] frobware: oh? what's the symptomp? [18:27] tych0: bleh, scrap that. Warning != Error. [18:33] frobware: ok, my abttery is going to die. let me know what you figure out about aws [18:34] tych0: just testing one more path... [18:39] * frobware mulls over the fact that he is doing basic sanity bootstrap... that a machine and/or CI could do... [18:41] tych0: not sure if you're still about but ... http://pastebin.ubuntu.com/15818135/ [18:41] tych0: that's your tree, bootstrapped as: JUJU_DEV_FEATURE_FLAGS=address-allocation juju bootstrap a1 aws --upload-tools [18:41] tych0: I'll need to dig into this more tomorrow [18:48] tych0: so, the problem above is because: [18:48] ubuntu@ip-10-229-48-181:/var/log/lxd/juju-machine-0-lxd-0$ sudo lxc profile list [18:48] docker [18:48] tych0: there is no default profile [19:03] Bug #1568668 changed: landing bot uses go 1.2 for pre build checkout and go 1.6 for tets === alexisb is now known as alexisb-afk [19:18] Bug #1568668 opened: landing bot uses go 1.2 for pre build checkout and go 1.6 for tets [19:19] natefinch: how's your stuff going? [19:21] katco: I may have just found a key piece.... I still haven't figured out how to disable all the upload-tools stuff, but I just found a spot where we haven't set the arch when getting user data config (i.e. the thing that sets up cloudinit and which determines the tools we download) [19:21] katco: maybe... still not sure [19:22] katco: do you know if there was ever a plan to support nested lxd containers for the lxd provider (such that deploy --to lxd:# would work)? [19:23] cherylj: that's certainly a use-case [19:24] katco: but would we expect it to work now? (because we don't allow it because of the "can host containers" check) [19:24] cherylj: i don't think we specifically added support for that in the provider [19:24] ok [19:25] cherylj: so right now it doesn't work because of a security setting? [19:25] cherylj: katco that's the profile issue [19:25] katco: I think it's because of the check that we do to determine if a machine can host a container [19:25] cherylj: katco where you have to use the docker profile to get that to work [19:25] the check is "are we a container" I think [19:25] rick_h_: cherylj: right, that's where i was going... can we tweak the profile to disable that security check [19:26] cherylj: oh, it's a juju thing, not lxd? [19:26] wait, I'm thinking of a different bug [19:26] maybe it's not that check that's failing [19:26] (sorry, I'm slowly losing my mind) [19:28] ah, yes it is the same error output: "cannot add a new machine: machine 1 cannot host lxd containers" but it doesn't fail at deploy time? weird. [19:28] this is bug 1569106 for reference [19:28] Bug #1569106: juju deploy --to lxd:0 does not work with lxd provider [19:30] Bug #1568668 changed: landing bot uses go 1.2 for pre build checkout and go 1.6 for tets [19:36] katco or ericsnow: care to pair on this? A fresh pair of eyes may be a big help [19:37] natefinch: sure [19:37] natefinch: trying to get this windows test going [19:37] natefinch: I need a break :) [19:37] ericsnow: cool [19:37] katco: np [19:38] katco: good luck === JoseeAntonioR is now known as jose [20:37] frobware: wat [20:37] frobware: how is there no default profile? that is odd. [20:45] morning [20:46] hi thumper [20:47] hey perrito666 [20:48] * perrito666 wonders what exactly did he change in his linkedin lately that provoked a wave of recruiters === terje is now known as Guest13601 [20:52] Bug #1569072 changed: juju2 bundle deploy help text out of date === \b is now known as benonsoftware [21:00] katc0, cherylj: pretty sure ericsnow and I figured out the problem with lxd on non-amd64. [21:00] yay! [21:00] natefinch: what was it? [21:01] cherylj: we were getting a list of valid tools, and not filtering it for valid architectures before picking one off the list. It just happens to always be amd64 first... and just happens to be ignored during bootstrap [21:02] nice === _stowa_ is now known as _stowa [21:28] Bug #1570096 opened: No way to remove user; remove-user command is missing [21:34] that seems like a pretty solid bug === axw___ is now known as axw [22:03] wallyworld: meeting? [22:03] thumper: on way, last meeting ran over === aluria` is now known as aluria [22:38] wallyworld: I am leaving for like an hour or so, if you change your mind regarding the standup please have anastasiamac_ or axw contact me via some communication protocol from this century, preferably one that rings my phone such as twitter [22:39] perrito666: ok, will do [23:05] wallyworld: axw: I proposed a quick fix for cloud lookup to include built-in providers consistently http://reviews.vapour.ws/r/4579/ [23:06] cherylj: if u don't want it in beta4 ^^^, I can re-propose against next with more testing around the area... [23:19] trivial review for someone: https://github.com/juju/gomaasapi/pull/37 [23:20] axw: you on call? [23:20] thumper: I am [23:20] (currently reviewing in fact) [23:20] * axw enqueues [23:23] thumper: can we not change gomaasapi to drop the empty-value ?op= [23:23] axw: we *could*, but I'm more hesitant to change parts that are used already... [23:24] it is almost certainly likely to be fine [23:24] but still has me a bit squeemish [23:24] thumper: ok [23:52] axw: quick state API question: http://paste.ubuntu.com/15822102/ [23:52] I think I prefer the latter [23:52] natefinch: how goes the non amd64 lxd provider fix? [23:57] wallyworld: we paired right before he had to go for dinner and it looks like we came up with a fix [23:58] wallyworld: natefinch should be back in a little while to wrap it up [23:58] ericsnow: oh, that is awesome [23:58] what was the root cause? [23:59] menn0: sorry going to meeting, will look in a little while [23:59] wallyworld: the list of tools that gets passed in is basically "all available" and amd64 is the first one [23:59] axw: no rush [23:59] wallyworld: the provider was simply using the first one in the list [23:59] wallyworld: needed to filter for arch first