=== thumper-dogwalk is now known as thumper [00:05] blahdeblah: I'll let you know, thanks for filing it [00:05] externalreality: Just making some toasted sandwiches, with you in about 15-20min [00:06] thumper: +1 for toasted sandwiches. Cya in 20 [00:23] blahdeblah: your comment one seems to be missing the results of the actual calls [00:24] thumper: That was intentional; mostly for anyone in the general public who runs into the problem but doesn't have access to the pastebins [00:24] blahdeblah: ah [00:24] ok [00:27] Bug #1662272 changed: Agents stuck with "dependency not available" [01:29] thumper: i have a question, got 2 minutes? [01:29] gimmie 5? [01:32] wallyworld: now is fine [01:32] sure, HO? [01:35] anyone seen this yet: https://bugs.launchpad.net/juju/+bug/1666722 [01:35] Bug #1666722: juju 2.1 fails to deploy machines in localhost with lxd 2.9.2 [01:35] not sure why it was marked incomplete though [01:37] stokachu: i've marked incomplete coz i cannot read the pastebin with log, plz attach log as a file :D [01:37] uh [01:38] well i can read it just fine [01:39] :D [01:40] k added [01:43] wallyworld: hangouts dropped you [01:43] now I get "can't start call due to an error" [01:43] hazaar [01:43] thumper: yeah, trying to rejoin [01:44] thumper: i'm bacl in join, waititng for people to join [01:44] well i was [01:44] dropped me again [01:44] I'm still getting error [01:45] wallyworld: oh well... [01:45] let's not bother then [01:45] ok [01:45] you answered my question, ty [01:53] wallyworld: thumper: m looking at some logs from 2.1 today's tip and am seeing 2017-02-21 23:09:40 ERROR juju.state database.go:243 using unknown collection "remoteApplications" [01:53] think i figured it out [01:53] i thought we have dealt with "remoteApplications"? [01:53] anastasiamac, wallyworld: all access should be behind a feature flag [01:54] stokachu: m seeing [01:54] Status:"provisioning error", Info:"A root disk device must have the \"pool\" property set." [01:54] stokachu: what did u figure? :D [01:54] anastasiamac, yea lxd 2.9 requires a pool defined [01:54] root: [01:54] path: / [01:54] type: disk [01:54] so that needs to be set in the profile [01:54] which it does by default it looks like [01:54] it all was behind a flag as far as i knew. there was an issue in 2.0 [01:54] in the megawacther but that was fixed [01:54] anyway anastasiamac you can close that issue [01:54] thumper: m fairly certain feature flag was not enabled on this environment [01:54] just requires "pool: default" [01:55] stokachu: could u plz add a note as to what u did for posterity :D [01:55] yea [01:55] stokachu: and just to confirm, u did not have any fature flags enabled? [01:55] the alest juju 2.1 should set up the profile [01:55] there's nothing that conjure up should need to do [01:56] IIANM [01:56] nope no feature flags [01:56] wallyworld: stokachu is on the latest (from today) 2.1 :D [01:56] wallyworld, yea i need to update our spells now [01:56] to account for the new way lxd storage works [01:56] added a note to the bug [01:57] stokachu: thnx \o/ [01:58] wallyworld: wallyworld: I am also seeing failures related to trying to config mongo... http://pastebin.ubuntu.com/24043782/ [01:58] np [01:59] that doesn't make sense, those kernel params do exist [01:59] wallyworld: ¯\(ツ)_/¯ [01:59] is this being done inside a container? [02:00] stokachu: ^^ [02:00] anastasiamac, what is that from [02:01] stokachu: from reading the logs u've attached to the bug... were u inside the container? [02:01] yes [02:01] those lone numbers don't match [02:01] line [02:02] yea im not sure those error messages relate to what i was doing [02:02] wallyworld: the full log is in bug 1666722 ... m calling as I see it :D [02:02] Bug #1666722: juju 2.1 fails to deploy machines in localhost with lxd 2.9.2 [02:02] but it implies the latest 2.1 is not being used [02:03] this is from my snap which i rebuilt several hours ago [02:03] oh wait [02:03] to make sure i pulled in the 2.9 fixes [02:03] sorry, i'm looking at thewrong branch [02:03] sigh, too many tabs [02:03] wallyworld, :D [02:04] ok, so just looked, those "errors" are poor debug messages [02:04] sigh [02:05] they can be ignored [02:05] wallyworld, all good :D [02:14] wallyworld: what about "remoteApplications" one? [02:14] NFI. it's harmless. will need more context to track it down. what was the user doing, what commands were being run etc [02:15] everything looks like it's behind the flag [02:15] something is leaking though it seems [02:15] wallyworld, also keep in mind i bumped the logging way up [02:15] stokachu: yeah, but here juju was logging this as an error [02:15] wallyworld, http://astokes.org/juju/2/api/debugging i do this everytime i need to investigate stuff [02:15] and it should not have been [02:15] ah ok [02:16] wallyworld: i believe that stokachu was just bootstrapping at the time of that message [02:16] bootstrap actually worked [02:16] it was the deploying of applications [02:16] yeah, that remote things is harmless [02:16] if it showed up after a deploy was run that gives some context [02:17] stokachu: m not talking about what operation failed but what operation was run when that log message appeared.. it's in your log anyway [02:17] ok [02:17] but since wallyworld is happy there is no impact (and he would know)... [02:17] or so i think [02:18] it doesn't appear to from what i've seen [02:18] just need to track it down and remove the noise [02:18] wallyworld: m not clear why this mesage would appear if everything is behind a feature flag (regardless of whether it was bootstrap or deploy) [02:19] excactly [02:19] we have been bitten before by similar things :D [02:19] that's my point [02:19] k. i'll leave it with u, master [02:19] there's ben a leak somewhere [02:20] it all should be behind a flag. i've looked through the code and it appears that's the case, but something somewhere is misconfigured [02:43] perrito666: pong [03:02] thumper: tech board? [05:03] jam: still doing standup? do you have anything to discuss? [05:04] axw: brt [05:12] axw: when u get a chance, PTAL https://github.com/juju/juju/pull/7017 [05:13] anastasiamac: How is 1662272 considered non-critical, if restarting machine-0 agent, juju-db, and unit agent doesn't fix it? [05:13] Seems like it's 1587644, only with more severe symptoms. [05:35] anastasiamac: LGTM [05:36] axw: \o/ [06:14] axw: anastasiamac: wrt persistent storage, what happens if you destroy the model or even the controller? [06:14] axw: do we intend to leave *persistent* storage as above Model or Controller scope? [06:14] axw: or at least have a way to say "this disk outlives us all" [06:15] axw: (came up in a discussion with anastasiamac about 'destroy-controller' and how it interacts with disks) [06:15] specifically, if the storage is going to outlive the machines we are killing, we probably need to *try* to do a clean shutdown, so the content on the disks can be in a consistent state. [06:24] jam: eventually I want to give users a way of disowning storage, but for 2.2 it'll still be owned by the model [06:25] axw: what are your thoughts about fast-pathing tear-down. I believe right now we tell everything "I want you to die", and then wait for everything to tear themselves down and fire the Dead hooks [06:26] jam: indeed that is what we do. what's the problem? [06:27] axw: it takes 10 minutes to "juju destroy-controller" when you're throwing everything away [06:27] * axw nods [06:27] axw: what value is there in triggering "relation-departed" on a machine that is being terminated [06:28] jam: so I think we want to at least wait for the units to be Dead, because they could interact with external things [06:28] axw: its different if you're just killing 1 machine [06:28] but you also know that you're killing all of its peers [06:28] jam: we also want to clean up manual machines, because their lifetime is not under our control [06:29] jam: we could probably fast-path cloud machines, though there is some interaction with storage at least (destroying an instance with cloud storage attached can have negative consequences) [06:30] IIRC on AWS, destroyin an instance with an EBS volume attached can make AWS sad [06:31] axw: how so? I thought terminating a machine is natural [06:32] jam: I just recall instances getting wedged in a state where they wouldn't cleanly terminate, and having to force-terminate them. pretty sure it's documented too [06:34] axw: so you have to do "umount" inside the instance before terminate? [06:35] jam: indeed. http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-detaching-volume.html [06:35] "You can detach an Amazon EBS volume from an instance explicitly or by terminating the instance. However, if the instance is running, you must first unmount the volume from the instance." === frankban|afk is now known as frankban [08:57] axw: perrito666: https://github.com/juju/testing/pull/121 is a small tweak to the testing/mgo startup code. On my Mac laptop it only has mongo3.4 and that seems to have changed the content of the "waiting for connections" line. [08:58] the tweak is to be slightly less strict, which I'm hoping is still sane. It seems odd that we would have a different 'waiting for connections' that we *shouldn't* treat as mongo being ready [08:59] jam: LGTM [09:06] jam: I suspect it's to do with how the logging works, it's probably not skipping a frame like on other platforms. not entirely sure tho [09:07] looking at blame in mongo, seems the code that logs that hasn't changed in a way that would explain the difference [09:07] anyway, it's a safe change [09:20] axw: I believe I responded to all of your review comments. Can you look at https://github.com/juju/juju/pull/6988 again? [09:20] jam: sure [09:49] It is too early to exist [09:50] Jam you have too much faith in me if you expected me to be up and reviewing code at 5:57 am [09:52] Bug #1662272 opened: Agents stuck with "dependency not available" [10:19] jam: ping me when you are around please [10:19] perrito666: good morning. It was more of a "I don't know who might end up being around, so I'll leave a note for someone to handle in the future" [10:19] I'm around, just got done with the cloud town hall call [10:21] ill wait for the recorded [10:21] * perrito666 started the day with an interesting headache [10:22] jam: I thought about the same you mail mentiones (re noproxy) last night but the code is confusing, when tracing where the NO_PROXY contents come from they seem to be ultimately lifted from the env, which makes no sense [10:23] perrito666: there's another one in a couple of hours that is the US based timezones [10:24] perrito666: so there are a couple of things about NO_PROXY, one is that the "proxyupdater" object actually sets the ENV variables [10:24] so that things we spawn have them set [10:25] and we've done some semi-bad things during 'init' times where we potentially race with the values that we're going to be setting in the future. [10:25] however, the concrete "what is in no-proxy" has an answer that I posted in the bug [10:25] where we had a bug about "API server addresses should be in no_proxy" [10:25] which led us to just iterating over APIHostPorts (which can include 127.0.0.1 I'm *pretty* sure) [10:26] we can just always add "localhost" to it, and we can consider if we want to add the target machine's known IP addresses as well. [10:26] it does include both localhost address and a third ip which I think is state serveer [10:30] duh, I was right next to that code looking for a clearner implementation of no proxy :p [10:31] perrito666: we have a loop of APIHostPorts which happens that all controller machines have a 127.0.0.1 address [10:31] so it isn't really by *intent* that we add 127.0.0.1, its more by accident because we are adding "all known addresses for Controllers" [10:32] yes, just landed there, I passed right through it expecting to find a place where we set default values for no proxy instead [10:33] we only set those values if you've set anything in no_proxy [10:33] arguably we should be setting those values either [10:33] a) always [10:33] b) when any of the *_proxy values are set [10:33] otherwise you might set http_proxy, but never set no_proxy, and then we're back to leaving the Controllers as being accessed via a proxy [10:34] to be fair, no_proxy as an env variable seems like a poorly thought out hack that has to be interpreted by every application we interact with [10:34] all which might have small variations on how they use it [10:34] and it seems *very* much focused on Domain names, and *not* on IPs [10:34] and we're pretty heavily abusing it with IPs [10:34] jam: I think no_proxy should be set as a default for config values [10:35] well, I believe proxying is rather a big hack [10:36] perrito666: we could. Its certainly arguable that what we're doing now is guessing that local traffic shouldn't be proxied, and thus we force our own addresses to not be proxied [10:36] some would say we should flag all IPs for all hosts in the model to not be proxed [10:36] proxied [10:36] jam: is there a limit to the size of no_proxy? [10:37] perrito666: I've heard of people doing 'export no_proxy=10.0.0.{1..255}' which means it expands to 255 values*10 chars or so, some have asked to do 'export no_proxy=10.0.{1..255}.{1..255}' [10:37] but I *think* the 65535 version wasn't working [10:38] we could certainly *inside juju* support [10:38] 10.0.0.0/8 sort of syntax [10:38] the problem there is that when we talk to 'wget' or 'curl' they don't do anything with it [10:38] (I should test again) [10:38] meh, what will they read? [10:39] perrito666: let me confirm, but I'm pretty sure they just ignore that one [10:39] perrito666: the default golang one says "either its a domain suffix, or its an exact IP match" [10:42] perrito666: curl, at least, doesn't listen to "export no_proxy=192.168.0." or "192.168.0.*" or "192.168.0.0/24" [10:42] I don't know if there *is* an IP based rule that it would respect [10:42] it does respect DNS suffixes [10:44] I am a bit worried that we are relying on something half the popular software ignore [10:57] perrito666: well, we aren't, we only support explicit IP addresses,which appears to be supported everywhere [10:58] perrito666: also, anastasia helpfully reminded me about https://bugs.launchpad.net/juju/+bug/1488139 https://bugs.launchpad.net/juju/+bug/1615719 https://bugs.launchpad.net/juju/+bug/1421650 [10:58] Bug #1488139: juju should add nodes IPs to no-proxy list [10:58] Bug #1615719: [juju-2.0-beta15] during the bootstrap stage the no-proxy config is ignored [10:58] Bug #1421650: allow cidr notation for no-proxy [10:58] which are things we should be aware of, but don't have to solve everything in one pass [11:01] on #1488139 I mentioned I'm worried about the fact that no_proxy then becomes an O(N^2) bug for everyone [11:01] Bug #1488139: juju should add nodes IPs to no-proxy list [11:02] I moved 1421650 to Won't Fix because I don't think we can deviate from interpreting it like the 'standard' that other tools do [11:02] and I'm tempted to mark 1615719 as Wont fix because I think he's just using it wrong, based on how people assume 'no_proxy' works, but doesn't actually work that way [11:03] maybe we should validate the value of no_proxy contains only domain suffixes and concrete IP addresses and fail if you pass a wildcard or CIDR notation? [11:03] that would probably be more helpful [11:03] it might make people unhappy with *us* but at least we're preventing them from setting something that won't actually work === frankban is now known as frankban|afk [13:43] perrito666: btw, if our fix is changing the model-default value, I think that's a 2.2 vs a 2.1 [13:45] jam: I was going to target 2.2 actually :) [13:46] is a migration required for that? it isnt right? [13:51] perrito666: migration? probably not, upgrade-step ? Maybe [13:51] actually likely, since at the least we would have been putting 127.0.0.1 into the field and wouldn't anymore. [13:53] jam: that is what I meant, my django just crept into me :p [13:53] the current default is "" === frankban|afk is now known as frankban [16:35] morning juju-dev [16:36] me reboots for new kernel === redir_holiday is now known as redir [17:28] evening perrito666, I hope everything is going well [17:28] hi redir [17:28] jam: yes, listening to CTH [17:29] jam: and also running tests for no_proxy [17:29] howdy jam === externalreality_ is now known as externalreality === frankban is now known as frankban|afk [19:12] phew email mostly filtered [19:13] * redir puts on OCR hat [22:55] wallyworld: hey, I'm updating the validation in the model description to know about remote applications. [22:56] wallyworld: at the moment it checks that there is a correspondence between the application units and the endpoint units. [22:57] wallyworld: There isn't really the same thing with the remote applications, is there? Are there unit settings for the remote units in the local db? [22:59] wallyworld: I think I've forgotten most of the Barcelona braindump [23:05] babbageclunk: sorry, was in release call, just finished, did you want to chat now? [23:06] wallyworld: yes please! [23:07] wallyworld: standup HO? [23:07] sure [23:18] anastasiamac: 4:30pm is likely to be school pickup so i'll be a few minutes late to meeting [23:18] wallyworld: would 5pm b better? [23:19] wallyworld: i know there is soccer... [23:19] yeah, i can squeeze in 5 [23:22] i'd rather u ddi not squeeze ;) but i'llmove the meeting \o/ [23:59] Oi, IDM rears its ugly head again