[00:14] * redir wonders where — and – fall in that spectrum... [00:14] they look so similar [00:16] and how about ‗ ? [00:21] babbageclunk, dude! it is time for sleeping [00:21] * alexisb goes to pick up kiddo talk to you all later === alexisb is now known as alexisb-afk [00:22] alexisb: you are right! I'm all done, except can't do a final test because of some weird dns issue. [00:22] * babbageclunk goes to bed! [01:37] axw: i'd like to get this landed for beta if possible, it's a small tweak to list controllers http://reviews.vapour.ws/r/5635/ [01:45] looking [01:57] wallyworld: reviewed [01:57] ta [02:02] axw: 1:1? === natefinch-afk is now known as natefinch [02:11] I hate it when I fix a simple bug and it reveals a huge gnarly bug behind that one. [02:14] natefinch: only one gnarly behind? consider it ur lucky day :D [02:15] well.. it's one gnarly bug because of two pieces of gnarly code working at cross purposes... both assuming too much about what the other one will do or not do [02:17] axw: got a minute? [02:17] axw: I'm going to ask you about code you wrote threee years ago ;) [02:18] start dusting off the tape drive :) [02:49] natefinch: sorry, missed your message [02:49] natefinch: what's up? [02:50] axw: heh, no problem. I've actually engineered a fix around it, but we can talk a little... [02:51] axw: juju help-tool [02:51] axw: https://github.com/juju/juju/blame/master/cmd/juju/commands/helptool.go#L123 [02:51] axw: we make a dummy hook context and then pass that into all the functions and hope they don't look too closely at it :p [02:52] let me guess, something is looking at it? :) [02:52] axw: yes :) [02:53] axw: but they don't *really* need to, at least not so early, and since all this code does is call info... I just deferred the code that looks at the value until we actually run the command. [02:54] natefinch: ok. perhaps the interface for those commands should be changed, so that Run is the thing that takes the hook context? [02:55] axw: that's a good general solution. I don't think anything should really need to use the hook context on command creation, and "context" is really more of a "while running" concept anyway (which is why we pass in the command context there) [03:02] axw: I made the minor change to get this one to work. I think in the future it would be good to make the fix you suggested, but that seems like a bigger change than we really have time for right now. [03:02] axw: http://reviews.vapour.ws/r/5636/diff/# [03:10] or if anyone else wants an easy review ^ (+26 -12) [03:12] natefinch: reviewed, please add QA steps tho [03:12] (code LGTM) [03:14] oops, right, updated [03:15] natefinch: do you know if we have a CI test that uses resource-get? [03:15] axw: not sure, I can check [03:25] axw: yeah, there is [03:25] natefinch: cool, thanks [03:49] ffs, landing bot is down :-( [04:19] wallyworld: can you please check if you can see the data I just shared on drive? [04:20] axw: yep, am looking at it now. i think it would be good to extract the 3 interesting numbers for each run and tabulate [04:21] wallyworld: indeed, I need to prepare something for my talk later tho :/ I've responded with rough figures that are most interesting, will see what else I can do [04:21] axw: i can pull out some numbers if i get time a bit later [04:22] axw: ah i see the next email; i think that's ok for now, it gives the key data point [04:23] axw: only other data point would be what it was before your changes [04:24] wallyworld: yeah, but I figured it's not that useful to know how bad the really bad implementation is :p [04:24] I'd prefer to focus on how to get it better [04:25] fair enough [04:39] axw or wallyworld: could I get a review of this please? http://reviews.vapour.ws/r/5629/ [04:39] it's fully QAed now [04:39] ok [04:39] menn0: except it appears landing bot is down :-( [04:40] wallyworld: ah... oh well [04:40] wallyworld: the review would be good to get done anyway [04:40] sigh [04:40] yep [04:40] * menn0 looks at the bot [04:40] menn0: oh, it's back [04:41] wallyworld: ok cool [04:41] got a server error before [04:48] wallyworld: TestCertificateUpdateWorkerUpdatesCertificate is failing really often [04:48] awesome [04:48] it has been ok for a while, was failing a bit a while back [04:48] menn0: how is it failing? [04:49] natefinch: http://juju-ci.vapour.ws/job/github-merge-juju/9153/artifact/artifacts/windows-out.log [04:49] natefinch: the test thinks the certificate isn't being updated [04:49] natefinch: it's happened to me twice in a row for a PR that changes a very unrelated shell script [04:49] it should even be running on windows [04:49] shouldn't [04:50] it failed for me yesterday a few times too [04:50] anything controller related does not need to run on windows [04:50] wallyworld: agreed, but this doesn't seem like a windows specific problem [04:50] i.e. maybe we're just lucky that it's not failing more often on linux [04:51] [LOG] 0:05.803 DEBUG juju.worker.certupdater addresses haven't really changed since last updated cert [04:52] I was looking into this cert updater stuff a lot for a manual provider bug I was looking into, so I'm fairly familiar with the code. There's a watcher here that's getting fired, but we think nothing has changed. If nothing has changed, why is the watcher firing? [04:57] natefinch: could be an update to the doc which just set the same values? that would still cause the watcher to fire. [04:57] yeah, true. Still, seems suspicious that we're getting that "nothing changed" firing and then the test failed because nothing changed. [04:59] natefinch: true [05:00] natefinch: I just dug through the code. the watcher is tracking the machine document so any change to the machine will fire it. not all those changes will be address updates. [05:00] oh right, that makes sense [05:00] natefinch: so that message could be completely normal [05:01] or it's a clue :) [05:02] is the landing bot working, or not? [05:09] menn0: lgtm [05:09] natefinch: it is again now [05:18] wallyworld: thanks [05:31] menn0: i think i found the pinger issue. haven't tested yet, just looking at code [05:31] but i don't fully understand it either [05:33] wallyworld: for the cert updater? [05:34] menn0: in apiserver/admin.go there's a call to startPingerIfAgent() [05:34] but it is inside an if{} [05:34] and i think nothing ever calls it [05:34] that's the only place i can see where the pinger would be started [05:35] i'll add some logging and test a bit [05:42] wallyworld: I believe the if there to avoid unnecessarily starting pingers for controller machine agents which log in on behalf of a model. [05:43] the thinking is there's no need to run the pinger if the client is on the same machine [05:43] ok. i'll confirm if we are starting the pinger or not and gp from there [05:43] sounds good [07:16] fwereade: you around? [07:28] wallyworld, yeah, sorry [08:34] Bug #1493058 changed: ensure-availability fails on GCE [08:46] wallyworld: u've kindly file this bug... but I *think* this has now been fixed as off last week. right? :D [08:46] https://bugs.launchpad.net/juju/+bug/1559701 [08:46] Bug #1559701: kill-controller manual provider broken <2.0-count> [08:46] axw: do u know? ^^ [08:47] anastasiamac_: that's fixed. destroy-controller auto destroys manual machines now [08:48] axw: \o/ [09:11] mgz: hey there - about? [09:12] babbageclunk: ping [09:13] voidspace: pong [09:52] dimitern: hey [09:54] mgz: I'm looking into bug 1621538 [09:54] Bug #1621538: container networking: cannot juju ssh to container [09:55] mgz: it seems the error is different [09:56] mgz: not about parent br-eth0 not having an address but something even weirder - node with that [device] system_id already exists [09:59] dimitern: there are only two changes in the regresion window [09:59] pr #6156 and pr #6158 [10:00] mgz: still digging in both maas and juju sources to see how that duplicate error can even happen - we're namespacing the hostnames(instance ids) we generate for containers and their backing maas devices [10:33] dimitern: dhcp won't start on my maas :( complaining about a missing conf file [10:34] fwereade: ping [10:34] voidspace, pong [10:34] babbageclunk: is it inside LXD ? [10:34] fwereade: so I'm trying to build an operation that "marks the action as failed and restores local state to a sensible state" [10:34] dimitern: I think it's because the drive filled up - I made the fs bigger. Any way I can get it to rewrite the config file? [10:34] dimitern: no, kvm [10:34] voidspace, ah yes [10:35] fwereade: afaics marking an action as failed means calling "state.State.ActionFinished" [10:35] babbageclunk: I'd try dpkg-reconfigure maas-dhcp first [10:36] fwereade: and nothing in the uniter/actions/resolver has access to that [10:36] fwereade: hmm... op_callback has some code that does that [10:36] voidspace, I thought I saw a method on callbacks that did that [10:36] fwereade: right [10:36] voidspace, so you should be able to supply that capability in the op factory [10:36] dimitern: just tried that, no dice. [10:37] dimitern: I guess it's reinstall time! [10:37] fwereade: right, sounds good to me :-) thanks [10:38] babbageclunk: what's the error you're getting? [10:38] voidspace, and the "sensible state" would be "that achieved by running the same code as in RunAction.Commit", I think [10:38] dimitern: http://paste.ubuntu.com/23154038/ [10:38] fwereade: yeah, that I'm fine with (well - until I actually get to doing it...) [10:38] dimitern: And it's right - that file's not there. [10:39] voidspace, cool :) [10:39] babbageclunk: ok, before reinstalling, try also dpkg-reconfigure maas-rack-controller [10:40] dimitern: ooh yeah, that looks better! [10:42] dimitern: didn't seem to help - I'll try bouncing the machine just in case [10:43] babbageclunk: yeah - and also check the permissions on /var/lib/maas/* [10:43] dimitern: Hold the phone, actually I think that did it! Yay thanks! [10:43] babbageclunk: \o/ :) [11:04] Bug # changed: 1543660, 1546805, 1587644, 1592887, 1621658 [11:10] Bug # opened: 1543660, 1546805, 1587644, 1592887, 1621658 [11:16] Bug # changed: 1543660, 1546805, 1547806, 1587644, 1592887, 1621658 [11:28] Bug #1547806 opened: open-port does not work on EC2 [11:28] dimitern: morning, how goes the battle with the functional container network tests? [11:29] rick_h_: morning, still trying to repro the new error [11:30] rick_h_: not related to missing addresses on host bridges [11:30] dimitern: rgr [11:30] rick_h_: it seems maas 2.0 specific so far [11:30] dimitern: they just upgraded to maas 2.0 final in CI yesterday before this run [11:31] rick_h_: I thought at first that job runs on maas 1.9 and wasted some time trying to find the error in lp:maas/1.9 [11:32] rick_h_: good that I checked the job's jenkins config [11:32] dimitern: it came up yesterday that it was on 2.0rc3 I think and was upgraded to 2.0 final [11:33] rick_h_: ok, that's useful - I'll concentrate on changes since 2.0rc3 [11:34] Bug #1547806 changed: open-port does not work on EC2 [11:42] * rick_h_ goes to get the boy to school and make a coffee run, biab [11:45] babbageclunk, http://reviews.vapour.ws/r/5637/ is up if you have a moment; ask many questions if required :) [12:28] dimitern: ping [12:28] frobware: pong [12:28] dimitern: any joy with the container failure? [12:29] frobware: so far it seems something broke after the maas 2.0rc3 on finfolk got upgraded to 2.0 final [12:30] dimitern: the maas update that we did with mgz yesterday? [12:30] dimitern: (I thought that 1.9) [12:30] frobware: I though so as well, but the job runs on 2.0 [12:30] that was a different maas [12:31] and this failure has been happening since the landings on the 7th [12:32] mgz: the first time it happend was after frobware's PR got reverted, so it can reach a bit further preparing container NICs [12:34] * frobware jumps on a train; back in 90 mins... or so. [12:37] fwereade: was lunching - looking now [12:42] babbageclunk, cheers [12:48] rick_h_: got a suggestion for another bug to pick up? [12:49] natefinch: for that one I pushed back on not doing anything on that [12:49] natefinch: did you get the reply? [12:49] rick_h_: missed that, will go look [12:49] ty [12:53] gah, I need multiple return values on my executables :/ [12:53] GAH, can someone please tell me what I should avoid so that the bot will post to RB? [12:53] heh [12:53] double gah! [12:53] o/ [12:53] babbageclunk: not sure what's wrong with the rb bot [12:55] Is it working for others at the moment? [12:56] babbageclunk: last night it made the review for me, but didn't update my PR description with the link [12:57] I think it's something to do with things I put in the description - it's looking for (Review request: http://blah) so it knows whether to update or create a new one. But it gets confused. [12:57] natefinch: what about going with resource-get-fingerprint [13:00] rick_h_: well, seems like the information we really want is "is the info different on the server" Do we care what the actual fingerprint is? I don't think that tells us anything by itself [13:00] dimitern, voidspace, frobware: can I get a review of http://reviews.vapour.ws/r/5638/ please? [13:01] babbageclunk: looking [13:01] dimitern: thx! [13:01] natefinch: I just sent an email off to chuck to clarify and make sure I'm understanding things correctly [13:02] rick_h_: cool [13:02] natefinch: while that goes on how about looking at https://bugs.launchpad.net/juju/+bug/1620056 please [13:02] Bug #1620056: constraints should support cores=X [13:03] natefinch: need to create a card on the board for it [13:03] rick_h_: ok, will do [13:03] heh, I was like "don't we already support that?" but it's a name change. I get it. [13:03] natefinch: ty [13:03] natefinch: right, little tweaking [13:25] rick_h_: the main problem is that the best interface for resource get would be to return two values: the path of the file on disk, and a boolean indicating whether or not it has been updated. But that's kind of hard to do on the command line. [13:26] natefinch: right, so that leads to two commands [13:27] rick_h_: but if you're running two commands already, you might as well just use md5sum [13:28] natefinch: but thing there is that folks can/will use different tools/etc. Making things like building a charm with layers and such more complicated. What mechanism did this guy use to hash/etc [13:28] natefinch: if it's a common use case thing we can make an opinionated standard that everyone just follows [13:28] natefinch: honestly, I want something like a return code from resource-get that says "did the file on disk change or not" [13:29] but I have a feeling anything non-0 might be more problematic for folks to rely on and build on [13:29] fwereade: LGTM [13:30] rick_h_: that's what I was thinking too.... if it's behind a new flag, it's probably not a big deal, since we wouldn't be breaking backwards compatibility. like resource-get --check-new [13:30] rick_h_: probably most people would just use whatever function we put in charm helpers [13:35] rick_h_: replied to your email. The other option is to have a --yaml flag (or --json) that outputs structured data, so we can return the path and a boolean [14:01] voidspace: fwereade ping for standup [14:01] dimitern: ping for standup [14:01] omw [14:01] hmm - anyone have advice about how to get into a 2nd gen x1 carbon to replace the ssd? [14:02] rick_h_: omw [14:02] wat [14:02] why you drop me google [14:23] babbageclunk, ping [14:25] alexisb: pong [14:25] heya babbageclunk, sorry moving slowish this morning [14:26] did you want to touch base this morning? [14:26] I forgot I was sitting in the hangout - camera's been on while I've been replacing my ssd. [14:26] lol [14:26] ok I will pop on [14:28] * dimitern is back [14:58] fwereade: was it you I'm due to be reviewing something for? [14:59] voidspace, http://reviews.vapour.ws/r/5637/ [14:59] voidspace, ta [15:09] voidspace: If you're looking for a pretty relaxing Friday afternoon-style review you can look at this one: http://reviews.vapour.ws/r/5638/ [15:09] babbageclunk: hah, because I have nothing else to do :-p [15:09] fwereade: looking [15:13] Ok, so do I want to do full-disk encryption? Will I still be able to install Windows for indie gaming fun? [15:14] Duh, no, that's what full-disk means. [15:14] babbageclunk: in my opinion, full disk excryption will just make your life harder. In theory, if Something Bad™ happens, you'll have protection against the government grabbing your laptop and using its contents against you. However, it doesn't prevent said government from hitting you with a pipe until you give them your password. [15:19] babbageclunk: LGTM, sorry it took so long [15:19] natefinch: true. [15:20] dimitern: No worries - sounds like there was some other stuff going on! [15:20] dimitern: Thanks! [15:21] babbageclunk: yeah, :/ a friday MAAS mystery (or likely even misery) [15:21] mgz: how goes? [15:21] dimitern: stink [15:24] dimitern: doing some futzing still, will yell in a sex [15:24] -x+c [15:25] wow, that really changes the meaning of your sentence [15:25] ;) [15:25] ;_; [15:25] mgz: ok [15:28] * rick_h_ goes for long lunch with family today, biab [15:29] dimitern: running now [15:29] mgz: I'm tailing the maas logs in the meantime [15:34] dimitern: state server coming up on 10.0.30.13 [15:34] you should be able to ssh in there with the ci staging key shortly [15:34] mgz: yeah, trying now [15:46] dimitern: okay, we're at the fail point [15:47] mgz: no errors so far in the controller log [15:56] fwereade: LGTM [15:56] mgz: ok, got the error - looking earlier in the logs [15:56] voidspace, ta [16:10] mgz: /var/log/postgresql/postgresql-9.3-main.log on maas has some very interesting errors [16:10] frobware: ^^ [16:13] morning juju [16:14] dimitern: categorically works for me on 1.9 [16:14] frobware: it works for me on both 1.9 and 2.0 [16:22] dimitern: hmm; I have a few teething problems on my maas 2.0 setup atm [16:26] frobware, mgz: I think I nailed it [16:26] the db got corrupted [16:27] when the upgrade was done yesterday [16:27] dimitern: what's up with having both 9.5 logs (last from -09-08) and 9.3 logs (current)? [16:27] instead of upgrading from 2.0.0rc3 to 2.0.0, it was done first to 2.1.0alpha3, then downgraded to 2.0.0 final [16:28] dimitern: aha [16:28] unfortunately the alpha3 package ran some db migrations, which dropped maasserver_node_system_id_seq [16:28] the migrations *should* be two ways [16:28] and that's causing the device creation to fail [16:29] I bet I can repro it manually even with the maas cli only [16:29] pgsql 9.3 is the one maas uses, the newer was installed with the intent of upgrading the maas one, but that didn't happen [16:31] all those hints can be observed in /var/log - ./apt/term|history.log, ./postgresql/.. but unfortunately not in any the maas logs [16:34] frobware, mgz: and indeed I did - http://paste.ubuntu.com/23155215/ [16:36] that maasserver_node_system_id_seq got dropped months ago after 2.0 switched to shorter node names (check the MP - the only google result for the seq name) [16:37] I guess nobody seriously thought about the existing maas 1.9 node ids will be a problem after upgrading to 2.0 and then to 2.1 [16:42] dimitern, mgz: so I just don't run into this because I don't upgrade... :( [16:42] dimitern, mgz: bitter experience from Windows 95. [16:45] frobware: well isn't it better not to run into these issues? :) [16:46] dimitern: yes, but customers (you would hope) are using our older versions... [16:46] mgz: I'm trying to see if it's salvageable without reinstall [16:46] by reverting the 2.1.0 migrations [16:46] dimitern: thanks... [16:47] dimitern, mgz: funnily enough I am having psql problems installing a fresh version of maas 2.0 [16:49] dimitern: that's my third install which has failed. [17:29] frobware: dimitern any progress on identifying the issue? [17:30] rick_h_: see issues above about psql failures in migrations [17:30] rick_h_: yeah [17:30] frobware: reading backlog, so there's issues with upgrading and how the ids change over time? [17:30] frobware: dimitern k, sounds like we need to file a bug against MAAS? [17:31] frobware: dimitern and then to update our install to be a fresh install of MAAS in CI and we should pass tests cleanly? [17:31] mgz: ^ ? [17:31] rick_h_: botched maas upgrade 2.0r3->2.1.0a1->2.0.0 corrupted the dn [17:31] db [17:31] dimitern: oh, it was a botched upgrade? [17:31] dimitern: ok [17:31] some db migrations ran, but not all are reversible [17:31] sinzui: mgz so are we able to clean the MAAS and rerun? [17:31] rick_h_: agreed to the clean install; leave the other as-is for continued investigation [17:31] mgz: ^^ [17:32] rick_h_: I was trying to manually revert those 2.1.0 migrations, but not all can be reverted [17:32] dimitern: ok, let's not bother with that imo [17:32] ok [17:32] dimitern: rick_h_ ouch, and the error happened before the botched upgrade [17:32] sinzui: right, but it was a different error [17:32] sinzui: that was backed out with the revert, but then we kept failing with a new error due to the botched upgrade [17:32] sinzui: yeah, a bit more complicated than usual [17:33] sinzui: mgz dimitern frobware hangout to set the path forward? [17:33] rick_h_: dimitern: I don't know what clean up means in this case. A clean install is something I have never done. I wont promise ot for today [17:33] rick_h_: sure - standup HO? [17:33] dimitern: rgr [17:33] I need to go soon [17:33] I think we need a new maas vm [17:33] sinzui: I can help with a clean install [17:33] or some more serious help from a maas expert [17:33] mgz: agreed [17:33] alexisb: BrettD mgz sinzui welcome to join https://hangouts.google.com/hangouts/_/canonical.com/core?authuser=1 [17:34] rick_h_, not atm [17:34] alexisb: k, will fill you in afterwards then [17:46] mgz hop back up [17:46] back on [17:48] mgz: really, hop back on to https://hangouts.google.com/hangouts/_/canonical.com/core?authuser=1 [17:48] sinzui: omw [18:01] natefinch: what version of juju did you disable ciphers [18:01] like the shitty insecure ciphers [18:01] marcoceppi: https://bugs.launchpad.net/juju/+bug/1604474 b16 [18:01] Bug #1604474: Juju 2.0-beta12 userdata execution fails on Windows [18:02] marcoceppi: uh [18:03] rick_h_: nice, thanks [18:03] yes that [18:15] rick_h_: mgz: functional-container-networking does pass on maas 1.9. maas 2.0 is being rebuilt now [18:16] sinzui: ty [18:16] sinzui: feel a bit better about things now [18:22] method returns value, error. The only place we call the method... we ignore the error. FanTASTIC. [20:27] silly question, i've got an application depoyed, its a subordinate, and its active in the "unit view", but its not listed as active in the "app view" - is this known an expected? http://imgur.com/AGp83ik [20:28] lazyPower: dunno... there's been a lot of churn in that area. Not sure who last updated that code. [20:29] ok, i figured it was intentional as subordinates dont technically occupy a unit, they co-locate.. but it was a bit startling of a realization [20:29] thanks natefinch [20:29] lazyPower: I know we intentionally hide them in some parts... not sure if this was an intentional part or not :) [22:01] ready for a review if anyone is still around: http://reviews.vapour.ws/r/5640/ [22:02] bbiab