blahdeblah | axw: Welcome back! How do you feel about advising on how to handle some juju2 controller unresponsive type of bugs? :-) | 00:31 |
---|---|---|
axw | blahdeblah: thanks. absolutely miserable ;p just kidding, what can I help with? | 00:31 |
blahdeblah | So, I've got a juju2 controller not responding, meaning juju status doesn't work on either the controller model, or the single model it's supporting. | 00:33 |
axw | blahdeblah: ok, got logs for the controller? | 00:34 |
blahdeblah | Yep, but there's been nothing in machine-0.log since 2017-01-20 10:24:10 | 00:34 |
blahdeblah | (UTC) | 00:34 |
blahdeblah | Oh, looks like logsink.log is the new hotness | 00:35 |
blahdeblah | Nothing in that for over half an hour, though | 00:36 |
axw | blahdeblah: is CPU pegged? memory? disk? | 00:36 |
blahdeblah | axw: None of the above? :-) | 00:37 |
axw | hmmk | 00:37 |
blahdeblah | axw: Our normal process for this is just to restart jujud-machine-0, and it usually works, but I'm keen to get it narrowed down to which bug this is, and get you folks something useful to make progress on. | 00:37 |
blahdeblah | Oh, just got some log activity | 00:38 |
blahdeblah | Lots of API terminations | 00:38 |
axw | blahdeblah: what ports is jujud listening on? | 00:38 |
blahdeblah | tcp6 0 0 :::17070 :::* LISTEN 53852/jujud | 00:39 |
blahdeblah | after that flurry of log messages, juju status is now responsive | 00:39 |
axw | blahdeblah: huh :/ | 00:40 |
axw | blahdeblah: can you share the logs? maybe there's something buried that'll jump out | 00:40 |
blahdeblah | axw: So it turns out someone else started working on this too, and just restarted the agents | 00:42 |
axw | blahdeblah: doh | 00:42 |
blahdeblah | So I guess this will have to wait until next time | 00:43 |
blahdeblah | It's been happening reasonably regularly, though, so hopefully it will be something we can gather again soon. | 00:44 |
menn0 | axw, wallyworld or thumper: https://github.com/juju/juju/pull/6859 | 01:09 |
wallyworld | says it's merged | 01:10 |
axw | menn0: wrong PR? | 01:10 |
menn0 | axw: right you are: https://github.com/juju/juju/pull/6866 | 01:10 |
axw | ok, looking | 01:11 |
wallyworld | menn0: should we also be changing the cleanup job to now remove dead local charm docs, since we don't need them for revnum checking anymore | 01:15 |
menn0 | wallyworld: yeah, I almost did that but left it out for now to save time | 01:16 |
menn0 | wallyworld: i'll do that tomorrow | 01:16 |
wallyworld | ok | 01:16 |
thumper | axw: do you remember if we create an ubuntu user for centos? | 01:29 |
thumper | axw: and whether centos has sudo? | 01:29 |
axw | thumper: cloud-init creates the ubuntu user | 01:30 |
axw | thumper: and yes it does | 01:30 |
thumper | axw: can lxd provider on ubuntu deploy a centos container? | 01:31 |
axw | menn0: LGTM | 01:31 |
axw | oh, I see wallyworld has some comments tho | 01:31 |
menn0 | axw: thank you | 01:31 |
wallyworld | a few yeah | 01:31 |
axw | thumper: not at the moment. there's probably not *too* much to do to get it to work | 01:32 |
thumper | hmm... ok | 01:33 |
axw | thumper: I did try using manual with the centos LXD image at one point, IIRC it didn't work because they didn't have sshd running by default? | 01:33 |
axw | thumper: or something else was missing, I forget | 01:33 |
* thumper nods | 01:33 | |
menn0 | thumper: lxc image list lcorg: | grep -i centos | 01:37 |
menn0 | thumper, axw: if sshd isn't running you can always lxc exec in and sort that out | 01:37 |
* redir is eod | 01:55 | |
wallyworld | menn0: if there are existing tests for removing dead charm docs and checking new rev numbers, how did they pass without sequence support? | 01:58 |
menn0 | wallyworld: previously charm revs were calculated by keeping all charm docs around, including dead ones | 01:59 |
axw | menn0: yeah, it was just that it wasn't working OOTB. I think there may have been other things too | 01:59 |
wallyworld | also, with the addition of "-1" to the charm urls in MakeCharm(), was that an across the board change or just for those metrics ones? | 01:59 |
axw | too long ago, can't remember now | 01:59 |
menn0 | axw: yeah, the images don't even have sshd installed, they are very bare bones | 02:00 |
menn0 | wallyworld: it should never have been valid to add charms with no revision, but State.AddCharm was allowing it and those (buggy) tests were taking advantage of it | 02:00 |
menn0 | wallyworld: with my changes that's no longer possible, which caused the tests to fail | 02:01 |
wallyworld | so in apiserver/uniter/uniter_test | 02:04 |
wallyworld | s.meteredCharm = s.Factory.MakeCharm(c, &jujuFactory.CharmParams{ | 02:04 |
wallyworld | Name: "metered", | 02:04 |
wallyworld | URL: "cs:quantal/metered", | 02:04 |
wallyworld | isn't that a case where the revision is missing? | 02:04 |
wallyworld | my other comment about testing the url rev number incrementing - unless we remove dead charm docs manually in the unit tests, they will still be around right? which means we are not 100% sure we are testing that the seqence behaviour is hooked in | 02:06 |
menn0 | wallyworld: ah ha, the prep code which requires a revision is specified only applies to local charms and that's a store charm | 02:08 |
menn0 | wallyworld: that's why the test still passed | 02:08 |
wallyworld | doh! missed that | 02:08 |
menn0 | wallyworld: it's technically not correct though right? | 02:08 |
wallyworld | probably not, but outside the scope of this change | 02:08 |
menn0 | wallyworld: I'll do the removal of charm docs tomorrow (and will make it part of that PR) | 02:09 |
wallyworld | ok, ta. do you agree with my point about thAT? | 02:09 |
* wallyworld hates caps lock being next to the A key | 02:09 | |
menn0 | wallyworld: I see your point (although you can obviously see that the code which queries the collection to find the next rev number has been replaced) | 02:10 |
thumper | oh FFS | 02:10 |
thumper | poop | 02:11 |
wallyworld | i can. but...... uness we remove the old artifacts that made it possbile for the old behaviour to work, we can't be sure | 02:11 |
wallyworld | i relaise it's pedantry | 02:11 |
wallyworld | TDD, write failing test, fix code, make tests pass | 02:11 |
wallyworld | there's no failing tests here | 02:12 |
wallyworld | since the dead docs still exist | 02:12 |
wallyworld | well, i guess if the code is removed they would have fsiled | 02:12 |
redir | menn0: that dutch skit was great thanks | 02:17 |
menn0 | redir: my wife showed it to me last night. i had tears rolling down my face | 02:19 |
thumper | well, fuck me, that worked | 02:22 |
thumper | I'm mildly surprised... | 02:22 |
babbageclunk | axw: azure pagination - if I don't pass anything for the top-n parameter to a list, will it return all the results at once, or do I still need to handle next-result getting? | 02:36 |
axw | babbageclunk: I believe you have to handle next result still | 02:36 |
axw | babbageclunk: (we're not doing that anywhere yet, I know) | 02:36 |
babbageclunk | axw: yeah, thought that might be the case - thanks | 02:36 |
axw | babbageclunk: limits on VMs, vnets, etc. preclude there being too many of a single resource AFAIK, but when you're listing all resources it might be needed | 02:37 |
babbageclunk | axw: ok - it's not too fiddly, just thought I'd make sure. | 02:38 |
babbageclunk | axw: gah - I can't work out what resourceProviderNamespace and parentResourcePath should be in the CreateOrUpdate call. The docs only tell me helpful things like "The namespace of the resource provider." | 02:49 |
babbageclunk | axw: sorry to keep bugging | 02:50 |
axw | babbageclunk: no worries | 02:50 |
axw | babbageclunk: un moment | 02:50 |
axw | babbageclunk: the namespace is something like "Microsoft.Compute", and parent resource path is the URL path proceeding the namespace, preceding the resource type. e.g. for security rules, the parent resource path will be something like "networkSecurityGroups/{network-security-group-name}" (I think) | 02:54 |
axw | babbageclunk: compare https://docs.microsoft.com/en-us/rest/api/resources/resources to the path template in https://msdn.microsoft.com/en-us/library/azure/mt163645.aspx | 02:54 |
axw | babbageclunk: Microsoft.Network is the namespace, "securityRules" is the resource type, {security-rule-name} is the resource name | 02:55 |
axw | babbageclunk: everything between the namespace and the resource type is the parent resource path | 02:55 |
babbageclunk | axw: thanks, that's awesome! There turns out to be a lot of azure to understand. | 02:56 |
menn0 | thumper: are we using the tech-board HO? | 02:59 |
babbageclunk | axw: can you take a look at this? https://github.com/juju/juju/pull/6868 | 04:58 |
axw | babbageclunk: okey dokey | 04:59 |
babbageclunk | axw: WIP at the moment - I'm doing tests now. | 04:59 |
axw | babbageclunk: just eating atm, will look soon | 04:59 |
babbageclunk | axw: thanks, no rush! | 04:59 |
axw | babbageclunk: you should be able to get the azure creds that CI uses | 05:19 |
axw | babbageclunk: and test with them | 05:19 |
axw | babbageclunk: just looking for them, will pm | 05:21 |
axw | wallyworld: can you please review https://github.com/juju/juju/pull/6869 when you're free? | 05:35 |
wallyworld | sure, give me 5 | 05:35 |
wallyworld | axw: looks good, a nice refactoring | 06:01 |
axw | wallyworld: thanks | 06:01 |
axw | wallyworld: got time for a HO? I need to brainstorm on a problem I've encountered with LXD creds | 07:20 |
wallyworld | axw: otp with uros, give me 10? | 07:21 |
axw | wallyworld: no rush, let me know when | 07:21 |
wallyworld | axw: free now | 07:42 |
axw | wallyworld: I think I figured out what I need to do. I'll ping you later if I'm wrong :) | 07:43 |
wallyworld | axw: ok, no worries. i may be afk for dinner or whatever but i'll be around | 07:47 |
axw | wallyworld: np. it can wait if not | 07:47 |
axw | wallyworld: enjoy your long weekend | 07:47 |
wallyworld | will do :-) | 07:47 |
wallyworld | i still need to land a pr first | 07:48 |
=== frankban|afk is now known as frankban | ||
=== rvba` is now known as rvba | ||
perrito666 | Morning | 09:33 |
jam | morning perrito666 | 12:09 |
jam | anyone want to review my attempt to make "juju ssh" better when the remote machine has multiple addresses? | 12:09 |
jam | https://github.com/juju/juju/pull/6857 | 12:09 |
perrito666 | Hey I'll be unresponsive for the next couple of hours my wife is at the hospital with a stomach issue mail me If you need me | 12:18 |
jam | perrito666: I hope everything is ok. hope she feels better soon | 12:21 |
jam | axw: babbageclunk: so I noticed both of you guys are making changes to how we're doing clouds between 2.1 and 2.2. I'd like to make sure your changes are resolved relative to each other. | 13:37 |
jam | specifically babbageclunk worked on playing with the names, while it looks like axw got rid of the clouds.BuiltinClouds map | 13:37 |
jam | I'm trying to make sure other 2.1 landed features get forward ported to 2.2 | 13:37 |
jam | can I get some hints as to how this should be resolved? | 13:37 |
perrito666 | Another bless what a week | 14:46 |
rick_h | lol congrats jujuteam | 14:46 |
jam | balloons: ping | 14:56 |
balloons | hey jam | 14:59 |
jam | hi balloons. I just sent you an email. Is it possible for us to carve out some time to go over CI and 2.1-dynamic-briges? | 14:59 |
balloons | jam, sure | 14:59 |
jam | I need to go spend time with my family before they disown me, but I can come back in a few hours. | 14:59 |
balloons | jam, see to that and we'll talk later | 15:00 |
jam | balloons: I sent a tentative invite | 15:01 |
jam | have a good one | 15:01 |
frobware | jam: AFAICT, trusty seems ok for the testing I have done. | 15:37 |
redir | morning | 16:24 |
frobware | jam: https://github.com/juju/juju/pull/6870 <<- bond-reconfigure-delay | 16:39 |
mbruzek | cmars: Is there a way to remove a term from a charm? One of the IBM charms still has lorem-ipsum | 17:18 |
cmars | mbruzek, push a new version of the charm to the charmstore without that term in the metadata | 17:18 |
=== natefinch is now known as natefinch-afk | ||
jam | frobware: I reviewed it, I feel like it would be better to be having the config of 'bond-reconfigure-delay' but have the API down to the script itself just think in terms of "reconfigure-delay" | 18:33 |
jam | so the API server notices this is a bond, and sends a different value | 18:33 |
jam | rather than having the "is this a bond" in the script | 18:33 |
jam | is that too invasive? | 18:33 |
frobware | jam: yep, we could do that. i think... | 18:34 |
frobware | jam: can we discuss this in the morning? | 18:35 |
jam | frobware: certainly | 18:35 |
frobware | jam: it would be ideal to shore up these changes _this_ _week_ | 18:35 |
* frobware sighs. | 18:36 | |
jam | yep | 18:36 |
jam | frobware: I think its ready to land with that change, as I'm happy with the rest | 18:36 |
frobware | jam: there was one additional test I thought of in config.go; we test for a custom value, but I don't think for the default value. | 18:37 |
frobware | jam: when you say "just think in terms of "reconfigure-delay"" - is that just naming or a deeper semantic change? | 18:37 |
jam | frobware: the script should not have the "is this a bond", but the api server should be making that decision and returning a value for the bridges it is creating | 18:40 |
frobware | jam: ok, so if you pass reconfigure-delay=<some-value-gt-0> it will do the sleep regardless of type. yes? | 18:40 |
frobware | jam: we're just talking about the origin of truth, correct? | 18:41 |
jam | frobware: if the script gets that value yes | 18:41 |
frobware | jam: makes sense. | 18:41 |
jam | the thing deciding how long to sleep is the API server | 18:41 |
frobware | jam: and it should know the device type/kind/category. | 18:41 |
jam | right | 18:41 |
frobware | jam: caveat emptor: if you run the script manually (as I do a lot) don't forget to pass a value. Just sayin... | 18:42 |
jam | frobware: have a default value of 1-3 ? | 18:43 |
jam | or a default value of 10-20 ? | 18:43 |
frobware | jam: 3 was a winner for so long. I vote for 4.2. | 18:43 |
jam | frobware: then if you forget it still fails saif | 18:43 |
jam | :) | 18:43 |
frobware | jam: so we're saying cli arguments win, but if not supplied we'll default to 4.2 (or 3). correct? | 18:44 |
jam | frobware: yeah, you had a default of 30 in the script already if I read it correctly | 18:44 |
frobware | jam: ah, crap. Hmm. | 18:45 |
jam | frobware: add_argument(..., default=30) I thought I read | 18:45 |
frobware | jam: this from my hacking.... probably.... | 18:45 |
frobware | jam: I had meant to leave it to whatever it was previously knowing full well that it would be driven from model-config. but perhaps 30 is nice and safe if you run it manually. | 18:46 |
jam | frobware: seems a little too safe | 18:47 |
jam | frobware: especially if it is the one that SSH is talking on and you wonder if everything is dead | 18:47 |
frobware | jam: a interface is better up than down. :) | 18:47 |
frobware | jam: so the only time you'll get 30 is if you specify 0 from juju for bond-reconfigure-delay. | 18:48 |
jam | frobware: we should just always pass the aggregate value | 18:48 |
jam | even if it is 0 so it overrides the default | 18:48 |
* frobware needs to go and look at this to make sure. But tomorrow now... | 18:48 | |
frobware | jam: I think this comes back to: who drives the value and why do we need a default. if juju passes 0, you get 0. if it passes 1234, you get ... and so on. | 18:49 |
frobware | jam: should the script carry a default? | 18:49 |
jam | frobware: the reason for a default is if you run the script by hand (IMO) | 18:50 |
frobware | jam: right. but I would counter, it has no documentation, is wrapped up by juju so in the true spirit of *nix you have rope to hang yourself with. | 18:51 |
jam | sounds like you're just trying to be spiteful now :) | 18:52 |
frobware | jam: you might be interested in: https://github.com/frobware/juju/tree/go-debinterfaces | 18:52 |
frobware | jam: I started a rewrite of the script in Go. The parsing side is done, there's 100% test coverage. I didn't do the bridging side as... | 18:53 |
frobware | jam: code is in juju/network/debinterfaces/... | 18:54 |
=== redir is now known as redir_exercise | ||
=== redir_exercise is now known as redir | ||
=== tvansteenburgh1 is now known as tvansteenburgh | ||
=== menn0_ is now known as menn0 | ||
=== frankban is now known as frankban|afk | ||
alexisb_ | thumper, ping | 23:50 |
Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!