[00:02] vino_: a few small things, let me know if you have questions [00:03] veebers: not sure tbh, charm push may no know to expand the "~" [00:06] wallyworld: ok, think I've found the problem, just checking now. [00:07] yay [00:07] wallyworld: sure [00:12] wallyworld: that seemed to work. Trying it again with a 3-machine controller [00:12] wallyworld: /tmp/blah didn't work, ~/tmp/blah did work (both are actual files). It's ok, not blocking me (suspect it's a snap/containment thing) [00:16] veebers: ah, i had it the wrong way around in my mind. that makes sense for a snap yeah === `Calvin is now known as Calvin` [00:28] wallyworld: yay, that worked too, checking one more scenario (2 machines in the controller, so one non-voter) [00:29] babbageclunk: i'll be interested to see what fix was [00:31] It was in the upgrade step - I didn't defer-close the log store, so the raft worker just hung on startup. In the past that had always been caused by the transport not getting an address, so I was looking in the wrong place, but the logging I added showed it wasn't the problem. [00:31] wallyworld: ^ [00:32] ah, cool [00:32] kelvin: i left a couple of small comments [00:34] wallyworld, thanks for reviewing, i just had a push to resolve ur comments. [00:34] ok [00:35] kelvin: looks good to land! on to next step [00:36] wallyworld, thanks, merging now [00:38] wallyworld: actually, my idea of testing a 2-machine controller relies on stuff in 2.4, so it's not necessary. Tidying up and pushing a PR now and then I'll do the unit tests. [00:39] sgtm [00:39] (Oh, and I'll check bootstrapping 2.4 directly is still fine.) [01:03] wallyworld: i am using IsolationSuite from github.com/juju/testing instead of BaseSuite in juju/juju/testing. [01:03] that sounds fine [01:03] it introduced few issues in reading the EchoArgs. Trying to fix it. [01:03] rest all comments make sense. [01:03] you could just leave the base suite out [01:04] i don't think any of the other charm tests use it [01:04] yes correct. And that will fix the CI issue i was facing. [01:04] so let's drop it then [01:04] Yes. charmDir_test uses IsolationSuite [01:04] its req for the EnvPatching [01:04] which is a good substitute for BaseSuite [01:06] Just hold on - using IsolationSuite has introduced a path error. Which i need to identify. [01:07] swapping to BaseSuite works well. SO there is something diff between them. I will fix that and push the commit. [01:40] wallyworld: https://github.com/juju/juju/pull/8850 [01:40] ok [01:40] Sorry, laptop died at a super-inopportune moment, but all better now [01:48] babbageclunk: looks ok i think. for 2.5 i think we could consider moving the common raft business logic out of the worker and into a core/raft package that the upgrader also uses [01:49] wallyworld: yeah, sounds good [01:52] vino_: using IsolationSuite should be all you need as that provides the PatchEnvironment functionality. Or you could just use CleanupSuite [01:54] wallyworld: do you think the ReplicaSetMembers function in upgrades.go needs a test? [01:54] wallyworld: yes. IsoaltionSuite gave me trouble. I found out this CleanupSuite which is good. [01:54] no Pathherror. [01:55] babbageclunk: not for this PR [01:55] iam pushing the commit now. [01:55] cool [02:25] wallyworld: i haven't addressed one review comment - testing.ReadArgs() [02:25] happy to discuss. [02:25] ok, i'll wait for that [02:26] did you ned help with it? [02:31] not really for that. [02:31] I am seeing a weird CI issue. for this NOVCS case. [02:31] i dont see this ssue locally. [02:32] It shd be nil returned from function. [02:32] i am trying to look into the CI issue here. [02:33] vino_: it's jus a typo [02:33] c.Assert(err, gc.NotNil) [02:34] should be c.Assert(err, jc, ErroIsNil) [02:35] vino_: actually, all the other gc.IsNil for error checking added to the pr should also be c.Assert(err, jc.ErrorIsNil) [02:37] ok. thank u. I was confused that i cudntnt see that issue Locally in my machine. [02:37] thanks wallyworld. [02:37] i'm sureprised by that - it should have failed locally [02:37] i can show u. [02:37] the logic error was checking for a not nil error when it would have been nil [02:38] i do see here all 185 OK [02:38] right but the issue is that the error should not have beenj returned as not nil [02:39] the new function should just return nil if no vcs dir exists [02:39] Yes. I am checking that if not nil then assert [02:39] i am explicitly returning NIL if NOVCS [02:39] The new function does exactly the same as u r telling. [02:39] right but the PR was checkinf that the error was not nil [02:40] line 300 was c.Assert(err, gc.NotNil) [02:40] this says that we expect a non-nil error [02:40] but we want the error to be nil [02:41] if that code passes when you run it locally i'm confused as to how [02:41] the new function does the correct thing. But i am check here is wrong. [02:41] is that what u mean - correct ? [02:41] yes, if the PR says c.Assert(err, gc.NotNil) then that's wrong [02:42] and hence the CI failure [02:44] I get what u r saying. [02:44] but i you telling all has to be changed. How come it is passing for me locally here.! [02:46] that's a different issue [02:46] what needs to be changed is the syntax for error checking. gc.IsNil is not how we do it as it sometimes can give wrong results if there's a pointer [02:47] c.Assert(err, jc.ErrorIsNil) is what we use [02:47] i didn't notice the use of c.Assert(err, gc.IsNil) before [02:48] so there's 2 issues: 1. fix the incorrect assetion to make test pass, 2. replace gc.IsNil to make code more correct [03:00] ok sure wallyworld. [03:01] thumper: 1:1 ? [03:14] wallyworld: It seems (in the code) that one can provide a 'revision' as a resource arg, but I don't see mention of that in the help etc. Does that ring a bell at all? [03:21] vino_: tim is away sick today [03:21] veebers: resources have revisions - each time one is updated the revsion increments [03:22] there's a published tuple of charn rev,resource rev [03:22] that's what is used by default [03:22] wallyworld: ack, understand that; it's the "juju deploy --resource " that I was confirming, i.e. use x revision of the resource. [03:22] right [03:22] there doesn't seem to be any docs around that (at least not in deploy help) [03:22] if you wanted to use a rev that's not published [03:23] could be the case, haven't read the docs [03:23] ack, no worries. Work needed on the helps docs there :-) [03:26] ok. Thanks wallyworld [03:56] babbageclunk: how goes the unit test writing? [03:56] almost done - just need to fix logging [05:25] wallyworld: with the upcoming docker resource type, if the user supplies the resource as an arg to deploy, will we (somehow) allow them to supply secrets details too? [05:26] no, these will come from the charm store [05:27] the charm storage manages all that stuff, as it does with macaroons for private charms etc [05:27] sweet, ah right you are [05:38] wallyworld: can u take a look at the Pr [05:38] when u get time [05:38] ok [05:39] Hi everyone [05:40] vino: have you pushed chmages? [05:40] yes. [05:40] Because of this issue https://stackoverflow.com/questions/50970133/installed-kubernetes-on-ubuntu-and-i-see-a-lot-of-nodes-are-getting-created-in-m [05:40] I deleted the VMs manually [05:41] I didn't know that this would affect kubernetes deployment [05:41] arent u seeing the changes yet wallyworld [05:41] ? [05:41] Is there a way that I can remove kubernetes deployments completely manually since conjure-down or up won't work? [05:41] i can see some but not all? there's still var version and I can't see ReadEchoArgs [05:41] u wanted to discuss on the ReadArgs() call [05:42] When I run juju list-models --debug, I see that it still connects to a vm that has been already deleted [05:42] yes.what it reads after that is the empty string. [05:42] we can discuss on that. [05:43] once it is read it resets the ptr. [05:43] thats why we cant use the ReadArgs again. [05:45] adham: deleting the vms directly in k8s will leave orphaned entries in the juju model. you could try remove-machinbe --force [05:47] vino: ok. so intead of the expectedVersion slice in the PR, it's cleaner to do what lines 167-170 of EachAsArgs() does [05:48] to build up the string to compare withthe version file contesnts [05:48] how can I clean up at this point? @wallyworld: I had to do this if you saw my question in StackOverFlow, I have observed a huge flood of VMs once I deployed Kubernetes [05:48] and that was a clean installation [05:48] what do you want to clean? the ensire controller? just the model? [05:49] So I have MAAS installed on the same server [05:49] I do not want to affect MAAS [05:49] I just want to cleanup the kubernetes and then reinstall it again (but after I know why those 70+ vms got generated) [05:50] i've not used conjure-up to deploy k8s so i can't answer that question - the conjire up guys will be online in a few hours [05:51] Ok, I can wait for the conjure team, but in the meanwhile, have you ever seen a list of VMs like this in the stack over flow? [05:51] if you want to keep the existing juju controller running, i think you will need to remove-machine --force all the machines for which you have manually deleted the k8s nodes [05:51] I am referring to the link to my question there... [05:51] wallyworld: yes. that code looks a bit professional. [05:52] how do you mean Wallyworld? [05:52] can you send me a sample command? [05:52] after you have deleted from juju the orphaned machines, you should hopwfuly be able to then delete the model [05:52] juju remove-machine --force [05:52] how can I get the machine id? [05:52] juju status [05:53] will show all the k8s nodes [05:53] from there you can see which ones correspond to the ones manually deleted [05:53] it's all a bit abstract without being totally familair with your setup [05:54] is the juju controller running on a maas node? [05:54] thx wallyworld, i'm running juju status, it's taking too long [05:54] what version of juju? [05:55] if you wanted to blow everything away and start again, you could kill that one maas node and allocate it back to the maas pool, and manually elete the k8s cluster [05:56] you'd also need to reclaim any worker nodes [05:56] but i've not done what you're doing myself so can't give specific advice [05:58] adham: maybe you can get better help by asking in #conjure-up if it is a conjure-up issue [06:00] Sorry wallyworld, just received the messages now, the version is 2.3.8-xenial-amd64 [06:00] wallyworld push the commit. plz let me know. there was another minor in DEbug log comment. [06:01] this server actually has the master maas controller [06:01] also, the command for juju status seems to be still no response till now [06:02] I will cancel it and re-run it with debug [06:02] ahh, ok, it's connecting to API addresses (which has been deleted I'm assuming) [06:09] adham: juju status connects to the juju controller. i had thought you had only deleted k8s worker nodes? [06:10] I deleted all of the extra VMs [06:10] maybe I deleted that as well [06:10] poof [06:10] vino: looks good! [06:10] Wallyworld [06:11] is htere a way that I can reverse everything about this deployment [06:11] for the kubernetes? [06:11] ok. i just pushed a last comment u gave :) [06:11] i will land it now. [06:12] wallyworld: in JUJU side i will update the dependencies.tsv first and then push the commit [06:12] adham: so if the juju controller is realy gone, then you probably just need to blow away the k8s cluster itself. i assume the worker nodes for that were all on maas? if so, and the juju controller node has already been deleted, you could just decommission those nodes in maas? [06:13] vino: you update the juju deps after landing the charm.v6 change [06:14] vino: and we need to make a small juju change also - we should trim the version string to say 100 before writing to the charm doc [06:14] just to be defensive [06:15] how can I blow away the k8s cluster itself then? [06:16] sounds like you have already started to do that? if the k8s nodes are all in maas, and juju controller node has been decommissioned, then you could also decommission the k8s nodes as well [06:16] and return all those nodes to the maas pool [06:17] In maas, I no longer see any traces for the k8s [06:17] but not sure if i'm missing any area to look into [06:18] i *think* you may have managed to clean it all up then :=) [06:19] then why juju status keeps being frozen, what do I do about this? [06:19] if juju controller is gone and all k8s nodes are gone then you should be ok [06:19] you need to remove the controller reference [06:19] from your local client [06:19] juju controllers [06:19] and then juju unregister [06:20] how can I be sure that this controller won't be a reference to MAAS itself? [06:20] Because when I deployed k8s, i made it to use MAAS's cloud [06:21] maas is treated in juju as a cloud. you can see it via "juju clouds". the juju unregister command removes a controller entry from a local yaml file [06:21] ahh, I see conjure-canonical-kubern-a4d [06:21] juju unregister conjure-canonical-kubern-a4d [06:21] should work [06:21] it just removes a yaml file entry [06:21] maas will still be running [06:21] and can be used again with conjureup [06:22] $ juju unregister conjure-canonical-kubern-a4d [06:22] ERROR controller conjure-canonical-kubern-a4d not found [06:22] conjure-up-server-01-f34* [06:22] is * part of the name? [06:22] the * indicates that's the current controller juju is using [06:23] but the names above differ [06:23] the last one I mentioned with * is under Controller, the one before it is under model [06:23] what does juju controllers say? that's what you pass to unregister [06:23] Doing so will prevent you from accessing this controller until you register it again. [06:24] I am about to deregister it now [06:24] right, but here the controller machine has been shut down [06:24] you have removed the controller machine manually right? [06:24] yes [06:24] I guess so [06:24] how can I ensure that? [06:25] check that there's no machine in maas with the listed ip address [06:25] but since status hangs and times out, it's a good bet it's gone [06:25] no, there is no machine in maas [06:26] If that's the case, I really want to redeploy K8s again [06:26] but that flood of VMs doesn't make sense, and I don't know why they were created [06:26] Do you have any idea about those machines and why possibly that they would created like that in https://stackoverflow.com/questions/50970133/installed-kubernetes-on-ubuntu-and-i-see-a-lot-of-nodes-are-getting-created-in-m ? I would really appreciate any knowledge around htis [06:26] so unregister will remove that orphaned entry [06:26] from juju client [06:28] adham: that's a question best asked in #conjure-up [06:28] i don't have any insight off the top of my head [06:29] lots of folks use conjure-up so if there is a bug it would be goos to get it fixed [06:29] no, that's alright, I really appreciate your help here, it just made me progress, I was stuck for the last 4 days [06:30] thx Wallyworld [06:30] adham: sorry i couldn't help more with the conjure-up side. i'm more a juju person [06:31] adham: you should be able to get the help you need in that other channel. if not come back here and we can chase them up :-) [06:32] thx, will do! I'm in conjure-up channel [06:34] adham: they folkd there are mainly USA based so you may not catch them for a few hours yet [06:35] wallyworld: we can chat here regarding the version string for charm. [06:35] that's why [06:35] that's fine* [06:35] so where are you based? I'm in Australia [06:35] so u want to keep 100 buffer before writing to the charm manifest [06:37] adham: brisbane [06:38] vino: not manifest. as we read in the charm from the zip, before we write to charmDoc [06:38] lol [06:38] you too? [06:38] neighbours :D [06:39] vino: since we don't always control what goes into charm zip [06:39] the best we can do is be careful about what we accept [06:44] wallyworld: ok before we repackage and write to charmDoc [06:45] So what we wriet is charmDoc needs to be sensible. [06:46] * vino make coffee and be back [06:47] Wallyworld, I know that this is off the topic and beyond your knowledge [06:47] But I just wanted to double check with you if I have any luck [06:47] when you bring up k8s environment, do you actually see funny names vms? [06:48] Cuz I am conjuring up the k8s again and now I see casual-guinea, famous-jackal, etc.. Do you have similar VMs in your environment? [06:49] adham: those names are autogenerated. there's a "pet names" library that is sued. i can't recall the exact name [06:50] they take a random adjective and put with a random animal name [06:50] vino: correct [06:50] Hmm, and do you have similar names in your environment? [06:50] Is there away to have a proper naming convention? [06:51] I think this is 100% pure conjure-up since I'm using conjure-up [06:52] adham: that's actually a maas naming conversion i think. ie the hostnames. juju doesn't care about or use hostnames as such [06:52] juju mainly uses machine numbers 1, 2, 3 etc that it generates [06:53] there might be a maas option to control hostname generation, not sure off hand [06:54] ok that explains [06:54] and would deploying k8s would deploy 70+ vms? [06:55] I just conjured up with minimal installation no addons, and it's still "deploying", and yet I see 10 vms in maas? [06:55] I mean 10 new vms [06:56] 20 machines now [06:57] adham: there's kubernetes-core and CDK. which bundle did you choose? [06:58] CDK [06:58] Also, shall report the naming problem in Juju? I really recommend at least a friendly [06:58] i would expect to see several VMs for that as it is full HA so several redundant nodes for easyrsa etc [06:58] name [06:59] which naming problem? the machine name generation in maas? [06:59] I used to think that this is a virus or something [06:59] the maas node namig is a deliberate design decison to at least give the nodes unique english names [07:01] Yes, I understood that part, I meant to put a RFE in juju asking for when having juju to create a machine, is to provide an english machine name at the beginning [07:01] i.e controller_1 [07:01] lb_1 [07:01] etc...? [07:02] adham: juju let's ypu specify whatever controller name you want - conjure up is what's generatig the names you see [07:02] ahh, ok, then this RFE would go to conjure-up [07:03] quiet tricky when you have multple projects and many teams [07:03] for the controller name and model names [07:03] there may be a way to tell conjure-up, not sure [07:17] thx Wallyworld, i've redirected this to conjure-up team, I am waiting for them to be available [07:18] np, good luvk [08:03] stickupkid: Took a look at your patch. Looks good. I would make 2 changes. [08:03] 1) Put the mock that is currently in environs/mocks into environs/testing. [08:05] 2) Instead of making a GetServerEnvironmentCert, make the cert a property. We already call GetServer on instantiation to get data from Environment - just populate it there. [08:38] manadart: thanks for the comment [08:42] stickupkid: NP. [15:05] Need a review: https://github.com/juju/juju/pull/8855 [16:00] hml: thanks for that! Can you please file a bug around the OS_CACERT needing to be in OS config, part of add-cloud, etc. and I'd be curious your thoughts on implementation time for that. [16:01] rick_h_: there’s a bug already along these lines: https://bugs.launchpad.net/juju/+bug/1777897 [16:01] Bug #1777897: add-cloud fails when adding an Openstack with a self-signed certificate [16:01] hml: ah right thanks [16:01] rick_h_: will ponder time question [18:05] wallyworld: Since conjure-up is just doing a juju deploy under the hood, I would think that the runaway VM creation would fall into juju in some way. I've never seen this with conjure-up or juju before though. I was hoping that adham had some old version of juju or something. I assume the juju debug-log would be useful if it happens again. [18:18] knobby: definitely debug-log what's going on if there's some issue there [20:35] Morning o/ [21:30] veebers: can you jump into release call [21:31] wallyworld: omw [21:55] babbageclunk: now that release is out, you can haz review we mentioned last week? :-D pretty please with sugar on top https://github.com/juju/juju/pull/8839 [21:55] oh yes! looking now [21:55] Thanks for sending the email btw [21:56] babbageclunk: no worries, i also had to fix the lp upload thing etc - can't wait for that bug to be fixed [21:57] wallyworld: have we pushed on that bug yet? (I haven't >_>) [21:59] veebers: i *think* tim has, was going to check again this week [22:02] wallyworld: ack, cheers