[00:01] --config="features=[legacy-leases]" [00:01] wallyworld: ^ [00:01] uh-oh, why? [00:01] found a bug? [00:01] babbageclunk: in k8s bootstrap testing, the initialisation is getting lease errors and we want to rule out a raft issue in the docker image [00:01] cool [00:08] wallyworld: got 5 minutes? [00:09] thumper: sure, just talking to kelvin, give me a minute [00:13] wallyworld: I'll jump into our 1:1, come when you're ready [01:08] wallyworld: what size did you suggest to use as a minimum for downloading the agent binaries again? [01:08] i think they are approx 80MB but we should allow for unpacking temp space etc [01:09] so say 250MB to be safe? [01:10] heh, that's the same as in upgrades/preupgradesteps.go [01:11] It's checking after the binaries are downloaded/unpacked and the agent has restart, though, so not quite the same. [01:12] *restarted [01:19] kelvinliu_: no rush, would love a review on this today sometime https://github.com/juju/juju/pull/9174 [01:21] wallyworld, it's a big one! looking now. [01:22] kelvinliu_: yeah - a lot of it is shifting code from one facade to another [01:22] no hurry [01:22] wallyworld, yup [01:27] wallyworld, wondering if it's easy to change the APIAddresses to [juju-controller-service-internal-endpoint] [01:28] kelvinliu_: i don't quite understand? can you explain? [01:29] for example, juju-controller[.juju-namespace]:17070 [01:30] wallyworld, not for this PR, just thinking it for juju k8s version [01:31] i'm sure we can discuss that [01:31] * wallyworld goes to buy coffee, bbiab [01:38] oh man, search command history I should be more careful. delete and describe namespace both start with de but do very different things [01:40] - -!! [02:22] veebers: only just saw that comment. i shouldn't laugh but i can't help it :-D [02:25] hah, it's ok it was an *almost* mistake, I caught it in time [02:25] wallyworld: we're still expecting caas charms to set Active after setting pod spec right? [02:27] i think so for now; but they only really need to if they have set "maintenance". if the unit status remains "waiting for container" we will override that [02:28] wallyworld: ack, sweet that matches my expectation. I did a run through but using the not updated charm so didn't work quite as expected; watch this space, just waiting for the bootstrap to complete [02:28] yeah, demo charms will need updating [02:37] wallyworld: re: this comment https://github.com/juju/juju/pull/9081/files/#r215465303, the tests added to state/caasmodel_test.go should cover the caas model and tests in state/status_model_test.go should cover IAAS models [02:37] actually state/status_model_test.go might not show up on that diff as I've modifed my changes to it etc. [02:46] ok, so long as we have coverage. i can check the final PR [02:46] kelvinliu_: you ask a good question - i have answered in the PR, but basically I am allowing for future if we want to only use a single config map for all apllications [02:47] does that make sense? [02:47] wallyworld, yeah, it makes sense. LGTM, thanks [02:47] tyvm [03:16] anastasiamac: tough question, have we ever seen the 'local charm archive' test fail in merge, or just the check-merge jobs? [03:28] wallyworld: FYI: https://pastebin.canonical.com/p/vFjmPhPrJc/ [03:29] Note the "Instantiting pod." is poor messaging from me [03:29] veebers: also though, tyhe active status for workload is premature [03:29] since the pod has still not come up [03:30] i think at that stage the container status is "wating" ? [03:30] wallyworld: didn't we discuss having the charm set active after it sets pod spec? [03:30] yes [03:30] but [03:30] (I'm with you that it seems dishonest to have it set active then)( [03:31] we said that would be overridden from container status if the container is not running [03:31] ie if container status is blocked, that take precendence [03:32] wallyworld: I'm pretty sure the container was never in the blocked stage (or that we polled at least), it was a happy deploy so we set pod spec and the pod came up real quick [03:32] the container status message would be the reason why iot is blocked [03:33] I'm happy to confirm this though [03:33] it had to be blocked because the pvc failed [03:33] the k8s pod status would have been Unschedulable [03:33] ah right, yeah that did happen, then after the trust addition the pod came up happy [03:33] yup [03:34] so until trust is run, the unit needs to show blocked [03:34] which it gets from container status [03:35] it did show that, but it went from active -> blocked (as the charm did 'set spec', set active) so we saw 'active -> then blocked once k8s realised the pod was unschedable [03:38] the container stataus would not have been "running" yet though? [03:39] i think unless the container status is running, we should not show the unit status as active [03:41] veebers: let me check... [03:42] veebers: the one babbageclunk linked yesterday was in merge not hte check... [03:42] why? [03:42] wallyworld: (unless I have the whole charm should set to active when doing the podspec) then there will be a gap, charm sets pod spec then sets active, k8s/pod will spin up then find out that it's blocked but by then juju has seen the 'active' status from the charm [03:42] (it's too late, it's seen everything) [03:43] anastasiamac: ah ok, I'm thinking that the way the pr check job is there is a bit of space for interruption from other jobs or jenkins. Really wanting to get my reshuffle of that stuff sorted and proposed [03:43] yeah right. we are stuck because the charm cannot see the workload [03:43] veebers: and fwiw, m not seeing it at all locally... more thn 24hr running under stress... [03:43] veebers: but at that stage, there will no no container status right? [03:43] anastasiamac: aye, this is why I thnk it's how it's run in jenkins [03:44] so we could also say if container status is not found, don't go to active [03:44] wallyworld: right becuase the mechanisms that does that is only just kicked off [03:44] treat it as container is still allocating [03:44] wallyworld: ok, will look at adding that into it too :-) [03:44] ty [03:44] so close [03:44] veebers: i think it's test data setup time when run in parallel... i think just waiting for charm to be setup will solve the issue... [03:45] wallyworld: feels like we're shoring it up with toothpicks and ducttape though ^_^ [03:45] anastasiamac: hopefully so [03:45] a bit but there's not much else we can do yet [03:45] aye [04:36] veebers, wallyworld, thumper: who knows about statfs? [04:36] babbageclunk: what is statfs? [04:36] or anastasiamac, kelvinliu_ [04:36] (I guess that answers for me . . .) [04:36] what's the question? [04:36] It's a syscall to get filesystem stats [04:36] ah /me googles [04:37] the same as veebers 'wot's statfs'? [04:37] babbageclunk: why do u need it? what's is wrong? [04:37] I'm trying to test my check-space change, but having a bit of a mare. [04:37] try stallion? [04:38] but srsly.... what kind of nightmare? [04:38] wallyworld: I'm trying to understand the difference between the Bfree and Bavail fields (and how they interact with fallocate, which is how I'm filling up the disk) [04:39] hmmm, shrug, sorry :-( [04:39] you can always test with a number > than your current free disk space [04:39] just to see [04:39] without having to fill the disk [04:40] Yeah, I found that in your tests. :) [04:40] *my* tests? [04:40] I'm doing a manual test. [04:40] what a clver hack that was :-) [04:40] not very sure what's the difference.. sry [04:40] right, so a manual test, just compile a jujud with a bogus free space requirment [04:41] wallyworld: https://github.com/juju/juju/commit/47b2bf184636b31076de11979429304ecd8a78a9 [04:42] wallyworld: well, the other way is to really fill up the disk using fallocate, which is what I'm doing (on someone else's computer) [04:42] babbageclunk: that was 2015!! another era... [04:42] Ian had long hair [04:42] it is, but if it's too hard to do, is there a real benefit? [04:42] s/long hair/hair [04:43] * wallyworld sobs quietly [04:43] It's not that it's hard to do, it's that my code doesn't seem to be working... [04:43] thanks for mentioning it :-( [04:43] :) [04:44] so you can't just code it to require 100000000000000000000000000000GB of free space and watch it fail? [04:44] And the code I cribbed from you (in that commit) uses Free, which from my testing still returns a big number even though df -h says I only have 68M available. [04:44] i like the idea of asking for unrealistic number.. this could be codified... [04:44] I can, but I'd rather test the real code or understand why it doesn't work [04:45] Yes, I'm doing that in tests, but without really testing it it's not obvious that the code is actually right. [04:45] babbageclunk: ur very dedicated if u want to fill up someone's else machine just to test space check...(or u have exceptional friends!!) [04:45] babbageclunk: why what doesn't work? if the bogus number fails in "prod" with a special jujud, that seems ok right? [04:45] I mean, it's Jeff Bezos' machine, we're not super close. [04:46] I have a machine with only 68M available space. Upgrading on it doesn't fail, I'd like to understand why. [04:48] ah i see, so it does fail in prod [04:48] yeah that seems like a bug [04:48] I'm worried that setting the number to a huge one (like I do in the tests) will work, even though it would still not prevent downloading the agent if it was the real value. [04:48] can you debug it on your machinr with only 68M [04:48] yes [04:49] That's what I'm doing [04:50] I'm trying to find out if anyone understands what the difference is between Bfree and Bavail, which seems to be the problem [04:51] NFI sorry [04:54] babbageclunk: the best i could find - https://community.hpe.com/t5/System-Administration/vxfs-jfs-bavail-vs-bfree/td-p/3786809 [04:54] babbageclunk: diff seems to b user based?... [04:55] babbageclunk: "bavail normally means blocks available to a non-superuser " [04:56] yeah. That's what I've found in the docs too. [04:56] https://linux.die.net/man/2/statfs [04:57] babbageclunk: yes, was reading this one too [04:57] ok, that gives me an idea [04:58] \o/ [04:58] on a side note, isn't it funny that the manual was my last reference? i went to forums first :) [04:59] The manual is not very useful in this case - I've been trying to find more info for ages. [05:01] wallyworld: can you think of a better place than Unit SetStatus to do the 'if no container status and active ignore it' check? doing it there feels a bit off [05:03] yeah, we normally don't want to mess with the raw data model [05:03] it should be done in the apiserver layer [05:04] (or ideally a separate business logic layer we don't have yet) [05:04] ack, thanks [05:04] but unless we storage the raw data, thing will still mess up [05:04] so that's not really the answer either - it really is a prsentation issue [05:05] we need to store the raw unit and container status as set [05:05] and transform when we hand off to status or when storing history [05:05] hence we talked yesterday about that helper method [05:06] so in the Done() of the unit ops [05:06] it would use the same helper as FullStatus() does [05:07] wallyworld: aye, that's in place at the moment, but I have the charm setting active and that's done outside of the caas provisioner updateUnit ops bits [05:07] that's fine - we need to store the raw info [05:07] If I remove that part of the charm it should work, I think, but that does mean that if anyone sets active in their charm it'll be displayed wrong [05:08] why? we transform in FullStatus() [05:08] we need to store the raw data as set by the various actors and transform as necessary to present [05:09] unit SetStatus() does call out to update history so we'll need to use the helper method there too [05:09] but only for caas models [05:09] wallyworld: Unit.SetStatus will update history, we're not doing anything there [05:09] hah right, being a bit slow at typing [05:09] :-) [05:11] wallyworld: so we need to tweak our helper function logic even more, if charm sets active then sets podspec, unit status is no longer active w/ default message. So unless the pod encounters an error it won't overwrite the unit status. [05:13] if the pod spec is updated, the deployment controller will do a rolling update; we should get events for that and can update the data model accordingly [05:13] we can do that in another PR [14:20] manadart: structs vs closures - do you mean pass back a struct or pass an argument as a struct [14:21] re: comment from #9163 [14:21] Pass on struct instead of the 3 mock pointers. [14:21] s/on/one [14:21] fair [16:20] how does one know what zones are available to the juju client? for instance, 'juju bootstrap --to zone=eu-west-2 aws' doesn't work from north america [16:21] pmatulis, does juju clouds list zones [16:21] `juju clouds` [16:21] pmatulis, well I guess it lists the default zone [16:22] `juju regions` maybe [16:23] zone == region correct [16:23] or is zone is generalization of region? [16:23] externalreality, i originally tried 'juju show-cloud aws' [16:24] pmatulis, juju regions aws [16:24] yeah, that gives the same as show-cloud except with less detail [16:26] it would be very useful if the client could somehow know what zones are available [16:26] seems it should be able to query AWS [16:30] funny. i can't get *any* zones to work. even the one that gets used by default successfully [16:31] pmatulis, is what `juju regions` lists not compatible with the "zone" placement directive? [16:32] pmatulis: honestly we don't want folks knowing/dealing with zones [16:32] pmatulis: Juju automatically attempts to spread units across zones to make worklaods resilient to outages [16:32] rick_h_, well we shouldn't say it works then [16:32] pmatulis: and having users custom-load them is a sign of a bad install or doing too heavy custom/snowflake stuff that's not portable across clouds [16:33] 'cause it fails fast and hard [16:33] pmatulis: it can/does work but we don't go out of the way to put it in normal user commands like clouds/etc [16:33] externalreality: zone != region [16:33] externalreality: each region has several zones typically [16:33] externalreality: so when you deploy to us-east1 you can get units in various zones in that region [16:33] externalreality: think of zones as racks in a room, for instance [16:37] rick_h_, ack, I see, I just found this which was written by Andrew Wilkins https://awilkins.id.au/post/blogger/availability-zones-in-juju/ [16:38] externalreality: cool, yea been a while. [16:38] rick_h_, pmatulis - I am with pmatulis in that how do you use the directive "zone" if you are unaware of any acceptable parameters [16:40] I guess juju does want to make it easy for users to do the wrong thing - Ack - just make it possible. [16:40] externalreality: understand, and maybe we can document something but if you're going to use it you need to understand the cloud you're on, the region you're in (different regions ahve different zones) and what it means to manually place [16:40] externalreality: exactly [16:40] so I'm definitely -1 on making it "easy" [16:42] rick_h_, what's the point in showing the regions? in 2 commands even. is it for something else? [16:43] pmatulis: ? every time you create a model you have to know what region it's in. It could be in the US, in EU, in APAC. The region is very important to performance and geo-spreading workloads across the world. [16:43] pmatulis: I'm not sure what you mean in 2 commands? [16:43] commands 'regions' and 'show-cloud' [16:43] * pmatulis didn't even know about the 'regions' command [16:44] pmatulis: ah, ok. Well the regions available are part of cloud details but folks thought that list-regions made since once you bootstrapped since you can only add-model to regions of the same cloud [16:44] pmatulis: so basically show-cloud is pre-bootstrap useful and list-regions is post-bootstrap (add-model) useful [16:46] but just listing regions doesn't tell you what you're model is in. the show-model command does though [16:46] pmatulis: right, it's meant to help you pick the right region names available for add-model [16:47] pmatulis: once the model is added you're right, what region the model is in is about the model [16:47] pmatulis: so list-regions is more about what you can do given the controller you're on [16:47] pmatulis: if you switch controllers from aws, gce, openstack and run list-regions you'd get different answers [16:55] rick_h_, ok [19:47] small pr for review: https://github.com/juju/juju/pull/9178 [19:47] fixes to errors.Annotatef() with go test (using go 1.11) [19:48] build failures [19:48] crackers, must be friday afternoon - need to check that the tests still pass. :-) [19:53] hml: ^ heads up [19:53] sorry, never mind [19:54] lol, somehow I read that hatch was submitting the pr [19:54] rick_h_: happy friday afternoon [19:54] :-) [19:54] * rick_h_ tries to halt checkout procedures