/srv/irclogs.ubuntu.com/2017/11/16/#juju-dev.txt

hmlveebers: ping00:32
veebershml: pong o/ how's things?00:44
hmlveebers: good - getting cold00:44
hmlveebers: is there a god way to force a model migration failure? - i’m working on your bug for show-model00:45
veebershml: babbageclunk would have a good idea, I think attempting to migrate before units are idle would do it (i.e. do the commands really quick)00:46
hmlveebers: at least with the tests, i’m not sure any error status will be more helpful.  :-/  e.g. "machine sanity check failed, 2 errors found"00:46
veebershml: it's starting to get warmer here, although we've had really nice warm days then it descends into cold days00:47
hmlveebers: of course, he’s the one who knew how to fix this .00:47
hmlveebers: the fun days - hot, cold, hot00:47
veebersaye ^_^00:47
veebershml: I love that Christmas (and thus Christmas break) is over the middle of summer here :-)00:49
hmlveebers: that’s just plain weird00:49
veebers^_^00:49
hmlveebers: that said, the family heads south for christmas00:49
veebershml: hah, flying in arrow formation I hope :-)00:50
hmlveebers: ha00:50
veebersHmm, with lxd, now when publishing an image from an existing container it takes ages and create tmp files > 100GB. I think something is wrong :-\01:22
babbageclunkhml: sorry, was grabbing lunch - unfortunately the units not being ready won't fail in the right way now.01:22
hmlbabbageclunk: i’m thinking this is what you had in mind?  https://github.com/hmlanigan/juju/commit/06db34cb637594cbe6e80db4ecb1f22778b2988f01:31
hmlbabbageclunk: is there a good way to force a model migration abort?01:32
hmlnot having any luck at it01:32
babbageclunkhml: not that I know of - it generally indicates a failure of prechecking. You could simulate it with a wrench?01:36
hmlbabbageclunk: wrench?01:36
babbageclunkhml: we've got a mechanism for throwing errors if a wrench file is present...01:37
babbageclunkhml: for testing hard-to-reach scenarios...01:37
babbageclunkhml: hang on, finding it01:37
hmlbabbageclunk: haven’t run into that yet.01:37
hmlbabbageclunk: the wrench part :-)01:38
babbageclunkhml: oh, there's one in the migrationmaster already...01:38
babbageclunkhml: see the call to wrench.IsActive?01:39
hmlbabbageclunk: yes01:39
babbageclunkbasically you could throw an error from transferModel if wrench.IsActive("migrationmaster", "die-in-export")01:41
hmlbabbageclunk: okay - and wrench active would be a line in  /var/lib/juju/wrench/machine-agent?01:42
babbageclunkI think it would be wrench/migrationmaster - add die-in-export to that file01:44
hmlbabbageclunk: ah, ty01:44
babbageclunkhml: no worries01:46
hmlbabbageclunk: the change was what you had in mind?01:46
babbageclunkhml: yup01:46
* thumper sighs02:00
thumperwe have found two new buts that we really need to fix for 2.302:01
thumperheh02:01
thumperbuts02:01
thumpergugs02:01
thumperbugs02:01
thumperthose things02:01
thumperbabbageclunk: got 10 minutes?02:01
thumperbabbageclunk: I'd like to talk a few things through02:01
* thumper steps away02:05
babbageclunkthumper: sorry, looking again!02:15
babbageclunkthumper: grab me when you come back02:15
hmlbabbageclunk: veebers: at least the last error won’t be overwritten by the abort msg with model migration.  we’ll see how useful the errors are.  :-)02:21
axwwallyworld: looking at your PR shortly, I've just put up https://github.com/juju/juju/pull/8087 if you have time to look; moves state pool to worker/state02:23
wallyworldaxw: will do, just getting a quick bite after finishing interview02:24
axwsure, no great rush02:24
veebershml: excellent :-) It should be very helpful when migrations fail (especially in tests so we can categorise them)02:25
jamaxw: so we would still try a specific zone, but doesn't your patch mean we'll at least fall back to another one?02:30
jamI suppose if the issue is that pods will give you one when they otherwise would want to fallback to explicit hardware02:31
jambut apparently the underlying bug was actually that MAAS would ignore a 'tag' constraint which is a better way to target real hardware02:31
blahdeblahthumper: Did you get a note from the meeting about checking whether juju create-backup excludes previous backups?02:32
blahdeblahthumper: Also, has any thought been given to leaving the backups in the filesystem instead of mongodb?02:32
hmlwallyworld: babbageclunk: i have a quick review if you’re around: https://github.com/juju/juju/pull/808802:36
babbageclunkyup02:36
wallyworldok02:36
wallyworldhml: the migration message won't actually sat that the migration is aborted though will it? it will show some arbitary error text but the user won't know what the final outcome is. maybe it's a recoverable, temporary error, who knows02:41
hmlwallyworld: i left as info, since it was calling setInfoStatus…02:41
wallyworldyou mean the log message?02:42
hmlwallyworld: yes02:42
wallyworldwhat written to logs is a warning though since something failed02:42
wallyworldthe setInfoStatus() is just an api name02:42
axwjam: in the cases where it would *fail* because of going to the zone, then sure. for this bug, I was only thinking about the case where we can get a machine in either zone, but we really want the one in the default zone02:43
wallyworldis there a way to track the last migration error in the worker and when aborting, preprent the error with "aborted: " or something02:43
hmlwallyworld: in this case, goes to info02:43
hmlwallyworld: i’ll have to look02:43
wallyworldsetStatusInfo(0 goes to the model status right? not the logs02:43
hmlit goes to both02:43
wallyworldok, but the final logged message should be a warning - sys admins grep for warnings and knowing something has failed is important02:44
hmlokay, can be changed02:45
hmlthere doesn’t appear to be a last failure.02:46
hmlwould have to change for every abort, instead of in abort02:47
hmland some places not easily known -02:47
wallyworldi haven't looked at code - can we record last failure as an attribute on the migration op. there's ony one place where the final abort status is set? or no?02:50
hmlthere is only one place abort status is set -02:51
wallyworldthat's good - we just need to check for last error and prepend that with "migration aborted: " or whatever02:52
wallyworldthat way it is 100% clear that the thing has stopped and why02:52
hmlwhere is the last error?02:52
wallyworldyo need to record it!02:53
wallyworldcreate a varioable to hold it02:53
wallyworldhave a heper method somehwere used to update status with anerror, and overwrite a lastError each time. or something02:54
hmlsorry i’m being dense here.02:54
hmlthere’s one i can add on to02:54
wallyworldcould be me oversimplifying02:54
wallyworldi only have a little brain02:55
hmli was thinking the worker in this case was at a highlevel02:55
hmlbut it’s in mibrationmater02:55
wallyworldyeah, i think migration master entity should be able to track errors encountered doing the job02:58
wallyworldbabbageclunk: you loving simple streams? :-D03:07
wallyworldaxw: just started looking - why was srv.run() put in a go routine? maybe it will be clear when i read more of the PR?03:09
wallyworldjust curious if it was due to a drive by or part of the PR03:10
axwwallyworld: apiserver.Server is a worker, I'm just making it conform to our usual patterns03:10
wallyworldok, ta03:10
axwwallyworld: it's not a big deal, just tidying up03:10
wallyworldno worries, sgtm03:10
wallyworldaxw: and that means you were able to pull expireLocalLoginInteractions() out of a separate go rountine03:13
hmlwallyworld: changes done03:15
babbageclunkwallyworld: I am not03:15
wallyworldhml: awesome tyvm, looking03:15
wallyworldhml: much nicer, thank you!03:16
wallyworldbabbageclunk: did we need a HO?03:17
babbageclunkwallyworld: probably couldn't hurt03:17
wallyworldrighto03:17
* hml headed to the airport to pick up my dad03:20
wallyworldhml: see you later03:20
axwwallyworld: yeah, the idea was to stop using the waitgroup for two different things (tracking sub-workers, and outstanding connections/requests)03:25
babbageclunkthumper: did you come back?03:30
wallyworldaxw: sorry about delay, had meeting, lgtm04:03
axwwallyworld: no worries. I've left a bunch of comments on your PR. I'd like it to be split up a bit (see comments), and worker responsibilities separated04:15
wallyworldok looking04:15
axwwallyworld: re renaming StateTracker to StatePoolTracker or whatever, I'd really rather not. I'm considering StatePool just an implementation detail, as the entry point to "state". I want to replace it with a different type which manages all the state workers, as we've talked about recently04:17
wallyworldok04:17
babbageclunkwallyworld: if we find the same binary version in multiple streams it doesn't matter, does it? They'll be the same.04:24
wallyworldthey are today but not guaranteed04:24
wallyworldalsthough they will always be the same in practice04:25
babbageclunkWe won't find 2.3-rc1 in both released and proposed.04:25
wallyworldcorrect04:25
wallyworldrc1 should only be in proposed04:25
wallyworldsorry, i think i misunderstood the first time04:25
babbageclunkNo, that was a slightly different question.04:25
thumperwallyworld: https://github.com/juju/juju/pull/808904:47
wallyworldok04:47
* thumper heads out for more kid duty04:47
thumperbbl04:47
thumperwallyworld: I'm not sure I agree with you on the select bits05:06
wallyworldthumper: why have 2 select blocks which creates bloat when we just need one?05:06
thumperwallyworld: because it is explicit and clear05:06
thumperyou aren't allowed to send to a nil channel05:07
thumpernot even allowed to try05:07
wallyworldsending to a nil channel as a no-op is fairly standard05:07
thumperno05:07
wallyworldoh wait05:07
wallyworldreceiving05:07
wallyworldsorry05:07
thumperpulling off a nil channel05:07
wallyworldyeah, ignor eme05:07
thumperok05:07
thumperI'll look at the Timeout bit05:07
wallyworldok05:07
thumperI think in order to keep the patch small, we shouldn't rename Timeout in this branch05:08
thumperin case we need to back port it05:08
thumperI'm ok in principle with renaming Timeout05:09
thumperalthough Timeout is fairly self explanitory05:09
wallyworldexcept that there's 2 and it's a bit ambiguous, but the doc helps and agree about minimising change05:11
wallyworldaxw: why do you want separate NewIAASModel() and NewCAASModel() when the code for both is 95% the same? there's just a couple of different checks which should disappear at some point eg the IAAS restriction on new model being in the same cloud as the controller we have talked about removing06:02
wallyworldgiven model type is eassentially just a model attribute, i'm not sure what such a far reaching change buys us06:03
wallyworldthere > 30 usages of NewModel()06:04
axwwallyworld: because it keeps the callers simpler, not having to pass in irrelevant things like storage providers and constraints. so each time you add something IAAS specific you don't have to worry about CAAS code, and vice versa06:08
axwwallyworld: all the existing ones are for IAAS models, right?06:08
wallyworldwe already support adding a caas model06:08
wallyworldmost of the test cases are iaas06:09
wallyworldthis NewModel() code with caas type has been there for a while06:09
wallyworldi think06:09
wallyworldi can change, will just add some noise to the pr06:10
wallyworldi'm starting a new pr06:10
wallyworldfor the below the water line stuff06:10
axwwallyworld: leave that one then06:10
axwit can be refactored later06:10
wallyworldaxw: i can easily do in a followup06:11
axwwallyworld: my biggest beef is with responsibility of making/maintaining connections, and head was getting too full to dive into the state innardy bits06:12
axwthe rest is small stuff06:12
wallyworldyup fair point06:12
axwexcept ModelActive06:12
wallyworldi've changed but still dislike having to make 2 separate mgo queries06:12
wallyworldyou don't have a model object at the call site - just a uuid06:13
wallyworldand the ony thing that calls ModelActive() is there06:13
axwwallyworld: but you can change that, by using a state pool -no?06:13
wallyworldno, this is before the agent is started06:13
wallyworldwhen it is figuring out what manifolds to use06:13
axwwallyworld: it's in the model worker manager? that's not before any agents have started...06:14
axwit's run outside of the machine manifold, which is part of the problem06:15
axwit needn't be though06:15
wallyworldok, i'll look closer06:15
axwwallyworld: I can move that into the machine manifold if you like, while you're doing other things?06:15
axwchasing down a test failure atm, which I'll need to fix first06:16
wallyworldok, that would be grand. i can rebase on top of that. i've got about 3 branches on the go06:16
wallyworldgot the skeleton operator agent going06:16
wallyworldwith a bit of common stuff extracted from uniter06:17
wallyworldsmall steps06:17
axwwallyworld: I'm going to merge develop into the feature branch, OK?06:22
wallyworld\o/06:22
axwthere's some agent fixes I don't want to work around06:22
wallyworldyup06:23
wallyworldi wish git had pipelines like bzr06:23
wallyworldwould make working on several dependent branches so much easier06:23
wallyworldaxw: you still working on the state pool tracker pr or can that land now the feature branch has been updated from develop?07:02
axwwallyworld: I'm trying to figure out why a test failed07:03
wallyworldrighto07:03
axwwallyworld: I've been unable to reproduce07:03
wallyworldjoy07:03
axwmight just land and see if it pops up again, might be blue moon type of thing07:04
=== frankban|afk is now known as frankban
babbageclunkwallyworld: can you take a look at another WIP and let me know whether you're happier with that approach?08:57
axwwallyworld: took a bit longer than expected, here's my PR to move modelworkermanager to the dependency engine: https://github.com/juju/juju/pull/809308:58
babbageclunkwallyworld: oops https://github.com/juju/juju/pull/809208:58
axwpipped to the post!08:58
wallyworldbabbageclunk: looking09:01
babbageclunkaxw: I don't know, you got the link there first09:08
axw:)09:08
wallyworldbabbageclunk: looks pretty good09:09
babbageclunkwallyworld: cool cool, just testing09:10
wallyworldbabbageclunk: i'm not sure there's more to do to handle searching across datasources? as the current behaviour might be sufficient?09:10
wallyworldaxw: ty, i'll look after dinner09:11
babbageclunkwallyworld: maybe? It'll still stop on the first datasource that has an index though09:11
wallyworldyeah09:11
wallyworldif datasource A only has develp and B has released09:12
wallyworldit needs to consider all09:12
babbageclunkright09:12
wallyworldi think the branch as is can land though09:12
wallyworldand another done to do the other bit09:12
babbageclunkI need to fix unit tests09:12
babbageclunkok, will do that09:13
wallyworldjeez, it's late for you09:13
babbageclunkhad to break to feed and bathe kids09:13
wallyworldsee how you go, i might have to pick it up tomorrow09:14
wallyworldi need to go afk for a short while for dinner09:14
babbageclunkok09:14
wallyworldaxw: awesome, ty. i will have some conflicts but I'll deal :-)09:41
axwwallyworld: cool09:41
wallyworldaxw: i'll have 2 PRs tomorrow. one for the state/infra stuff, one for the worker and manifold stuff. and then I'll weave in the facade stuff. and after that I'll propose the operator skeleton. so I guess that's 4 :-)09:42
axwwallyworld: sounds good. I'll try and move the remaining machine workers to manifolds in the morning, and we can figure out what CAAS things I can do in 1:!09:43
axw1:1 even09:43
wallyworldyay, more caas stuff09:44
babbageclunkgit question because I'm feeling stupid - if I want to see all the diff-stats for the accumulated commits in my branch, how do I do it?09:47
babbageclunkI've tried git diff --stat upstream/develop..my-branch but it includes changes on develop that I haven't rebased into my branch yet.09:48
babbageclunkbah, just rebased so that I didn't need to worry about it.09:49
=== salmankhan1 is now known as salmankhan
axwballoons: looks like jenkins has died in the ass again - https://github.com/juju/juju/pull/809310:25
babbageclunkwallyworld: ping? unlikely13:18
jambabbageclunk: go to bed13:25
jam:)13:25
jambabbageclunk: but if there is something I can help with13:25
jamaxw: balloons said it was out of disk the other day13:25
babbageclunkjam: I'm thiiiis close to stopping! Thanks, I think I worked it out.13:26
wpkjam: it's not even 3am, leave him alone!13:32
babbageclunkwpk :)13:45
babbageclunkis mergebot down?13:48
babbageclunkballoons: is there a problem with mergebot? This PR isn't getting built: https://github.com/juju/juju/pull/809213:50
* babbageclunk goes to bed13:50
balloonsI can look. Mean13:58
balloonsEvery morning this week13:58
balloonsI need to share the doc with you all on how to troubleshoot13:59
balloonsAxw, so perhaps we need to clean and reboot the vsphere daily to keep things going?14:02
balloonslooks like cloud instance died14:10
balloonsrestored, jobs running again14:18
balloonswpk, does this look better? https://github.com/juju/juju/pull/808616:14
=== frankban is now known as frankban|afk
hmlaxino: ping18:18
bdxhttps://bugs.launchpad.net/juju/+bug/173276418:53
mupBug #1732764: series + spaces + artful = fail <juju:New> <https://launchpad.net/bugs/1732764>18:53
hmlballoons: is that doc for troubleshooting the merge bot around?  :-)  github-check-merge-juju-pipline thinks it has a config error.19:22
balloonsbdx, xenial works and artful doesn't?19:23
balloonshml, needs updated and it was shared with you at one point :-) https://docs.google.com/document/d/1TN4SG8QXNbXpFn_9QTpPgI7XxD85uzdlttGB0QRlm2I/edit#19:26
hmlballoons: i have to remember that far back?  ;-)19:27
bdxballons: yeah19:28
balloonsbdx, ty. We've actually been testing this over the last couple weeks19:28
bdxthe only combo that breaks it is "series + spaces + artful + beta3"19:28
bdxballons np19:28
balloonshml, ohh the old job was restoed19:29
balloonsjust needs to be removed from jenkins again19:29
bdxIm making an official redis snap right now, so it wont matter to me anymore - I was chasing the 4.x in artful lol19:29
bdxgood to get these things fixed anyway though19:30
balloonsbdx, ack. And indeed19:30
balloonshml, anyways notice the actual bot ran fine on your pr19:32
hmlballoons: i did, wasn’t sure what the pipeline thing was19:33
hmlwallyworld: i looked at the bug for a panic in the storageprovisioner, it’s easy to see the cause, but i’m wondering why the pointer is nil at that point.  who’s been in that area?21:58
hmlwallyworld: just to make sure there isn’t a bad root case21:58
hmlor cause even21:58
balloonswallyworld, babbageclunk, please test sigfile in the edge snap22:08
balloonsand let me know; since we're going to ship that in rc1 if it's good22:10
balloonsor if you don't say anything :p22:16
axwballoons: I guess we could try, but it didn't take long for it to start misbehaving after I restarted it last time23:31

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!