/srv/irclogs.ubuntu.com/2015/06/16/#juju-dev.txt

rick_h_axw: I commented through the resources spec. Please let me know if anything I say makes little to no sense or a chat through would help make things clear/etc.01:26
rick_h_axw: well, not the spec I guess, but the use case/design doc which had a lot more interesting details01:27
axwrick_h_: ok, wallyworld has been doing that mostly, but I can take a look if you like01:27
rick_h_axw: ah np, I was going to ping you both but he's afk from irc atm01:27
rick_h_axw: if he comments just let him know I'm happy to be schooled in person/etc.01:28
axwrick_h_: heh :)  no worries01:28
thumperwallyworld: damn hangouts02:41
thumperI could hear you, but you weren't moving02:41
wallyworldyeah, tis ok. i'll reply02:41
thumperand stuff I typed into the chat window didn't come through02:41
thumpercheers02:41
menn0axw, wallyworld: i just noticed that our tests prefer the system installed mongod over the Juju one. is that intentional?02:50
axwmenn0: hrm, I didn't think so. I thought we were meant to use juju-db if it existed, otherwise whatever's available02:50
wallyworldum, don't think so02:51
menn0axw, wallyworld: getMongod (in juju/testing) tries $PATH first so that one wins if you have the system mongodb package installed02:51
axwmenn0: oh sorry, the *tests*. I think the reason for that is so we get the javascript-enabled version if it's there02:52
axwmenn0: IIRC juju-db doesn't have javascript on some arches02:52
axwpossibly all02:52
menn0axw: why do we need a JS enabled monogdb?02:54
axwmenn0: for charmstore I think02:54
menn0axw: ah ok02:54
menn0axw: it's just unfortunate that we can end up running tests with the mongodb version that isn't actually used in production02:54
menn0axw: it probably doesn't matter most of the time02:55
axwmenn0: true. perhaps it should be juju-db by default, and configurable (for charmstore)02:55
menn0axw: except if we unwittingly take advantage of some newer feature of mongodb02:55
anastasiamacdavecheney: o/03:22
anastasiamacdavecheney: what were the problems with importing "time"?03:23
davecheneyanastasiamac: it is a symptom if a flakey test03:24
davecheneykicking off an actoin then waiting for some time to pass03:24
davecheneyis likely to be flakey03:24
anastasiamacso it's only a "symptom" if imported in tests?03:25
anastasiamacit's k to import into actual code..03:25
anastasiamacdavecheney: ?03:25
davecheneyyes03:27
davecheneybut imo test files that import "time" tend to be flakey03:27
anastasiamacdavecheney: :D ty for ur passion! (and answers)03:45
davecheneyno03:55
davecheneynp03:55
davecheneyi say they are flakey because there is in herent uncertainty03:56
davecheneythe test is usually saing somthing like "I'm going to hope that X happens in under NNN milliseconds, or Y shouldnt take more than NNN milliseconds to happen"03:56
davecheneyand if uou look at our build bot03:56
davecheneyneither of those tend to be true03:56
anastasiamac:)04:01
anastasiamacI am going to add a "r u absolutely sure?!" to a review I am doing based on ur opinion. I would love for us to move away from using actual "time" in tests... whether it will ever happen is a separate question...04:03
* davecheney bursts into tears of applause04:04
* anastasiamac can't believe she made Dave cry.. of joy nonetheless.. 04:06
anastasiamac:D04:06
davecheneyit doesn't take much these days04:07
davecheneyhttp://gophercon.com/04:07
davecheneynote the helpful countdown, telling me how much work I haven't done04:07
anastasiamacdavecheney: it's impressive! well done :D m not going but it's looking to be really exciting!!04:08
menn0can i trouble someone for an easy review? http://reviews.vapour.ws/r/1940/05:21
anastasiamacmenn0: looking ...05:59
menn0anastasiamac: thanks05:59
anastasiamacmenn0: rhetorical question :D...06:01
anastasiamacso the deque with max length as 0 will behave as before?06:02
anastasiamacdropping from the other side?06:02
anastasiamacmenn0: neevr mind - re-read and re-understood :D06:03
menn0anastasiamac: with a max len of 0, there is no maximum length (as before)06:03
menn0ok :)06:03
menn0anastasiamac: thanks for the review!06:09
fwereadeaxw, anastasiamac: thanks for the reviews06:24
fwereadeanastasiamac, about the exported test suites06:24
axwfwereade: nps06:24
fwereadeanastasiamac, do you know why people *avoid* exporting test suites?06:25
fwereadeanastasiamac, I've seen people do it06:25
fwereadeanastasiamac, and I've always thought it's a bit weird, but AFAICS it makes no practical difference at all06:25
fwereadeanastasiamac, so I always chalked it up to matter-of-taste-don't-worry-about-it06:26
anastasiamacfwereade: well, I am kind of on the fence of this one...06:26
anastasiamacfwereade: i just don not see the point of exporting suite06:26
anastasiamacfwereade: and all suites that i have "accidentally" written with Capitals, I was told to change06:27
anastasiamacfwereade: :D06:27
axwfwereade: it doesn't really make a difference. I would argue if it's an internal test package (i.e. not _test), then not exporting keeps the exports clean for the _test package (not the case here)06:27
axwotherwise matter of taste06:27
fwereadeanastasiamac, (fwiw, exporting "feels" right to me because the tests *are* the reason for the package to exist -- and I know it *really* works by passing it as an interface{} into gc.Suite, but the semantic payload of "this component is important and part of the public expression of the package" feel relevant)06:28
anastasiamacfwereade: from practical perspective, I use autocompletion a lot06:28
anastasiamacfwereade: and exported tests are visible06:28
anastasiamacfwereade: they tempt me to re-use them06:29
anastasiamacfwereade: I would rather they were not exported if noone else is meantt o use them :D06:29
fwereadeanastasiamac, when do you import a foo_test package? I thought you couldn't06:29
fwereadeanastasiamac, I submit that your autocomplete is buggy if it's pulling in stuff exported from _test packages ;p06:30
anastasiamacfwereade: i do not import... I auto-complete.. which tries to import if I select something06:30
anastasiamacfwereade: possibly... but it's not a deal breaker either way...06:30
anastasiamacfwereade: i guess i wonder why export if it's not re-used?06:31
fwereadeanastasiamac, semantic payload06:31
anastasiamacfwereade: so mayb we should have a consensus - export all suites :D06:32
fwereadeanastasiamac, possibly a bit of habit from python? I wouldn't keep a test private because I'd expect something like nose to find it for me, and my views on how tests shoudl be expressed are probably coloured by that06:32
anastasiamacfwereade: m happy to let it go.. altho "semantics" does not really give an argument a tangible benefit IMHO :D06:33
fwereadeanastasiamac, heh, I feel it's terribly important06:34
anastasiamacfwereade: :D if u feel "terrible" importance, I am happy to go with ur gut feeling06:34
fwereadeanastasiamac, the set of associations triggered in the developer's mind have a massive effect on how they use the code, and what expectations they have of it06:34
anastasiamacand take it up next sprint :D06:35
fwereadeanastasiamac, the trouble is that it's hard to measure those effects objectively06:35
fwereadeanastasiamac, and even if you could they'd diiffer across sets of developers06:35
fwereadeanastasiamac, (sorry I'm not saying that test-exports are terribly important -- but I think the semantic associations of the names we pick *are*)06:36
anastasiamacfwereade: mayb because I do not have python goggles on, I m a bit reserved about exporting un-reusable bits of code (or tests for that matter)06:36
anastasiamacfwereade: semantics assosciations of names r important...06:37
anastasiamacfwereade: exporting or not exporting test suites is an approach (pattern?) not a naming semantics... no? ;))06:38
fwereadeanastasiamac, (...and I guess the extension of that is that exporting a type itself has semantic associations; and in my mind the "these are the parts someone using this package cares about" associations are stronger, in my own mind, than the "someone will actually want to import this type" ones)06:38
fwereadeanastasiamac, that's only true because you can't import test packages though06:38
anastasiamacfwereade: :D06:39
fwereadeanastasiamac, if you could then the choice would not be completely academic06:39
anastasiamacfwereade: i am happy for an executive decision here :D what would u like to see as a convention - export or not export? to be or not to be?06:40
* anastasiamac @ school run06:41
fwereadeanastasiamac, my personal preference would be to export, but I also feel like tastes can legitimately differ, which is why I never made a fuss when I saw people starting to not-export-by-habit06:41
fwereadeanastasiamac, if we can predict harm caused by unpredictably-exported tests, yeah, we should pick a convention, but I can't really see anyone being inconvenienced either way?06:44
anastasiamacfwereade: tyvm for an amazing discussion point :D it's always re-freshing to hear ur thoughts and read ur reasoning :))07:10
fwereadeanastasiamac, thank you, always a pleasure :)07:20
TheMuedimitern: ping07:45
dimiternTheMue, pong07:45
fwereadeanastasiamac, what'd be the advantage of enums for fieldnames?07:46
TheMuedimitern: one question regarding API errors. they have a message and a code. if an operation on state returns with an error it's clear to put it, sometimes with additional information, into the message. but do we have a common strategy which codes to use? e.g. when ipaddress.Remove() fails.07:46
fwereadeoops brb07:47
dimiternTheMue, yes, use common.ServerError() to wrap an error as *params.Error07:47
dimiternTheMue, btw I couldn't see an update in any of your branches07:48
TheMuedimitern: aaaah, the missing puzzle piece07:48
TheMuedimitern: yeah, just wanted to finish the remove method07:48
dimiternTheMue, ok, cool07:49
TheMuedimitern: feeling better with the current state of the code, even if I'm still not happy with our API segmentation :D07:50
dimiternTheMue, well, one step at a time :)07:52
TheMuedimitern: hehe, yes07:52
TheMuewrapping an error into an error result into error results, seems to be a very brittle piece of data07:58
TheMueah, when wanting to use a mock state from inside a mock entity, this field should not be nil *facepalm*08:37
fwereadeaxw, anastasiamac: have several responses on http://reviews.vapour.ws/r/1934/ ; thoughts appreciated08:46
axwfwereade: thanks, responding now.08:46
axwfwereade: BTW, it makes it much easier to follow up if you reply on the review page, rather than in the diff. your responses came up as new issues08:50
dimiternvoidspace, fwereade, standup?09:01
voidspacedimitern: omw09:01
fwereadeaxw, yeah, sorry about that, will try to remember -- I tend to click through for context and forget to go back09:15
fwereadedimitern, oops omw09:15
axwfwereade: yeah, it's kind of an annoying interface when replying to comments.09:16
axwfwereade: responded. thanks, my primary issues were just misunderstandings. LGTM, thanks!09:18
fwereadeaxw, awesome09:19
fwereadeaxw, it's not ready yet though. not enough tests :)09:19
axwfwereade: I know, I just meant the logic itself09:19
axwand interface09:19
fwereadeaxw, perfect, tyvm09:19
jamvoidspace: 7m30s in a trusty vm on my laptop with an SSD, but modest CPU09:39
fwereadeaxw, hey, one thought: I'm becoming convinced that info,EarliestExpiry is an internal implementation detail, and exposing it can only encourage people to DTWT09:45
fwereadeaxw, opinion?09:45
fwereadeaxw, we still need it internally ofc09:45
axwfwereade: I think I agree, let me take another look09:45
fwereadeaxw, and then Info is just Holder;Expiry;AssertOp09:46
axwfwereade: yes, SGTM09:46
fwereadeaxw, cool, thanks09:46
dimiterndooferlad, as you drop that test about allowing loop mounts not returning an error, you might close bug 1465404 as a side-effect10:00
mupBug #1465404: worker/provisioner: fail lxcBrokerSuite.TestStartInstanceLoopMountsDisallowed <juju-core:New> <https://launchpad.net/bugs/1465404>10:00
dooferladdimitern: thanks, will do.10:01
* dimitern ffs.. it takes more than 30m to run all tests on my machine, even with gt10:01
dooferladdimitern: gt will only help once you have run once. It should now run in about a second10:02
dooferladdimitern: until you change something that is, but it is still much faster not re-running tests on code that hasn't changed.10:02
dimiterndooferlad, I can't wait for it to complete so I can see if there will be an improvement the second run10:02
dooferladdimitern: :-)10:04
dooferladdimitern: guess a new machine is needed!10:05
dooferladdimitern: not that I have a problem and buy new computers way too often...10:05
dimiterndooferlad, it's high time for a new one; I might not even wait until october for my laptop refresh :)10:06
dooferladdimitern: well, the laptop refresh is just money right? If you can afford to spend now then you still recover the money in october.10:07
dimiterndooferlad, yeah, of course - it's now *yet* bugging me that much, but it's close10:08
dimiternwow!10:09
dimiternit does indeed run in a couple of seconds after the first run - and prints out nice cached panics as well :)10:10
dooferladdimitern: so, perhaps gt needs a cloud option - share the cached results for a particular file hash...10:12
dooferladdimitern: should be really easy to add...10:12
dimiterndooferlad, that'll make it reusable a lot easier among teams10:12
dooferladdimitern: exactly. All of juju (or any project) could basically share CPU time across all their machines10:13
dooferladdimitern: new friday labs project then?10:14
dimiterndooferlad, sounds like it, yeah - I was just about to say the same :)10:14
dooferladdimitern: https://github.com/rsc/gt/blob/master/main.go already uses sha1 to hash files and packages, so just storing them somewhere that isn't ~/.cache/go-test-cache is all you need10:16
dooferladdimitern: ok, that looks just too simple not to do10:16
dimiterndooferlad, :)10:20
dimiterndooferlad, but if it takes more time, please put it on hold10:21
dooferladdimitern: oh, I am not doing it *right now*10:26
dooferladdimitern: I have some self control!10:26
dimiterndooferlad, I know it's tempting :)10:28
voidspacejam: just seen your message10:57
voidspacejam: I still don't know the full runtime on this desktop - it timed out after 40 minutes so trying with an even more ridiculous timeout (2 hours)10:57
voidspacejam: there's something up on this machine10:57
jamvoidspace: that sounds more like just a simple deadlock. If you do "go test -timeout" won't it print a panic for you?10:58
voidspacejam: well I know *one* test takes 9 minutes so it may not be a deadlock10:58
voidspacejam: (one specific test)10:58
voidspacejam: yeah, I get the panic - not dug into it10:59
wallyworlddimitern: hi11:44
dimiternwallyworld, hey11:49
wallyworlddimitern: i had on my todo list today bug 1464616 for 1.24.1 but i've only just finished some work in progress. maybe one of your guys could take a look and i'll fix it tomorrow if needed if you don't get to it?11:50
mupBug #1464616: destroy-machine --force no longer forceably destroys machine <destroy-machine> <regression> <juju-core:Triaged> <juju-core 1.24:Triaged by wallyworld> <https://launchpad.net/bugs/1464616>11:50
dimiternwallyworld, Iooking11:52
dimiternlooking even11:53
wallyworldty, see how you go, i'll fix tommorrow if needed11:53
dimiternwallyworld, cheers, have a good one11:53
anastasiamacfwereade: re enums :D11:53
voidspacejam: less than an hour is the answer... 2990.665s :-) definitely something up there.11:54
anastasiamacfwereade: my personal guiding force is 1. type protection vs magic strings/numbers; 2. grouping of related constants11:54
wallyworldfwereade: hey, could you take a look at perrito666's latest uniter hook work to see if you're happy for the pr to land?11:56
fwereadewallyworld, sorry, will do11:56
wallyworldty11:56
anastasiamacfwereade: of course, since most of ur consts are unexported and internal to the package, u could say that putting enums might be a bit on the paranoid side...11:57
fwereadeanastasiamac, sorry phone11:58
anastasiamacfwereade: nps :) can be another of these topics we discuss informally over drinks ;D11:59
fwereadeanastasiamac, I never felt that golang really helped that much there -- and when they're all internal you tend to just trust yourself to use the symbolic constants and not the magic, rather than adding another whole layer of types and validation methods and so forth12:00
anastasiamacfwereade: i agree - internally it may not be as useful (unless u forsee more values coming in the future under the same "group", or the need to be exported, etc)12:02
anastasiamacfwereade: feel free to drop the issue :)12:02
fwereadeanastasiamac, yeah -- I'm hoping those will stay static and internal and forgottenn for many years ;p12:02
anastasiamacfwereade: (and just for the record, no I don't trust myself as much u trust urself or I trust u)..12:03
jamvoidspace: your machine got scared when 50 minutes rolled around :)12:12
jammakes me wonder if something like shortWait got set to 50s instead of 50ms or something.12:12
voidspacejam: no-one else is seeing the same thing12:12
voidspacejam: I have this on trunk12:13
jamvoidspace: you're sure you don't have any local changes, right?12:13
voidspacejam: yep12:13
voidspacejam: I'll dig in and see where the slowdown is actually occurring and see if I can work out why12:13
jamvoidspace: so the only thing *I* would get out of the panic is what test was running, you could use that to see what test is slow12:13
jamI generally find golangs 200 tracebacks hard to debug since we use so many goroutines12:14
voidspacejam: I know at least one test that takes 9 minutes12:14
voidspacejam: yep :-/12:14
voidspacejam: so I have somewhere to start12:14
=== anthonyf is now known as Guest89967
perrito666internet over 3g here feels a lot like dial up, I am suddenly invaded by nostalgia... and lag, lots of lag13:28
katcowwitzel3: standup14:01
wwitzel3katco: yes14:02
mupBug #1465694 opened: TestDiesOnFatalError fails <ci> <test-failure> <juju-core:Incomplete> <juju-core db-log:Fix Committed by menno.smits> <https://launchpad.net/bugs/1465694>14:34
mupBug #1465695 opened: StorageAddSuite setup fails <ci> <intermittent-failure> <test-failure> <juju-core:New> <juju-core db-log:Triaged> <https://launchpad.net/bugs/1465695>14:34
perrito666first internet goes off and now the lights... I feel like in the prelude of a b movie about catastrophes14:34
katco"the first reports came from south america"14:46
perrito666katco: well I had cellphone signal and 3g inside my house, that was quite apocalyptic14:48
katcomaybe the usual electrical interference was gone14:48
perrito666fantastic, the power outage reset the ISP box for the block and I am back online14:48
katco:)14:48
voidspacewhat's the environment variable for setting logging level during tests (alongside -check.vv) ?14:48
perrito666katco: also I live in a suburb mostly populated by families or working age people, I am most likely the only person in 100m around me14:49
voidspacedimitern: ping15:30
dimiternvoidspace, pong15:32
mgzgsamfira: are you free to talk windows maas images in 10 mins maybe?15:36
gsamfiramgz: sure15:36
mgzgsamfira: ace, will invite you in a min15:37
voidspacedimitern: how do I enable trace logging for a test run?15:42
dimiternvoidspace, TEST_LOGGING_CONFIG='<root>=TRACE' go test ...15:46
dimiternmgz, ping15:52
mgzdimitern: hey15:52
dimiternmgz, have a look at this http://paste.ubuntu.com/11725493/15:52
dimiternmgz, I believe that test should be dropped entirely15:53
dimiternmgz, as it will only work as expected when type==manual and AWS_ACCESS_KEY is in os.environ15:53
natefinchbtw all, we have a customer on #juju who might have hit #1464304 in a production environment15:54
mupBug #1464304: Sending a SIGABRT to jujud process causes jujud to uninstall (wiping /var/lib/juju) <cts> <sts> <juju-core:Triaged> <https://launchpad.net/bugs/1464304>15:54
natefinchand so juju has happily uninstalled itself, and he sorta needs juju on that machine, since there are services running there...15:54
voidspacedimitern: thanks15:54
dimiternnatefinch, by "uninstalled" what do you mean? apt? upstart?15:55
natefinchdimitern: if you send sigabrt to jujud, it wipes out /var/lib/juju/  I presume it also stops the service15:56
natefinchthis is a "feature" to help manual provider clean up15:56
mgzdimitern: do you mean drop the test, or drop the wrapper with the special case?15:57
mgzit's intended to just be an extra safety net after destroy-enviroment15:58
mgzthe test could easily be fixed by overriding the envvar15:58
natefinchperrito666: sounds like the guy mostly needs to recover the apipassword... do you happen to know where we can get that for the machine?15:59
dimiternmgz, well, the code this tests checks requires both AWS_ACCESS_KEY in os.environ and config,type==manual, so unless both are set in that and the previous test, it will fail15:59
=== kadams54 is now known as kadams54-away
mgzdimitern: yup, not overriding the env is wrong16:00
perrito666natefinch: context?16:00
dimiternmgz, ok, but if it's overridden the it's almost the same as the previous one (safe for e.g. setting the aws_access_key to '')16:00
natefinchperrito666: sorry... guy on #juju had his jujud uninstall itself... he's mostly recovered by copying files from other units, but the agent keeps failing saying it has a bad API password....16:01
mgzdimitern: but that's the intention of the test I believe16:01
natefinchperrito666: wasn't sure if in your work with restore, you might know a way to hack that.16:01
mgzie, assert d_j_i is not called if the envvar is not set16:01
perrito666natefinch: well if he is copying the agent.conf from another unit he will have problems because the tag will not be the right one16:01
natefinchperrito666: right.16:02
perrito666natefinch: and even then I am not sure he could recover the old password16:02
perrito666he might need to do some state server side magic16:03
natefinch*nod* he said he's been poking at mongo,, which is scary16:03
dimiternmgz, sorry, I'm apparently confused :)16:05
dimiternmgz, are you saying this test, as written is expected to fail always?16:05
perrito666natefinch: when someone says he has been poking at mongo I just raise my hands like a soccer player after a foul16:06
mgzdimitern: no, I'm saying it should make sure the envvar is unset, then it would do what it intends16:06
natefinchperrito666: yeah....16:06
mgzgsamfira: ready to roll?16:06
perrito666but yeah, we lack a way to recover it is an interesting feature though16:07
gsamfiramgz: in about 10-15 minutes. Got pulled into another meeting16:07
gsamfiramgz: will ping you soon16:07
mgzgsamfira: cool, thanks16:07
perrito666juju --Iswearthatisaunitjustreaprovisionthenecessaryfiles16:07
natefinchperrito666: yeah, I mentioned on bug  #1464304 that we should probably not use a signal as our way of telling jujud to commit seppuku and instead use some really arcane jujud command like `jujud DIEPLEASEDIE -y --yesIreallyMeanIt --deleteAllMyStuffForReal`16:09
mupBug #1464304: Sending a SIGABRT to jujud process causes jujud to uninstall (wiping /var/lib/juju) <cts> <sts> <juju-core:Triaged> <https://launchpad.net/bugs/1464304>16:09
perrito666natefinch: why on the universe we have a commit seppuku command on the first place?16:09
natefinchperrito666: that's how manual provider tells a unit to clean itself up (since it can't just de-provision the machine).16:10
perrito666and why does it work in any other provider :p16:11
natefinchthat was going to be my next point, too16:11
natefinchbut it shouldn't even work on the manual provider... it's just too easy to do by accident, regardless of what signal we use16:11
dimiternmgz, you mean instead of 'AWS_ACCESS_KEY' in os.environ; add something like "and os.environ['AWS_ACCESS_KEY'] != '' ?16:11
mgzdimitern: I mean like, patch(os.environ, {})16:14
voidspacemgz: patch.dict(os.environ, clear=True)16:15
mgzor zat.16:16
dimiternmgz, that won't work16:17
mgzdimitern: I don't see why not?16:18
dpb1natefinch: thanks for the look yesterday.  I'll do more work to narrow down the problem.  If I get it live, can you hop on and take a look?16:18
gsamfiramgz: ready16:20
mgzgsamfira: jog pasting you hangout link16:20
dimiternmgz, hmm, ok, it works, sorry for the noise :)16:22
mgzdimitern: :P *hugs*16:22
dimiternmgz, I rather like the dummy stack actually - I'm extending how it's deployed so it can be used concurrently in pairs16:24
mgzdimitern: neat16:25
dimiternmgz, e.g. deploy one pair sink/source in a couple of lxc containers, then deploy another pair in a kvm and on a machine (using different service names ofc)16:25
=== natefinch_ is now known as natefinch
natefinchkatco: I've been trying to help MrOJ over on #juju but have to run to pick up my daughter at preschool.  Handing off to perrito666 for now.  But wanted you to know I'd spent some time on it.16:32
=== kadams54-away is now known as kadams54
=== kadams54 is now known as kadams54-away
natefinchericsnow: you around?19:15
ericsnownatefinch: yep19:15
ericsnownatefinch: just wrapping up a review on your patch19:15
natefinchericsnow: cool.  We have a user on #juju that evidently had a problem with restoring his state server19:16
natefinchericsnow: or rather, he restored and then the agents on all his machines killed themselves19:16
natefinch(on all the non-state-servers)19:16
ericsnownatefinch: that sounds familiar19:17
ericsnownatefinch: perrito666 fixed (?) something like that recently, I believe19:17
natefinchericsnow: he's on 1.23.3, so pretty recent.19:18
natefinchperrito666: was helping out, but had to go run an errand... unfortunately that was before I thought to ask how the guy got in this state19:18
ericsnownatefinch: well, 1.23 is the first version with the new restore implementation19:20
natefinchwell at least that19:21
ericsnownatefinch: what info do we have?19:22
natefinchhe said he had some DNS issues during restore, and restore never finished, so he had to kill it.19:22
natefinchI presume he has fixed the DNS issues and reran restore (verifying now)19:23
ericsnownatefinch: which provider?19:26
natefinchericsnow: maas19:26
ericsnownatefinch: lovely19:26
natefinchiknorite?19:26
natefinchhe updated his state servers to 1.24 before trying restore after fixing DNS19:27
natefinchI wonder if ^ was the cause of a problem, since now the agents are on the wrong version19:41
ericsnownatefinch: I expect it depends on at what point they canceled the restore19:46
=== anthonyf is now known as Guest41199
ericsnownatefinch: restore really shouldn't impact the agents on other machines (other than to make the API unavailable)19:47
ericsnownatefinch: so the fact that those agents died implies to me that something in mongo got corrupted (or there was some API timeout that triggered the grim reaper)19:48
natefinchericsnow: yeah, it seems like a manifestation of the bug where if the agent can't get to the state server, it kills itself. Except that I thought we'd fixed that.19:49
ericsnownatefinch: yeah, that's what I was thinking of19:51
ericsnownatefinch: it may be that it's not fixed in the version(s) he is running19:51
natefinchericsnow: he's on 1.23.3 ... that's the latest release version.  If it's fixed anywhere, it should be fixed there.19:52
ericsnownatefinch: it seems like we haven't been backporting as many (any?) fixes to 1.23 as we have to 1.2419:53
natefinchericsnow: looks like it was fixed in 1.20.1: https://bugs.launchpad.net/juju-core/+bug/133977019:55
mupBug #1339770: Machines are killed if mongo fails <canonical-is> <landscape> <maas-provider> <juju-core:Fix Released by wallyworld> <juju-core 1.18:Won't Fix> <https://launchpad.net/bugs/1339770>19:55
ericsnownatefinch: ah, fixed last year so it should be fixed in 1.2319:57
natefinchthank god juju uninstalling itself doesn't also blow away its logs19:59
perrito666natefinch: so, we have  logs for the restore issue?20:01
ericsnownatefinch: FYI, https://github.com/juju/juju/commit/b4e37a8d31bedb56ce46318779c04387f833302620:01
ericsnownatefinch: from what I can tell, the actual fix is on 1 line: https://github.com/juju/juju/commit/b4e37a8d31bedb56ce46318779c04387f8333026#diff-edccfba67a01587c9faca9185781e5dbR28920:04
mupBug #1465844 opened: juju action-set skips values with underscore in the key <juju-core:New> <https://launchpad.net/bugs/1465844>20:44
thumperfwereade_: ping?20:56
alexisbthumper, that is wishful thinking at this hour :)21:01
thumperalexisb: I know, but sometimes he is around21:01
fwereade_thumper, pong21:20
fwereade_thumper, how's it going?21:20
thumperfwereade_: good,21:20
thumperfwereade_: many of the CI failures seem to be jujud/agent tests getting deadlocked21:20
thumperI *think* this is lease related...21:20
thumperyou are fixing this yes?21:21
fwereade_goddammit that is not at all unlikely21:21
thumpersome of this is due to the agent not stopping21:21
fwereade_that is one of the features of the current implementation that I intend to do differently yes21:21
fwereade_thumper, I might even have a fix for it inn a rotted branch somewhere that I didn't realise I hadn't finished21:21
thumperfwereade_: was just poking to see if that could be fixed soonish21:22
fwereade_thumper, in the short term the core of it is *similar* to the following (and axw made most recent changes so can verfiy I think)21:22
wwitzel3ericsnow: you still around?21:22
fwereade_thumper, there's an implicit dependency between the api server and the lease manager; and between the lease manager and state21:23
mupBug #1465694 changed: TestDiesOnFatalError fails <ci> <test-failure> <juju-core db-log:Fix Committed by menno.smits> <https://launchpad.net/bugs/1465694>21:23
fwereade_thumper, if you shut them down in the right order (apiserver; lease manager; state) then the tests should be safe21:23
fwereade_thumper, but to do that you need to get all the way down to the dummy provider21:24
thumperfwereade_: jujud/agent calls a.Stop(nil)21:24
thumperi think21:24
fwereade_thumper, which is what creates the apiserver and state and shuts them down21:24
fwereade_thumper, yeah, I can well believe it21:24
fwereade_thumper, what I saw was deadlocks on test shutdown21:25
fwereade_thumper, but I didn't have a repro that wasn't tied up with other changes21:25
fwereade_thumper, specifically what I saw was that the apiserver had started to respond to a leadership request21:26
fwereade_thumper, but the lease manager had been shut down already21:27
fwereade_thumper, so the api call was blocking forever and preventing the apiserver from completing21:27
thumperfwereade_: this timeout is effecting everyone21:28
thumpercan we priorities the fixing of it?21:28
perrito666sinzui: have a moment to help me not make a fool of myself?21:28
sinzuiperrito666: I am entering the release meeting. I may be distracted for 15-30 minutes21:29
fwereade_thumper, not tonight, but send me a mail about it and I should be able to tomorrow21:30
thumpersinzui: can we reduce the 1200s timeout on the bot tests runs to be 600s?21:32
thumpersinzui: the longest tests only take about 6 or 7 minutes21:32
thumper10 should be enough21:32
thumper20 means we have an extra 10 minutes to work out that we have a problem21:32
=== kadams54-away is now known as kadams54
sinzuithumper: for the maas bootstrap? If so, I did 60 minutes ago21:32
thumperno, just the bot unit tests runs21:33
sinzuithumper: oh, unit tests. Yes, I think I can21:33
sinzuithumper: all series and arch tests should pass in 600s?21:34
perrito666sinzui: I intend to propose a CI test buuut I am not sure how to push both repository and juju-ci-tools for that purpose (and am not sure how private all that should be21:35
sinzuithumper: I will get this change commit in the next hour21:35
sinzuiperrito666: juju-ci-tools is public, nothing will be private if we merge it21:36
perrito666good21:37
thumpersinzui: I'm not sure about all series and arch21:39
thumpersinzui: but amd64, for sure21:39
sinzuithumper: ok. I think the 386 allows creater timeouts because we only get under pwered machines. and 386 unit tests don’t pass in lxc on a fast machine21:40
sinzuiperrito666: I see your branch, You can use the “Propose for merging” link on https://code.launchpad.net/~hduran-8/juju-ci-tools/add_status_ci_tests and I will review it21:45
perrito666sinzui: following up with the "I feel like an idiot" series of questions21:46
perrito666how do I push repository?21:46
perrito666man, this makes me feel like a toddler trying to operate a nuclear reactor21:47
=== kadams54 is now known as kadams54-away
sinzuiperrito666: you don’t push repos. Lp setup each project with a repo with shared branches. All branches are in one plzce to see, no searching22:01
perrito666sinzui: yeah but it is lp:juju-ci-tools/repository that I have no clue how to propose a branch to22:02
sinzuiperrito666: i see your two branches art https://code.launchpad.net/juju-ci-tools22:02
perrito666sinzui: I created my work env by using https://github.com/juju/juju/wiki/ci-tests22:03
perrito666which suggests to bzr branch repository inside juju-ci-tools22:03
sinzuiperrito666: at https://code.launchpad.net/~hduran-8/juju-ci-tools/add_status_ci_tests/+register-merge , “target branch” is lp:juju-ci-tools/repository22:03
perrito666sweet, tx (the search says repository is a term too generic :p )22:04
sinzuiperrito666: It isn’t a bzr repo. It is the what we set LOCAL_REPOSITORY to22:04
sinzuithumper: 1200 is hardcoded in  juju’s Makefile. See the “check” target22:10
thumperah...22:10
thumperhaha22:10
thumperwe can fix that :)22:10
perrito666sinzui: I think I didn't mess up, I totally might have :p22:11
=== kadams54-away is now known as kadams54
sinzuithank you perrito666 I will review this after dinner22:21
perrito666sinzui: thank you, there is more incomming, storage is on tha making :p22:21
asanjarhi all, has anyone else experiencing bundle deployment from charmstore with broken icon or/and no relations?22:31
arosalesasanjar, are you seeing it with just the spark bundle or even simple bundles like mysql-wordpress22:32
arosalesasanjar, also what version of juju22:32
asanjararosales: mainly with big-data related bundles.. spark, hadoop or clouderamanager22:37
asanjar juju --version  1.23.3-trusty-amd6422:37
asanjarI can reproduce the problem %90 of times on aws just by executing: juju quickstart u/bigdata-dev/apache-hadoop-spark-zeppelin/122:39
arosalesasanjar, does juju quickstart u/jorge/wordpress/5 also fail?22:43
arosalesin aws?22:43
=== kadams54 is now known as kadams54-away
asanjararosales: have not tried jorge/wordpress .. however it looks like the problem is easier to reproduce with complex and time consuming deployments22:50
arosalesasanjar, gotcha, just wanted to see how prevalent the problem is22:51
arosalesasanjar, do you have a bug opened on it?22:51
mupBug #1465873 opened: Environment.Users does not take into consideration the current environment <juju-core:Triaged by waigani> <https://launchpad.net/bugs/1465873>22:53
asanjarhttps://bugs.launchpad.net/juju-core/+bug/146008722:54
mupBug #1460087: quickstart deployment fails to add relations when bootstrap goes "down" <deploy> <quickstart> <juju-core:Triaged> <https://launchpad.net/bugs/1460087>22:54
arosalesasanjar, but you are seeing this in AWS now with deploy time taking far less than 3 hours, correct?22:56
asanjaroh yes..22:57
asanjarthis might help>> juju debug-log for the last 30+ minutes http://paste.ubuntu.com/11727687/22:59
=== anthonyf is now known as Guest21341
thumperwaigani: hello on call reviewer: http://reviews.vapour.ws/r/1943/23:14
menn0thumper: mongodb is weird. i just spent ages figuring out what why there were strange 1s pauses in my oplog using tests, but only when I faked the oplog with my own capped collection.23:28
menn0thumper: turns out that if I move the fake oplog out of the "local" DB the pauses go away.23:28
menn0thumper: makes the tests heaps faster23:29
waiganithumper: done. that's a lot of duplicated factory calls.23:33
thumperwaigani: ta23:35
thumpermenn0: weird23:35
wallyworldperrito666: did you see the conflict in the mp?23:36
perrito666wallyworld: just saw it, when I did the PR it said it needed a moment to generate the diff and therefore I left after my attention span timeouted23:36
perrito666wallyworld: so, ill not make comments about the aestethic properties of that conflict display but omg, what a dumb merge tool23:37
perrito666I think I need to fix that but I presume ill get a shower of comments from reviewer so I should do all that together23:38
perrito666wallyworld: sinzui said he would review it after dinner23:38
wallyworldperrito666: bestter to fix conflict first23:38
menn0thumper: yeah. at first I thought it was related to the timeout I was using (also 1s) but changing it to other values didn't affect the 1s pauses23:38
wallyworldperrito666: as most reviewers prefer no conflicts before looing23:39
mupBug #1456714 changed: assignCleanSuite.TearDownTest fails <ci> <intermittent-failure> <unit-tests> <juju-core:Invalid> <https://launchpad.net/bugs/1456714>23:53

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!