[00:24] mwhudson: testing now, thanks [00:25] this stuff is a bit subtle [00:26] davecheney: as has been noted before the go linker is terrible [00:27] davecheney: i know what i was doing wrong though [00:27] basically the value of goarm was being appended to the value from the runtime.a file [00:27] so it was \x00\x07 [00:27] and it's only a uint8 so the runtime was just seeing 0 [00:31] wallyworld_: I've landed that fix to unblock master (thanks for the review) [00:32] np, thanks for fixing :-) [00:32] wallyworld_: CI is supposed to auto-unblock once CI passes, right? [00:32] yes [00:32] * ericsnow crosses fingers [00:32] g'night [00:45] ericsnow: thanks for fixing that [01:59] ran into an interesting bug: https://bugs.launchpad.net/juju-core/+bug/1492088 [01:59] Bug #1492088: juju bootstrap fails inside a wily container [01:59] anyone seen this before? [02:04] Bug #1492088 opened: juju bootstrap fails inside a wily container [02:11] davecheney: say, do you know off hand the difference between GOARM=5 and GOARM=6? [02:12] oh soft float seems like a major suspect [02:13] 12:10 < dfc> thumper: http://reviews.vapour.ws/r/2588/ [02:13] 12:10 < dfc> remove version.Binary.OS [02:13] mwhudson: yes, and no atomics [02:13] no STREX/LDREX [02:16] i think this is soft float [02:17] davecheney: if you enable the -shared flag for arm things go very wrong in hashmap code [02:17] and there is floating point code that could explain it [02:21] yes, the hash factor [02:21] or fill factor or something [02:21] from memory hashmap.go:mapinit [02:26] cmd/juju/status_test.go is a clear form of modern torture [02:27] yes it is :-( [02:28] * perrito666 uses the time between testruns to learn emacs [02:29] * perrito666 wonders if he can set kb layouts only for a given app [02:29] davecheney: yeah, luckily the softfloat is so broken that it doesn't get 10.0*100.0 right... [02:29] perrito666: i'm working on the great american novel while running tests on ppc64 [02:29] heh [02:30] oh finally, success [02:31] I changed one value in formatted status [02:31] git diff status_test.go | grep "^\+" | wc -l [02:31] 105 [02:35] ARGH [02:52] sinzui, I thought we had a voting race builder now ? [02:53] https://bugs.launchpad.net/juju-core/+bug/1492095 [02:53] mwhudson: soft float only works for values of 1 with extremely large expononents [02:53] Bug #1492095: worker/statushistorypruner: data race [02:53] haha i fixed it [02:54] not sure that was a sensible use of my time, but it took waaaay less than figuring out what was going on [03:02] Bug #1492095 opened: worker/statushistorypruner: data race [03:03] thumper: thanks for your comments, please see my reply [03:04] davecheney: I agree with your summation [03:04] davecheney: all within time [03:04] davecheney: thanks for finding the race (that was me, most likely) [03:04] perrito666: np [03:05] thumper: it's a valid operation [03:05] but I think it deserves to be broken out into its own logic [03:05] possibly in an upcoming juju/series package ? [03:26] perhaps [05:03] mwhudson: congrats on your +2 [05:03] proving the motto of the Go team: "you get commit rights when we get sick of comitting your stuff" [05:31] davecheney: thanks [05:31] and, yeah [05:33] we're classy like that === Spads_ is now known as Spads [07:09] axw: a small one if you have a moment http://reviews.vapour.ws/r/2590/ [07:29] wallyworld_: looking [07:29] ta [07:30] wallyworld_: filesystem CLI is delayed, the existing volume stuff needs cleaning up first. I created another card on the board [07:30] np [07:38] wallyworld_: I don't really understand why this branch is required at all. when would we ever get to the end of the resolver and have Started==false? [07:39] axw: during my testing i saw the update-status hook fire before the start hook had run (after install hook i think, before leader-elected) [07:39] so it's in response to observed behaviour [07:40] hmmm [07:40] wallyworld_: ah hm, apparently Start isn't run until after the first ConfigChanged [07:40] maye the refactoring got eid of the prolem [07:41] wallyworld_: so if the update status trigger comes in before config changed, this could happen I think [07:41] config changed may cone first yeah [07:41] yeah [07:41] wallyworld_: but no... we always wait for the first config changed [07:41] oh, ok, i forgot that [07:42] wallyworld_: although I think it could still happen in the case of a failed/resolved config-changed [07:43] could do. adding this extra started check is trivial abd seems like a good sfatey net [07:43] wallyworld_: we should probably move that logic into the resolver (out of operation/runhook.go), and have it drop out of the resolver if !Started and waiting for config changed [07:43] maybe not now [07:44] the stuff in run hook commit? [07:44] wallyworld_: yes [07:44] yeah, makes sense to do that now i think, but not earlier with the old code [07:48] wallyworld_: actually even that doesn't make sense. if we resolved the config-changed, it'd still commit and go to Started [07:50] so maybe now that the update status trigger has been pulled into the main loop processing, the issue is mmot [07:50] whereas before it was a concurrency lottery [07:50] wallyworld_: right, yeah. I thought you were testing with your change [07:51] i just added a bunch of cards, but i should have retested after the first refactor [07:51] i'll retest and drop this branch probably [07:51] just goes to show how fragile the uniter was before [07:52] and how theis reworkhas fixed a bunch of stuff implciitly [07:58] wallyworld_: that particular bit was okay before maltese-falcon, it got broken during (I think) [07:58] that may well be true [07:58] axw: and i think those other cards about duplicate status may be bugs in status history (need to dig further). so the feature branch may well be almost ready [07:59] wallyworld_: cool. [07:59] i have soccer now but will continue testing after [09:26] cmars, axw: you know the worker/gate thing I did? [09:26] cmars, axw: I think that we want a sort of extension of the concept to describe what charmdir.worker really does [09:27] cmars, axw: because I think it really is just a custom synchronisation construct, the charmdir relationship is entirely incidental [09:31] cmars, axw: metaphorically something like Fortress sorta works -- clients can Visit(func() error), the person in charge can Unlock (unblock Visits) and Lockdown (stop accepting new Visits, wait for existing ones to complete) [09:31] cmars, axw: but that's mainly just because I'm thinking about gates [09:32] or anyone who's interested in naming problems :) ^^ [11:11] davecheney: still here? [11:24] dimitern: http://reviews.vapour.ws/r/2593/ [11:25] fwereade: ping [11:25] voidspace, cheers [11:39] Bug #1492232 opened: backup hogs resources. [12:09] Bug #1492237 opened: juju state server mongod uses too much disk space [12:09] Bug #1492241 opened: juju upgrade-juju cli doesn't provide clear feedback on action being taken [12:39] fwereade, hey, are you around? [12:39] dimitern, heyhey -- and voidspace, oops, sorry [12:40] fwereade, :) voidspace has a branch that I'd like you to have a look, if possible [12:40] dimitern, just saw; voidspace, looking forward to it :) [12:40] fwereade, hopefully rectifies some of the issues with unit addresses changing randomly [12:40] fwereade, awesome, thanks :) [12:42] fwereade, we were careful not to break api compatibility [12:46] mgz: could we get a CI run against master to clear that blocking bug? [13:08] fwereade: cool [13:08] fwereade: it touches the uniter which is why we pinged you particularly [13:08] fwereade: (touches it in a very minor way) [13:08] voidspace, phew :) [13:08] voidspace, what with maltese-falcon and all ;) [13:12] fwereade: yeah... [13:12] fwereade: hopefully a good touch not a bad touch... [13:14] mgz, dooferlad: ping [13:14] ericsnow: pong [13:14] dooferlad: could you kick off a CI run against master to clear the blocker bug? [13:15] ericsnow: I wouldn't know how... [13:15] dooferlad: ah [13:16] ericsnow, abentley, mgz, jog_, or sinzui are better people to ask :) [13:16] dimitern: duh, mixed up irc handled :) [13:17] abentley, mgz, jog_: could you kick off a CI run against master to clear the blocker bug? [13:29] ericsnow: looking... [13:30] abentley: thanks! [13:41] ericsnow: The lxc on our wily slave is unhappy. We're still in the middle of testing 1.25. I'm looking into fixing lxc. [13:42] abentley: k [13:42] abentley: any way we could get an exception for unblocking master? [13:42] abentley: I've verified locally that Windows builds and passes the test suite now [13:43] ericsnow: We need the lxc working before we can test master. [14:01] katco: having google issues [14:02] wwitzel3: there's no issues like google issues! [14:07] ericsnow: I've fixed the lxcs and queued master to be tested next. [14:08] abentley: thanks! [14:08] abentley: is it still about 2 hours to run? === benji is now known as Guest60778 [14:10] ericsnow: yes. [14:10] k [14:44] wwitzel3: we should circle up on my bug... I think I'm going to end up being out a lot today. [14:44] ok, now is good [14:51] fwereade: ping [14:52] wwitzel3, pong [14:52] fwereade: question about deploy.go and bug #1486553 [14:52] Bug #1486553: i/o timeout errors can cause non-atomic service deploys [14:52] wwitzel3, ah yes [14:53] fwereade: if you look at the DeployService code, is there a specific reason all of that AddService and UpdateConfig and AddUnits aren't in a single transaction? [14:54] fwereade: is there some chicken egg thing going on? Or is a possible fix to prevent the empty service being created just to perform all those in a single transaction? [14:54] fwereade: if that isn't possible, then my other thought was to wrap all of those so I could handle the error and properly cleanup previous transactions manually [14:54] wwitzel3, apart from sheer inertia, the trickiest bit of fixing that would be to unpick the machine assignnments [14:55] wwitzel3, I am generally a bit underwhelmed by "clean up the mess" approaches, because it's hard to guarantee that they get run [14:55] fwereade: yeah, I thought about that, but given that the placement to the unit is the last thing that happens, if that errors, I wouldn't have to actually worry about unpicking the assignments right? [14:55] wwitzel3, I *think* it goes add/assign/add/assign/add/assign etc [14:56] fwereade: hrmm, ok, so even if the AddUnitsWithPlacecode returns an error, it may hve done the add but not the assign? [14:56] fwereade: and that won't be cleaned up? [14:56] wwitzel3, yeah :( [14:56] fwereade: that's shitty [14:56] lol [14:57] wwitzel3, it has always been like that: my justification is that at least it's *possible* to clean it up manually, and there are only so many hours in the day [14:57] wwitzel3, however [14:57] fwereade: :) [14:58] wwitzel3, now that at last we've been told it's important to fix it, we can actually dive into Doing It Right [14:58] wwitzel3, which I *think* is not that hard [14:58] wwitzel3, because: [14:58] fwereade: take my hand, show me the way [14:58] wwitzel3, as you observe, add-service/set-config/add-unit go very nicely together [14:59] wwitzel3, there will be tweaks necessary -- eg set all the unit refcounts in the service doc at once [14:59] wwitzel3, and *create* service settings with X data instead of create-empty and set-later [15:00] wwitzel3, and I really don't think there's anything terribly *hard* there [15:01] wwitzel3, but, obviously, that leaves us with a bunch of unassigned units [15:02] wwitzel3, ...and *that* sounds to me like a candidate for a watcher/worker that just assigns unassigned units [15:02] wwitzel3, now this is clearly not a *small* bugfix [15:03] fwereade: right, I can probably put a good dent in it today though and hand it off (I'm out next week) [15:03] wwitzel3, but if we have traction on fixing it I think it would be worth while [15:03] wwitzel3, sweet [15:05] fwereade: so in the case where they used placement .. the worker/watcher would handle that too [15:05] wwitzel3, (placement directives might be a touch fiddly -- some (`--to 0`) you can just run (or reject) directly, but others will need to be stored somewhere and used by the assigner [15:05] fwereade: but then, there would be a delay right? [15:05] wwitzel3, yeah, there would, I think that's just the price we have to pay [15:06] fwereade: we would still have to validate placement as part of the operation though right? [15:06] fwereade: we wouldn't want the assigner to come back later and give the user a bad placement error [15:07] wwitzel3, I think it's equivalent to a provisioning error [15:07] wwitzel3, any pre-validation we can do, hell yes [15:08] wwitzel3, in fact I think that covers everything, right? [15:08] wwitzel3, we reject invalid ones on the way in, and we can ask the environ about them [15:08] fwereade: so it looks like .. add service w/ config, increment unit refs, validate and store placement directives .. run that transaction [15:09] wwitzel3, but they *might* induce provisioning errors on the associated machines later, just like any other machine [15:09] wwitzel3, yeah exactly [15:09] fwereade: assigner picks up job, attempts to use unassigned units, surface and error to the user like a provisioning error [15:09] wwitzel3, yeah [15:10] wwitzel3, in fact I think it is possible that assignment could fail there [15:10] wwitzel3, manual provider with unhelpful assignment policy? [15:11] wwitzel3, we don't have any way to retry machine provisioning *with different constraints/placement*, do we? [15:11] fwereade: right, so in the case of this bug, does this fix their issue though .. I'm not sure. If we fail at assignment, we still have a service? [15:12] fwereade: I guess this bug was caused by the adding of the service and the updating the config and units not being atomic .. this does solve that [15:13] fwereade: an assignment error would be something else and wouldn't be caused by a timeout to the API since the worker would be retrying in cases of timeout [15:14] wwitzel3, sorry justa sec [15:14] fwereade: np [15:19] wwitzel3, so, yes, I think that it's a different situation, even if I'm not 100% sure why [15:19] wwitzel3, failing to put my finger on what it is about the assignment logic that plays badly with transactions [15:21] wwitzel3, but I think if we (1) surface the errors and (2) think through how users'd want to address them, we will provide a much better experience there [15:21] fwereade: well, at the least, this is an improvement over the current implementation and it addresses the bug [15:21] wwitzel3, (the assignment stuff must be coming up to 3 years old now... memory is hazy) [15:21] wwitzel3, yeah [15:22] fwereade: thank you [15:22] wwitzel3, np :) === frankban_ is now known as frankban [16:10] who is ocr today?? === frankban_ is now known as frankban [16:35] wwitzel3, katco, ericsnow: sorry, my wife is not really getting much better, so I won't be getting anything done today. [16:35] natefinch: hope she starts feeling better soon :( [16:35] natefinch: hope she gets better soon! [16:36] thanks, hopefully it'll get better overnight [16:36] * natefinch stays on to be able to read scrollback === natefinch is now known as natefinch-afk [17:04] Bug #1492396 opened: Misleading error when agent-version doesn't match juju version on bootstrap [19:07] any one willing to rubber stamp a couple of fwports? [19:25] mmpdf, seem to be having flaky tests again [19:25] MachineWithCharmsSuite.TestManageEnvironRunsCharmRevisionUpdater <-- anyone seem that one ? [20:01] wwitzel3: you around? [20:01] natefinch-afk: yeah === natefinch-afk is now known as natefinch [20:01] wwitzel3: let's catch up