[00:24] <davecheney> mwhudson: testing now, thanks
[00:25] <davecheney> this stuff is a bit subtle
[00:26] <mwhudson> davecheney: as has been noted before the go linker is terrible
[00:27] <mwhudson> davecheney: i know what i was doing wrong though
[00:27] <mwhudson> basically the value of goarm was being appended to the value from the runtime.a file
[00:27] <mwhudson> so it was \x00\x07
[00:27] <mwhudson> and it's only a uint8 so the runtime was just seeing 0
[00:31] <ericsnow> wallyworld_: I've landed that fix to unblock master (thanks for the review)
[00:32] <wallyworld_> np, thanks for fixing :-)
[00:32] <ericsnow> wallyworld_: CI is supposed to auto-unblock once CI passes, right?
[00:32] <wallyworld_> yes
[00:32]  * ericsnow crosses fingers
[00:32] <ericsnow> g'night
[00:45] <davecheney> ericsnow: thanks for fixing that
[01:59] <stokachu> ran into an interesting bug: https://bugs.launchpad.net/juju-core/+bug/1492088
[01:59] <mup> Bug #1492088: juju bootstrap fails inside a wily container <cloud-installer> <juju-core:New> <https://launchpad.net/bugs/1492088>
[01:59] <stokachu> anyone seen this before?
[02:04] <mup> Bug #1492088 opened: juju bootstrap fails inside a wily container <cloud-installer> <juju-core:New> <https://launchpad.net/bugs/1492088>
[02:11] <mwhudson> davecheney: say, do you know off hand the difference between GOARM=5 and GOARM=6?
[02:12] <mwhudson> oh soft float seems like a major suspect
[02:13] <davecheney> 12:10 < dfc> thumper: http://reviews.vapour.ws/r/2588/
[02:13] <davecheney> 12:10 < dfc> remove version.Binary.OS
[02:13] <davecheney> mwhudson: yes, and no atomics
[02:13] <davecheney> no STREX/LDREX
[02:16] <mwhudson> i think this is soft float
[02:17] <mwhudson> davecheney: if you enable the -shared flag for arm things go very wrong in hashmap code
[02:17] <mwhudson> and there is floating point code that could explain it
[02:21] <davecheney> yes, the hash factor
[02:21] <davecheney> or fill factor or something
[02:21] <davecheney> from memory hashmap.go:mapinit
[02:26] <perrito666> cmd/juju/status_test.go is a clear form of modern torture
[02:27] <wallyworld_> yes it is :-(
[02:28]  * perrito666 uses the time between testruns to learn emacs
[02:29]  * perrito666 wonders if he can set kb layouts only for a given app
[02:29] <mwhudson> davecheney: yeah, luckily the softfloat is so broken that it doesn't get 10.0*100.0 right...
[02:29] <davecheney> perrito666: i'm working on the great american novel while running tests on ppc64
[02:29] <perrito666> heh
[02:30] <perrito666> oh finally, success
[02:31] <perrito666> I changed one value in formatted status
[02:31] <perrito666> git diff status_test.go | grep "^\+" | wc -l
[02:31] <perrito666> 105
[02:35] <mwhudson> ARGH
[02:52] <davecheney> sinzui, I thought we had a voting race builder now ?
[02:53] <davecheney> https://bugs.launchpad.net/juju-core/+bug/1492095
[02:53] <davecheney> mwhudson: soft float only works for values of 1 with extremely large expononents
[02:53] <mup> Bug #1492095: worker/statushistorypruner: data race <juju-core:New> <https://launchpad.net/bugs/1492095>
[02:53] <mwhudson> haha i fixed it
[02:54] <mwhudson> not sure that was a sensible use of my time, but it took waaaay less than figuring out what was going on
[03:02] <mup> Bug #1492095 opened: worker/statushistorypruner: data race <juju-core:New> <https://launchpad.net/bugs/1492095>
[03:03] <davecheney> thumper: thanks for your comments, please see my reply
[03:04] <thumper> davecheney: I agree with your summation
[03:04] <thumper> davecheney: all within time
[03:04] <perrito666> davecheney: thanks for finding the race (that was me, most likely)
[03:04] <davecheney> perrito666: np
[03:05] <davecheney> thumper: it's a valid operation
[03:05] <davecheney> but I think it deserves to be broken out into its own logic
[03:05] <davecheney> possibly in an upcoming juju/series package ?
[03:26] <thumper> perhaps
[05:03] <davecheney> mwhudson: congrats on your +2
[05:03] <davecheney> proving the motto of the Go team: "you get commit rights when we get sick of comitting your stuff"
[05:31] <mwhudson> davecheney: thanks
[05:31] <mwhudson> and, yeah
[05:33] <davecheney> we're classy like that
[07:09] <wallyworld_> axw: a small one if you have a moment http://reviews.vapour.ws/r/2590/
[07:29] <axw> wallyworld_: looking
[07:29] <wallyworld_> ta
[07:30] <axw> wallyworld_: filesystem CLI is delayed, the existing volume stuff needs cleaning up first. I created another card on the board
[07:30] <wallyworld_> np
[07:38] <axw> wallyworld_: I don't really understand why this branch is required at all. when would we ever get to the end of the resolver and have Started==false?
[07:39] <wallyworld_> axw: during my testing i saw the update-status hook fire before the start hook had run (after install hook i think, before leader-elected)
[07:39] <wallyworld_> so it's in response to observed behaviour
[07:40] <wallyworld_> hmmm
[07:40] <axw> wallyworld_: ah hm, apparently Start isn't run until after the first ConfigChanged
[07:40] <wallyworld_> maye the refactoring got eid of the prolem
[07:41] <axw> wallyworld_: so if the update status trigger comes in before config changed, this could happen I think
[07:41] <wallyworld_> config changed may cone first yeah
[07:41] <wallyworld_> yeah
[07:41] <axw> wallyworld_: but no... we always wait for the first config changed
[07:41] <wallyworld_> oh, ok, i forgot that
[07:42] <axw> wallyworld_: although I think it could still happen in the case of a failed/resolved config-changed
[07:43] <wallyworld_> could do. adding this extra started check is trivial abd seems like a good sfatey net
[07:43] <axw> wallyworld_: we should probably move that logic into the resolver (out of operation/runhook.go), and have it drop out of the resolver if !Started and waiting for config changed
[07:43] <axw> maybe not now
[07:44] <wallyworld_> the stuff in run hook commit?
[07:44] <axw> wallyworld_: yes
[07:44] <wallyworld_> yeah, makes sense to do that now i think, but not earlier with the old code
[07:48] <axw> wallyworld_: actually even that doesn't make sense. if we resolved the config-changed, it'd still commit and go to Started
[07:50] <wallyworld_> so maybe now that the update status trigger has been pulled into the main loop processing, the issue is mmot
[07:50] <wallyworld_> whereas before it was a concurrency lottery
[07:50] <axw> wallyworld_: right, yeah. I thought you were testing with your change
[07:51] <wallyworld_> i just added a bunch of cards, but i should have retested after the first refactor
[07:51] <wallyworld_> i'll retest and drop this branch probably
[07:51] <wallyworld_> just goes to show how fragile the uniter was before
[07:52] <wallyworld_> and how theis reworkhas fixed a bunch of stuff implciitly
[07:58] <axw> wallyworld_: that particular bit was okay before maltese-falcon, it got broken during (I think)
[07:58] <wallyworld_> that may well be true
[07:58] <wallyworld_> axw: and i think those other cards about duplicate status may be bugs in status history (need to dig further). so the feature branch may well be almost ready
[07:59] <axw> wallyworld_: cool.
[07:59] <wallyworld_> i have soccer now but will continue testing after
[09:26] <fwereade> cmars, axw: you know the worker/gate thing I did?
[09:26] <fwereade> cmars, axw: I think that we want a sort of extension of the concept to describe what charmdir.worker really does
[09:27] <fwereade> cmars, axw: because I think it really is just a custom synchronisation construct, the charmdir relationship is entirely incidental
[09:31] <fwereade> cmars, axw: metaphorically something like Fortress sorta works -- clients can Visit(func() error), the person in charge can Unlock (unblock Visits) and Lockdown (stop accepting new Visits, wait for existing ones to complete)
[09:31] <fwereade> cmars, axw: but that's mainly just because I'm thinking about gates
[09:32] <fwereade> or anyone who's interested in naming problems :) ^^
[11:11] <perrito666> davecheney: still here?
[11:24] <voidspace> dimitern: http://reviews.vapour.ws/r/2593/
[11:25] <voidspace> fwereade: ping
[11:25] <dimitern> voidspace, cheers
[11:39] <mup> Bug #1492232 opened: backup hogs resources. <juju-core:New> <https://launchpad.net/bugs/1492232>
[12:09] <mup> Bug #1492237 opened: juju state server mongod uses too much disk space <canonical-bootstack> <juju-core:New> <https://launchpad.net/bugs/1492237>
[12:09] <mup> Bug #1492241 opened: juju upgrade-juju cli doesn't provide clear feedback on action being taken <canonical-bootstack> <juju-core:New> <https://launchpad.net/bugs/1492241>
[12:39] <dimitern> fwereade, hey, are you around?
[12:39] <fwereade> dimitern, heyhey -- and voidspace, oops, sorry
[12:40] <dimitern> fwereade, :) voidspace has a branch that I'd like you to have a look, if possible
[12:40] <fwereade> dimitern, just saw; voidspace, looking forward to it :)
[12:40] <dimitern> fwereade, hopefully rectifies some of the issues with unit addresses changing randomly
[12:40] <dimitern> fwereade, awesome, thanks :)
[12:42] <dimitern> fwereade, we were careful not to break api compatibility
[12:46] <ericsnow> mgz: could we get a CI run against master to clear that blocking bug?
[13:08] <voidspace> fwereade: cool
[13:08] <voidspace> fwereade: it touches the uniter which is why we pinged you particularly
[13:08] <voidspace> fwereade: (touches it in a very minor way)
[13:08] <fwereade> voidspace, phew :)
[13:08] <fwereade> voidspace, what with maltese-falcon and all ;)
[13:12] <voidspace> fwereade: yeah...
[13:12] <voidspace> fwereade: hopefully a good touch not a bad touch...
[13:14] <ericsnow> mgz, dooferlad: ping
[13:14] <dooferlad> ericsnow: pong
[13:14] <ericsnow> dooferlad: could you kick off a CI run against master to clear the blocker bug?
[13:15] <dooferlad> ericsnow: I wouldn't know how...
[13:15] <ericsnow> dooferlad: ah
[13:16] <dimitern> ericsnow, abentley, mgz, jog_, or sinzui are better people to ask :)
[13:16] <ericsnow> dimitern: duh, mixed up irc handled :)
[13:17] <ericsnow> abentley, mgz, jog_: could you kick off a CI run against master to clear the blocker bug?
[13:29] <abentley> ericsnow: looking...
[13:30] <ericsnow> abentley: thanks!
[13:41] <abentley> ericsnow: The lxc on our wily slave is unhappy.  We're still in the middle of testing 1.25.  I'm looking into fixing lxc.
[13:42] <ericsnow> abentley: k
[13:42] <ericsnow> abentley: any way we could get an exception for unblocking master?
[13:42] <ericsnow> abentley: I've verified locally that Windows builds and passes the test suite now
[13:43] <abentley> ericsnow: We need the lxc working before we can test master.
[14:01] <wwitzel3> katco: having google issues
[14:02] <katco> wwitzel3: there's no issues like google issues!
[14:07] <abentley> ericsnow: I've fixed the lxcs and queued master to be tested next.
[14:08] <ericsnow> abentley: thanks!
[14:08] <ericsnow> abentley: is it still about 2 hours to run?
[14:10] <abentley> ericsnow: yes.
[14:10] <ericsnow> k
[14:44] <natefinch> wwitzel3: we should circle up on my bug... I think I'm going to end up being out a lot today.
[14:44] <wwitzel3> ok, now is good
[14:51] <wwitzel3> fwereade: ping
[14:52] <fwereade> wwitzel3, pong
[14:52] <wwitzel3> fwereade: question about deploy.go and bug #1486553
[14:52] <mup> Bug #1486553: i/o timeout errors can cause non-atomic service deploys <cisco> <landscape> <juju-core:Triaged> <juju-core 1.25:In Progress by natefinch> <https://launchpad.net/bugs/1486553>
[14:52] <fwereade> wwitzel3, ah yes
[14:53] <wwitzel3> fwereade: if you look at the DeployService code, is there a specific reason all of that AddService and UpdateConfig and AddUnits aren't in a single transaction?
[14:54] <wwitzel3> fwereade: is there some chicken egg thing going on? Or is a possible fix to prevent the empty service being created just to perform all those in a single transaction?
[14:54] <wwitzel3> fwereade: if that isn't possible, then my other thought was to wrap all of those so I could handle the error and properly cleanup previous transactions manually
[14:54] <fwereade> wwitzel3, apart from sheer inertia, the trickiest bit of fixing that would be to unpick the machine assignnments
[14:55] <fwereade> wwitzel3, I am generally a bit underwhelmed by "clean up the mess" approaches, because it's hard to guarantee that they get run
[14:55] <wwitzel3> fwereade: yeah, I thought about that, but given that the placement to the unit is the last thing that happens, if that errors, I wouldn't have to actually worry about unpicking the assignments right?
[14:55] <fwereade> wwitzel3, I *think* it goes add/assign/add/assign/add/assign etc
[14:56] <wwitzel3> fwereade: hrmm, ok, so even if the AddUnitsWithPlacecode returns an error, it may hve done the add but not the assign?
[14:56] <wwitzel3> fwereade: and that won't be cleaned up?
[14:56] <fwereade> wwitzel3, yeah :(
[14:56] <wwitzel3> fwereade: that's shitty
[14:56] <wwitzel3> lol
[14:57] <fwereade> wwitzel3, it has always been like that: my justification is that at least it's *possible* to clean it up manually, and there are only so many hours in the day
[14:57] <fwereade> wwitzel3, however
[14:57] <wwitzel3> fwereade: :)
[14:58] <fwereade> wwitzel3, now that at last we've been told it's important to fix it, we can actually dive into Doing It Right
[14:58] <fwereade> wwitzel3, which I *think* is not that hard
[14:58] <fwereade> wwitzel3, because:
[14:58] <wwitzel3> fwereade: take my hand, show me the way
[14:58] <fwereade> wwitzel3, as you observe, add-service/set-config/add-unit go very nicely together
[14:59] <fwereade> wwitzel3, there will be tweaks necessary -- eg set all the unit refcounts in the service doc at once
[14:59] <fwereade> wwitzel3, and *create* service settings with X data instead of create-empty and set-later
[15:00] <fwereade> wwitzel3, and I really don't think there's anything terribly *hard* there
[15:01] <fwereade> wwitzel3, but, obviously, that leaves us with a bunch of unassigned units
[15:02] <fwereade> wwitzel3, ...and *that* sounds to me like a candidate for a watcher/worker that just assigns unassigned units
[15:02] <fwereade> wwitzel3, now this is clearly not a *small* bugfix
[15:03] <wwitzel3> fwereade: right, I can probably put a good dent in it today though and hand it off (I'm out next week)
[15:03] <fwereade> wwitzel3, but if we have traction on fixing it I think it would be worth while
[15:03] <fwereade> wwitzel3, sweet
[15:05] <wwitzel3> fwereade: so in the case where they used placement .. the worker/watcher would handle that too
[15:05] <fwereade> wwitzel3, (placement directives might be a touch fiddly -- some (`--to 0`) you can just run (or reject) directly, but others will need to be stored somewhere and used by the assigner
[15:05] <wwitzel3> fwereade: but then, there would be a delay right?
[15:05] <fwereade> wwitzel3, yeah, there would, I think that's just the price we have to pay
[15:06] <wwitzel3> fwereade: we would still have to validate placement as part of the operation though right?
[15:06] <wwitzel3> fwereade: we wouldn't want the assigner to come back later and give the user a bad placement error
[15:07] <fwereade> wwitzel3, I think it's equivalent to a provisioning error
[15:07] <fwereade> wwitzel3, any pre-validation we can do, hell yes
[15:08] <fwereade> wwitzel3, in fact I think that covers everything, right?
[15:08] <fwereade> wwitzel3, we reject invalid ones on the way in, and we can ask the environ about them
[15:08] <wwitzel3> fwereade: so it looks like .. add service w/ config, increment unit refs, validate and store placement directives .. run that transaction
[15:09] <fwereade> wwitzel3, but they *might* induce provisioning errors on the associated machines later, just like any other machine
[15:09] <fwereade> wwitzel3, yeah exactly
[15:09] <wwitzel3> fwereade: assigner picks up job, attempts to use unassigned units, surface and error to the user like a provisioning error
[15:09] <fwereade> wwitzel3, yeah
[15:10] <fwereade> wwitzel3, in fact I think it is possible that assignment could fail there
[15:10] <fwereade> wwitzel3, manual provider with unhelpful assignment policy?
[15:11] <fwereade> wwitzel3, we don't have any way to retry machine provisioning *with different constraints/placement*, do we?
[15:11] <wwitzel3> fwereade: right, so in the case of this bug, does this fix their issue though .. I'm not sure. If we fail at assignment, we still have a service?
[15:12] <wwitzel3> fwereade: I guess this bug was caused by the adding of the service and the updating the config and units not being atomic .. this does solve that
[15:13] <wwitzel3> fwereade: an assignment error would be something else and wouldn't be caused by a timeout to the API since the worker would be retrying in cases of timeout
[15:14] <fwereade> wwitzel3, sorry justa sec
[15:14] <wwitzel3> fwereade: np
[15:19] <fwereade> wwitzel3, so, yes, I think that it's a different situation, even if I'm not 100% sure why
[15:19] <fwereade> wwitzel3, failing to put my finger on what it is about the assignment logic that plays badly with transactions
[15:21] <fwereade> wwitzel3, but I think if we (1) surface the errors and (2) think through how users'd want to address them, we will provide a much better experience there
[15:21] <wwitzel3> fwereade: well, at the least, this is an improvement over the current implementation and it addresses the bug
[15:21] <fwereade> wwitzel3, (the assignment stuff must be coming up to 3 years old now... memory is hazy)
[15:21] <fwereade> wwitzel3, yeah
[15:22] <wwitzel3> fwereade: thank you
[15:22] <fwereade> wwitzel3, np :)
[16:10] <perrito666> who is ocr today??
[16:35] <natefinch> wwitzel3, katco, ericsnow: sorry, my wife is not really getting much better, so I won't be getting anything done today.
[16:35] <katco> natefinch: hope she starts feeling better soon :(
[16:35] <ericsnow> natefinch: hope she gets better soon!
[16:36] <natefinch> thanks, hopefully it'll get better overnight
[16:36]  * natefinch stays on to be able to read scrollback
[17:04] <mup> Bug #1492396 opened: Misleading error when agent-version doesn't match juju version on bootstrap <bootstrap> <ci> <juju-core:Triaged> <https://launchpad.net/bugs/1492396>
[19:07] <perrito666> any one willing to rubber stamp a couple of fwports?
[19:25] <perrito666> mmpdf, seem to be having flaky tests again
[19:25] <perrito666> MachineWithCharmsSuite.TestManageEnvironRunsCharmRevisionUpdater <-- anyone seem that one ?
[20:01] <natefinch-afk> wwitzel3: you around?
[20:01] <wwitzel3> natefinch-afk: yeah
[20:01] <natefinch> wwitzel3: let's catch up