=== natefinch-afk is now known as natefinch [02:23] wallyworld: ping? [02:24] cherylj: hey give me a sec [02:24] np [02:26] cherylj: hey. btw do you have a link to the 2.0 todo list that has been srated for the sprint? [02:27] wallyworld: I don't know that one has been started yet [02:27] ah, ok, np. next week then [02:28] wallyworld: I know that when you guys were working the bootstack upgrade issues, you came across the problem in this bug: https://bugs.launchpad.net/juju-core/+bug/1459033 [02:28] Bug #1459033: Invalid binary version, version "1.23.3--amd64" or "1.23.3--armhf" [02:28] yes [02:28] but I cannot find / remember what the cause of that was determined to be [02:28] they had bad data in their tools metadata collection in state [02:29] at some point, juju i think must have returned "" for an unknown series lookup [02:29] and so when tools were being imported, wily or xenial tools for that bad version [02:29] did you have to manually recover the db? [02:29] yeah, i went in and deleted the tools metadata [02:30] this causes juju to go out to streams.canonical.com [02:30] to fetch tools instead of using any cached values [02:30] and the newly downloaded tools are then cached again [02:30] okay, cool. Thanks! [02:31] but hard to reporduce [02:31] i don't know how old that "" series issue is [02:31] I can take a look [02:32] there's a lot of moving parts [02:32] may be hard to pin down a definitive "this is it" release [02:32] it should be an upgrade check [02:32] the upgrader checks for bad metadata and deletes it [02:50] axw: here's the pr http://reviews.vapour.ws/r/3317/ [02:50] wallyworld: ta, looking [03:07] wallyworld: reviewed [03:40] axw: good pickups on the other issues; i've explained one point, hopefully it makes sense [03:40] wallyworld: ok, looking again [03:41] wallyworld: alternatively just "ServiceOffers" if URL is the canonical identifier [03:41] wallyworld: your call tho [03:42] that might work [03:42] shit , just saw a typo [03:42] formatOfferedServiceDetailss [03:42] will fix that [03:58] wallyworld: responded [04:00] looking [04:01] wallyworld: going to go have lunch and start packing, will check back in a little while [04:01] ok, np, i'll push changes [05:04] wallyworld: under what circumstances? re "even if the query(filter) succeeds and the Error above is nil, converting the data from a particular query result item may have an error." [05:05] wallyworld: just wondering if it's actually worthwhile departing from the conventional all-or-nothing per result [05:05] axw: it gets the query result and then does things like look up the service and/or charm details (can't recall exactly) [05:05] so that op per record could fail [05:06] wallyworld_: does it make sense to include that in the result at all then? what're you going to do with that? you didn't specifically ask for an item, you just said "give me all the things that this filter matches" [05:07] the doc in offered services collection does match. but composing the result errors [05:07] we could ignore such errors i guess [05:07] wallyworld_: yep... why would the user care about that? [05:07] and pretent the item doesn't exist [05:08] i'll rework it [05:08] wallyworld_: it's just not clear to me how the user can action on that error [05:08] wallyworld_: it's not due to an error in input, it's a server-side error [05:09] wallyworld_: if you like, defer to a follow up [05:09] as person making an offer, i would want to know if one of my offers went bad somehow [05:10] let's land now and iterate next week? there's a fair bit of other cruft to fix also [05:10] this is a good start though [05:10] wallyworld_: then I tink we need to move the error inside the details, rather than outside the details. that would be more like the errors we have in machine status, I think [05:10] wallyworld_: sure [05:10] hmm, ok, i could move inside [05:11] i'll land as is and we can think a bit. i need to go pack etc [05:11] wallyworld_: do it later, I'll take a once over now [05:11] ok [05:15] wallyworld_: couple small fixes please, then shipit [05:20] axw: thanks. my eyes get sore looking at all those params structs. they all start to merge into a big mess [05:20] wallyworld: :) === ashipika1 is now known as ashipika [08:29] Heya, old team. Next week in OAK/SFO? [10:04] dimitern: standup? [10:04] voidspace, omw - having some HO issues [10:24] Bug #1522484 changed: state package tests no longer run since PR #3806 [10:27] Bug #1522484 opened: state package tests no longer run since PR #3806 [10:30] Bug #1522484 changed: state package tests no longer run since PR #3806 [10:53] dimitern, axw, rogpeppe: you have all contributed to this logic, so you may have insight: ISTM that the broken/closed logic in api/apiclient.go is really rather likely to deadlock on Close(); which would match a bug I've seen; any comments/recollections? [10:55] fwereade, I don't recall much around that code, i have to do a refresher [10:55] dimitern, axw, rogpeppe2: and it STM that http://paste.ubuntu.com/13665970/ would fix it [10:55] fwereade, that looks like it should have been like this to begin with :) [10:56] dimitern, it's just about the interaction of 2 channels (s.closed/s.broken) and how the heartbeatMonitor goroutine doesn't always close broken; but Close always waits for broken to be closed [10:56] fwereade: looking [10:57] fwereade: what's the difference between those two pieces of code? [10:57] fwereade: i ca't see that using defer makes a difference [10:57] rogpeppe2, ...goddammit [10:57] rogpeppe2, this is why it's a good idea to talk to you about these things ;) [10:57] fwereade: :) [10:58] rogpeppe2, hadn't picked up that we always ping until we fail [10:58] * fwereade wonders if a ping could hang somehow... [11:09] uiteam: support removing multiple entities at once: https://github.com/CanonicalLtd/charmstore-client/pull/150 [11:35] voidspace, ping [11:40] dooferlad, your python observations on add-juju-bridge.py. I think I'm going to land on master as-is because I'm going to work on the multiple bridge / multiple NICs straight away and I can a) fix them there and b) don't want to invalidate any of the manual testing. Ok? [11:41] frobware: sure [11:48] frobware, voidspace, I found out why David is having that issue - I'm about to propose a PR that fixes it by allowing Subnets() to be called without an instanceId (returning all subnets) [11:55] dimitern, great & thanks. [11:57] dimitern: pong [11:58] dimitern: without an instanceId... interesting [11:58] voidspace, yes - like "gimme all subnets there are [11:58] dimitern: sure, what will use that? [11:58] voidspace, with the new api that's easy, and in fact already implemented [11:58] voidspace, in Spaces() [11:58] dimitern: for maas, yes [11:59] voidspace, yes, only for maas and until we have import in place [11:59] voidspace, this will allow David & et al to try spaces in constraints [11:59] dimitern: ok, cool [12:24] uiteam: now updated to allow a -n flag: https://github.com/CanonicalLtd/charmstore-client/pull/150 [12:25] * rogpeppe2 lunches [12:56] dimitern: frobware: dooferlad: wife ill in bed, I'm looking after the boy [12:56] hopefully she'll be rested and up soon [12:56] voidspace, ack [12:57] voidspace, sure - speedy recovery! [12:58] voidspace: hope things improve soon :-( [13:23] dimitern, dooferlad, voidspace: PTAL @ http://reviews.vapour.ws/r/3319/ [13:25] frobware, looking [13:27] frobware, LGTM [13:33] dimitern, once that lands on master I plan to rebase maas-spaces as I can do the multi nic / multi bridge with that change in place [13:34] frobware, great [13:34] * dimitern steps out for ~1h [13:34] * frobware lunches [13:56] rogpeppe2, re that heartbeatMonitor [13:56] fwereade: yes? [13:57] rogpeppe2, the only reason I can see for it to block on Close is if Ping somehow hangs; and while I can't explain exactly why that would happen, it doesn't seem *intrinsically* implausible if the state server is misbehaving somehow [13:58] rogpeppe2, (I'm mainly asking you because I think you wrote v1? anyway) === rogpeppe2 is now known as rogpeppe [13:58] fwereade: yeah, i'm responsible for the design of most of that code, i think [13:58] fwereade: have you got a reproducible test case? [13:58] rogpeppe, anything obviously insane about timing the ping out and stopping? AFAICS none of the clients need to block until it's *actually* stoppped? [13:58] rogpeppe, sadly not [13:59] rogpeppe, I am experimenting with messing around with connectioons and it's been flawless for me so far [13:59] fwereade: is there a bug report? [14:00] rogpeppe, but there's a bug open -- and that includes a panicking state server somewhere -- yeah, https://bugs.launchpad.net/juju-core/+bug/1522544 [14:00] Bug #1522544: workers restart endlessly [14:00] fwereade: how would a hung-up heartbeatMonitor cause endless restarts? [14:00] fwereade: i'd've thought it would have the opposite effect [14:01] fwereade: i.e. it *can't* restart when needed [14:02] rogpeppe, the worker that wraps it never finishes and is never restarted; so the conn resource doesn't get replaced, and everybody keeps gamely trying to connect with the old one [14:02] rogpeppe, after all, it might just have been some transient error ;) [14:02] fwereade: hmm [14:02] rogpeppe, (but ofc they all just start up, try to do something, fall down) [14:02] rogpeppe, anyway [14:04] fwereade: before putting a timeout on the ping, i would make sure that that actually is happening [14:04] fwereade: for example by getting a *full* stack trace of all goroutines when this is happening [14:04] fwereade: if the connection to the state machine has gone, the Ping *should* exit [14:04] rogpeppe, yeah, I'm not worried about that [14:05] fwereade: if it doesn't then it may be a bug in the rpc package [14:06] rogpeppe, I'm worried about the worst case of what a confused state server might induce I guess [14:07] rogpeppe, ehh, just a thought -- the interesting thing I suppose is that the same failure could have happened silently before, I suspect, it'd just manifest as an agent blocked for some reason and when there's no clear cause it's all too easy to bounce it and move on [14:09] fwereade: all the client request *should* return regardless of the server state [14:09] fwereade: because we close the connection [14:09] rogpeppe, do you recall the rationale for waiting for a failed ping, rather than just exiting on close? I don't think a successful ping-failure implies anything useful about whether any other clients are using the conn [14:10] fwereade: and if the connection's closed the response reader should close [14:10] fwereade: no, i think it would be just fine to return after reading on s.closed [14:11] fwereade: i don't think that'll make any difference though [14:12] fwereade: because sending an API request after the client is closed will immediately return an error without actually doing anything [14:12] fwereade: actually i do see at least one rationale [14:12] fwereade: which is that State.Close waits on s.broken before returning [14:13] rogpeppe, I'm thinking of scenarios like "a panicking apiserver has somehow deadlocked itself mid-rpc" [14:13] fwereade: thus ensuring that the heartbeat monitor is cleaned up [14:13] fwereade: even then, it should cause a client to hang up [14:13] fwereade: i mean, feel free to experiment [14:14] fwereade: but i think that if you try making an rpc server that never replies on any request, you should still be able to close the client ok [14:14] fwereade: (if not it's a bug) [14:16] rogpeppe, yeah, I think you're right [14:16] rogpeppe, cheers :) [14:17] fwereade: np [14:42] frobware: you rebased yet? [14:51] Bug #1522861 opened: Panic in ClenupOldMetrics [14:55] voidspace, just about to. make check works ok on maas-spaces. [14:57] voidspace, I forget (doh!) what we agreed for rebasing. Just push, or PR and push? [14:57] dimitern, ^^ [14:57] frobware: I'd say just push [14:58] frobware: you can't merge the PR anyway, so no point (IMO) [14:58] voidspace, exactly. [14:58] voidspace, couldn't remember what I did last time... [14:58] you PR'd then pushed [15:02] voidspace, for the record, diff against master looks like: http://pastebin.ubuntu.com/13669884/ [15:02] frobware, +1 for just push [15:06] dimitern, dooferlad, voidspace: rebased. http://pastebin.ubuntu.com/13669966/ [15:07] frobware: cool, thanks! [15:19] frobware, sweet! [15:44] mgz: you around? [15:45] natefinch: yo [15:47] mgz: something wonky with my feature branch here: http://juju-ci.vapour.ws:8080/job/github-merge-juju-charm/20/console [15:47] + /var/lib/jenkins/juju-ci-tools/git_gate.py --project gopkg.in/juju/charm.nate-minver [15:47] should be --project gopkg.in/juju/charm.v6-unstable [15:48] probably the first time we've ever tried a feature branch on a gopkg.in library [15:49] mgz: not really sure how we're supposed to make that work. I can understand what the CI code is doing... I don't know how to tell it "pretend this is charm.v6-unstable" [15:50] yeah, I don't think gopkg.in and feature branches are compatible [15:51] well, they are, it's just via the go mechanism of rewriting all your imports [15:51] you can't name a branch something other than v6-unstable then import it as that via gopkg.in [15:58] well, I mean, it works fine using godeps from juju/juju.... because godeps just sets the right commit... but yeah, I guess there's no real way to do that solely from the gopkg.in branch directly [15:58] like, you can do go get gopkg.in/juju/charm.v6-unstable and then git checkout minver and it'll work fine [15:59] er nate-minver [16:00] you can get git to give you any rev it can find in the repo [16:00] right [16:00] that doesn't mean it's in any relevent history [16:01] I think we need feature branch support in CI. Otherwise we get into the case where a change to one of these versioned branches breaks juju/juju ... just like that email I sent a week ago or so [16:02] we can't really hack around the way gopkg.in works [16:02] ci on github.com/ stuff is fine [16:02] but we'd need to actually sed imports to make gopokg.in work I think, and that's... not very productive [16:03] sure you can. I just said how: You do go get gopkg.in/juju/charm.v6-unstable and then git checkout nate-minver [16:03] (and then godeps dependencies.tsv)_ [16:03] that's how I do development on my feature branch [16:04] let me double check that that actually works from scratch [16:05] yep, totally works [16:05] hm, the simple case works with just mv on the dir yeah [16:06] yeah, you just need the code from the branch to be in the directory go expects [16:07] so the current CI code could work if we added a simple mv statement... though again, it has to understand what the code "expects" to be called, which now has to live outside the branch name. [16:11] natefinch: can we just naming scheme it? [16:11] how many dots is too many dots? [16:12] mgz: whatever is easiest for you guys. *I* don't care what the name of my branch is [16:12] mgz: I'm pretty sure that gopkg.in only cares about that first dot [16:12] mgz: we could do gopkg.in/juju/charm.v6-unstable.nate-minver [16:13] lol almost semantic versioning at that point [16:13] charm.v6-unstable.featurename seems somewhat workabl... right [16:58] mgz: would you like me to poke at the CI code to get something like that working? I need it to get my min juju version stuff tested and landed [16:59] lp:juju-ci-tools git_gate.py I'd add a flag to do magic dot handling [17:00] good lord bzr is slow [17:00] mgz: thanks, I'll look at it [17:00] you should just be able to mangle the directory variable in go_test() [17:03] mgz: can you rename my branch? [17:04] natefinch: I can [17:08] lol, writing python just broke sublime [17:08] bbiab === natefinch is now known as natefinch-afk [17:41] frobware, voidspace, dooferlad, any of you guys still around? [17:44] dimitern, yep [17:44] frobware, before I go today I need a review on the PR I'm proposing to unblock David [17:45] frobware, doing a final live test on maas 1.9 now and I'll publish it [17:45] dimitern, OK [17:45] dimitern, it's on the maas-spaces branch so we're pretty contained [17:45] frobware, yep [17:49] dimitern, the alternative is to just email a patch to Davi if you're unsure that you want to land it. [17:49] frobware, oh boy :/ I've just discovered it won't work because of yet another maas bug [17:50] frobware, http://paste.ubuntu.com/13673162/ [17:50] frobware, well, I can still propose it and land it, as it's useful but it won't unblock David until this is resolved [17:56] dimitern, why does that fail? [17:56] frobware, no idea - looking at the maas logs now [17:58] frobware, PR: http://reviews.vapour.ws/r/3322/ === natefinch-afk is now known as natefinch [17:58] frobware, I don't know why it includes already merged commits though :/ [17:59] my changes are only in provider/maas/environ.go and environ_whitebox_test.go [17:59] dimitern, is that because of my rebase? [17:59] frobware, well, I did a rebase of my origin/maas-spaces onto upstream/maas-spaces [18:00] frobware, and then rebased the last 2 commits on top of that [18:00] anyway, I need to start packing.. [18:00] dimitern, your commit touches a lot of files [18:01] dimitern, that's the single patch for David? [18:01] frobware, most of the changed files come from voidspace's ProviderId for subnets/spaces [18:01] dimitern, are you merging from his branch? [18:01] frobware, I guess I'll redo it cleanly starting from fresh upstream/maas-spaces [18:02] frobware, no, I was rebasing.. well I did something wrong obviously [18:03] dimitern: I'm here, sort of [18:03] dimitern, the review is split over two pages for me [18:04] voidspace, false alarm it seems :) [18:04] frobware, yeah, I'll redo it cleanly, but not now [18:04] dimitern: how come that PR has my "already landed" changes in it [18:05] voidspace, a git late-friday-evening mystery :) [18:05] dimitern: heh