[00:03] wallyworld: can you cast your eyes over this when you have a moment: http://reviews.vapour.ws/r/1955 (ignoring unknown tools) [00:03] already looking [00:05] thanks :) [00:16] davecheney: got a minute? [00:17] davecheney: nm... [00:17] thumper: free now [00:17] wallyworld: let me try something first... [00:18] ok [00:18] davecheney: do you know why this happens? http://paste.ubuntu.com/11733285/ [00:19] (trivial is a GOARCH=arm binary) [00:39] thumper: whats up [00:39] davecheney: I was going to talk to you about dumping the logs on timeouts, but I have another approach right now [00:39] * thumper is deep in it [00:39] mwhudson: first guess, alignment [00:40] hmm, not that, that's aligned properly [00:40] mm [00:41] yeah, no functions that have the same alignment get disassembled ok [00:41] bah grammer [00:41] it seems everything before main.main gets disassembled ok [00:41] maybe thumb/arm confusion? [00:42] http://paste.ubuntu.com/11733349/ <- i wonder what that ... is hiding [00:46] * mwhudson reads binutils, is not enlightened [00:47] that never helps [00:48] FFFAAAAARRRRRKKKKKK [00:48] (╯°□°)╯︵ ┻━┻ [00:48] ?? [00:48] I think I have found the cause of the deadlock test [00:48] and it harks back to a worker not created properly [00:48] not using the agent config for the logdir [00:48] * mwhudson has a super bad feeling [00:49] damnit... [00:50] perhaps not... [00:50] * thumper digs some mroe [00:50] thumper: surely we don't need the rsyslog worker churning in the background all the time [00:50] davecheney: i think it's 'marker symbols' [00:50] to indicate thumb, data or arm [00:50] mwhudson: O_O [00:51] no... [00:51] $a/$t/$d [00:51] oh those buggers [00:51] davecheney: while that sucks, it isn't the cause of this problem [00:51] davecheney: they end up in the stuff from runtime/cgo [00:51] and their friends $f and $t [00:51] davecheney: this is a naked channel write [00:51] and the last one is a $d i guess? [00:53] thumper: I'm here now [00:53] ok this is definitely the problem [00:53] axw: that's ok, don't need you now :) [00:54] excellent [00:56] davecheney: yes, that's the problem [00:56] I'm pretty sure I can make the tests pass and not block, but I have a feeling the blocking may still happen in the wild [00:56] i guess i can use objcopy to remove them all... [00:59] mwhudson: what do marker symbosl do ? [00:59] shift obdump into another 'mode' or something ? [00:59] yes, exactly [00:59] $a == what follows is arm insns [01:00] $d == what follows is data [01:00] $t == what follows is thumb [01:00] i don't know if anything other than objdump cares about them [01:02] given that GOARCH=arm uses constant pools after the insns, would be nice to generate them [01:03] i'll file a bug and see what minux says :-) [01:04] i think we already go to some length to strip them out [01:05] ah! [01:05] this is external linking only i bet [01:06] yeah, ldelf ignores them [01:06] well not $t, that's a bug i guess [01:07] alternatively, i wonder if we can stop gcc making them [01:07] wallyworld: hey sorry I missed the standup. I'll set aside volume destruction today and work on that relation-set bug for 1.24.1 [01:08] axw: np, ty. just reviewing the last branch so you could land those also in the background [01:08] damn... [01:08] that isn't it [01:08] wallyworld: cool, thanks [01:09] wallyworld: shouldn't be too far off now. need to update state to mark entities as Dead (possibly another big branch), and then the last big piece will be adding the lifespan binding [01:09] then a couple of bits and pieces [01:09] and CLI for persistent volume mgmt [01:11] wallyworld: oh yeah, UI should be fairly trivial I think. [01:11] i think so too [01:13] thumper: how do I turn up the logging output fora tes [01:13] i tried -gocheck.v [01:13] how do I get the debug output thta we get when a test fails ? [01:13] -gocheck.vv [01:13] two v's means send it straight out [01:13] this is what I'm going through now [01:14] you need test.v too, or check.v doesn't do anything [01:20] natefinch: wrong [01:21] natefinch: it does do stuff, just perhaps also different with test.v [01:23] thumper: hrmph... I've definitely had test.v be the difference between no output and some output, when specifying check.vv ..but maybe it was a different scenario [01:30] thumper: I was t hinking about proposing adding something like the hookLogger directly to loggo, since I want to do the same thing for the workload process plugins (i.e. redirect stdout to a loggo log). What do you think? Here's the hookLogger for reference: https://github.com/juju/juju/blob/master/worker/uniter/runner/logger.go#L23 [01:30] natefinch: I'm kinda deep in debugging right now [01:31] thumper: np [01:42] wallyworld: in your review you said "update the CI tests" -- are they in already? [01:42] axw: no yet but soon. i should have added "when they're done" [01:42] wallyworld: ok [01:42] we'll create a card for next week [01:43] wallyworld: can't do a feature test until we have the last two big pieces done. I'll make a card for it [01:43] np, ty. i expected we may need to wait [01:51] Bug #1466269 opened: bootstrap instance started but did not change to Deployed [01:51] Bug #1466272 opened: Remove{Volume,Filesystem} etc. should behave like other Remove methods w.r.t. NotFound [02:00] thumper: fix coming for 1466011 [02:01] it's a problem with the test suite helper that sets up the apiserver and creates a mock api.Info [02:13] axw: so with that worker loop log change - won't that mean we log twice in production? [02:14] wallyworld: yes. I can remove it, I'll just add it back in if I need to when testing [02:14] ty [02:14] maybe we can change test setup to show error [02:15] wallyworld: with rootfs's DestroyFilesystems, it could be made into a no-op but I don't know if we should. it could remove the directory we create in the agent's storage dir. I think sabdfl wanted stuff to be left behind though, for post-mortems [02:16] axw: won't unmount be sad though if you try and use it against a dir that is not mounted? [02:16] thumper: https://github.com/juju/juju/pull/2590 [02:16] menn0: https://github.com/juju/juju/pull/2590 [02:17] wallyworld: I don't understand your question. I'm not talking about Detach (which does umount), I'm talking about Destroy. currently Destroy does nothing. [02:17] axw: oh sorry, thought we were taling about detach [02:17] wallyworld: bad segue :) [02:18] wallyworld: Create will create a directory in the agent's storage dir. Attach will bind-mount (if possible). Detach will umount. Destroy does nothing atm [02:18] that sounds reasonable for rootfs [02:19] wallyworld: only risk with that is that people will fill up their rootfs with crap, and it'll never get cleaned up even if they destroy the storage [02:19] axw: bind-mount is used to map to different user specified location. but what if they don't ask for that mapping, then they acess the dir on rootfs directly right? [02:21] wallyworld: we create something like "/var/lib/juju/agent//storage/", then "mount --bind" that into the location specified in the charm, or the path that juju generates [02:22] wallyworld: if mount --bind fails (e.g. strict LXC/AppArmor), then we just give them the latter location if (a) it's on the rootfs, and (b) it's empty [02:22] axw: right, so my question is if there's no locaton specified in the charm, we just hand that created dir straight to the charm i think don't we [02:22] wallyworld: we do the bind-mount-or-else regardless of whether hte location is specified [02:23] wallyworld: we could probably skip it if the location is unspecified, but the code's a bit simpler this way [02:23] ok, that i couldn't recall off hand. i was worried about if we didn't bind mount, then detach would fail when we tried to unmount [02:24] wallyworld: that's why we check whether the location has the same mount-point as its parent dir. if it does, then it's not a bind mount [02:24] wallyworld: and in that case detach is a no-op [02:24] sounds good [02:24] wallyworld: that also covers the case where we *did* bind mount, but we already detached; i.e. detach is idempotent [02:24] and that's all tested? [02:24] yep [02:25] wallyworld: I have tested that in the past. the latest code is only tested in unit tests. I'll test it more rigorously when the bits are in place to mark things as dying etc. [02:25] sure, ok [02:27] wallyworld: oops, I already merged the log message. I'll do another branch to take it out [02:28] axw: just do it as a driveby [02:28] not that critical [02:28] wallyworld: ok, I'll do it when I update the code to destroy filesystems [02:28] sounds good [02:31] wallyworld: would you mind also reviewing http://reviews.vapour.ws/r/1952? pretty straight forward one [02:31] sure [02:32] davecheney: looking [02:33] axw: why remove environmanager = true? [02:33] wallyworld: it's set by default in SetUpTest [02:34] ok [02:36] axw: done [02:37] wallyworld: thanks [02:44] davecheney: I think since you referenced a previos reviewboard thingy, it didn't add a new one [02:44] davecheney: reviewing on GH [02:44] davecheney: shipit [02:45] thumper: sorry lost contex [02:45] my machine rebooted [02:46] davecheney: https://github.com/juju/juju/pull/2590 [02:46] thumper: please review the comments here http://reviews.vapour.ws/r/1962/ [02:48] specifically my claims that we never write a bind all address into the jenv files [02:50] ah fuckit [02:50] * thumper is a dumb arse [02:50] ? [02:50] the blocking [02:50] that I thought I fixed [02:50] I realised why my fix didn't [02:51] davecheney: yes we never write bind all addresses to teh jenv [02:55] waigani: thanks :) [02:55] thumper: that's what I though [02:55] the test had caused the code to change to accomodate it [02:55] jw4: np [02:57] davecheney: am I right in recalling there is no way to see if a channel is closed before trying to send stuff down it? [02:57] thumper: that is correct [02:57] but the cororraly is only the sender should close a channel [02:58] if you have a channel, which you don't own (don' thave the rights to close), then that is the bug [02:59] the tests are hanging because the cert updater is trying to tell the api server that the cert has changed [02:59] but I don't think the apiserver is listening any more [02:59] it is a buffered channel, with 1 entry [02:59] and this is the second change event... [02:59] so it blocks [03:00] which means the certupdater worker doesn't finish [03:00] which means the agent doesn't finish [03:02] thumper: if a channle is closed the sender will panic [03:02] do you want a non blockgin send ? [03:03] no, I want to work out why the apiserver isn't consuming the change event [03:03] I can make the tests pass by not running the cert change worker [03:03] which a number of other tests do [03:03] fair [03:03] but I want to know why, because this may happen in the real world [03:03] thumper: can I press you for a review of http://reviews.vapour.ws/r/1962/ [03:03] sorry menn0, no offence [03:04] davecheney: i've just responded to and dropped those issues. [03:04] davecheney: but another set of eyes wouldn't hurt [03:04] cooll [03:04] thanks for the review [03:04] if I submit this [03:04] how can I check [03:04] when will I know if it's fixed ? [03:05] it looks ok to me... [03:05] I am curious though [03:06] about '127.0.0.1' vs localhost [03:06] AIUI localhost will be [::] with IPv6 [03:06] false [03:06] it is ::1 [03:06] ah [03:06] anyway... [03:06] the problem is if uou have a net.Listner [03:06] no [03:06] you do [03:06] do we need the listener in the base test to support ipv4 and v6? [03:07] l, _ := net.Listen("tcp", ":0") [03:07] the address reported by l.Addr(), will be a wildcard address [03:07] there is no way to convert that into a dialable addresss [03:07] see my second comment [03:08] we can bind to a wildcard address and then assume localhost points back to the wildcard (safe assumption) if you like [03:08] what confuses me, is whatever logic writes out the jenv file, must not use server.Addr(), because that was previosly hard coded to be "localhost:35687" (port number guessed) [03:35] have I mentioned recently how completely fucking crazy our codebase is? [03:37] I'm feeling the need to go to the gym and pound on a heavy bag [03:37] * thumper feels the rage rising [03:41] * davecheney pats thumpers shoulder, soothinglu [03:49] * thumper breathes deeply [03:53] thumper: what can I do to help ? [03:54] nothing right now [03:58] i have my shotgun at the ready [03:59] davecheney: in which case, shoot our watcher infrastructure [04:00] because it is truly shite [04:00] * thumper crosses fingers [04:01] I *think* I may have it [04:02] ok... [04:02] for the first time since I started looking, I got the test to pass 5 times in a row [04:02] previous failure rate was 20-33% [04:02] running 20 times now [04:02] BTW, our code is shite [04:03] and distributed, asynchronous, parallel, real life network systems are fucking hard to get right [04:03] I feel the likelihood of success raising [04:04] 9 passes in a row [04:09] * thumper submits fix [04:14] http://reviews.vapour.ws/r/1964/ [04:14] for the curious [04:14] davecheney: you are reviewer right? [04:17] reviewing [04:17] * thumper runs out to collect daughter [04:17] * thumper pauses [04:18] * thumper runs again [04:19] thumper: i dunno [04:19] this looks like it's fixing the symptoms, not treating the cause [04:37] thumper: I agree with davecheney, that fix seems like a band aid to me [04:38] thumper: great that you figured out the problem though. that's friggin awesome. [04:39] thumper: i'm also concerned that the pr description metnionds channel sends and bare writes [04:40] but the patch doesn't make any change in that respect [04:40] either the code, or the diagnosis, or the description are wrong [04:48] thumper: i've just had a dig and i see that fixing this properly is going to be somewhat harder [04:49] thumper: i reckon merge the fix so far and then add a card to do a better fix next iteration [04:50] thumper: Susanna has a meeting now so I have dad duties [04:51] thumper: but I will do some more work later tonight. [04:53] axw: whenever you get a chance http://reviews.vapour.ws/r/1965/ [04:53] wallyworld: awesome! looking [04:58] davecheney: part of the problem was that the way the notify workers are, they don't pass the tomb in to the handle method [04:58] so we can't select on it [04:59] that is a somewhat bigger refactoring [04:59] but perhaps worthwhile [04:59] davecheney: I think one of the biggest issues was trying to work out what the problem is [04:59] davecheney: and the more I think about it, the more I think we should pass the tomb into the handler functions [05:00] davecheney: because anyone doing any channel work in the workers should use it [05:00] * thumper enfixorates === kadams54 is now known as kadams54-away [05:11] wallyworld: reviewed [05:11] ty [05:52] davecheney: http://reviews.vapour.ws/r/1964/diff/# now passing the tomb dying channel to the handle func [05:53] davecheney: so to treat the problem not just the symptom [05:56] * thumper heading out to dinner now [05:56] back later for meetings === frankban_ is now known as frankban [07:24] meanwhile, reviews from the future http://dave.cheney.net/paste/wat.png [07:25] *lol* "3 minutes from now" [08:07] good morning and happy birthday, dimitern [08:09] TheMue, thank you! and good morning indeed :) [08:09] dimitern: want to continue our ipaddress id discussion? [08:11] dimitern: the current id may not be unique for future solutions, so we have to change it anyway. and the output for logging could be done by a fine String() on IPAddress [08:12] dimitern: also the id has the only role to identify one record, finding those matching some criteria is task of a query [08:12] dimitern: so far ok? [08:14] TheMue, I think we should get "ipaddress--" implemented as a tag, space can be hardcoded to "default" for now [08:14] dimitern: where value is the ip address or hostname? [08:15] dimitern: could it be possible to have multiple same named spaces in one database in the future? [08:18] davecheney: still around?http://reviews.vapour.ws/r/1964/ [08:19] TheMue, the value is either an ip or hostname, yes [08:20] dimitern: so my questions remains, while a value is unique for a space, could the combination of space and value be double in one database? [08:21] TheMue, however... hmm to make it easier to parse (as space names can have dashes in them, as can hostnames), let's use "ipaddress-@" ? [08:21] TheMue, no it should not be possible [08:22] TheMue, we can have "ipaddress-default@localhost", but that's just stupid to store in state [08:24] a local-machine scoped address/hostname cannot participate in a space [08:24] fwereade: ping [08:24] dimitern: ok, reasonable [08:25] fwereade: https://plus.google.com/hangouts/_/canonical.com/chat and http://reviews.vapour.ws/r/1964/diff/# [08:25] as it defeats the purpose of the space being a collection of interchangeable subnets which can see each other [08:25] dimitern: thought about fwereade 's arguments for having a generated unique key, may be UUID but also could be different one? [08:25] o/ dimitern and TheMue [08:25] dimitern: it will always be unique, regardless what we think about in the future [08:25] thumper: o/ [08:25] thumper, hey there [08:27] TheMue, for one, space names are unique - even across environments, as they can be reused by multiple environments once created [08:28] dimitern: really? are you sure? [08:28] TheMue, about which part? [08:28] thumper, sorry ^^ [08:28] space names unique across environments [08:28] I'm not entirely sure that makes sense [08:29] consider this... [08:29] thumper, o/ [08:29] yes, as the point is for the admin to set them up once and then they can be discovered for new envs [08:29] bob in environment A in system B creates space C [08:29] dimitern: if env foo has space bar and env yadda has space bar? [08:29] mary in environment D in system E creates space C [08:29] bob's env A migrated to system E [08:29] boom [08:29] fwereade: in the hangout [08:30] spaces are bound to the underlying substrate, not to an environment per se [08:30] * TheMue still prefers keys with global unique ids [08:30] e.g. space "foo" means in AWS a bunch of subnets with tag "juju-space-foo" [08:31] however, if we have environments on different substrates, they can have the same space names I guess [08:32] dimitern, please just use an opaque key [08:32] TheMue, ok, I'm convinced - let's go with: 1) add a UUID field to ipaddressesDoc (and a unique index on it), 2) upgrade step to generate one if missing, 3) use "ipaddress-" for tag format, 4) finish the addresser and change other workers that use api + addresses to use tags [08:33] dimitern: +1, will do (and add cards) [08:34] TheMue, cheers [09:01] omw btw, but hangput dislikes me :( [09:03] dimitern: voidspace: dooferlad: same troubles with hangout? [09:03] TheMue, no, just joined [09:03] TheMue: no, but make sure you are using the right link. It isn't the same every day... [09:04] dooferlad: I've used the one from the mail notification [09:04] TheMue: OK, that is odd [09:04] dooferlad: it hangs at "try to joining the call" [09:04] jam, fwereade, standup? [09:04] dooferlad: will restart my browser [09:05] on our way [09:49] dimitern: ... ¸¸♬·¯·♩¸¸♪·¯·♫¸¸Happy Birthday To You¸¸♬·¯·♩¸¸♪·¯·♫¸¸ ... [09:54] anastasiamac, thank you so much :) [09:56] :D [12:11] fwereade: http://reviews.vapour.ws/r/1968/ [12:27] wallyworld, LGTM with a trivial [12:28] fwereade: actually, i'm testing on the scenario outlined in the bug and it's broken again with this fix. i've done an install hook that never exists. i'm wondering if this wold have worked in 1.21 [12:29] wallyworld, force-destroy machine always should have, yes [12:29] hmmm. i can't see off hand how the latest fix is different from 1.21 [12:29] wallyworld, I suspect that something about the status changes has made possible a db state which breaks unit.Destroy [12:30] wallyworld, the fact that it's triggering in obliterateUnit is coincidental [12:30] hmmm. i'll need to look at the logs i guess [12:31] wallyworld, if you've got a repro you should look at the actual transactions we tried to run against that unit [12:32] wallyworld, and it sounds like a stable broken state it gets into [12:32] wallyworld, that way you can at least see what ops the chain of failed build attempts produced [12:33] wallyworld, see if they're changing (there's a race we've missed) or stable (we've fucked up the memory/db assertion equivalence) [12:33] yup [12:33] sigh [12:33] so i guess there's another problem somewhere besides the one just fixed [12:34] yeah [12:34] wallyworld, if we're seeing reproducible ErrExcessiveContention we should at least be able to repro it in unit tests and have a fix that rests on something more durable than mere painstaking analysis ;p [12:35] tests pass, i've got to figure out failure senario [12:36] i don't know that we have a unit test for a hanging hook === anthonyf is now known as Guest73520 [12:58] fwereade: oh, i'm a dick - i forgot --upload-tools. just verified and it's fine [13:09] wallyworld, phew :) [13:09] fwereade: i replied to yout comment. also, we don't define a slice of statuses so hard to cycle through them [13:10] wallyworld, yeah, I thought so [13:10] but so long as it's != Allocating that's all we need [13:10] wallyworld, and, ah yes, ofc, I'm dumb [13:10] wallyworld, ship it :) [13:10] i just added thoses extras just because [13:10] ty [13:11] at least caus ei forgot upload tools i reproduced it :-) [13:23] Bug #1466498 opened: serverSuite.TestStop fails [13:26] Bug #1466498 changed: serverSuite.TestStop fails [13:35] Bug #1466498 opened: serverSuite.TestStop fails [13:53] Bug #1466513 opened: destroyTwoEnvironmentsSuite teardown fails [13:53] Bug #1466514 opened: TestCertificateUpdateWorkerUpdatesCertificate fails === mgz_ is now known as mgz [14:14] Bug #1466520 opened: serviceSuite setup fails [14:14] Bug #1466525 opened: restore fails due to mongo login failure === Spads is now known as M-SpaceHobo === kadams54 is now known as kadams54-away === kadams54-away is now known as kadams54 === M-SpaceHobo is now known as Spads [14:36] wwitzel3: lmk if you would like to skip our 1:1 [14:40] wallyworld, don't suppose you're awake? [14:43] katco: right, I was planning to take some medicine and lay down for a bit, unless you had something specific we needed to cover [14:43] wwitzel3: get some rest [14:44] ericsnow: natefinch: do you need anything? [14:45] katco: not currently :) [14:45] katco: no, thanks [14:46] katco: other than a LGTM on that metadata patch :) [14:46] ericsnow: done :) [14:46] katco: thanks! [14:47] i'd very much like a review of this from someone in juju-core please. moving towards having an client-inspectable schema for juju environment config: https://github.com/juju/juju/pull/2597 === rogpeppe1 is now known as rogpeppe [14:56] perrito666, I had fun playing with the new service status yesterday [14:57] it is mightily useful [14:57] I especially like the status-history [14:57] I am glad :D I had fun playing with it too :p [15:12] hey, quick question perhaps somebody here knows the answer [15:13] does juju use some sort of connection pool for mongodb? or is a new connection established for each query? [15:14] perrito666: ^^^ [15:15] mramm: it creates a new session for each connection [15:15] mramm: *but* [15:15] mramm: mgo is supposed to do connection pooling for us [15:15] mramm: including disposing of dead connections [15:17] voidspace: so there is connection pooling in mgo? [15:17] mramm: yes [15:17] thanks! [15:38] perrito666, was just looking at your github PR, one question: [15:38] * perrito666 braces [15:39] perrito666, this is just testing the existing behaviour, right? in which update-status fires only when the timer ticks? [15:40] fwereade: yes, as that pr does not add the new behaviour [15:40] perrito666, great, so that is why I can't see any changes in startupHooks despite the addition of update-status to baseCharmHooks :) [15:40] I split both things after you suggested it yesterday [15:41] fwereade: yes :) [15:44] perrito666, ok, LGTM with some naming quibbles, but check back with ian about the idle timer [15:45] fwereade: I will tx [15:45] perrito666, I'm pretty confident it shouldn't be below 5s [15:45] perrito666, even if everything responds super-fast [15:45] * perrito666 uses randInt(2,10) [15:45] perrito666, relation-set gets written when hook completes [15:46] perrito666, it could easily take 5s before a counterpart even sees that [15:47] perrito666, and even if *that* unit runs a hook with a response really fast [15:47] perrito666, we won't see it until 5s after it did at best [15:47] perrito666, and more likely 10s [15:47] fwereade: this https://github.com/juju/juju/pull/2586#discussion_r32743555 [15:48] wallyworld, this conversation is worth reading back when you're around [15:49] perrito666, wallyworld, I'd actually say that it's more like 15s of idleness before you can have confidence in your idleness [15:49] it is because as the logic stands now the default is inactive until and unless the call to getMetricsTimer(charm) says otherwise [15:49] fwereade: Ill make sure wallyworld gets to this conversation tonight [15:49] perrito666, think of the timerchooser on its own [15:49] perrito666, what makes the inactive one "default"? [15:50] perrito666, you must at least have "active" and "inactive" as well as "default" [15:50] perrito666, otherwise "default" is just another name for "inactive" [15:50] perrito666, but I maintain that inactive is *not* the default [15:51] fair [15:51] perrito666, would you implement a BoolChooser with methods True and Default? [15:51] ;p [15:52] perrito666, wallyworld: if you think this is reason enough to tweak the watcher resolution, so we can detect idleness early, I'm prepared to investigate that but we'd need to stress-test it pretty hard [15:55] perrito666, wallyworld: (and I think there are UX considerations too -- ISTM that traffic lights that flicker back and forth will inspire less confidence than those that take a little longer to go solid green and stay there) [15:56] fwereade: I definitely would compare this with hdd light rather than traffic lights [15:56] back when hdd light was useful [15:56] Bug #1466565 opened: Upgraded juju to 1.24 dies shortly after starting [15:57] perrito666, that's what solid yellow indicates [15:57] perrito666, "I'm doing stuff, leave me be" [15:57] perrito666, solid green indicates "everything is configured and working as it should" [15:58] ill agree nevertheless that entreing and leaving iddle too fast feels a bit like whack-a-mole [15:58] perrito666, if our units look like they can't decide which state they're in we all look dumb [15:58] perrito666, look at it this way [15:59] perrito666, status needs to represent things users care about [15:59] perrito666, they will pay attention to, and react to, changes in status [16:00] perrito666, but every status change they don't need to react to dilutes the value of the data stream [16:00] perrito666, (and IMO yellow->green should inspire the reaction "great, now I can relax") === kadams54_ is now known as kadams54-away [16:08] Bug #1466570 opened: Juju add-unit stuck on new lxc containers [16:18] fwereade: here's another step along the provider schema thing. i didn't change the EnvironProvider interface yet because i didn't want to potentially break external provider implementors yet. http://reviews.vapour.ws/r/1969/ [16:18] rogpeppe, nice [16:18] rogpeppe, queued it :) [16:18] fwereade: ta! [16:19] fwereade: i won't hold my breath :) [16:19] * fwereade considers rogpeppe to be a wise man [16:26] heh, the deputy repo I made 2 days ago under github.com/juju already has 12 stars [16:31] mramm: there is connection pooling *but* it's likely that each concurrently running operation uses one connection [16:31] mramm: which is something we've been dealing with in the charm store recently [16:32] mramm: see https://github.com/go-mgo/mgo/issues/124 [16:33] natefinch: i was thinking of suggesting for someone recently, but the stderr output needed too much massaging for it to be useful as is [16:34] natefinch: i wonder if it might be good to have an optional TransformStderrToError([]byte) error (or something) function that can be used to cope with situations like that [16:36] rogpeppe: you can always get the error string and massage it afterward, if you really need to. That seems a little too specialized to broaden the API. Generally the stderr-as-error is only good if you know the command produces reasonably sized stderr output. [16:37] natefinch: that's true [16:37] natefinch: in the case above, we really wanted just the first line [16:38] natefinch: i wish there was some nice way of dealing with multiline errors [16:40] gah, I wish loggo had non-f methods [17:00] katco: so, with a bunch of people already watching the repo I created (github.com/juju/deputy), I'm hesitant to follow through with my previous plan of deleting it in order to follow our new process. Maybe we can make this the exception, and just make all future work on the repo via PRs? [17:01] natefinch: that's fine. but we still need the code reviewed [17:01] katco: absolutrely [17:13] rogpeppe: btw, updated the deputy package to not use the cute options thingy anymore. I think it's a lot more understandable without it. Also added a way to have it pipe the output to a logging function, similar to how we do hook output. [17:14] natefinch, katco: what's the "new process" ? [17:14] rogpeppe: first, you like 3 and a half candles [17:14] rogpeppe: create the repo with readme & license only, then add all code via PRs [17:14] rogpeppe: then, you place a blue lotus from the slopes of mt. kilimanjaro in the middle [17:14] katco: well good start, i already like candles [17:14] lol [17:15] rogpeppe: er, what nate said. it will be on the wiki soon [17:15] katco: ha, i'd forgotten we even had a wiki :) [17:15] the candles and lotus are optional but recommended [17:16] rogpeppe: we started using the wiki after I got annoyed with the state of our developer documentation [17:16] rogpeppe: the github wiki that is [17:16] natefinch: where is it? [17:16] rogpeppe: https://github.com/juju/juju/wiki [17:17] natefinch: what are the interface{} arguments to StdoutLog and StderrLog ? [17:17] natefinch: wouldn't that be better as just a single string arg? [17:17] I need an adult if there's one around? [17:17] rogpeppe: it's a concession to the way most log packages work, e.g: http://golang.org/pkg/log/#Print [17:18] mattyw: i can pretend [17:18] natefinch: i think it would be better a func(string) [17:18] as [17:18] ah rogpeppe you might be just the man - you're name is on the file I'm interested in [17:18] mattyw: i'm afraid that might be true of quite a few files :) [17:18] rogpeppe: I started that way... I can put it back. It means you need to wrap most log package's logging functions, but it is a lot more clear what it'll send [17:19] natefinch: yeah, i think it's worth it. [17:19] natefinch: and it's easier to use if you want to add a prefix, for example [17:19] rogpeppe, for acme fans: apiserver/charms/client_test.go:L120 for github fans https://github.com/juju/juju/blob/master/apiserver/charms/client_test.go#L120 [17:19] rogpeppe: yeah, I agree, after looking at a consumer using loggo [17:19] mattyw: not *quite* :) [17:19] mattyw: no L [17:20] rogpeppe, that call seems to be testing the CharmInfo function in the client, but as it's in the charms facade package I'd be expecting it to test that one right? [17:20] rogpeppe, ah yeah - sorry [17:21] mattyw: it's quite usual to test the server side functions by calling the client side functions [17:22] mattyw: because the client side functions work by invoking the server side functions [17:22] rogpeppe, but it shouldn't it be calling the charms facade client? [17:22] rogpeppe, there are two charminfo functions one in the charms facade, and another in the enormous client [17:23] mattyw: ah, i guess [17:23] mattyw: that stuff is well after my involvement in this area [17:24] mattyw: i've no idea why there are two implementations of the same call [17:24] rogpeppe, I'm not looking for blame - I'm looking for another set of eye to check my sanity [17:28] mattyw: looks like you're probably right [17:28] mattyw: i've no idea why the duplication; i guess all the tests need to be duplicated too [17:28] rogpeppe, there is some dubiousness added by the basecharmSuite and charmSuite stuff [17:29] rogpeppe, duplication is probably a wip attempt at moving away from the huge client [17:29] mattyw: i would guess that too [17:29] mattyw: fat lot of difference it makes to the understandability or maintainability of the code though :) [17:29] if a state server is in a bad way, is it safe to recommend cycling it? or is juju going to do something weird like uninstall itself? [17:30] katco: restarting it should be just fine [17:30] katco: restarting the service, that is [17:30] katco: in case you had something else in mind [17:31] rogpeppe: i was just suggesting to cycle the container (it's local) [17:31] katco: reboot it? [17:31] rogpeppe: yes [17:31] katco: you should probably be ok. how is the server in a bad way? [17:32] rogpeppe: https://bugs.launchpad.net/juju-core/+bug/1466565 [17:32] Bug #1466565: Upgraded juju to 1.24 dies shortly after starting [17:33] rogpeppe: it looks to me like somehow juju got in a bad node in its state machine [17:34] katco: the log linked to in that issue doesn't look like it shows the issue [17:34] katco: it says it's restarting because an upgrade is available [17:35] rogpeppe: it never comes back up though, wondering if it actually shut down [17:35] katco: if you just restart the state server service, do you just get the same thing? [17:35] rogpeppe: waiting to hear from sparkiegeek [17:35] rogpeppe: if you're interested, #juju@canonical [17:40] It's been so long since I've done some juju stuff I have no idea what I'm doing [17:41] mattyw: just write if err != nil { return err } a bunch of times, and it'll come back to you ;) [17:41] lol [17:42] natefinch, go is fine, it's the facades that do it :) [17:42] mattyw: yeah, juju is a beast of a codebase. Just the pure LOC make it hard to get a handle on [17:44] natefinch, ah ha! I've just remembered the magic [17:44] (the juju in juju you could say) [18:24] Bug #1466565 changed: Upgraded juju to 1.24 dies shortly after starting [18:27] How come my software updates all get shown with the description instead of the name of the package? e.g.: http://i.imgur.com/E4Q3uBV.png Is that an ubuntu bug, or some weird setting I set by accident? [18:33] Bug #1466565 opened: Upgraded juju to 1.24 dies shortly after starting [18:36] Bug #1466565 changed: Upgraded juju to 1.24 dies shortly after starting === kadams54-away is now known as kadams54_ [18:49] ericsnow: unintended nice side effect of using json for plugin output - it's easy to output without line returns [18:49] natefinch: yay [18:51] is anyone familiar with the FORCE-VERSION file? [18:51] natefinch: ah, same happened today [18:51] perhaps the name is collapsed? [18:54] Bug #1466565 opened: Upgraded juju to 1.24 dies shortly after starting [18:57] Bug #1466565 changed: Upgraded juju to 1.24 dies shortly after starting [18:58] katco: I think that's what gets included when you use --upload-tools [19:00] I think it is necessary to allow juju to use the uploaded version of juju (for which there is no stream) [19:02] perrito666: apparently not only then. was installed by default by just bootstrapping 1.23.3 [19:03] natefinch: ^^ [19:06] Bug #1466565 opened: Upgraded juju to 1.24 dies shortly after starting [20:09] Bug #1466629 opened: Containers fail to get ip when non-maas dhcp/dns is used [20:42] ericsnow: how do I upload a patch to reviewboard? [20:51] thumper, not that you need any distractions today, but when you have a moment I need your help today [20:51] heh [20:51] ok, will ping you when I have time [20:57] ericsnow: if you have time before EOD, could you take a look at the deputy repo? I made a PR to a branch that shows all the code: https://github.com/juju/deputy/pull/1 I don't know how to get that onto reviewboard or if we care (can always review on github) [20:59] (or anyone else up for a review ^ ) === kadams54_ is now known as kadams54-away [21:24] Bug #1466660 opened: Unable to create hosted environments on EC2 [21:24] * fwereade would like to express a brief moment of rage to discover that we're still just crapping charm.Meta et al into the database [21:24] * fwereade is sure that multiple people have promised to fix that over the last year [21:50] fwereade, you should be sleeping, but if it makes you feel better I am listening [21:51] wallyworld: thumper: if anyone in your half of the world gets some bandwidth, this bug is targeted for 1.24.2: bug 1466565 [21:51] Bug #1466565: Upgraded juju to 1.24 dies shortly after starting [21:52] alexisb, I have subsided to mild background grumpiness :) [21:52] menn0, ping [22:02] fwereade: hi [22:06] what happened to 1.24.1? [22:07] thumper, what do you mean what happened to it? [22:07] alexisb: katco says it is targetted to 1.24.2? I thought we just released 1.24.0 [22:08] thumper, we are going to cut 1.24.1 soon [22:08] thumper: we have a 1.24.1 as well [22:08] so new bugs may be targeted to 1.24.2 instead of 1.24.1 [22:10] menn0, re the stateCollection interface [22:10] menn0, what motivated the set of methods we picked? [22:10] * menn0 looks [22:11] fwereade: they were what was in use with Juju at the the time of writing [22:11] fwereade: it's grown a few methods since it was created as people have wanted more [22:12] ericsnow: do you know what have we forked because of lack of 1.4? was it just the stdlib? [22:12] fwereade: why do you ask? [22:12] katco: the only things of which I'm aware are govmomi and that package for windows [22:14] menn0, I wanted to use it myself from a state subpackage to take advantage of the env-aware bits [22:14] menn0, and I found myself thinking "hmm, I'm pretty sure we shouldn't ever Insert" [22:14] menn0, and I found where we inserted and GAAAH WTF in more than one way [22:15] ericsnow: looks like sys package from stdlib: http://reviews.vapour.ws/r/1897/diff/# [22:15] menn0, so I got scared and I looked for Update and couldn't spot it [22:15] fwereade: AddCharm is one place we do where we shouldn't [22:16] katco: that's not the stdlib sys package, is it? [22:16] ericsnow: i believe so [22:16] menn0, and then I just found UpdateId and got grumpy again because I'm damn sure I told thumper *not* to do txn-layer-skipping writes into docs that use txns, and to please put LastLoginTime somewhere else [22:16] ericsnow: golang.org/x/sys --> github.com/gabriel-samfira/sys [22:17] katco: there's a syscall in the stdlib, but no sys [22:17] menn0, so, ehh, anyway [22:17] fwereade: I think we agreed to disagree on that point... [22:17] and I felt that the overhead of putting it elsewhere wasn't worth it [22:17] and no one watches it [22:17] ericsnow: sorry, supplemental library [22:18] thumper, I apologise then for my lack of clarity [22:18] katco: regardless, yeah, that's the one [22:18] thumper, the txn system is a layer as much as the api is [22:20] thumper, and now we've got a document with a magic field that needs as much documentation as everything else in that serialization thing [22:21] thumper, and depends on people reading and understanding and remembering and eschewing the temptation to just autocomplete something that looks like a good idea [22:21] thumper, there are times when we want txn and times when we don't [22:22] thumper, but we need very very good reasons to mix them [22:22] thumper, the relation data cleanup ones are bad enough and they only touch documents known to be unreferenced by the rest of the system [22:24] thumper, mixing txn-aware and non-txn-aware fields in the same document just triggers my twitch-froth-gesticulate reflex [22:24] * thumper watches fwereade twitch [22:24] * fwereade is indeed something to behold [22:25] thumper, I'm also suffering residual grumpiness from observing that the charm document schema is still largely coming from a different repository [22:26] fwereade: ENOCONTEXT [22:26] thumper, ehh, this is all coming from our chat this morning [22:27] thumper, trying to get the stateCollection interface out of state so I could consume things that implemented it and get useful env-awareness in my lease bits [22:27] environment based lease collection? [22:28] thumper, I think most of the lease namespaces will be per-environment, yes [22:28] thumper, I could hand-hack it but that would be dumb [22:29] agreed [22:29] thumper, so anyway I started looking at what we actually did with that interface and that led me to the naked Insert of the charm document and then I looked at the charm serialization and gibber-gibber-flail-moan [22:30] thumper, menn0: so, what I want to do is [22:30] the serilization structure isn't in the state package? [22:30] that seems like a fail [22:31] thumper, is is a goddam massive fucking failure and I am certain that more people than just those working on metrics and actions ahve promised to (1) DTRT themselves and (2) fix the legacy shittiness [22:31] thumper, and yet it has somehow failed to occur in both those instances [22:32] alexisb, huh, it looks like I did have a rant in me after all :/ [22:32] haha [22:32] I'm thinking my fix I landed yesterday should probably be ported to 1.24 [22:33] hi, it's normal to get without space on juju machine-0??? [22:33] im using aws default constraints on bootstrap [22:33] juju is using 7.5G of 8Gb [22:33] redelmann: check the logs [22:34] as in, check if the logs are overly sized [22:34] oversized [22:34] perrito666, 36M of logs [22:34] thumper, and even more infuriatingly [22:34] perrito666, /var/libs/juju/db is using 3.7Gb [22:35] thumper, there's a *bit* of input sanitization for *one* of those fields [22:35] * thumper chuckles [22:35] thumper, but that just munges the *Config into a different one [22:36] thumper, and just for fun that only happens on *one* of the two code paths that can lead to config data being written to the document [22:36] haha [22:36] fwereade: were you around yesterday when I said how shite our codebase was? [22:36] nothing surprises me any more [22:36] redelmann: could you provide a detailed du -shc /var/lib/juju/* [22:36] redelmann: pastebin :) [22:37] thumper, so, I do not think that graduating reviewers has done us any good [22:38] fwereade: I don't think that is the source of the problem to be honest [22:38] thumper, it is all too easy for these things to happen when both the coder and the reviewer have only passing familiarity with the code [22:39] thumper, I think it was an attempt to enforce tighter controls and has not really worked out on that front [22:39] perrito666, http://paste.ubuntu.com/11737759/ and http://paste.ubuntu.com/11737763/ [22:39] fwereade: architectural documentation pls [22:39] redelmann: looking [22:39] fwereade: the code doesn't convey all the knowledge needed to review [22:39] fwereade: there are too many devs to push all reviews through the relatively few people who know more how things hang together [22:39] katco: agreed [22:40] katco, it is hard to bring oneself to do that when the stuff I have written seems to be completely ignored [22:40] fwereade: we need a wiki page of "distilled fwereade ranting" [22:40] perrito666, mhh.. look at this: almost 1 hour of this "ps -aux" http://paste.ubuntu.com/11737765/ [22:41] katco, that is no doubt at least partly a perception issue [22:47] wallyworld: did you fix 1452745 in master too? [22:48] bug 1452745 [22:48] Bug #1452745: 386 constant 250059350016 overflows int <386> [22:48] i think so [22:48] i'll check [23:01] (╯°□°)╯︵ ┻━┻ [23:02] thumper: yes, it got ported forward [23:02] https://bugs.launchpad.net/juju-core/+bugs?field.tag=intermittent-failure&field.tags_combinator=ALL [23:02] * thumper sadface [23:12] thumper: ow! that's a lot of "intermittent" failures... [23:12] anastasiamac: yeah... [23:13] something for the bug squad to focus on [23:13] :D [23:13] I'm hoping my team will have some time next cycle [23:13] s/cycle/iteration/ [23:13] well, dave is living in the future - he must have more time than anyone :D [23:14] according to rb at least :D [23:31] yup, living in the future, three minutes at a time [23:32] make it 8 and u'd be almost like the sun :)) [23:36] thumper, menn0: oh, yeah, before sleep: I want to make that interface smaller and at least force the questionable bits to go via .Underlying() so they know they're being bad [23:36] thumper, menn0: objections? [23:37] thumper, menn0: and I know it doesn't cover everything and you can write via Query but one thing at a time :) [23:38] thumper, menn0: (I refer to state.stateCollection) [23:38] fwereade: that sounds reasonable to me [23:40] fwereade: I defer to menn0 on this [23:41] menn0, thumper: cool, thanks, and happy weekends both :) [23:41] cheers