[01:11] thumper: back, ready to chat when you are [01:17] axw, you must of eaten a big breakfast ;) [01:17] alexisb: :p may have done some other things too... [01:18] axw: gimmie 10 to finish up what I'm doing [01:18] thumper: np, ping whenever [01:28] axw: now is good [01:29] thumper: standup? [01:29] sure [01:30] axw: as in bluejeans or hangout? [01:31] thumper: sorry, I meant hangout [01:31] derp [01:31] heh [01:31] link? [01:31] thumper: https://hangouts.google.com/hangouts/_/canonical.com/a-team-standup?authuser=1 === frankban|afk is now known as frankban === mup_ is now known as mup === mup_ is now known as mup === mup_ is now known as mup [08:01] morning all [08:02] any storage gurus around? I'm seeing this with the openstack provider and juju 2.0 rc-1 [08:02] juju add-storage ceph-osd/3 osd-devices=4,10G [08:02] failed to add "osd-devices": adding storage to unit ceph-osd/3: pool "cinder" not found [08:27] anastasiamac: https://bugs.launchpad.net/juju/+bug/1466514 [08:27] Bug #1466514: MachineSuite.TestCertificateUpdateWorkerUpdatesCertificate timeout while waiting for certificate to be updated === frankban is now known as frankban|afk [09:06] fwiw raised https://bugs.launchpad.net/juju/+bug/1629229 for ^^ [09:06] Bug #1629229: unable to add-storage with openstack provider [09:07] axw: i wonder if you might be amenable to giving this the thumbs up... https://github.com/juju/juju/pull/6353 [09:07] axw: large but fundamentally trivial [09:07] jamespage: https://bugs.launchpad.net/juju/+bug/1615095 [09:07] Bug #1615095: storage: volumes not supported [09:08] jamespage: (same underlying issue) [09:08] axw, dupe then - thanks! [09:09] rogpeppe: my brain is fried, but seeing as it's mechanical [09:10] axw: :) thanks! [09:12] fsck [09:12] formatted text output comes from iteration over a dictionary [09:12] so testing output is a damn pain [09:13] voidspace: ouch! [09:14] dooferlad: going to sort in the code just so I can test it [09:14] voidspace: +1 [09:15] dooferlad: I bet the default string sort is asciibetical [09:15] dooferlad: but I don't think I care... [09:15] dooferlad: and of course the only way to get all the keys out of a golang map is to iterate and build a slice [09:15] dooferlad: which is just inelegant code [09:15] but ah well [09:16] rogpeppe: +1 [09:16] axw: tyvm [09:33] babbageclunk macgreagoir https://github.com/juju/juju/wiki/Intermittent-failures === frankban|afk is now known as frankban [09:40] great page and diagrams redir! [09:45] redir: interesting. isn't that kind of thing why the Alarms channel exists? [09:45] redir: +1, really nice and clear with links ot code. [09:48] heads up: my rather large branch which standardises acronym spelling in juju-core has just landed. you might wanna rebase before submitting any existing PRs. [09:49] oh no, it hasn't actually landed, darn [11:13] it has now, finally [11:27] rogpeppe: I think the problem with the Alarms channel here is that you might have had many calls to clock.After, but have nothing in the waiting queue. [11:30] rogpeppe: So WaitAdvance would continue even though there isn't anything in waiting. [11:30] babbageclunk: unless someone's passing zero duration to time.After, how could there be nothing in the waiting queue? [11:30] babbageclunk: (assuming the waiting queue is drained before you call Advance) [11:30] babbageclunk: hmm, i guess if you advance a long way [11:31] rogpeppe: Yeah, that would clear multiple waiters, potentially. [11:31] babbageclunk: what's WaitAdvance? [11:31] babbageclunk: is something added in a feature branch [11:31] ? [11:32] rogpeppe: Oh, this is the proposed solution. It waits until there are waiters before advancing. [11:32] babbageclunk: that's never going to be sufficient [11:32] babbageclunk: because you need to wait for *all* the waiters [11:32] rogpeppe: We're aware that the name isn't very clear :( [11:33] babbageclunk: the problem is always going to be that you need to know exactly how many entities there are that will be waitinmg [11:33] rogpeppe: True [11:33] babbageclunk: which is always going to be fragile [11:34] rogpeppe: But so's advancing before the waiters get there. [11:34] babbageclunk: well exactly. you always need to wait for everyone to block on the clock [11:34] babbageclunk: before advancing [11:34] rogpeppe: Another possibility would be to have the waiters specify an absolute time instead of a delta. [11:34] babbageclunk: i've been toying with that idea [11:35] babbageclunk: but often waiters will actually want to wait for a certain length of time [11:35] babbageclunk: so they'll call time.Now [11:35] sorry, clock.Now [11:35] babbageclunk: and that will depend on whether advance has been called or not [11:36] babbageclunk: so i think you'll end up back in the same situation [11:36] rogpeppe: yup :/ [11:36] babbageclunk: one possibility might be to assume that the number of goroutines remains constant [11:37] babbageclunk: then the first time you wait for a while to gather them all, then always wait for that number of alarms in the future [11:37] babbageclunk: but that's not great either [11:37] babbageclunk: another possiblity, perhaps better but more work is this: [11:37] babbageclunk: if you start a goroutine that's going to block, then register it with the clock [11:38] babbageclunk: and unregister when it exits [11:38] babbageclunk: then you know exactly how many alarms to wait for [11:38] rogpeppe: Makes sense, but that's really invasive. [11:38] babbageclunk: yup [11:39] babbageclunk: it could also potentially be useful for other things though - it could give an idea of how many long-lived workers there are in the system. [11:40] babbageclunk: hmm, but it doesn't work in general [11:40] babbageclunk: because most workers don't block on time events. [11:41] babbageclunk: i think that you've bitten off a problem that's too hard to chew [11:41] babbageclunk: i don't think there's going to be any decent solution to this in the large scale [11:42] rogpeppe: But we already have intermittent failures for tests using clock.Advance. [11:42] babbageclunk: yup, i'm not that surprised [11:42] rogpeppe: Fixing the easier constant number of waiters problem still seems worthwhile even though we can't do the broader one. [11:43] babbageclunk: the problem is that the number of waiters won't remain constant [11:43] babbageclunk: for example, you might have a worker that mostly blocks on a network event, but occasionally makes a call with a retry and wait [11:43] rogpeppe: Well, if we make the number of waiters a parameter then the test can change as it needs. [11:44] babbageclunk: it's gonna be inevitably fragile because the test needs to know exactly all the things underneath that can block, and that's an implementation detail [11:44] rogpeppe: True, but I can't see any other way. [11:45] babbageclunk: i suspect that clock mocking is useful in the small scale but becomes an anti-pattern in more integration-level tests [11:45] rogpeppe: Well, other than not using clock [11:46] rogpeppe: Yeah, I think you're right [11:46] babbageclunk: at that level, better just to have very small durations for waiting and use wall clocks [11:50] rogpeppe: Nothing's ever easy! :( [11:50] babbageclunk: :) [11:50] babbageclunk: sometimes it's better to cut yer losses [11:51] rogpeppe: Thanks for the discussion/pep-talk anyway. [11:51] babbageclunk: np. it's a very interesting area. [11:51] * babbageclunk lunches [11:52] * perrito666 forgot to tell hr that today is a local holiday... and also forgot himself [12:07] hey, is anyone available for a second review for https://github.com/juju/juju/pull/6352 ? thanks [12:13] frankban: i'll take a look [12:22] frankban: it seems a little weird that we serve the default icon instead of a not-found error for any file in the archive at all. what's the rationale behind it? [12:23] frankban: oh, i see now, it's only if you ask for the icon [12:33] rogpeppe: yes [12:37] frankban: reviewed [12:41] rogpeppe: ty [12:50] babbageclunk: frobware anastasiamac macgreagoir https://github.com/juju/testing/pull/113 is ready for review PTAL [13:04] add official-dns-name controller attribute: https://github.com/juju/juju/pull/6363; reviews appreciated, thanks. [13:24] katco: ping? [13:26] wwitzel3: ping [13:26] wwitzel3: actually, unping - but hi o/ [13:26] wwitzel3: gotta nip off - will catch you in a bit :-) [13:27] review for someone https://github.com/juju/juju/pull/6364 [14:01] babbageclunk: hey [14:06] dooferlad: ooh, thanks for the review [14:16] katco`: Hey, I'm chasing an intermittent test failure in https://github.com/juju/juju/blob/master/worker/logforwarder/logforwarder_test.go#L221 [14:17] happy friday all! [14:17] alexisb: o/ happy friday [14:18] babbageclunk: ok [14:18] katco`: Trying to understand the logforwarder - do you know why the loop method has two goroutines in https://github.com/juju/juju/blob/master/worker/logforwarder/logforwarder.go#L188 [14:19] babbageclunk: confused; i only see 1? [14:20] babbageclunk: do you mean why does it start a goroutine when it's running in a catacomb goroutine? [14:20] katco`: Oh, sorry - you're right. I mean, why the internal goroutine pushes records over to the main loop via a channel, rather than just doing sender.Send() itself. [14:21] babbageclunk: ah, let me read through the code rq [14:22] katco`: Thanks! [14:22] this may be a remnant of the fact that it used to be two separate workers [14:23] this is some sloppy code... [14:23] e.g. why is L189 not after L188 [14:23] doh L187 after L188 [14:24] katco`: Oh, yeah - that's pretty dangerous. [14:25] natefinch: Yeah, that would explain it a bit. [14:25] stream is used in the closure [14:25] natefinch: right but it's not used outside the closure [14:26] oh ha [14:27] really, that whole closure should have been pulled out into a separate function. No sense putting almost 40 lines of code inside another function. [14:27] Also would make the interdependencies more obvious [14:28] there's no need for the member variable enabled either. could use enabledCh, have: func (l *LogForwarder) Enabled() bool { _, ok := <- l.enabledCh; return ok == false } [14:28] lf.waitForEnabled, and then we do if !enabled? [14:30] babbageclunk: my guess is that it's in a goroutine because we're not using channels for streaming records; so we have to obtusely convert a method into a channel [14:31] babbageclunk: if it were a pipeline all the way down, and stream returned a channel, we could just have one loop, one select [14:33] babbageclunk: does that make sense? if you all you have is a method to retrieve records, you can't use it in a select block unless you create a goroutine to convert the calls into a channel [14:33] katco`: Ah, makes sense - because stream.Next() blocks? [14:33] babbageclunk: yeah, and it must be a preemptable op [14:33] babbageclunk: this is why i advocate for API's returning channels. bc you end up having to do this dance [14:33] katco`: Thanks! That's much clearer. [14:34] select is way more fundamental of an op than people realize; it's how modules come together sanely [14:34] babbageclunk: hth [14:35] babbageclunk: if you're touching this code, please try and clean it up a bit? again, waitForEnabled could be returning whether the enabled chan is closed or not [14:35] katco`: definitely - I'm beginning to get a feel for this but it's still not second nature. [14:35] awesome article on using channels in APIs, for anyone who hasn't seen it: https://inconshreveable.com/07-08-2014/principles-of-designing-go-apis-with-channels/ [14:35] katco`: ok, will do. Trying to understand the test now. [14:36] gets rid of a variable and a bunch of locking [14:36] * babbageclunk clicks [14:36] the link I mean. [14:40] babbageclunk: it's probably too much surgery/controversial to change stream.Next() to channels, but i would encourage you to pull that goroutine out into a method that has a clear name [14:41] katco`: Yeah, I'll definitely do that - seems like a nice Friday afternoon refactoring task. [14:41] +! ^ [14:41] +1 that is :) [14:42] anastasiamac, ping [14:42] alexisb: pong? [14:42] + it might help me understand the timing issue causing the intermittent failure. [14:43] anastasiamac, can you jump in a HO with me real fast [14:43] alexisb: i can. let me get ears :) [14:44] ok, anastasiamac I will meet you on teh a-team bluejeans [14:44] k [14:46] alexisb: anastasiamac looks like they're using 1.25? [14:47] katco ping [14:49] natefinch, ping [14:49] alexisb: yo [14:49] babbageclunk: https://github.com/juju/juju/pull/6366 PTAL [14:49] heya natefinch can you come join the party [14:49] alexisb: I love parties [14:49] https://bluejeans.com/5036865018 [14:49] o you are really going to love this party [14:50] ominous [14:54] alexisb: sorry, i have 2 nicks in here right now =/ [14:54] babbageclunk, natefinch was soooooo glad he joined the party ;) [14:55] power pc AND simplestreams. I'm so lucky [14:55] oh, and 1.25 [14:55] =| [14:55] d'oh [14:55] * redir backs away slowly [14:55] s/quickly [14:55] lol [14:56] * natefinch jumps on the grenade [14:56] katco natefinch may need your morale support today [14:56] :) [14:57] http://i.imgur.com/YQ6kI9z.gif [14:57] i'm afraid i'm fresh out of morale =| [14:57] heh [14:57] v1/2 -> v3 charm url has drained me [14:57] katco`: ouch! Sorry! [14:58] babbageclunk: yeah i had to pull in the latest charm lib. breaks a bunch of tests. like a lot. [14:59] katco`: I know - I fixed most of them! In a branch that isn't landed yet. :( [14:59] babbageclunk: well too late now ;p [14:59] Oh no. [14:59] babbageclunk: https://github.com/juju/juju/pull/6348 [14:59] babbageclunk: doesn't have my latest pushed up yet [15:06] katco`: I updated juju/names to handle the new charm urls: https://github.com/juju/names/pull/74 [15:06] any chance of a review of this branch, by any chance? https://github.com/juju/juju/pull/6363 [15:07] i need a review from someone in juju-core please [15:08] babbageclunk: yeah: https://github.com/juju/juju/pull/6348/files#diff-bc1c339eba7450f0fe12ab61e4e1987aR81 [15:08] katco`: The thing that stalled me was that the cmd/juju/applcation tests were failing because the charmstore would send bundle URLs back in the V3 format (which doesn't include "bundle") [15:12] voidspace, do you have a moment to review rogpeppe PR https://github.com/juju/juju/pull/6363 [15:24] alexisb: free for a hangout? [15:26] mgz: yt? [15:28] redir: yo [15:29] mgz: there's two things that landed this week that will hopefully reduce or remove some intermittent issues. 1. a patch in patches to fix a race in mgo.Stats 2. an update to the testing clock. [15:30] 1. mgz there's a change in clock at https://github.com/juju/testing/blob/master/clock.go#L107 which should show up if we experience intermitttent failures like https://bugs.launchpad.net/juju/+bug/1607044 [15:30] Bug #1607044: WorkerSuite.TestUpdatesAfterPeriod timed out [15:32] gotcha, so we want to look out for that new message in failures [15:32] and also see if there's an obvious change in incidence [15:33] mgxz exactly [15:33] and the other is related to https://bugs.launchpad.net/juju/+bug/1604817 [15:33] Bug #1604817: Race in mgo Stats implementation [15:34] which might fix various tests where that race is the cause. [15:34] mgz I plan to send an email to the qa list, but thought I'd mention it in case I forget over the long weekend. [15:34] :) [15:34] redir: thanks! [15:38] babbageclunk, I am now [15:39] alexisb: ok, jumping into the hangout [15:39] babbageclunk, I am there [15:45] i'm trying to get a cloud-utils upload into yakkety. it is blocked on juju's dep8 test due to failures on ppc64: [15:45] http://autopkgtest.ubuntu.com/packages/j/juju-core-1/yakkety/ppc64el [15:45] and: [15:45] http://autopkgtest.ubuntu.com/packages/j/juju-core-1/yakkety/amd64 [15:45] (linked to from http://people.canonical.com/~ubuntu-archive/proposed-migration/update_excuses.html) [15:45] i really do not think cloud-utils is related at all, as the onyl change to that package is in mount-image-callback, which i'm pretty sure is not used by juju [15:45] anyone able to help refute or confirm that ? [15:47] Setting up lxc1 (2.0.4-0ubuntu4) ... [15:47] ERROR: ld.so: object 'libeatmydata.so' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored. [15:48] looks like your rebuild is being unfairly blamed [15:49] but, does seem to be a real issue of some kind [15:50] actually, maybe it is the utils change [15:51] "Tools checksum mismatch" [15:51] is not the clearest message [15:51] but seems to be the final failure? [15:56] smoser: we can run these tests in CI, and perhaps with extra debugging [15:57] mgz, did you look at ppc64el or amd64 [15:57] they fail differently [15:58] babbageclunk: you still around? [15:58] katco`: yup yup [15:58] babbageclunk: what's the new bundle url format for v3? [15:59] It's not distinguishable from the charm url [15:59] katco`: ^ [16:00] babbageclunk: do i specify a series for a bundle then? [16:00] smoser: the amd64 failures all seem to be testbed related [16:00] it's possible the ppc64 one is as well, in a more subtle way [16:01] katco`: No, I don't think so. Some more details here: https://bugs.launchpad.net/juju/+bug/1584193 [16:01] Bug #1584193: juju deploy is in a different form than jujucharms.com <2.0> [16:01] perhaps we want to poke pitti about it? [16:01] babbageclunk: ta. if i specify "wordpress-with-endpoint-bindings/1" i get "series not specified" [16:02] katco`: In deploying on command line or in tests? [16:02] tests [16:03] right - in that case it's going through a charmstore running locally and the url that comes back is in the V3 format. [16:03] babbageclunk: And there's code in the deploy command looking for series == "bundle" [16:04] duh, who am I again? [16:04] katco`: ^ [16:04] babbageclunk: https://github.com/juju/juju/blob/master/cmd/juju/application/bundle_test.go#L124 [16:04] babbageclunk: "cannot post archive: series not specified" [16:04] babbageclunk: i've changed the test to be wordpress-with-endpoint-bindings/1 [16:05] babbageclunk: seems like it's an error on uploading the bundle... maybe i need to pull in a newer copy of charmrepo? [16:06] katco`: Ah, I fixed that (in my branch) but then got to the next problem and had an existential crisis. [16:06] katco`: Hang on - finding link [16:06] babbageclunk: well maybe we can meet halfway :) i think i had already fixed what you ran into [16:06] honestly it's all becoming a bit of a blur [16:07] katco`: same - sorry you're running into all of this too. [16:11] katco`: So I've got a local change that sets the series to bundle when uploading http://paste.ubuntu.com/23256099/ [16:12] katco`: Then the deploy line in the test fails because the urls coming back from the local charmstore don't indicate that it's a bundle. [16:12] babbageclunk: this looks... scary. if i introduced this, what exactly are we testing? you say deploy has something similar? [16:13] katco`: I'm certainly not sure it's the right thing to do. [16:13] babbageclunk: hm... thanks for the diff. i will think on this for a few mins [16:16] katco`: Here's the bit of deploy that is checking for series == "bundle". If this fails then it falls back to assuming it's a charm and you get UnsupportedSeries errors back. [16:16] https://github.com/juju/juju/blob/master/cmd/juju/application/deploy.go#L882 [16:17] babbageclunk: ah, i remember that bit! :) === frankban is now known as frankban|afk [16:19] katco`: basically all of this stems from the change I made in charm splitting URL.String() and URL.Path(). [16:19] katco`: I think it's possible to fix the tests by changing the charmstore code to use url.Path() where it's currently using .String()... [16:20] katco`: But I'm not sure that's the right thing to do. [16:22] katco`: Or another fix would be to make charm urls distinct from bundle urls by including "bundle" in the V3 format. [16:31] alexisb: rogpeppe: sorry, missed your message - it has a couple of approves already, but I can see changes since those [16:31] alexisb: rogpeppe: I'll take a look [16:31] voidspace, thanks, we just want to make sure we get one person from core to review [16:32] alexisb: ah, cool [16:35] katco`: seen this? https://mobile.twitter.com/filler/status/781175066742534144 [16:36] perrito666: yes :) [16:39] hi guys! with a workload state of blocked, how often will update-status retry? [16:40] retry/execute/whatever [16:42] rogpeppe: yup, LGTM [16:46] babbageclunk: hi! ;) [16:46] anastasiamac_: hello also :) any ideas ^^? [16:47] admcleod: hi! I don't know off the top of my head, and this isn't a great time for code-spelunkery, sorry! [16:48] admcleod: perrito666, do u know ^^ [16:48] sinzui: ^^^^ :} [16:50] anastasiamac_: 5 something, sec or min, cant recall which [16:50] perrito666: and then apparently it increases? [16:50] admcleod: ^ gimme a sec and Il lookl [16:51] update-status is every 5 or 10 minutes, I forget which [16:51] 5 minutes [16:52] admcleod: doesn't look like it changes [16:52] it doesnt [16:52] it runs every 5 mins [16:52] http://paste.ubuntu.com/23256225/ [16:54] admcleod: I wonder if other hooks are preventing it from running on time [16:55] admcleod: i did a --replay and the output didn't change from your pastebin.. 'active' charms update-status every 5. 'blocked' charms are every 25. (at least in this case) [16:56] other things might be preventing the update-status hook from being invoked [16:56] maybe, but show-status-log isn't telling... [16:56] 30 Sep 2016 15:08:36Z juju-unit idle [16:56] 30 Sep 2016 15:12:55Z juju-unit executing running update-status hook [16:56] 30 Sep 2016 16:56:21Z idle last 3 statuses repeated 20 times [16:57] i really dont think anything else will be getting in the way [16:57] kwmonroe: what version juju? [16:57] rc1 [16:58] admcleod: the code running that hook is very different since I was last there so cant give you a more complete answer without some digging [16:59] ya know what's interesting admcleod? the last line of show-status-log above says "repeated 20 times", which is 100 minutes if update-status runs every 5... which is the time difference between the last 2 messages in that output. === alexisb is now known as alexisb-afk [17:01] kwmonroe: ah [17:02] i reckon i'll just watch the stupid thing for 5 minutes and put this to rest. [17:03] ha! don't have to. admcleod, machine logs on the unit show it running every 5. not sure why debug-log --replay showed every 5th one. [17:04] (and did so consistently over the last 1.5 hours) [17:05] kwmonroe: nice one, thanks [17:06] wishing these tests didn't spin up an entire charmstore and actually deploy charms right about now [17:33] voidspace: thanks! === alexisb-afk is now known as alexisb [18:51] while trying to test goose library changes in juju, i’m hitting the known issue where godeps leaves your git respository in a detached head state. and my changes don’t seem to take. [18:52] google ways to fix involve not using godeps restore or something. [18:52] does anyone have a better way in juju land? [19:03] hml: git checkout master [19:03] hml: if you just want to get the repo back into a known state [19:03] hml: not sure exactly what you're doing, but I have hit similar problems with gopdes [19:04] godeps [19:04] natefinch: ideally i’d like to test juju using the testservices from my goose branch - it’s not in the master yet because testing isn’t complete. [19:04] hml: ahh, ok. [19:05] hml: so, run godeps -u dependencies.tsv for juju. *then* switch your goose code to your branch, then build juju [19:05] make sure your goose code is in the same directory as the default (so, $GOPATH/src/gopkg.in/goose.v1/) [19:06] the way I usually do it is to just git remote add [19:07] natefinch: i’m working in a branch off of … maybe that’s the problem [19:07] hml: totally doesn't matter. go install/go build etc only care about what code is in $GOPATH/src/gopkg.in/goose.v1/ it doesn't know about git or branches or anything [19:08] the *only* thing that understands git / vcs at all is "go get" after the code is on your drive, the rest of the commands only look at the files on disk in the directories they're told to look at by import statements [19:09] natefinch: running off to try this === benji is now known as Guest10314 [19:38] natefinch: got it working - thank you, i think i was confused about how often to run godeps to update the dependencies… i’m picking up my code to test now. [19:51] hml: yeah, you only have to do that once, immediately after you switch branches [20:34] Where can I find what uuid the cloud provider of choice is using for bootstrap?