[00:14] if there's a review of https://codereview.appspot.com/8811043/ we could get it in under the wire [00:14] which would be nice :-) [00:15] and it is very small [00:16] * davecheney looks [00:22] gary_poster: this diff is dirty [00:22] there is some bzr .THIS crap in thre [00:22] davecheney, I was assuming that it was a clean up [00:23] davecheney, might a repush clean it up? [00:23] Makyo, ^^^ could you try a repush? [00:23] davecheney, than kyou for looking [00:23] actually, this removes a mistaken checkin [00:23] let me publish my review, there is some more dirtyness [00:23] ok thanks [00:24] hazmat: this was your fault! [00:25] done [00:25] thumper: https://codereview.appspot.com/8811043/ [00:25] can you give the gui guys a second LGTM [00:25] gary_poster: or you could just commit it [00:25] if time is a factor [00:25] davecheney, thanks much. I'll check what is possible, [00:25] * thumper looks [00:27] done [00:28] * thumper off to the gym shorly for some sparring [00:28] gary_poster: Makyo you've got two, make those changes and fire at will [00:28] awesome thanks very much davecheney and thumper === thumper is now known as thumper-afk [00:38] does anyone have a link to last week's hangout agenda ? [00:38] m_3: ping [00:46] okay, have the other bug down as well [00:49] davecheney: hey [00:49] mgz: we're stuc [00:49] k [00:49] m_3: how's it going ? [00:49] mgz: I'm leaving it be atm [00:50] davecheney: seems hung up on something [00:50] hang on, changing hosts [00:50] m_3: tell me where it hurts [00:51] davecheney: we brought up 200 earlier to checck on adding incremental nodes via nova cli [00:51] m_3: I saw it's not increasing, but didn't see anything in the log [00:51] davecheney: mgz played a bit [00:51] davecheney: then added another 200 (juju add-unit hadoop-slave -n200) [00:51] I have two of the three issues I've seen tracked down [00:51] davecheney: it added 15 more and hung [00:51] * m_3 gotta relocate... getting kicked out of the expo area [00:52] mgz: i've seen an issue where the PA will get into a loop if there are problems creating security groups [00:52] back on after while... I'll leave it up for y'all [00:52] m_3: mgz: i suggest destroying this instance [00:52] bzr pull to rev 1164 [00:52] then trying again [00:53] 2013/04/16 23:41:06 ERROR provisioning worker: failed to GET object provider-state from container goscale2-1 [00:53] caused by: https://region-a.geo-1.objects.hpcloudsvc.com/v1/17031369947864/goscale2-1/provider-state%!(EXTRA string=failed executing the request) [00:53] i think i have a fix for these formatting errors [00:55] mgz: i'd like to destroy this envbironment and start again with a fresh build [00:56] okay, we could probably do with fixing a couple of the dumb issues too [00:56] rev 1164 cuts down on 66% of the log span [00:56] spam [00:56] that particular one is probably api flood issue again [00:56] i'll grap the all machines log [00:57] the blow away the environment [00:59] ubuntu@juju-hpgoctrl2-machine-0:~$ scp 15.185.160.145:/var/log/juju/all-machines.log . [00:59] all-machines.log 20% 141MB 5.6MB/s 01:38 ETA [00:59] that is a great network hp [01:03] filed bug 1169778 [01:03] davecheney: compress first :) [01:04] mgz: sure [01:04] but 5.4Mb/s is shit [01:04] my pandaboard can do better than that [01:04] mgz: it's nice to know you can resume juju destroy-environment [01:08] mgz: m_3 we have 7 stray nova instances after destroy-environemnt [01:08] hmm [01:08] actually only 2 [01:08] never mind [01:12] one was mine, that I killed, and one is the manage one? [01:12] fixed it [01:12] wc -l was the wrong thing to tuse [01:15] mgz: m_3 deploying 200 hadoop-slave units [01:17] 100% of the time is being consumed waiting on hp to respond to new instance requests [01:57] davecheney: hey [01:57] davecheney: how's it going? [01:58] davecheney: btw, you never want to use 'hadoop-slave' the charm... (long story) [01:58] m_3: 2013/04/17 01:52:25 NOTICE worker/provisioner: started machine 200 as instance 1509985 [01:58] davecheney: make sure you're doing `juju deploy hadoop hadoop-slave -n200` [01:58] m_3: sorry, i was looking in the history, that is all I could find [01:58] ack [01:58] nm, we can destroy the instance [01:58] always ran from bin/hadoop-stack [01:59] ok, let me nuke this environment [01:59] (it worked fine btw [01:59] rev 1164 is much less chatty [01:59] davecheney: did it get above 200? [01:59] davecheney: we have perms to go to 2000 [01:59] just did -n200 [01:59] were just having probs right near the 200 mark [01:59] i'll bootstrap a fresh env [01:59] do bin/hadoop-stack [01:59] I did a `-n1997` before [02:00] yeep! [02:00] thati'll take hours [02:00] did you get mgz's changes in? [02:00] no, but the logging change went in [02:00] yup... about 8hrs or so [02:00] so we can actually see what is going on [02:00] ah, cool [02:00] wanna do 300? [02:00] ok [02:00] or shall we go big [02:00] 500 should see us throught to drinks [02:00] how is ODS ? [02:00] is the marketing completely off tap [02:01] ? [02:01] dude [02:01] hookers and blow ? [02:01] it's pretty over the top [02:01] haha [02:01] yeah, all hookers and blow [02:01] supposed to meet kapil in the lobby for dinner in a bit [02:01] actaully, card tables, laptops and tshirts [02:02] yeah :( [02:02] better than couch, popcorn, and underwear at least [02:02] m_3: i don't need to go to a conference for that [02:04] m_3: we're good to go [02:04] hit it with the old -n300 [02:05] hmm, that isn't good [02:05] let me drive for a second [02:05] ok, still piss farting around [02:05] the last bootstrap was much faster [02:06] ack [02:06] still scroblling in apt [02:06] ├─cloud-init-cfg─┬─add-apt-reposit [02:06] │ └─sh───tee [02:06] wow [02:06] that is slow [02:06] this instance may fail to bootstrap [02:07] might just killl it [02:07] do it [02:07] oh wait [02:07] yeah, that is really slow [02:07] nuke it [02:08] one that one's up just run bin/hadoop-stack [02:08] I'll chcek in after dinner if that's cool with you? [02:08] m_3: ok, is that configured for -n300 ? [02:08] maybe you can catch what's going wrong near the 200 mark [02:08] totolly cool [02:08] yup [02:08] you go have dinner with K [02:08] awesome thanks man... ttyl [02:09] oh, and let's archive the logs in ~/arch/ [02:09] sure, i've been scp'ing them onto my machine [02:09] will fix [02:09] at least to keep some sort of record of what we're doing [02:09] thanks [02:09] actually, i'll put it in google drive [02:09] I'm still rsyncing the whole control node's /home/ubuntu offsite daily [02:09] perfect [02:10] probably isn't a bad idea [02:10] make sure you use rsync -z [02:10] (-azvP) [02:10] cool... /me food [02:10] thanks man [02:12] m_3: also, status now shows relations [02:13] 2013/04/17 02:13:05 DEBUG environs/openstack: openstack user data; 2708 bytes [02:13] 2013/04/17 02:13:05 ERROR settings for entity with unrecognized key "r#0#requirer#hadoop-slave/6" [02:13] \o/ [02:13] 2013/04/17 02:13:05 ERROR settings for entity with unrecognized key "r#1#requirer#hadoop-slave/6" [02:13] 2013/04/17 02:13:05 ERROR settings for entity with unrecognized key "r#0#requirer#hadoop-slave/5" [02:13] 2013/04/17 02:13:05 ERROR settings for entity with unrecognized key "r#1#requirer#hadoop-slave/5" [02:13] 2013/04/17 02:13:05 ERROR settings for entity with unrecognized key "r#0#requirer#hadoop-slave/6" [02:13] 2013/04/17 02:13:05 ERROR settings for entity with unrecognized key "r#1#requirer#hadoop-slave/6" [02:14] 2013/04/17 02:13:05 ERROR settings for entity with unrecognized key "r#0#requirer#hadoop-slave/5" [02:14] 2013/04/17 02:13:05 ERROR settings for entity with unrecognized key "r#1#requirer#hadoop-slave/5" [02:14] if this is an error, why does the provisioner not stop [02:14] davecheney: that doesn't look good [02:14] ? [02:15] 2013/04/17 02:12:29 INFO environs/openstack: starting machine 14 in "goscale2" running tools version "1.9.14-precise-amd64" from "https://region-a.geo-1.objects.hpcloudsvc.com/v1/17031369947864/goscale2-1/tools/juju-1.9.14-precise-amd64.tgz" [02:15] 2013/04/17 02:12:29 DEBUG environs/openstack: openstack user data; 2710 bytes [02:15] 2013/04/17 02:12:40 ERROR settings for entity with unrecognized key "r#0#requirer#hadoop-slave/1" [02:15] 2013/04/17 02:12:40 ERROR settings for entity with unrecognized key "r#1#requirer#hadoop-slave/1" [02:15] 2013/04/17 02:12:40 ERROR settings for entity with unrecognized key "r#0#provider#hadoop-master/0" [02:15] 2013/04/17 02:12:40 ERROR settings for entity with unrecognized key "r#1#provider#hadoop-master/0" [02:15] 2013/04/17 02:12:40 ERROR settings for entity with unrecognized key "r#0#requirer#hadoop-slave/1" [02:15] 2013/04/17 02:12:40 ERROR settings for entity with unrecognized key "r#1#requirer#hadoop-slave/1" [02:15] 2013/04/17 02:12:40 ERROR settings for entity with unrecognized key "r#0#provider#hadoop-master/0" [02:15] 2013/04/17 02 [02:15] starts to screw up here [02:17] caused by: https://az-2.region-a.geo-1.compute.hpcloudsvc.com/v1.1/17031369947864/servers%!(EXTRA string=failed executing the request) [02:17] i really want to fix this stupid formatting errror [02:47] shit shit shit [02:47] the 1.9.14 release cannot seem to download any tools [02:47] however when I build the release from source [02:47] it can ! [02:51] 2013/04/17 02:50:21 DEBUG checking tools 1.9.14-precise-amd64 [02:51] 2013/04/17 02:50:21 INFO environs/openstack: starting machine 298 in "goscale2" running tools version "1.9.14-precise-amd64" from "https://region-a.geo-1.objects.hpcloudsvc.com/v1/17031369947864/goscale2-1/tools/juju-1.9.14-precise-amd64.tgz" [02:51] 2013/04/17 02:50:21 ERROR worker/provisioner: cannot start instance for machine "298": cannot find image satisfying constraints: failed to get list of flavours [02:51] caused by: https://az-2.region-a.geo-1.compute.hpcloudsvc.com/v1.1/17031369947864/flavors%!(EXTRA string=failed executing the request) [02:51] caused by: Get https://az-2.region-a.geo-1.compute.hpcloudsvc.com/v1.1/17031369947864/flavors: lookup az-2.region-a.geo-1.compute.hpcloudsvc.com: Temporary failure in name resolution [03:12] i'm having a lot of trouble validating the 1.9.14 release [03:13] is anyone in a position to install 1.9.14 from ppa:juju/devel [03:13] and try [03:13] (don't forget to make sure you're calling /usr/bin/juju, not ~/bin/juju) [03:33] hazmat, if you're there, I don't think we necessarily want to destroy machines [03:34] hazmat, I suspect the deploy-ubuntu-to-prime, force-machine-to-deploy (anti-)pattern will be popularly used [03:54] davecheney, thumper: if you're there, can I get a trivial on https://codereview.appspot.com/8815043 please? [03:57] fwereade: i'll take a look [03:57] fwereade: i'm very concerned that 1.9.14 is stillborn [03:58] i'm thinking of proposing we move to daily point releases [03:58] so m_3 and I stop using the development chain [03:58] davecheney, I don't think it'll build, given this stuff [03:58] davecheney, it's an old test file that somehow crept in [03:59] wut ? [04:00] davecheney, at a casual glance it looks like it slipped in from the force-machine branch [04:00] this is the second turd to fall out of that branch [04:00] davecheney, although it's not quite clear how [04:00] without being rude, 4 people looked at that branch [04:00] davecheney, ouch :( [04:00] and nobody spotted the .THIS file there [04:02] fwereade: LGTM, please submit [04:03] davecheney, happening right now [04:03] davecheney, looks like that file slipped in somewhere between 1 and 5 days ago after the main round of reviewing [04:04] so that means the submitter did not run the tests before submitting [04:04] do we need to make another adding to .lbox.check ? [04:04] /s/adding/addition [04:05] I never found out what happened to the bot proposal but that also seems to be stillborn [04:05] in its absence, maybe we do [04:05] hi fwereade [04:06] what are you doing up? [04:06] thumper, fell asleep early last night, back up now to merge a whole pile of things and do some reviews [04:06] fwereade: we can't have a bot til the tests pass reliably [04:06] that was my understanding [04:06] fwereade: oh, I have four reviews up? [04:06] * thumper goes to drink the coffee [04:08] davecheney, if we can't have a bot without reliable tests but we can't commit reliably without a bot *something* needs to change [04:09] fwereade: i think we can at least do a test compile of the tests [04:09] that would ahve caught this error [04:09] but it does seem a bit futile [04:09] we're like the TSA [04:09] running after the horse [04:09] davecheney, oh! how do we compile tests without running them? [04:10] davecheney, actually, don't tell me, I might start doing it ;p [04:11] fwereade: i'm sure there is a way [04:11] (cd /tmp ; go test -c $PKG) will do it [04:12] davecheney, emphatic +1 then [04:17] fwereade: http://paste.ubuntu.com/5714987/ [04:17] can anyone else test the 1.9.14 release ? [04:18] the top invocation is from the pacakge [04:18] the bottom is the same source [04:23] 2013/04/17 02:50:17 INFO environs/openstack: starting machine 207 in "goscale2" running tools version "1.9.14-precise-amd64" from "https://region-a.geo-1.objects.hpcloudsvc.com/v1/17031369947864/goscale2-1/tools/juju-1.9.14-precise-amd64.tgz" [04:23] 2013/04/17 02:50:17 ERROR worker/provisioner: cannot start instance for machine "207": cannot find image satisfying constraints: failed to get list of flavours [04:23] caused by: https://az-2.region-a.geo-1.compute.hpcloudsvc.com/v1.1/17031369947864/flavors%!(EXTRA string=failed executing the request) [04:23] caused by: Get https://az-2.region-a.geo-1.compute.hpcloudsvc.com/v1.1/17031369947864/flavors: lookup az-2.region-a.geo-1.compute.hpcloudsvc.com: Temporary failure in name resolutio [04:23] after 200 machines, we loose the ability to resolve the name of the hp endpoint [04:24] davecheney, hmm, I seem to be bootstrapping ok in us0east-1 with 1.9.14 from the package [04:24] fwereade: so is it like 6:30am for you now? [04:25] thumper, yeah [04:25] fwereade: cool, not even mind-numbingly early [04:25] fwereade: don't suppose you have time for a quick hangout? [04:25] as predicted I fell asleep early and woke up early [04:25] thumper, sure [04:25] fwereade: i cannot bootstrap into ap-southeast-1 or 2 with that release [04:25] fwereade: I'll start [04:25] but it works perfectly using the source [04:31] * davecheney has had a gutfull of unreliable cloud providers [04:31] ubuntu@juju-hpgoctrl2-machine-0:~$ juju destroy-environment [04:31] error: failed to delete server with serverId: 1510323 [04:31] caused by: https://az-2.region-a.geo-1.compute.hpcloudsvc.com/v1.1/17031369947864/servers/1510323%!(EXTRA string=failed executing the request) [04:31] caused by: remote error: handshake failure [04:50] davecheney, ap-southeast-2 is... well, appearing to bootstrap anyway [04:50] davecheney, but acting a bit weird, I will let you know how I get on [04:56] davecheney, ok, the good news is that the bootstrap node comes up and works fine, the bad news is that the CLI will not acknowledge its existence [04:57] fwereade: -v ? [04:57] did it bootstrap the wrong tools [04:57] ? [04:57] 2013/04/17 06:55:37 INFO environs/ec2: waiting for DNS name(s) of state server instances [i-79c80744] [04:57] 2013/04/17 06:55:42 ERROR command failed: no instances found [04:57] error: no instances found [04:58] eep [04:58] davecheney, the instance definitely exists and it's definitely got the right tools [04:58] fwereade: i;m getting these errors constantly during the load tests [04:58] 2013/04/17 04:57:50 ERROR settings for entity with unrecognized key "r#0#requirer#hadoop-slave/2" [04:58] 2013/04/17 04:57:50 ERROR settings for entity with unrecognized key "r#1#requirer#hadoop-slave/2" [04:58] 2013/04/17 04:57:50 ERROR settings for entity with unrecognized key "r#0#requirer#hadoop-slave/0" [04:58] 2013/04/17 04:57:50 ERROR settings for entity with unrecognized key "r#1#requirer#hadoop-slave/0" [04:58] 2013/04/17 04:57:50 ERROR settings for entity with unrecognized key "r#0#requirer#hadoop-slave/2" [04:58] davecheney, so that is good anyway [04:58] 2013/04/17 04:57:50 ERROR settings for entity with unrecognized key "r#1#requirer#hadoop-slave/2" [04:58] 2013/04/17 04:57:50 ERROR settings for entity with unrecognized key "r#0#requirer#hadoop-slave/0" [04:58] 2013/04/17 04:57:50 ERROR settings for entity with unrecognized key "r#1#requirer#hadoop-slave/0" [04:59] davecheney, ah ffs there is no way those should be logged [05:00] davecheney, they're not errors, it's just the allwatcher getting its knickers in a twist because it's watching things it doesn't know about [05:01] davecheney, btw, we seem to be waiting only a very few seconds for the DNS name (apse2 issues), which is surprising to me [05:10] fwereade: oh, so that is the api server [05:10] davecheney, yeah [05:10] i was wondering why provisioning didn't land in a screaming heap [05:10] fwereade: i'll log a bug [05:11] davecheney, they're not even really errors -- it's just logging and stopping processing for those docs [05:11] davecheney, cheers [05:11] i'll log two bugs [05:11] can I get a review on https://codereview.appspot.com/8818043/ [05:14] https://bugs.launchpad.net/juju-core/+bug/1169825 [05:21] davecheney, what happened to errors.Newf (if anything)? [05:22] nothign, just people were not calling it correctly [05:22] URL was not a format specifier [05:22] it was arg 1 [05:23] * fwereade sees now [05:23] davecheney, LGTM [05:24] http://paste.ubuntu.com/5715085/ [05:32] https://bugs.launchpad.net/juju-core/+bug/1169826 [06:05] morning [06:07] fwereade: seen my part 1? during sleep I found a by far simpler approach. will continue after breakfast. [06:21] afternoon wallyworld, how's it going today? [06:21] TheMue, great news, I'm afraid I fell asleep last night ;p [06:21] jam1: going well thanks, just sorting out a test isolation issue and i will send a mp for my work === jam1 is now known as jam [06:22] fwereade: no wonder after your night before [06:28] fwereade: second LGTM on https://codereview.appspot.com/8818043/ [06:29] sorry, I meant davecheney ^^ [06:31] bigjools, jtv: Is the maas provider ready to be exposed in jujud/juju client? [06:32] * jtv is not here [06:33] jam: what is the effect of that? Sorry, I have no context on it. [06:34] bigjools: if you have it enabled, people can configure it in ~/.juju/environments.yaml and try to bootstrap against a maas service. [06:34] as far as I know it all works, so if there's something missing I don't know why it's needed [06:34] If it is ready for people to do so, then we should expose it. [06:34] (which it has been) [06:34] But we also want to configure jujud to require it [06:34] jam: we can already bootstrap with it, I don;t understand [06:34] so we don't accidentally lose it later. [06:35] bigjools: I was just confirming that the code has gotten to the "working" state. I hadn't heard a specific update to where you guys were. [06:35] great. [06:35] jam: bzr: ERROR: Cannot lock LockDir(chroot-72851216:///%2Bbranch/goose/.bzr/branch/lock): Transport operation not possible: readonly transport [06:35] I cannot submit that mp [06:35] davecheney: so I would say LGTM trivial on exposing it. [06:35] jam: ok :) yes it should all work [06:35] any ideas ? [06:35] davecheney: you can't use 'lbox submit' for goose. You mark the MP as approved (make sure it has a commit message), and then goose bot will run the tests and land it for you. [06:35] I can do it if you need help. [06:36] ta [06:36] davecheney: how's it going? [06:36] davecheney: so on here: https://code.launchpad.net/~dave-cheney/goose/001-fmt-error/+merge/159287 [06:36] m_3: hey [06:36] there is a "Set commit message" which you can just copy the description if you are happy with it. [06:36] and then mark it Approved. [06:37] sent you an email with the sitrep [06:37] basicallyu can't get above 228 machines [06:37] i susecpt we're running out of ipv4 allocations [06:37] ouch [06:37] and it is being reported in a strange way [06:37] actually, that's awesome... this is the shit we're doing this to flush out [06:38] the way the dns address of the keystone endpoint stops resolving is both [06:38] a. repeatable [06:38] b. fucked up [06:38] * m_3 reading mail [06:39] davecheney: we can see our ip quotas... one sec [06:40] 228 is close enough to 254 that it makes me go hmmm [06:43] m_3: ahh [06:43] interesting [06:43] 228 isn't near any of those numbers [06:44] jam: now what happens ? [06:45] davecheney: hmmmm.... thought that showed ip allocations too [06:45] * m_3 rtfm-ing [06:49] jam: is the bot broken ? [06:50] davecheney: I'll dig through the acct setup with antonio tomorrow [06:51] davecheney: jorge and I are talking tomorrow, so won't get to this til later in my daytime [06:51] davecheney: usually it takes about 5 min for it to wake up and run the test suite [06:51] but if it fails it *should* report a failure to Launchpad and unmark the proposal as pneding [06:52] * davecheney thinks its dead [06:52] davecheney: thanks for the help... /me beddy-bye [06:52] m_3: no worries mate [06:53] davecheney: from what I can tell 'canonistack' thinks the machine is running, but I'm getting 0 response from port 22 [06:56] jam: i remember a discussion a while ago where you metnoined that you weren't able to contact th emachine [06:56] but it appeared to be doing stuff, so whateva [07:14] jam: i'm off to soccer, but here's a wip mp. dimiter was interested to see it. i can't land till his work goes in since i need to account for whatever changes he has made. i also need to land a small goose change [07:14] https://code.launchpad.net/~wallyworld/juju-core/openstack-image-lookup/+merge/159301 [07:15] the covering letter also need to be expanded, but i am late [07:30] * TheMue is happy, the simpler approach works fine. now part two adding the subordinate unite information. [07:54] mornin' all [07:56] rogpeppe2: morning === TheRealMue is now known as TheMue [09:25] in case my earlier email didn't get through, my phone line is down currently [09:25] i'm connected through my mobile currently, but i don't know how reliable it will be [09:33] trivial branch that fixes #1169825 ; review appreciated please: https://codereview.appspot.com/8821043 [09:58] YEEEEEAH! [09:59] * TheMue does the subordinate dance ... [10:09] fwereade: i'm after a review of this fairly trivial branch, please. i think it's important to get in before the release. https://codereview.appspot.com/8823043 [10:09] rogpeppe2, ok, I'll chat about s3 in a mo [10:10] fwereade: ok [10:11] rogpeppe2, nice, LGTM [10:12] fwereade: trivial? [10:12] fwereade: you've got a proposal :) [10:12] rogpeppe2, yeah, I think so [10:12] fwereade: cool, thanks [10:12] TheMue, cheers [10:12] * rogpeppe2 wonders how long it'll take to run lbox submit through the mobile data connection [10:13] fwereade: it's https://codereview.appspot.com/8824043/ === rogpeppe2 is now known as rogpeppe [10:13] TheMue, cheers [10:13] rogpeppe, I am somewhat ambivalent about changing Storage.Put [10:14] fwereade: i think that's going to be a necessary thing if we ever want retries to work. [10:15] rogpeppe, I would be much happier if it took a Reader and tried to ReadSeeker it [10:15] fwereade: i wondered about that, but i was very glad i didn't [10:15] rogpeppe, oh yes? [10:16] fwereade: because the static type caught lots of places that i would not otherwise have found [10:17] fwereade: it's not that we can stream data without having it first either [10:17] fwereade: because we need to know the length [10:17] rogpeppe, it is not unheard of for content lengths to be declared [10:17] fwereade: that is true, i suppose [10:18] rogpeppe, if they are, then great,but even if not I don't really think we should force all clients to buffer everything [10:18] fwereade: but in all the cases we care about, we do want retries to work [10:18] fwereade: and they won't work on a straight io.Reader [10:19] rogpeppe, maybe it's our own responsibility to buffer though [10:19] fwereade: i think so [10:19] rogpeppe, ha, different "our" I suspect [10:19] fwereade: ah, maybe so :-) [10:19] fwereade: i really don't think that Put should do the buffering itself [10:20] fwereade: if that's what you were thinking of [10:20] rogpeppe, yeah, I definitely see that side of it too [10:20] rogpeppe, but I'm not sure it's a very strong objection -- convince me? [10:21] fwereade: in many of the places, we already have the data in a file [10:21] rogpeppe, in which case, yay, easy ReadSeeker [10:21] fwereade: buffering in Put would force us to make another copy of that before doing the Put [10:21] rogpeppe, I'm only suggesting we should buffer ourselves if we can't ReadSeeker it [10:21] fwereade: oh i see, you mean "if it doesn't implement ReadSeeker, then pull everything into a buffer" ? [10:22] rogpeppe, more ideally a temp file [10:22] fwereade: i don't really see the point. [10:22] fwereade: in all the places we have, it's trivial to make a ReadSeeker [10:22] fwereade: and i much prefer to avoid dynamic type checking magic where reasonable [10:23] rogpeppe, so that we can inject the reliability as close to the problematic bit as possible, and so we don't push a new restriction into the API that I'm not sure is justified [10:24] fwereade: you're saying we shouldn't force all clients to buffer everything, but we're going to buffer everything anyway, right? [10:24] fwereade: for the rare (currently non-existent) case where the client doesn't have a seeker, it's easy for them to make one. [10:25] fwereade: and know about the trade-offs involved [10:26] fwereade: as i said, having the static type caught lots of cases where i could trivially use a readseeker but didn't. if i hadn't made that change, we'd be creating a temp file for each of those. [10:26] rogpeppe, or maybe we'd be making use of the length param to decide whether or not to create one ;) [10:27] fwereade: that seems to me like lots of heuristics and extra code where currently we don't need any [10:28] fwereade: if we find lots of places that want to use an io.Reader, we can easily make a func PutBuffered(s Storage, name string, r io.Reader, length int64) error [10:28] rogpeppe, isn't that because you're just pushing the responsibility onto the client? [10:28] fwereade: so it's obvious to the caller that we're going to be creating a temporary file [10:28] fwereade: sure. but creating a temporary file is something i think it's worth the client be aware of [10:29] fwereade: and it means we don't have to write more code now [10:30] rogpeppe, possible compromise: at the point where you'd seek, just error out if it's not a ReadSeeker? [10:31] fwereade: that's worse, i think [10:31] rogpeppe, automatic reliability in the cases we already have, no external code changes, no magic buffering [10:31] rogpeppe, I guess there's something I'm missing [10:31] rogpeppe, (ok, maybe we should check first to know where to seek back to) [10:32] fwereade: except that i want automatic reliability in all cases, and it's not hard [10:32] fwereade: i don't really see why there's a problem with requiring ReadSeeker. it loses no generality [10:33] fwereade: yes, that is also an issue with just dynamically type converting to ReadSeeker [10:33] fwereade: we'd have to say "*if* this is a ReadSeeker, it must be positioned at the start" [10:34] fwereade: that's ok i guess, but i'd much prefer to just require a ReadSeeker [10:35] rogpeppe, the issue is that it's a restrictive interface change to all the providers to implement a change that's ec2-specific [10:36] rogpeppe, I feel like it's ec2 driving the interface, not the other way round [10:36] fwereade: it's not necessarily ec2-specific. it applies to any provider where the request might encounter a transient failure [10:38] rogpeppe, it is in practice ec2-specific... and any other provider that implements Put-reliability will need to make similar internal changes to those in ec2 regardless [10:40] fwereade: i agree. but if you've just got an io.Reader, you *cannot* retry a request - so regardless of the provider implementation, they'll want some way of rewinding the stream. [10:40] fwereade: so we're going to force all providers that want to retry to implement their own temp-file buffering stuff [10:40] fwereade: i'd prefer to keep that cleverness outside of the individual providers [10:41] rogpeppe, the tradeoff is between asking a limited number of providers to do so, and asking every client to do so [10:42] rogpeppe, our perspective on this may skewed because we are focused on writing one client, and we have 3+ providers to think about [10:42] fwereade: if it's just a difference between s.Put(r, name, len) and environs.PutBuffered(r, name, len) then i don't see the issue [10:42] PutBuffered(s, name, r, len) of course [10:43] fwereade: then the temp-file buffering is implemented in a single place, once and for all [10:43] fwereade: (and currently we don't even need it, because there's not a single place that's really streaming) [10:43] rogpeppe, the question I am trying to get clarity on is: purely at the level of the interface, why is it better to have a ReadSeeker than a Reader? [10:44] fwereade: because a Reader allows streaming only, and here we can't stream [10:44] rogpeppe, we can stream, it's just not necessarily so reliable [10:45] fwereade: and that's a good thing? [10:46] rogpeppe, considering the capabilities of the various providers, I think it is all we can reasonably guarantee [10:47] fwereade: we can guarantee to make available to a provider some data that they can retry on. i think that's a nice thing to do. it makes the provider's job easier. [10:48] fwereade: your solution is asking me to put more intelligence and code in the provider, where actually we want to move as much as possible *outside* the provider [10:48] rogpeppe, *we* can, yes -- but I thought we weren't writing the interfaces just for our own convenience? [10:48] rogpeppe, I don't know how you are going to put S3-specific buffering outside the provider [10:49] fwereade: as far as i'm concerned, it's the job of that interface to make it as easy as possible to write a provider [10:49] rogpeppe, the provider should absolutely contain *provider-specific* intelligence and code [10:49] fwereade: for clients, we can easily write convenience functions that wrap it [10:49] fwereade: agreed. but buffering the reader is not provider-specific. i'm not sure what you mean by "S3-specific buffering". [10:53] fwereade: another possibility is to pass a function (func() io.Reader) that can be called multiple times to get the source. [10:54] fwereade: that would make it easier for a client to do streaming. [10:54] rogpeppe, does every provider have to implement this retry stuff? [10:54] rogpeppe, it seems it is a provider-specific enhancement [10:54] fwereade: it's a good question. i suspect it's not unlikely. [10:55] fwereade: the point is that any provider that *does* want to implement retry *must* implement the same buffering code. [10:55] fwereade: and that client code must work against any provider [10:56] fwereade: i suppose your argument could be: for providers that *don't* implement retry, we'll be doing more work than necessary. [10:57] fwereade: assuming we actually have some client code that wants to stream. [10:57] rogpeppe, well, we already did, even if it was just a smattering of test code [10:58] fwereade: really? [10:58] fwereade: i thought the changes were just changing bytes.Buffer to bytes.Reader in the main [10:58] fwereade: no extra buffering required [10:58] rogpeppe, my argument is really that *most* providers don't retry, and that even if we get to a point where most *do* I would rather have a new provider able to implement the interface to the full more easily [10:59] fwereade: huh? this doesn't make it harder in the slightest for a new provider [10:59] rogpeppe, rather than make Put-retries a fundamental requirement for a provider, I would like them to be an optional enhancement [10:59] fwereade: they are [10:59] fwereade: a provider can totally ignore the fact that it's a ReadSeeker [10:59] rogpeppe, if an Environ is always required to retry, ReadSeeker makes sense [10:59] rogpeppe, Storage sorry [11:00] fwereade: i don't see any requirement - it makes it *possible* for a provider to retry, but by no means mandatory [11:01] rogpeppe, a Reader in the interface implies potentially-unreliable streaming, a ReadSeeker tenacious persistence [11:01] rogpeppe, if the lowest common denominator is unreliable streaming, that is what we should advertise [11:01] fwereade: i don't see any difference in reliability between a Reader and a ReadSeeker [11:01] fwereade: they both implement io.Reader [11:02] fwereade: a seeker has no more reliability guarantees [11:02] rogpeppe, if any, or all, providers want to get smarter and buffer, then they can do so and everyone who uses them gets to see the benefits transparently [11:02] rogpeppe, I'm talking about the reliability of the operation, not its data source [11:02] fwereade: but we can implement this behaviour once and for all providers [11:03] fwereade: another possibility would be to hoist the retry logic outside the providers, but i think it's too inherently provider-specific [11:04] fwereade: and even then, we'd need a stream we could rewind [11:05] rogpeppe, how about EnsureSeeker(Reader) (ReadSeeker, error), and fall back to only trying once? [11:05] fwereade: who calls that? [11:06] rogpeppe, any provider that wants to add retry logic [11:06] rogpeppe, they just have to be prepared for it to fail and fall back to non-retrying [11:06] fwereade: what would EnsureSeeker do other than .(io.ReadSeeker) ? [11:07] rogpeppe, one day, maybe, buffer to a file if that fails [11:08] rogpeppe, I think that's a cleaner separation of concerns [11:09] fwereade: i'd find it preferable for a client to explicitly disable retrying (for instance by passing in a ReadSeeker where the Seek always fails) [11:09] fwereade: then a client has to make a positive action to *disable* reliability, rather than the other way around [11:09] rogpeppe, except in the cases where it isn't reliable [11:09] fwereade: func ErrorReadSeeker(r io.Reader) io.ReadSeeker [11:10] fwereade: sure, some providers just aren't reliable. but when we use a reliable provider, we are assured of reliability [11:11] fwereade: rather than, some time in the future "oh, that's failing because i forgot to use a bytes.Reader rather than a bytes.Buffer" [11:12] fwereade: and this question is currently academic - we have a seekable reader in every single case, trivially. [11:12] TheMue, https://codereview.appspot.com/8824043/ reviewed [11:12] fwereade: cheers [11:12] rogpeppe, if not all providers are reliable, why would we imply they are? [11:13] fwereade: we are not. we are implying that a provider *may* be reliable. [11:13] rogpeppe, and placing the burden for supporting reliability on the *client* [11:14] fwereade: ... which is trivial! [11:14] rogpeppe, despite the fact that it may not actually be present [11:14] fwereade: there is no burden [11:14] fwereade: honestly - how does it actually burden the client? [11:15] rogpeppe, you just suggested ErrorReadSeeker [11:15] rogpeppe, which STM to be boilerplate you expect every client to use if they don't know the provenance of their Reader [11:16] rogpeppe, which, honestly, I do not think they should have to care about [11:16] fwereade: yes - i'd like each client to be fully aware when they might not get reliability against an otherwise bulletproof provider [11:17] fwereade: but currently there are *no* clients that would use it [11:17] rogpeppe, IIRC Put returns an error [11:18] rogpeppe, under-promise, over-deliver ;) [11:19] fwereade: this isn't promising anything, really. [11:19] fwereade: [11:19] 1) this does not require any providers to do anything different [11:19] 2) this enables a provider to implement retry easily [11:19] 3) there is no significant impact on how easy it would be to write a client that uses the Storage. [11:19] i really can't see a down side [11:21] rogpeppe, *every* provider needs to Read; *some* providers may be able to be smarter if they can also Seek [11:21] fwereade: sure. [11:22] rogpeppe, this seems to be exactly why we can do `rs, ok := r.(io.ReadSeeker)` [11:22] fwereade: and *every* client needs to work with *every* provider [11:22] fwereade: so what the client provides will enable those *some* providers to be smarter. [11:23] fwereade: if there was a significant difficulty for clients to provide a Seeker, i'd agree with you. [11:23] fwereade: but really, there's not AFAICS. [11:23] rogpeppe, whether or not a given Reader is also a Seeker is not something clients should have to think about [11:24] fwereade: it really is - a client should be aware that by using a reader that's not a seeker it is making the operations more unreliable on some providers. [11:24] fwereade: and we can let the compiler tell us that [11:24] fwereade: i think that's awesome [11:25] * rogpeppe really doesn't like dynamic type checks where they're easily avoided. [11:25] rogpeppe, you're proposing that introducing a magic implicit reliability switch will make people's lives *easier*? [11:26] rogpeppe, the client should not get to affect that choice [11:26] fwereade: it's not magic. doing a dynamic type check under the hood to see if the client *might* have turned on that switch *is* magic. [11:27] rogpeppe, how does it affect the client, except to cause fewer errors? [11:27] fwereade: fewer errors is a huge thing [11:27] rogpeppe, yes [11:28] rogpeppe, and a good thing [11:28] rogpeppe, that the client doesn't have to think about [11:28] fwereade: i'm glad we agree on something :-| [11:28] fwereade: indeed - the io.ReadSeeker contract implies that they get reliability for free [11:29] fwereade: and is trivial for all current (and probably most future) clients to provide [11:29] fwereade: that's really the crux - if that's not true, then i think you're right. [11:30] rogpeppe, saying "you have to provide a readseeker, and hence reliability, but it might not be used" is not better than saying "just give us a Reader -- if it's also a ReadSeeker, you might see some benefits on some providers" [11:30] fwereade: i think it is [11:31] fwereade: because it means that the compiler tells us where things might be unreliable. [11:31] fwereade: so we *know* that in all current cases, you *will* see benefits on the providers that can use it [11:32] fwereade: and when making this change, i was surprised by the places that it told me [11:32] fwereade: and they were all trivially changeable to make work [11:32] jam: you around? [11:32] fwereade: i think that's a concrete benefit [11:33] rogpeppe, you'll have to explain further [11:33] fwereade: so, when i changed the signature, i ran the build [11:33] fwereade: and it said, in various places "that argument does not implement io.ReadSeeker" [11:33] fwereade: and i went there [11:34] rogpeppe, so, it doesn't get to be reliable on ec2, just like it doesn't everywhere [11:34] fwereade: and, lo! it was using bytes.Buffer not bytes.Reader [11:34] dimitern: https://code.launchpad.net/~wallyworld/juju-core/openstack-image-lookup/+merge/159301 [11:34] dimitern: I'm back [11:34] fwereade: i don't understand that [11:35] rogpeppe, the interface should reflect the minimum set of required capabilities [11:35] rogpeppe, if I've just got a Reader I *know* its potentially unrepeatable anyway [11:36] rogpeppe, you're asking me to turn it into a ReadSeeker that you might not bother to use [11:36] rogpeppe, how does it help? [11:37] rogpeppe, OTOH transparently making use of reliability, or injecting it ourselves if it's really critical, frees the client to solve their own problem [11:37] fwereade: we require that code to run against every provider. we *know* that when we run with the ec2 provider that the ReadSeeker *will* be used (assuming an operation fails in a transient way) [11:37] rogpeppe, why do all the clients for all the other providers have to care? [11:38] fwereade: there is no client "for another provider". a client is for *all providers* [11:38] rogpeppe, you have been talking like the client knows its using ec2 [11:39] fwereade: we are indeed writing code that we know will be used against ec2 (and all the other providers too) [11:39] rogpeppe, the *all providers* context should only make it more starkly clear that we are messing with a perfectly good interface so we can show off a special feature of a single provider [11:40] fwereade: on the contrary, i think that the ec2 requirement shows the inadequacy of the interface [11:42] fwereade: and i'm sure there will be other providers that will want to retry a Put too. [11:42] fwereade: by making the change now, we make it easy for them to do so [11:45] rogpeppe, AFAICT the only "inadequate" thing about the interface is that it requires that you do about the most innocuous possible type check check [11:45] rogpeppe, in order to add *provider-specific* functionality [11:46] rogpeppe, ReadSeeker makes no sense unless reliability-vs-transient-errors is in the contract of the Storage type [11:46] fwereade: the inadequate thing is that it's not obvious to people using that interface that if the *happen* to pass a ReadSeeker in, that somehow things will magically get more reliable on some providers. [11:47] rogpeppe, but you seemed to be saying that people would usually be using ReadSeekers anyway [11:47] fwereade: as far as i'm concerned, the contract of the Storage type is "do whatever you can to try and fulfil these requests" [11:48] fwereade: not really - i'm saying that people would usually be using some that could trivially *be* a ReadSeeker [11:48] rogpeppe, so you're breaking the contracts of maas and openstack,because they don;t do that [11:48] fwereade: not at all [11:48] fwereade: perhaps they don't get transient errors [11:48] rogpeppe, then they *definitely* don;t need ReadSeekers! [11:49] fwereade: that's fine. all they need is the reader bit then. [11:49] fwereade: there's no requirement to use the seeker part [11:49] rogpeppe, then why would we require that it be supplied? [11:49] fwereade: ec2 won't use the seeker part unless something fails with a certain kind of error [11:49] fwereade: because thing client needs to supply a reader for *all* providers! [11:50] s/thing/the/ [11:50] rogpeppe, and it does [11:50] rogpeppe, and ec2 would make use of it if you would just make use of the language feature designed explicitly for that purpose [11:50] rogpeppe, type casts and types switches were not just put in for fun [11:51] fwereade: *and* if clients actually passed in a readseeker [11:51] rogpeppe, we don;t get to control our inputs and probably nor do they [11:51] rogpeppe, *they* then have to advertise ReadSeeker [11:51] fwereade: when would it ever *not* be appropriate for a client to pass in a ReadSeeker? [11:53] rogpeppe, all the places you changed, for a start [11:53] fwereade: in that branch, there are various places where the changed code does "bytes.NewReader(buf.Bytes())". that code is totally non-obvious if the argument to Put is just a Reader. but it has significant runtime implications. [11:55] rogpeppe, being? [11:56] fwereade: that when running against ec2 the operation will be less reliable if you don't [11:57] fwereade: essentially we want all our client code to work as well as possible against all providers. that means that the client code should work as well as possible against *at least* the ec2 provider. so *all* clients should provide the thing that makes ec2 work well - i.e. a ReadSeeker. given that, why not make the interface appropriate to that? [11:58] * rogpeppe thinks it might be possible to phrase most of that in mathematical notation [11:59] fwereade: are you free sometime soon for a meeting with me and dimitern ? [11:59] soon = next 30 minutes ? [11:59] fwereade: about openstack constraints/images/flavors selection [12:02] wallyworld, dimitern: sure, I can do now [12:02] fwereade: ok, I'll start a g+ [12:02] ok [12:03] https://plus.google.com/hangouts/_/edc5333a9131548ea93258b2c5c90f6a9ef4af15?authuser=0&hl=en << fwereade, wallyworld [12:04] trying, not connecting [12:06] wallyworld: william was having problems with sound [12:13] wallyworld: as a side thing, when you know your holidays, try to put them on the juju team calendar. I think I did it correctly for you this time. [12:13] jam: i did add them [12:13] the week before i left [12:14] wallyworld: I mean the ones you just applied for national holidays in April and June [12:14] np sorry [12:14] april 25th june 10th I think [12:48] fwereade: "Why aren't we doing this at the end of cmd.Main? We already get a log message on error." [12:48] fwereade: that's the way i tried to do it first [12:48] fwereade: but unforunately it breaks the jujuc log command [12:49] fwereade: we really don't want jujuc log to say "command finished" every time it runs. [12:52] jam: " [12:52] Some [12:52] platforms allow renaming an empty dir over another empty directory. [12:52] " [12:52] really? [12:52] i thought the whole point of using directories in fslock was that that wasn't the case [12:53] fwereade: if that *is* the case, then fslock is broken AFAICS [13:01] * dimitern bbiab [13:06] fwereade: the propose is in again [13:06] * TheMue is at lunch [13:07] rogpeppe, I'm not at fslock properly yet [13:07] fwereade: ok [13:08] fwereade: a review from you would be good before it goes in [13:08] rogpeppe, re jujuc: hmm, I'm not so sure jujuc logging is entirely a misfeature [13:09] rogpeppe, if anything I'd say "good point, add start logging to jujuc commands" [13:09] rogpeppe, I would expect that in the eventual case that would be logged at a lower level than that specified for juju-log [13:10] fwereade: the problem is the jujuc *log* command. do we really want that to produce two lines of log for every line the user intended to log? [13:10] fwereade: i thought that the level was one of the arguments to juju-log [13:10] fwereade: well, one that we currently discard, i'm aware [13:11] fwereade: but given that we have those levels now, i think it probably should support that [13:11] rogpeppe, it is one of the arguments to juju-log, and ofc it should support that [13:12] rogpeppe, but that's the level at which the supplied message should be logged, and it's independent of the level at which we log command completion [13:12] fwereade: so if i do juju-log --level=debug foo, do i really want to see two lines at that level in the log? [13:13] fwereade: if you think that's ok, then i'll go with logging in cmd.Main. i didn't think it was, which was why i changed it. [13:13] rogpeppe, yes, I think so, because if we unfuck juju-log we will also, I imagine, badge its output clearly such that we can filter that stuff easily [13:13] rogpeppe, might be wrong [13:14] fwereade: i'm just imagining a scenario where the user is producing *lots* of output using juju-log. we just doubled the number of lines produced [13:14] fwereade: and log file size is actually a significant issue for us. [13:16] rogpeppe, how much would we save if we dropped the logging in state.Open except (1) before the dial func and (2) on errors inside the dial func? [13:17] rogpeppe, saving 240 lines/agent/hour will probably leave us some wiggle room:) [13:17] fwereade: i don't know. i have thrown away the last huge log file i acquired [13:18] rogpeppe, not as much help as silencing the txn spam, I agree [13:18] fwereade: i just think it's a bit weird that a command whose sole purpose is to produce a line of log output actually produces two. [13:19] fwereade: well, probably more still actually, as the uniter probably logs when it gets a jujuc command excecution request [13:20] fwereade: i think the changes to logging in state.Open sound reasonable. [13:20] morning all (well early my morning) [13:20] mramm: hiya [13:20] rogpeppe, I think that the general "it's good to know what commands we run" principle *probably* beats the (otherwise clearly sane) juju-log consideration [13:21] mramm: i won't be able to join the kanban meeting today as my broadband connection is out [13:22] fwereade: i think i agree. but the "finished" message perhaps doesn't [13:22] rogpeppe, sorry, the agent finished messages? [13:22] bummer that [13:22] fwereade: the "finished" message in cmd.Main [13:23] mramm: you never know, the engineers might turn up (chortle, chortle) [13:24] rogpeppe, I don't see one [13:24] rogpeppe, I see"command failed" [13:24] fwereade: that's the one you're suggesting [13:24] fwereade: "Why aren't we doing this at the end of cmd.Main?" [13:24] rogpeppe, and I just suggested an alternative: in SuperCommand [13:25] fwereade: orly? i don't think i've seen that [13:25] rogpeppe, or... apparently I didn't [13:25] rogpeppe, the insanity peppers must be kicking in [13:25] rogpeppe, or maybe I'll discover it in the wrong buffer somewhere in an hour [13:25] :-) [13:25] * fwereade pokes himself with something sharp [13:26] fwereade: that happens to me too often [13:26] fwereade: why would doing it in SuperCommand help? doesn't jujuc use SuperCommand? [13:26] rogpeppe, I *think* niemeyer talked me out of it, let me check [13:27] fwereade: you're right, i don't think it does [13:27] pwd [13:28] fwereade: still seems a bit iffy to me. we're logging in SuperCommand because use of SuperCommand *happens* to correspond exactly with the commands we want to log finished messages for. [13:28] rogpeppe, feels a bit icky but maybe not too much so [13:28] rogpeppe, because SuperCommand at least does know about logs [13:28] rogpeppe, and in general I don't think we should expect either main.Main *or* cmd.Main to have logging configured [13:28] fwereade: and juju doesn't? [13:29] rogpeppe, not much point logging if you don't have a target IMO [13:29] fwereade: true, but no harm either, no? [13:29] rogpeppe, message that looks like it should be printed but actually isn't [13:29] fwereade: true of all log messages... [13:31] rogpeppe, I think that , all other things being equal, it is better to write a given message when you know that logging setup has occurred, and to write important messages elsewhere if you can't be sure [13:31] rogpeppe, but that is kinda a derail [13:31] rogpeppe, how do you feel about dropping it from cmd.Main and logging success/failure in SuperCommand? [13:31] rogpeppe, independent of reasons I might like it ;) [13:32] fwereade: that seems reasonable to me. i'd forgotten that logging was so tightly bound up with SuperCommand. [13:38] TheMue, https://codereview.appspot.com/8824043/ LGTM [13:39] fwereade: yeah, just read it, thank. [13:39] dimitern, rogpeppe, jam, anyone: a second look at that would be handy; I am not entirely happy about some of the code but it produces the correct output and that's the critical thing at this point [13:41] fwereade: the deleting of the units of subordinate service is done this way in py too. the data is used before to add the unit info to the principal units. [13:42] fwereade, TheMue: looking [13:42] rogpeppe, https://codereview.appspot.com/8821043/ LGTM trivial [13:43] fwereade: ta! [13:48] * rogpeppe is trying hard to grok the code in processServices [13:49] rogpeppe: maybe i should move those loops into two proprocessing funcs with a speaking name [13:49] istm that it should be possible to get the type system working for us there, rather than fighting it in every line [13:49] rogpeppe: it's only a mix of the collected data into the output [13:50] rogpeppe: especially the lower loop is exactly how it is done today in py [13:50] TheMue: the difficulty i have is that we're spending a lot of code dynamically type casting stuff that perhaps we can already know the types of. [13:50] rogpeppe: but this map[string]interface{} indeed complicates it *sigh* [13:50] rogpeppe: yes, i'm not happy about it too [13:51] rogpeppe: but regarding the feature freeze i think we should change this afterwards [13:52] TheMue: at the least it could do with some comments so we know what "post-processing the subordinates" is actually doing. [13:52] rogpeppe: will add them === wedgwood_away is now known as wedgwood === wedgwood is now known as Guest86505 [13:59] TheMue: what do subFromMap and subToMap do? [14:00] rogpeppe: will find better names. the subToMap had its name from collecting the service names for the output "subordinate-to". ;) [14:01] TheMue: what's are the keys and what to the keys map to? [14:01] s/to the/do the/ [14:01] rogpeppe: they do hangout ;) [14:01] TheMue: given that the data types are so dynamic, we really need more comments [14:01] rogpeppe: i'll add comments and change the names [14:02] hmm, the link box says it's got a link, though the phone's still dead. will try broadband again. [14:03] woo! [14:04] https://plus.google.com/hangouts/_/539f4239bf2fd8f454b789d64cd7307166bc9083 [14:05] fwereade: hmm, i'm not sure if it's working [14:06] very limited bandwidth! [14:06] dimitern: i saw you for a moment... === wedgwood_ is now known as wedgwood [14:34] fwereade: is it ok to move the status in with your comments handled and improvements in readability and comments? [14:35] TheMue, has anyone else at least glanced at it? [14:35] TheMue, I care a lot more about output than about readability though :) [14:36] fwereade: roger is looking, and i could as dimitern [14:37] TheMue, but yes, you have my LGTM, just get someone else's too :) [14:37] fwereade: ok, will do [14:38] interesting, bandwidth tester indicates 1Mb download (not too bad) but 52Kb upload. not surprising that the hangout worked ok when i turned off my vid and microphone [14:39] TheMue: you really hate merging conflicts it seems ;) [14:39] dimitern: yeah, really, i do [14:39] dimitern: and also i got my first lgtm ;) [14:40] TheMue: no worries, it'll be easier for me as well once you land your stuff [14:41] dimitern: great. had a chance to take a look at it? [14:42] TheMue: just a brief glance [14:43] dimitern: can i charm you a lgtm? ;) [14:44] TheMue: in a bit, just looking through cards and bugs to make sure we're in sync [14:52] dimitern: just wait, a new propose will fly in in a few moments [14:52] TheMue: sure, np [14:54] fwereade: ping [14:55] TheMue, pong [14:56] fwereade: your comment in the status_test, line 842, how do you think it shall work? [14:56] dimitern, rogpeppe1, TheMue: trivial if anyone wants it: https://codereview.appspot.com/8768045 [14:56] fwereade: i'm lost with this comment. [14:56] fwereade: LGTM, trivial [14:57] fwereade: LGTM [14:57] thanks guys [14:57] fwereade: i thought i followed your first idea, but it seems that this has been wrong [14:58] TheMue, I mean that adding a relation is a prereq of adding a subordinate, and that won't (shouldn't ;)) work if you add a subordinate of service S to more than one unit of service T [14:58] TheMue, it's a good start [14:58] TheMue, but relateServices already exists and shouldn't be duplicated [14:59] TheMue, while adding the subordinate is an operation on an existing unit and an existing service [14:59] fwereade: ah, ic [15:01] popping out for a mo, bbiab [15:13] gary_poster/rick_h_: yeah, I'm looking at it [15:13] benji, hi channel jumper [15:13] pff [15:26] dimitern: https://codereview.appspot.com/8824043/ is back in again [15:26] TheMue: I'm on it [15:33] TheMue: you got LGTM from me [15:34] TheMue: please go ahead an land it, so I can continue on mine [15:34] dimitern: thx, will do [15:48] anyone experiencing the same bootstrap_test problems like me? i just merged the trunk and since then the test fails [15:52] TheMue: what's you series? [15:53] precise [15:53] TheMue: i'm on quantal - jut got latest trunk and running tests now; haven't seen this issue before though [15:56] TheMue, if you're seeing https://bugs.launchpad.net/juju-core/+bug/1169826 then I need help tracking it down please :) [15:57] fwereade: hmm, no, i don't get a panic [15:58] I also didn't get a panic, but got this error in environs/openstack: http://paste.ubuntu.com/5716314/ [15:58] fwereade: it's in bootstrap_test line 105 [15:58] fwereade: c.Check(urls, HasLen, len(test.uploads)) [15:59] fwereade: if i get it right it's one url instead of the expected two [16:00] TheMue, expected/actual? which test? [16:00] and i can see the cmd/juju tests almost doubled in running time the past 2 weeks [16:01] fwereade: cmd/juju/bootstrap_test.go line 105 [16:01] dimitern, that's because we're actually testing stuff [16:01] TheMue, that's a table test, the line number tells me very little [16:01] fwereade: that's good then :) [16:02] dimitern, ha, I can repro yours (one of them anyway) if I edit /etc/lsb-release [16:02] fwereade: does "test 10: --upload-tools always bumps build number" help you more? [16:02] TheMue, helpful, tyvm [16:02] TheMue, dimitern, sorry I have to take a break :/ back soon [16:03] fwereade: as my CL doesn't cover this is it ok to still submit it? [16:03] fwereade: that's awesome, I'll remember this trick to change current series [16:03] TheMue: submitting onto a broken trunk is a no no [16:04] ods keynote livestream fwiw http://openstackportland2013livestream.eventbrite.com/ [16:05] dimitern: ok, so status has to wait [16:05] hazmat: it says it's ended, but I can't see a link to the video.. [16:06] dimitern, by bad.. better link.. http://www.openstack.org/ [16:07] hazmat: cool, 10x! [16:08] dimitern, mark's keynote is in +1hr 15m [16:08] TheMue: just dump the whole error output + logs and paste it please === BradCrittenden is now known as bac [16:11] hazmat: isn't it streaming live now? I can see it has started.. [16:11] dimitern, the keynotes have started, but there are some other ones scheduled first [16:12] hazmat: ah, ok, cheers [16:14] fwereade: I can reproduce the error I posed above consistently [16:42] dimitern, fwereade: http://paste.ubuntu.com/5716425/ [16:46] TheMue: cheers [16:59] * TheMue has to take a break === TheMue_ is now known as TheMobileMue === deryck is now known as deryck[lunch] [17:23] mark's keynote is staring now [17:50] dimitern: link? [17:50] rogpeppe1: http://www.openstack.org/home/Video/ [17:53] dimitern: is it really not possible to find out a service's subordinate or principal without looking at its units? [17:54] rogpeppe1: I don't think you can [17:54] dimitern: seems a bit weird. so if there are no units, there's no subordinate-principal relationship between two units? [17:54] s/two units?/two services?/ [17:55] rogpeppe1: you can take a look at the service's charm as well [17:55] dimitern: how does that help? [17:55] dimitern: presumably you actually need to look at the relations [17:55] rogpeppe1: if the service is running a subordinate charm, then all it's units are/will be subordinates [17:56] dimitern: yes, but for status we need to find out a *service's* principal service, irrespective of units, no? [17:57] rogpeppe1: well, from the list of relation a subordinate service is participating in then [17:57] rogpeppe1: relations* [17:58] dimitern: yup. i think that if we see that there's a relation between two services and scope is container, then we know [17:59] rogpeppe1: makes sense, yeah [18:00] niemeyer: is there any way of telling goyaml to ignore a field when marshalling? (unexported fields seem to cause an error) [18:01] rogpeppe1: unexported fields shouldn't cause errors [18:01] rogpeppe1: Example? [18:01] niemeyer: one mo [18:03] niemeyer: http://paste.ubuntu.com/5716622/ [18:03] niemeyer: one mo, perhaps i'm running an old version of goyaml [18:03] niemeyer: when i run that code, i see "error: YAML error: reflect.Value.Interface: cannot return value obtained from unexported field or method" [18:03] rogpeppe1: Haven't made any changes about that [18:03] rogpeppe1: Will test [18:03] rogpeppe1: That's a bug [18:03] rogpeppe1: Will fix [18:04] niemeyer: ah, ok [18:04] niemeyer: there's another bug i just found too [18:04] Surprising to see that broken [18:05] niemeyer: i haven't looked into the source, but it seems like it might be hashing types by name somewhere [18:05] I'd have guessed someone would have reported by now [18:05] rogpeppe1: Can't tell what that means [18:05] niemeyer: because if you use two function-scoped types of the same name, goyaml gets confused [18:05] rogpeppe1: Sorry, I still have no idea about what you mean.. an example helps [18:07] niemeyer: ok, here's an example (slightly bigger than it could be, but it's the stage i got to when i realised i knew what the problem was: [18:07] niemeyer: http://paste.ubuntu.com/5716642/ [18:08] * rogpeppe1 has a very dodgy network connection currently [18:08] niemeyer: search for "noMethods" in that code [18:09] rogpeppe1: Ok? [18:11] rogpeppe1? [18:12] niemeyer: sorry, i'm just finding the output [18:16] niemeyer: ok, this is a slightly smaller example: http://paste.ubuntu.com/5716665/ and this is its output: http://paste.ubuntu.com/5716667/ [18:17] niemeyer: the fields showing under exposed-service/0 are from the serviceStatus type, not from the unitStatus type as they should be [18:18] niemeyer: for example "charm" is showing the field from slot 1 of the unitStatus struct [18:19] rogpeppe1: It looks correct given the code [18:20] func (s serviceStatus) GetYAML() (tag string, value interface{}) { [18:20] type noMethods serviceStatus [18:20] return "", noMethods(s) [18:20] } [18:20] rogpeppe1: serviceStatus *is* returning a serviceStatus [18:20] niemeyer: [18:20] yuip [18:20] niemeyer: and unitStatus is returning a unitStatus [18:20] rogpeppe1: Yeah, and how will it ever get into a unitStatus? [18:21] niemeyer: but goyaml thinks that the unitStatus returned (well actually a noMethods value) is a serviceStatus [18:22] rogpeppe1: let me run the code here rather than guessing.. hold on [18:23] rogpeppe1: Yeah, there's a bg [18:23] bug [18:24] niemeyer: indeed :-) [18:24] rogpeppe1: I actually improved that logic a long time ago in bson, and forgot to implement it in goyaml [18:24] rogpeppe1: Easy to fix, though [18:24] niemeyer: cool === deryck[lunch] is now known as deryck [18:49] rogpeppe1: Both fixed and pushed. [18:50] niemeyer: woah! [18:50] niemeyer: nice one! [18:51] rogpeppe1: I already had the context for the latter issue, and the former was a silly mistake [18:52] niemeyer: at some point, it would be nice for goyaml to support `goyaml:"-"` like json does === TheMobileMue_ is now known as TheMobileMue [18:55] rogpeppe1: Done and pushed. Next? :) [18:56] * rogpeppe1 can't think of any more at the moment :-) [18:57] rogpeppe1: Cool, thanks for the reports [18:58] niemeyer: np. i was also surprised by the unexported field error, BTW. [18:58] rogpeppe1: I found a better naming for the flusher stuff, btw, which made me happier as well. [18:58] rogpeppe1: v2 is coming [18:58] niemeyer: great! [18:58] niemeyer: new name is? [18:58] rogpeppe1: type Task interface { Run; Kill } [18:59] niemeyer: perfect [18:59] niemeyer: that maps much better to how i came to understand it [19:00] rogpeppe1: Yeah, it maps better to the overall problem indeed.. and, funny enough, it was the previous name [19:00] rogpeppe1: But there were several refactorings after that which made things flow back and forth [19:01] niemeyer: i know how that goes... [19:01] rogpeppe1: The current design was finally sound, but the naming wasn't perfect anymore [19:01] niemeyer: overall it's a nice small package. i really like it. [19:01] rogpeppe1: Nice, I'm glad to hear it [20:18] * TheRealMue goes to bed now, good night [20:42] morning [20:44] thumper, heyhey [20:44] hi fwereade [20:45] thumper, I WIPped most of your fslock branches but there's only one serious question at the root of it, and then a feature we'd all like that will hopefully not hurt too much ("message" or similar on acquire) [20:45] fwereade: ok, what is the one serious question? [20:45] thumper, the latter will enable units to reliably break their own locks, which I think is a Good Thing [20:46] thumper, if some FSs can rename over empty dirs, shouldn't be we breating the held file before moving the dir into place? [20:46] thumper, or does that break something I'm not seeing? [20:46] s/breating/creating/ [20:46] actually that makes a lot of sense, and yes, I'm +1 on that [20:47] thumper, sweet [20:49] dumb go question, I have n *int, how do I do (n != nil ? *n : 0) idiomatically? [20:49] this is to fill in a bunch of struct fields, so I really don't want a three line if block per one... [20:52] mgz, func valueOrZero(*int) int, I suspect [20:52] mgz, nothing neater springs to mind [21:07] fwereade: reading your review comment, how would you confirm that you didn't take a lock while someone else had it? when running in a go routine, repeated? [21:09] thumper, fist thought, may not be very smart, send a bunch of "acquired" and "released" thingies down a channel from n goroutines and check that you always get true/false/true/false/true/false [21:10] thumper, can't think of anything obviously wrong with it though [21:11] thumper, acq1/rel1/acq3/rel3/acq3/rel3/acq1/rel1 etc [21:11] * thumper nods [21:12] I get the feeling that I should smash the pipes together, and have a single proposal... [21:12] thumper, I'd be fine with that I think [21:12] thumper, (oh, I would also really like tests for the uniter) [21:12] fwereade: yeah, read that, will look into it [21:13] fwereade: is there any way to nicely simulate a crash during hook execution? [21:13] fwereade: actually, we can trivially have a test that has a lock for that uniter [21:13] thumper, not nicely -- I think best to create the lock dir really [21:13] fwereade: and we can check when we create one, it unlocks it [21:13] thumper, exactly [21:13] * thumper nods [21:14] fwereade: also, I don't like the message being the held file... [21:14] fwereade: feels wrong some how [21:15] fwereade: how do I check I own it? before unlocking [21:15] if we use message? [21:15] I'll update the code to use atomic writes for message [21:15] thumper, message doesn't have to be the held file, you can create the two indpendently before the copy [21:15] and have an expected message on Lock [21:15] fwereade: jam was suggesting to use the message for held [21:16] just saying I'm -1 on that [21:16] thumper, yeah, I rather like the nonce in held [21:17] thumper, there is I think a meaningful distinction between "did this identity create the lock" and "did this specific chunk of memory create the lock" [21:17] thumper, so, yeah, I don't think it's necessary to conflate the two [21:17] * thumper nods [21:18] fwereade: one problem with logging every time you can't get a lock [21:18] fwereade: when trying, is that retries are every second (by default) [21:18] fwereade: and hook execution could be a while [21:18] thumper, ok, that's too many [21:18] in the order of minutes [21:19] fwereade: how about writing it the first time [21:19] thumper, (not too many checks, just too many messages) [21:19] thumper, first time is fine by me [21:19] fwereade: and each subsequent time the message changes? [21:19] thumper, ah, yeah, nice [21:19] should give an obvious, not too invasive trail [21:20] thumper, and *maybe* one every 120 times though the loop or something? [21:20] fwereade: I'd rather use time than a loop counter [21:20] although [21:21] hmm... [21:21] synchronization is hard [21:21] * thumper recalls his talk on "multithreading 101" [21:21] thumper, follow your heart :) [21:21] gave that talk at a conference 7 years ago [21:22] I thought I knew heaps about it, in the end, I learned so much researching my talk it scared me [21:22] :) [21:22] haha [21:22] I know enough to be scared in general [21:22] I have learned a lot over the years but somehow the amount I know remains characterizable by "enough to be scared" [21:23] I'm busy downloading sabdfl's openstack keynote in the background [21:23] youtube keeps killing it watching live [21:23] it's on youtube, I'm listening to it [21:23] ah it's only fallen over once for me [21:23] I had a huge amount of fun writing some lock free algorithms [21:23] thumper: i might be tempted to use atomic.AddInt32 to check that two things aren't holding a lock at the same time [21:23] after the third time it fell over, I decided o download it [21:24] rogpeppe1: can you explain more? [21:24] I don't quite get it [21:24] oh, so increment on lock [21:24] and decrement on unlock? [21:24] thumper: yup [21:24] and assert values [21:24] thumper: and after the increment, you look at the value returned [21:24] thumper: if >1 you've fucked up [21:24] * thumper nods [21:24] * fwereade expected rogpeppe1 to show up and explain how to do it better , cheers :) [21:24] sounds like a plan [21:25] thumper: also, i'd change the retry interval to zero when doing that test [21:25] thumper: all the better to stress with [21:26] rogpeppe1: ok, sounds good [21:30] thumper: what platform are we worried about not having O_EXCL, BTW? [21:30] rogpeppe1: I don't have an explicit reason, just went with what bzrlib had because I had talked to lifeless a lot about it, and it is convenient to add informational messages to [21:32] thumper: ok. messages would be just as convenient with a file though, i think, no? [21:32] I'm not familiar with O_EXCL [21:32] but I trusted the giant who came before me :) [21:32] thumper: :-) [21:33] thumper: bzr probably needs to run on way more platforms than us, but i suppose it can't harm [21:33] * thumper nods [21:34] thumper: i'm just slightly concerned about EBS network traffic though, when we could really do it all locally. [21:34] what EBS network traffc? [21:34] thumper: doesn't disk storage on amazon instances with EBS go across the network? [21:35] well, we are storing in the agent datadir, where it puts tools [21:35] we could do it anywhere [21:35] /var/run/juju would make sense to me [21:35] thumper: /tmp ? [21:35] I would assume some of these would be local [21:35] no, /tmp is fugly [21:36] /var/run was made for this reason [21:36] * rogpeppe1 has never heard of /var/run [21:36] AFAIK [21:36] thumper, btw, I forget, do you definitely create the temp dirs on the same filesystem for actual atomic renames? [21:37] fwereade: I was assuming that /tmp and the juju data dir were on the same filesystem... [21:38] could create a temp filename in the lock dir to be sure [21:38] thumper, +1 [21:38] but I thought that was a bit messy [21:38] thumper, subdir with invalid lock name? [21:38] .nonce or something [21:38] where nonce is the hex nonce [21:39] * thumper will work something out [21:39] * thumper goes to make a coffee and toast [22:27] rogpeppe1, thumper: can I get a quick look at https://codereview.appspot.com/8804044/ please? [22:27] * thumper looks [22:27] rogpeppe1, thumper: I promise I was using quantal for some of my testig across the pipeline, but clearly not quite as much as I thought I had [22:29] fwereade: given, with trivial [22:29] rogpeppe1, thumper: the only other quantal complaints I am aware of are (1) verified transient, fixed by completing pipeline and (2) cannot repro at all -- that's a *weird* error in UpgradeJujuSuite that I propse to ignore until I can squeeze more details out of themue [22:30] thumper, tyvm [22:30] fwereade: looking [22:30] np [22:32] fwereade: PTAL https://codereview.appspot.com/8842043 [22:33] thumper: would appreciate it if you could take a look too [22:33] * thumper looks [22:33] fwereade: reviewed [22:39] * rogpeppe1 has to go to bed very soon [22:43] rogpeppe1: +1 [22:44] thumper: thanks [22:44] rogpeppe1: personally I wouldn't have had map[string]map[string]interface{}, but instead had a few other structs instead of just reusing *state bits, but that's just me [22:44] and your approach is equally valid [22:44] so no review comment on that :) [22:45] thumper: that's about my maximum tolerance [22:45] thumper: thanks for forbearing :-) [22:45] np [22:48] thumper: actually, i *don't* have map[string]interface{} - i don't use interface{} at all and i'm really happy about that [22:48] actually you are right... [22:48] I typed from memory miss remembering [22:49] thumper: that was what really prompted this branch was the million occurrences of .(sometype) and thinking "i have no idea where this might panic or if it's valid" [22:49] :) [22:49] rogpeppe1, LGTM just trivials [22:49] for me, I would have had the units map inside a struct with the service state [22:49] that's all [22:49] fwereade: oh sorry, i've just submitted [22:49] the rest looks really good [22:50] rogpeppe1, meh, fix them tomorrow [22:50] fwereade: i saw your earlier LGTM [22:50] rogpeppe1, huge win regardless [22:50] rogpeppe1, yep, np at all [22:53] fwereade: i generally agree with your remarks. putting StatusError alongside other fields would be wonderful and mean that the json/goyaml custom marshallers could go [22:53] rogpeppe1, no worries tonight though [22:53] fwereade: i just tried to be ultra-compatible because i didn't know what the constraints were [22:53] rogpeppe1, +1 [22:53] rogpeppe1, in general adding fields is fine [22:54] rogpeppe1, existing stuff we're trying to be a bit more careful about [22:54] fwereade: right, it's that time of day. actually way past. and if it's that time for me, then i suspect it might be for you too :-) [22:54] thumper, fwereade: g'night [22:54] SLEEP IS FOR THE WEAK [22:54] rogpeppe1: night [22:54] rogpeppe1, gn :) [22:54] * thumper is writing the lock stress test [23:04] m_3: ping [23:07] davecheney: I can go find him if it's anything urgent [23:08] ah poos [23:11] mgz: nah, just wanted to talk about load testing stuffs [23:15] * thumper facepalms [23:15] thumper: how's the face? [23:15] covered [23:16] what exasperated you? [23:17] $ go test [23:17] throw: all goroutines are asleep - deadlock! [23:19] \o/ [23:24] * thumper sighs c.Check(state, Equals, 1) [23:24] ... obtained int32 = 1 [23:24] ... expected int = 1 === wedgwood is now known as wedgwood_away [23:24] lol [23:25] davecheney: is there a nice way to get a go routine to release it's time segment? [23:25] davecheney: I want to say, go run something else for a bit [23:25] davecheney: will time.sleep(0) do that? [23:25] * thumper recalls it does in other languages [23:26] that will do it [23:26] runtime.Gosched() [23:26] * thumper goes to the manual [23:27] godoc runtime Goschel [23:27] that's what I want, awesome [23:27] godoc runtime Gosched [23:27] sleep will do the same thing [23:28] hmm... [23:28] thumper: sorry, maasive lag in australia [23:28] stress test now fails [23:28] :( [23:28] it shouldn't [23:28] * thumper goes to read some more [23:29] also try with GOMAXPROCS=8 (your n cores) go test [23:29] will probably have a similar effect [23:31] heh, passes with max procs [23:31] fails without [23:32] failed that time [23:34] bugger [23:34] debugging this is going to be a PITA [23:38] thumper: tried the race detector [23:38] ? [23:38] if you have a branch, I can test it for you [23:39] what is the race detector? [23:39] it's a feature on go 1.1 [23:39] it is hte same thread santiser that is available in clag 3.2 / gcc 4.8 [23:39] thumper: http://tip.golang.org/doc/articles/race_detector.html [23:40] davecheney: lp:~thumper/juju-core/fslock-mashup has a failing stress test in utils/fslock [23:40] passed with 1 and 2 concurrent locks, failed with 3 [23:41] that is why it currently says 3 [23:41] I want 10 in the end... [23:44] thumper: two secs [23:48] mgz: is mramm around ? [23:49] not near me currently, could either be at the booth or in a meeting [23:50] mgz: nm [23:57] thumper: sorry, got distracted by submitting my own branch