[00:08] good news everyone, save joyent, http://paste.ubuntu.com/7517545/ [00:08] ppc64el is passing [00:14] davecheney: o/ [00:14] i am fixing joyent this morning [00:14] davecheney: have you seen the jenkins job? it fails with a compile error [00:15] what's your take on that? [00:15] eghttp://juju-ci.vapour.ws:8080/job/walk-unit-tests-ppc64el-trusty-devel/309/console [00:15] * davecheney looks [00:15] is there a bug raised for this ? [00:16] not that i know of, i only just saw it this morning. there's a history of job failures we should extract bugs for [00:16] how is the jenkins set up different to your? [00:17] it is possible that the gccgo fix has not landed in main yet [00:17] i am running the fix from the ppa [00:17] ah [00:17] can you ssh tot he test machine and look at dmesg [00:18] the last 20 lines should be sufficient [00:18] i'll have to determine what the machine is [00:18] let me get the upstream oyent fixes in the core first [00:25] ummm [00:25] + juju_version=juju-core_1.19.3 [00:25] + set -x [00:25] + set +e [00:25] I think this is wrong [00:39] where is the above from? [00:43] that link [00:43] you posted [00:43] +e turned off break on error afaik [00:44] we can talk to curtis about the test scripts used [00:44] davecheney: this will make the joyent tests go faster https://codereview.appspot.com/101740043 [00:50] davecheney: is this the ppa i would need to use to get the same gccgo as you? https://launchpad.net/~ubuntu-toolchain-r/+archive/ppa/?field.series_filter=trusty [00:56] morning all [00:56] davecheney: juju-ci.vapour.ws:8080/job/walk-unit-tests-ppc64el-trusty-devel/312/console [00:57] davecheney: ec2 tests paniced [00:57] davecheney: is this new or known about? [00:58] actually, that isn't ec2 [00:58] looks like it panicked in the go tool itself [00:58] is it joyent? [00:58] yeah, looks like compiling error somewhere [00:58] doesn't seem like it gives good feedback as to which bit causes the problem [00:59] reminds me of the "internal compiler error" I used to get with gcc and hairy templates [00:59] the gccgo on jenkins is old apparently [00:59] doesn't have the latest fixes [00:59] the one from the ppa is better [00:59] ah... [00:59] i'm trying it out now [00:59] that makes sense [00:59] see my above link [01:00] axw: morning. if you have a moment, could you +1 this small mp. fixes the joyent tests. https://codereview.appspot.com/101740043 [01:00] sure, looking [01:00] ta [01:00] yay, they merged [01:00] yeah, finally :-) [01:01] the tests still take a little too long, but much better [01:02] thumper: we're just discussing that [01:02] kk [01:02] my hunch is the gccgo fix hasn't landed in trusty-updates yet [01:02] * thumper nods [01:03] wallyworld: doko moves ppa's more often than the tide [01:03] that one will probably work [01:04] cool, trying it locally [01:06] heh [01:20] Guest78498: trying the joyent fixes now [01:20] davecheney: sadly, i don't get a clean test run. http://pastebin.ubuntu.com/7517679/ [01:20] a test failure or two and compile errors [01:20] that's with the ppa [01:21] but joyent tests pass :-) [01:21] still 3 times slower than maas tests, or 6 times slower tahn ec2 [01:21] but, it's a start [01:22] Guest78498: what have you don't with wallyworld? [01:22] s/don't/done/ [01:22] Guest78498: can you send me the last 20 lines of dmesg from that host [01:22] thumper: computer crashed, and freenode takes way too f*cking long to drop the connection, so it won't let me back in [01:22] cmd juju ran too long [01:22] Guest78498: ghost wallyworld [01:23] hmm... [01:23] davecheney: i ran the tests on my laptop, you want dmesg from that? [01:23] signal: segmentation fault (core dumped) [01:23] FAIL launchpad.net/juju-core/environs/simplestreams 1.552s [01:23] Guest78498: gotta be on ppc [01:23] that is where the bug is [01:23] skip the dmesg request [01:24] you don't need to apply the ppa to gccgo on amd64 [01:24] it is unaffected [01:24] looking at the rest of these tests [01:24] looks like they just took to long [01:25] what about the seg fault [01:25] no idea [01:25] check dmesg [01:25] (yes, I know i just told you no to) [01:26] davecheney: oh joy, lookie [01:26] 674.033377] CPU7: Core temperature above threshold, cpu clock throttled (total events = 1921) [01:26] [ 674.034391] CPU7: Core temperature/speed normal [01:26] [ 674.034392] CPU3: Core temperature/speed normal [01:26] [ 675.690153] mce: [Hardware Error]: Machine check events logged [01:27] maybe that had something to do with it? [01:27] Guest78498: if I had to guess, 8 tests compiled with gccgo plus 8 mongodbs is causing you to swap [01:27] i have 16GB RAM [01:27] but no swap partition [01:27] some of the tests consume gigabytes when run under gccgo [01:28] i've never needed a swap partition till now with that much memory. sigh [01:28] adding swap won't help [01:28] probably reducing the number of tests run concurrently will [01:28] go test -p 4 ./... [01:29] go test -p 4 -compiler=gccgo ./... [01:29] even if i don't have gomaxprocs set currently? [01:30] unrelated [01:30] go test tries to start as many test jobs in parallel as you have cpis [01:30] ok, i'll try that after getting the jpyent branch landed [01:32] ffs, how long does freenode want to hold open my old connection for [01:33] ok, good news and bad news [01:33] good news: ok launchpad.net/juju-core/provider/joyent127.919s [01:34] bad news: some other transient error, http://paste.ubuntu.com/7517706/ [01:34] usual unreachable servers bullshit [01:41] Guest78498: I managed to reproduce the issue kapil had where kvm containers are leaking. I'll be looking at that today [01:41] \o/ great thanks [01:41] I think it doesn't happen in 1.19.3, but I'd like to figure out what's going out to be sure [01:42] sounds good [01:42] davecheney: that replicset stuff has been so f*cking unreliable [01:42] it's been made more robust, but...... [01:42] still can fail === Guest78498 is now known as wallyworld === thumper is now known as thumper-otp [02:19] davecheney: much better with -p 4. the jujud watcher error occurs sometimes on amd64 also, but there's a fault in the openstack provider tests http://pastebin.ubuntu.com/7517849/ [02:30] davecheney: here's the dmesg from the CI ppc machine used to run the tests every hour and for which i posted the output with all the faults etc earlier http://pastebin.ubuntu.com/7517890/ [03:10] thumper: have you run a local provider with kvm before? [03:11] yes [03:11] seenthis? [03:11] ian@wallyworld:~$ juju status [03:11] ERROR failed getting all instances: exit status 1 [03:11] ERROR Unable to connect to environment "local-kvm". [03:11] Please check your credentials or use 'juju bootstrap' to create a new environment. [03:11] i just bootstrapped, no errors [03:11] no [03:11] :-( [03:13] hmmm, seems KVMObjectFactory.List() is sad [03:14] ian@wallyworld:~$ virsh -q list --all [03:14] error: failed to connect to the hypervisor [03:14] error: no valid connection [03:14] error: Failed to connect socket to '/var/run/libvirt/libvirt-sock': Permission denied [03:14] thumper: could it be that we need sudo and it's not prompting? [03:31] wallyworld: seems odd that "juju status" is trying to list providers; that implies your .jenv doesn't have a valid API server address cached [03:32] axw: works fine for local with lxc [03:32] wallyworld: you should add your user to the group that owns libvirt-sock tho [03:32] ok [03:32] then you should be able to use virsh [03:32] I think it's libvirtd [03:33] if we need that for local kvm to work, we should check for it [03:33] it shouldn't be necessary, sounds like something else is wrong [03:33] but it's useful for using virsh [03:33] does local with kvm work for you? [03:33] yes [03:33] just works [03:33] hmmmm [03:34] i'm trying to reproduce bug 1322281 [03:34] <_mup_> Bug #1322281: local provider deployment of mysql to kvm hangs in pending state [03:34] I am in the group, however all the important bits run under root anyway [03:34] I tried that, it just worked for me [03:34] i.e. I installed mysql in a kvm container and it worked fine [03:34] :-( [03:34] so my setup is screwed somehow [03:35] we might need to ask for the cloud init log [03:35] for the user of that bug [03:36] yeah may be useful === thumper-otp is now known as thumper [04:18] Guest21046: crash again? [04:18] Guest21046: I have a branch where the only failing bit is tool selection and syncing [04:19] thumper: nah, had to log out and back in so a new group membership would stick [04:19] Guest21046: not sure why... perhaps you could take a look? [04:19] sure [04:19] Guest21046: lp:~thumper/juju-core/enable-alpha-versions [04:19] Guest21046: I can't see where the version comparison stuff is different for tool selection [04:19] and since you most recently touched it, you may know more [04:20] sure, no problem === jam1 is now known as jam [04:24] thumper: just pulling your branch and looking without running tests, ottomh there are a number of tests which rely on minor version being odd to be a dev version. and that is now no longer the case in you branch [04:26] ah... [04:26] ok [04:26] i'll run the tests though to conirm [04:26] almost certain that'll be the problem then [04:27] * thumper goes to enfixorate [04:27] ok [04:27] let me know i you want me to do anything to help [04:30] axw: sorry I missed this a few days ago, but the fix here https://code.launchpad.net/~axwalk/juju-core/lp1321025-mongod-path-1.18/+merge/220544 [04:30] should really be to default to juju-mongodb except for precise and raring (I think) [04:30] otherwise everything will fail again when V is released. [04:31] jam: and raring is out of support [04:33] jam: that is what it does, see the code above [04:33] jam: it's using the result of MongoPackageForSeries, which was already doing the right thing [04:34] axw: ah, I see. It was just that we had a line that was using == trusty, gotcha === Guest21046 is now known as wallyworld [04:34] yup [04:34] I misread the diff given you added explicit "utopic: juju-mongodb" line [04:34] but that was just in tests [04:37] wallyworld: I think you put the wrong LP# in the replicaset card [04:37] sigh yes [04:38] bigjools was distracting me with drivel [04:38] heh :) [04:38] he's easy to distract [04:39] I just throw a ball [04:39] ball? where?? me fetch? can I? huh? huh? [04:39] hmmm... i'm trying to add a field to the FullStatus API response and everything hangs if I use a map keyed by int but it's fine if the map is keyed by string [04:39] I am NOT rubbing your tummy [04:39] is this a known thing? [04:39] * wallyworld tries so hard to be pc [04:40] makes a change :) [04:40] does our API serialisation only like string keys on maps? [04:40] as thumper says, it's a family channel [04:40] hello menn0, apparently you used to work with my brother in law at BATS [04:40] heh [04:40] what's his name? [04:40] Glyn [04:41] yep :) [04:41] I know Glyn well [04:41] unlike his brother in law, gyln is an awesome guy [04:41] I've known him for 19 years, some say 19 years too many [04:41] Glyn is awesome... if a little grumpy at times :) [04:41] lol [04:42] any takers on my question? what's our serialisation format for the API? [04:43] davecheney: apart from replicaset shite, ppc tests on CI server now pass also http://juju-ci.vapour.ws:8080/job/walk-unit-tests-ppc64el-trusty-devel/317/consoleFull [04:43] if it's JSON I can see how there could be a problem... [04:43] i think it is [04:43] right [04:43] that make ssense [04:43] * menn0 will use strings [04:44] don't ya love json for that stuff [04:44] it's not a big deal in this case [04:48] wallyworld: w00t [04:48] davecheney: i upgraded gccgo-4.9 from the ppa [04:48] cool [04:49] we're just waiting for that to land in trusty-updates [04:49] davecheney: did you see my local test run? fault in opestack tests http://pastebin.ubuntu.com/7517849/ [04:50] wallyworld: have you run godeps ? [04:50] i did earlier [04:50] which dep in particular? [04:56] wallyworld: did you see https://bugs.launchpad.net/bugs/1321009 [04:56] <_mup_> Bug #1321009: juju-metadata doesn't produce content that 1.19.2 bootstrap can use [04:56] mgo [04:57] hmm,, mail client shows me the old stuff, but not the new [04:57] looks like you already fixed it [04:57] jam: yes, fixed, it wasn't a simplestreams data format bug but rather bootstrap looked up the simplestreams data for supported arches before it was uploaded in the case the --metadatasource was specified [05:09] davecheney: i've got rev 275 o mgo locally even though juju-core's dep file says 273 [05:10] ... o_O [05:12] that shouldn't cause the test failures though [05:12] looking at the logs, it seems a mongo process is just dying unexpectedly [05:13] we had a bug a few days ago where our client was causing mongo to crash [05:13] looks like the replica set stuff is a bit fragile [05:13] as in, mongos' [05:13] and the test doesn't recover. it gets EOF, refreshes and calls Ping(), but tries to reach the process that has just died [05:13] yeah, i reckon mongo itself is fragile :-( [05:13] wish we didn't use it === vladk|offline is now known as vladk [05:18] jam: how detailed is you mgo ha knowledge? [05:21] wallyworld: I wouldn't say it is amazing, but I'm willing to tell you want I can :) === jam1 is now known as jam [05:21] jam: quick hangout? [05:21] sure [05:21] link? [05:21] https://plus.google.com/hangouts/_/gxweywbcs523zdbqunelr5u4uua [05:21] jam: morning [05:21] wallyworld: says the party is over.. [05:21] morning vladk [05:22] invite sent === Ursinha-afk is now known as Ursinha === Ursinha is now known as Ursinha-afk [05:58] * wallyworld -> school run bbiab [06:01] jam: hangout time [06:27] morning all === Ursinha-afk is now known as Ursinha === vladk is now known as vladk|offline [07:54] morning [08:04] fwereade: on closer inspection, my assumption about the keymanager test was correct; it's tested by TestImportKeys above. going to take that test back out again... === vladk|offline is now known as vladk [09:00] mgz: standup? [09:23] wallyworld: lost you [09:23] mgz: we're here [09:23] you're muted [09:23] I.. can't now hear anything [09:23] mgz: you dropped out and came back [09:24] jam: I lost you. I can see you’re in, but muted and no video. [09:25] TheMue: you're frozen for me as well, I'll reconnect [09:51] jam: could it be that I’m not allowed to write into the team calendar? [09:52] TheMue: fixed [09:59] thx === vladk is now known as vladk|offline [10:16] dimitern: fwereade: https://codereview.appspot.com/96600043 adds a caching layer to the srvRoot code so that all facades are now cached [10:16] jam, looking === vladk|offline is now known as vladk === Ursinha is now known as Ursinha-afk [10:43] jam, LGTM [10:43] hx [10:43] thx [10:45] TheMue, fwereade, standup? [10:55] dimitern: we lost you after "I almost" === vladk is now known as vladk|offline === vladk|offline is now known as vladk === Ursinha-afk is now known as Ursinha [12:03] hi all, is anyone available for reviewing https://codereview.appspot.com/92610045 ? thanks! [12:05] good morning everyone === vladk is now known as vladk|offline [12:20] vladk|offline: we're done for now, don't worry about coming back === vladk|offline is now known as vladk [12:54] frankban, LGTM [12:54] fwereade: thank you! [13:17] fwereade: so for modelling "environ tag" as part of the data we store in the .jenv, we currently have a split of EnvironInfo.SetAPIEndpoint() and EnvironInfo.SetAPICredentials() [13:17] I'm tempted to just put EnvironUUID into APIEndpoint informationd [13:17] Which currently holds Addresses and CACert [13:18] but seems a decent fit [13:18] jam, +1 [13:19] k, thanks for confirming it was sane [13:19] now I just have to figure out the spaghetti mess to figure out how to hook it together... :) [14:02] perrito666: wwitzel3 is natefinch gone today? [14:02] jam: natefinch and wallyworld_ [14:02] aghh [14:02] wwitzel3: [14:02] holiday [14:03] ah right memorial day [14:03] * perrito666 goes again [14:03] jam1: nate and wwitzel3 and voidspace are on holiday today [14:20] greetings [14:25] bodie_: o/ [14:25] fwereade: I'm trying to gather my thoughts to ask you some questions about cleanup.go [14:26] fwereade: are you going to be around for a bit? [14:39] jcw4, let's talk now [14:39] jcw4, if you can? [14:39] jcw4, otherwise we can do the one in 80m [14:40] jcw4, I have a meeting in 20m but if I can save you a bit of time I would be delighted [14:42] fwereade: I've started working on cleanupDeadUnit [14:43] fwereade: and it's unclear if it should be called in unit destroyOps [14:43] fwereade: or in a new *Ops on the unit? [14:43] fwereade: seems there should be some hook for EnsureDead [14:43] fwereade: to call some *Ops fn [14:44] jcw4, hmm, so, Destroy will go from alive to either dying or removed [14:44] jcw4, if it's in fast-forward mode, we should add a cleanup there in destroy [14:44] fwereade: I see... so EnsureDead is not guaranteed to run... makes sense. [14:44] jcw4, wondering whether it makes more sense to tack it onto the remove ops [14:45] fwereade: okay, so just add it as another op right after the cleanupDyingUnit... === vladk is now known as vladk|offline [14:45] jcw4, the cleanups would not be guaranteed to run before the unit was removed anyway [14:46] jcw4, not sure I follow you there [14:46] fwereade: unit.go:325 [14:46] jcw4, nah, that'll run while the unit is still dying and not necessarily dead [14:47] fwereade: ok. that's what I was stumbling against. so removeOps... line 367... [14:47] I see [14:47] jcw4, I'd add it in Service.unitRemoveOps [14:48] fwereade: okay.. that makes sense [14:48] jcw4, sorry, removeUnitOps [14:48] jcw4, cool [14:49] fwereade: I'll start looking there and ping you with further questions as needed [15:23] looks like the dirt in xeipuuv/gojsonschema is in sigu-399/gojsonreference [15:24] I need to make sure there's not more in gojsonschema itself, but gojsonpointer looks pretty OK if not terribly beautifully written [16:00] mgz, you around? [16:00] :) [16:03] fwereade: If we cleanup actions in removeOps it seems that we could miss actions added while the unit is Dying... [16:05] hey what does it means when godeps yields "blah is not clean" [16:05] where blah is a dependency path [16:08] jcw4, I don't think so? === Ursinha is now known as Ursinha-afk [16:21] fwereade: should we be versioning the Admin api and not changing Login without a version bump? [16:21] jam, ha, yes, good point [16:21] jam, excellent testbed [16:21] ... :( [16:22] true, though more work for me [16:22] jam, we will probably notice of login fails [16:22] fwereade: well, it is our entry point where we were going to tack on the compat stuff [16:22] so it is a bit hard to get right [16:23] fwereade: namely old servers will happily pay no attention to a "Version: 2" in the Admin login request [16:23] is that good/ok ? [16:23] (as in, you pass V2, but you just get login v1) [16:23] well, v0 at least [16:24] jam, it's not ideal, but I think it's inescapable... trying to figure out the worst that could happen [16:39] fwereade: well, we can just "Login", 2 and have it just work but not give us back the actual v2 of the call. [18:22] okay this gojsonreference module is weird [18:24] heh, discarding errors in the test.... [18:24] let's see how much of gojsonschema is broken once this is unbroken [19:35] okay, implemented some much better testing in gojsonschema [19:35] I'm pretty sure they just had some dumb typos, but it looks like there might be a problem with their implementation of gojsonpointer regarding URL scheme [19:36] (fwereade, mgz, rogpeppe) [19:36] sorry, implemented testing in gojsonreference, not gojsonschema [19:36] the uglier of the dependencies [19:37] bodie_: o/ [19:37] :) [19:38] bodie_: you haven't pushed back up to github yet? [19:38] I have, actually. it's under a dev branch [19:39] bodie_: I see now.. tx [19:39] pushed up latest bits [19:42] oy, just noticed Go playground is more verbose about errors than my vim plugin. [19:42] that's very annoying. [19:42] oh? [19:43] http://play.golang.org/p/dXHaRCeB_l [19:43] mine was giving me something like "expression expected" [19:44] ah, but not the actual syntax error "unexpected comma..." [19:46] right [20:08] urg.... found the really ugly bits of this [20:39] jam, if you expect an echo/assert of the version in the response the client could handle appropriately [20:39] with absence -> v0 === _thumper_ is now known as thumper [20:43] fwereade: around maybe? [21:10] Any hints as to where in the code I would find a machine tag being calculated, or how to get a machine tag from an id? [21:13] jimmiebtlr: names/machine.go ? [21:13] thumper, heyhey [21:13] fwereade: got some time to chat? [21:13] thumper, 5 mins? [21:13] fwereade: me next if possible ;) [21:13] fwereade: as in "in 5 minutes" or "only have 5 minutes" ? [21:13] (after thumper ) [21:14] thumper, sorry, in 5 minutes [21:14] jcw4, sure [21:14] fwereade: that's fine [21:16] thanks [21:18] jimmiebtlr: yw [21:41] fwereade: I know it's late there... I'll just post my question and you can respond at your convenience [21:41] fwereade: https://codereview.appspot.com/92630043 is the WIP branch [21:42] fwereade: all good except the cleanup test TestCleanupEnvironmentServices is now failing because somehow the actions cleanup is getting queued but not run [21:43] fwereade: when I explicitly call Unit.Remove() and then Service.Cleanup() it works... [21:45] fwereade: and AFAICT cleanupUnitsForDyingServices *should* call unit.Destroy() and trigger the actions cleanup, but I *think* the unit is gone before that line of code gets run [21:46] fwereade: http://paste.ubuntu.com/7524594/ [21:49] fwereade: I see that cleanupUnitsForDyingServices only processes Units that are Alive... [22:05] jcw4, it's ok, I think: you clean up the environment services, but those cleanups schedule more cleanups because some units got removed; you then need to clean up *again* before there are no cleanups left [22:06] fwereade, I think that's what I just came to [22:06] jcw4, unit.Destroy will *queue* the actions cleanup but not run it [22:06] I'm busy adding one more assertCleanupsRun to the test + comments explaining [22:06] jcw4, assertCleanupCount is what I introduced last branch for exactly that reaosn wrt dying-unit cleanups [22:07] jcw4, comments explaining why it's used will generally be appreciated, indeed [22:07] fwereade: Okay, I'll plan on using assertCleanupCount too to make sure we're not succeeding accidentally [22:07] jcw4, cool [22:07] fwereade: tx [22:07] fwereade: btw... do you have involvement in the ubuntu sprint in your neck of the woods this week? [22:08] fwereade: I just saw niemeyers comment on G+ and thought it was an interesting co-incidence:) [22:17] jcw4, not really, but I will surely pop down into sliema to say hi [22:17] jcw4, it's client stuff really [22:17] fwereade: i see :) [22:18] fwereade: tests passing now.. .I think I'll do a real lbox propose now [22:18] jcw4, cool, I need to do tidying up and stuff now, if I still have enrgy when I'm done I'll pass by and see what I can do [22:19] fwereade: thx, no pressure :) [22:35] fwereade, fwiw https://codereview.appspot.com/92630043/ [22:58] why does this test not run: http://pastebin.ubuntu.com/7524918 ? [22:59] it is not inside juju-core, just an exercise to understand the testing setup [23:10] waigani: you need to register the test [23:10] func Test(t *testing.T) { [23:10] gc.TestingT(t) [23:10] } [23:10] not the test i mean, but you need to set up to run with gocheck [23:24] wallyworld_: is ppc passing in ci yet ? [23:24] davecheney: yes andno. intermittent mongo failures, not releated to ppc http://juju-ci.vapour.ws:8080/job/walk-unit-tests-ppc64el-trusty-devel/ [23:24] there's a good run of blue there [23:25] awesome [23:25] do you think the mongo failure are because mongo on ppc is not well tested ? [23:32] davecheney: it fails on amd64 too. mongo is not my favourite db, let's put it that way to stay polite [23:36] wallyworld_: Thank you! so that is how gc stitches up to the standard go testing package? [23:36] yep [23:37] a lot of our packages have a package_test.go in them - that's the convention we use since there is often more than one test file [23:37] and we only want to register once per package