[00:08] <wallyworld_> thumper: how long do the cmd/juju tests take for you? on my average laptop, they take about 160s. JujuConnSuite test set up is about 200ms for each
[00:08]  * thumper runs
[00:08] <thumper> ah...
[00:08] <thumper> my state is somewhat broken right now
[00:08] <wallyworld_> the bot seems way slower
[00:09] <wallyworld_> i'm pretty sure we're using tmpfs
[00:09]  * thumper scrolls way up
[00:09] <thumper> ~170s
[00:09] <wallyworld_> so that's well within the timeout limit
[00:10] <thumper> we are on fairly high speced machines though
[00:10] <wallyworld_> so yes, the tests are bad, but they should run
[00:10] <davecheney> wallyworld_: on my laptop it was 400s
[00:10] <wallyworld_> wow
[00:10] <thumper> I have ssd
[00:10] <wallyworld_> are yoyu using tmpfs?
[00:10] <davecheney> yes
[00:10] <davecheney> core i5 thinkpad x220
[00:10] <davecheney> there is a 3-4 second delay between each test run
[00:10] <davecheney> i'm assuming that is setup/teardown
[00:10] <wallyworld_> i don't see that
[00:10] <wallyworld_> i see 200ms
[00:10] <davecheney> don't look at the times
[00:11] <davecheney> go test -gocheck.v
[00:11] <wallyworld_> i did
[00:11] <davecheney> well, your machine is the odd man out
[00:11] <davecheney> it takes 400s on my machine
[00:11] <davecheney> and 570+ in CI
[00:11] <wallyworld_> thumper gets the same times as me
[00:11] <davecheney> i only have 4 cores
[00:11] <davecheney> just like CI
[00:12] <wallyworld_> i have an 8 core i7
[00:13] <wallyworld_> shouldn't be that much faster
[00:13] <davecheney> yes, that is what I said
[00:14] <wallyworld_> i'm not sure CI should be blocked on this - it's not a regression
[00:14] <davecheney> i agree
[00:14] <wallyworld_> i'll update the bug
[00:14] <davecheney> thanks
[00:16] <wallyworld_> maybe in the interim we can run CI on a larger instance type
[00:32] <davecheney> maybe
[00:32] <davecheney> the largest that ec2 offer is an 8 core
[00:32] <davecheney> and in my testing
[00:32] <davecheney> that is still slower than my 2 year old core i5
[00:32] <davecheney> building gccgo as a test
[01:17] <wallyworld_> davecheney: thumper: changing the instance type to "c3.2xlarge" (8 core, 16GB) seems to have helped the cmd/juju tests to pass. More data points needed, but -p 2 was faster than full parallisation for this run. there was a spurious apiserver/client failure which caused -p 2 to run. interestingly, the -p 2 run was faster for many tests this time around
[01:18] <wallyworld_> maybe we try running with -p 2 to start with for a bit
[01:18] <perrito666> thumper: are you around?
[01:18] <thumper> yeah...
[01:18] <perrito666> thumper: don't sound so happy
[01:18] <perrito666> :p
[01:18] <thumper> I'm writing docs/bootstrapping.txt
[01:19] <perrito666> thumper: my condoleances
[01:19] <thumper> after having to work it out for the umteenth time...
[01:19] <thumper> I thought I'd write it down
[01:19] <perrito666> thumper: how savvy are you on menn0's "upgrade mode" ?
[01:19] <thumper> ish
[01:19] <thumper> whazzup?
[01:20] <perrito666> thumper: I am trying to replicate the idea for restore and I would like some overview but apparently I missed menn0 this week
[01:21] <perrito666> I worked out a decent part of it
[02:04] <perrito666> ok guys braing shutting down, see you all or some of you tomorrow
[02:09] <waigani_> cya perrito666
[02:25] <waigani_> I'm getting a "state changing too quickly; try again soon" error
[02:26] <davecheney> o_O
[02:26] <waigani_> how soon? should I have a cup of tea?
[02:27] <waigani_> I'm looping over a collection of state users and adding each one as an environ user via a new AddEnvironmentUser func - maybe I need to add them all as one transaction?
[02:28] <waigani_> thumper: ^?
[02:42] <thumper> waigani_: it is lying to you
[02:42] <thumper> that is another error that returns when the assertion fails
[02:42] <thumper> it tries a few times
[02:42] <thumper> and then assumes that the assertion is due to someone else
[02:42] <thumper> in dev, it is most likely you
[02:53] <thumper> waigani_: did that help?
[02:53]  * thumper goes to turn the coffee machine on
[02:53] <waigani_> thumper: sorry I was afk
[02:57] <axw> wallyworld__: I've got to fix up my bootstrap tools branch, it's incompatible with your changes to use /var/lib/juju in the containers
[02:57] <axw> gah, not sure how I'm going to fix this...
[02:58] <wallyworld__> axw: that's ok, i mounted that directory to avoid the need for the container to call out to http (if i am thinking of the right thing)
[02:58] <wallyworld__> calling out to http on the host
[02:58] <axw> wallyworld__: yes that is the one. in my branch, file:// is treated specially. it means read the file contents locally and then add to the cloud-init script
[02:59] <wallyworld__> i did that change because people were seeing errors
[02:59] <wallyworld__> this branch i think obsoletes the need for that
[02:59] <wallyworld__> since the tools are copied into the container via ssh init
[02:59] <axw> wallyworld__: only for bootstrap ,not for containers...
[03:00] <wallyworld__> the containers will get from state server
[03:00] <wallyworld__> we can handle retries at that point
[03:01] <wallyworld__> it was only a short term quick fix
[03:01] <axw> wallyworld__: different branch. I guess I can revert it temporarily... I'm checking if we can do better tho. we may be able to load the tools into the userdata for the local provider
[03:02] <wallyworld__> could do. i just wanted a quick way to avoid the observed source of the errors
[03:02] <wallyworld__> i knew it would be throwaway
[03:44] <axw> wallyworld__: can you PTAL: https://github.com/juju/juju/pull/600/commits
[03:44] <wallyworld__> sure
[03:44] <axw> from the 5th commit on
[03:45] <axw> wallyworld__: it looks like lxc doesn't have the same limit on userdata size, so if we need to we have the option of serialising the tools in there for all lxc containers
[03:46] <wallyworld__> good to know
[03:46] <wallyworld__> i think that would be useful
[03:46] <wallyworld__> avoid networking calls back to the state server to get the tools
[03:47] <wallyworld__> axw: supportedArchitecturesCount is just for testing?
[03:48] <wallyworld__> ah nevermind
[03:48] <wallyworld__> i thought it was in production code
[03:53]  * thumper needs to work out how to poke inside of mongo
[03:54] <thumper> also noticed some wonderful weirdness in our code
[03:54] <thumper> the password hash of the admin-secret is used as the actual password for the admin user to mongo (which is "machince-0" btw)
[03:55]  * thumper goes to make that coffee
[03:56] <thumper> bugger, timer on the machine would have turned it off again by now
[04:12] <thumper> ugh...
[04:12] <thumper> spent the day teasing apart the layers of juju to work out where to put my change
[04:13] <thumper> still not entirely sure... but closer
[04:15] <stokachu> where does juju log its debugging output when attempting to bootstrap into openstack?
[04:15] <stokachu> i ran with --debug but its just sitting at apt-get update http://paste.ubuntu.com/8146668/
[04:20] <thumper> grr...
[04:20] <thumper> stokachu: not sure sorry
[04:21] <stokachu> no worries
[04:21]  * thumper can't push because master is dirty 
[04:21] <thumper> ericsnow: you need to fix your pre-push hooks
[04:36] <davecheney> thumper: hang on
[04:36] <davecheney> martin fixed that, twice
[04:37] <davecheney> how did a change get past the bot
[04:37] <thumper> wallyworld__: chat?
[04:37] <thumper> davecheney: no idea
[04:37] <wallyworld__> thumper: hiya, ok
[04:37] <wallyworld__> onyx standup?
[04:37] <thumper> https://plus.google.com/hangouts/_/g6ga27vzkwgy3dz4s7xly5sivia?authuser=1&hl=en
[04:38] <wallyworld__> ok
[04:47] <thumper> wallyworld__: https://github.com/juju/juju/pull/604
[05:02] <thumper> davecheney: https://github.com/juju/juju/pull/605
[05:04]  * davecheney looks
[05:04] <davecheney> thumper: not logm
[05:04] <davecheney> do not use NewXXTag in productoin code
[05:04] <davecheney> unless you are 100% sure that the tag is valid
[05:05] <thumper> davecheney: the line above it validates
[05:05] <thumper> davecheney: how else should we do it?
[05:05] <davecheney> names.ParseEnvironTag("environment-"+string)
[05:05] <davecheney> if you're sure the tag is valid then LGTM
[05:06] <davecheney> but this is a warning
[05:06] <thumper> hmm...
[05:06] <davecheney> honestly those NewXXTag functoins shouldn't be in the package
[05:06] <davecheney> they are a footgun
[05:06] <thumper> davecheney: but they have to be created somewhere, right?
[05:06] <davecheney> in most cases they come as strings on the wire
[05:06] <davecheney> kind-id
[05:06] <davecheney> so we use parse
[05:06] <thumper> but something puts them on the wire
[05:07] <davecheney> yup tag.String()
[05:07] <thumper> but something creates the tag
[05:07] <thumper> you have to have trust somwhere
[05:07] <davecheney> no argument there
[05:07] <davecheney> but you should look suspciously at every case
[05:08] <thumper> the same method is used in state.Initialize
[05:08] <thumper> we have a uuid
[05:08] <thumper> then create an environment tag from it
[05:08] <davecheney> sure
[05:09] <davecheney> but you are arguing that using a dangrous weapon is ok because others have used it heaps of times in the past with nothing going wrong
[05:09] <davecheney> past performance is no guarentee of future profit
[05:09] <thumper> no, I am saying that this is one of the places where you use the dangerous weapon carefully
[05:09] <davecheney> sure
[05:09] <thumper> there are always places where we need to create tags with known data
[05:10] <davecheney> yes
[05:10] <davecheney> but I don't see any validation there
[05:10] <davecheney> you just take what comes out of hte jenv fil
[05:10] <thumper> no, this isn't the jenv
[05:10] <davecheney> +ssInfo, err := st.StateServerInfo()
[05:10] <thumper> right, st here is *state.State
[05:10] <davecheney> ssIfnfo should have an envTag field or method
[05:11] <davecheney> no, StateServerInfo is not a *state.State
[05:11] <davecheney> it's some turd that got passed back from the api
[05:11] <thumper> no, st is
[05:11] <davecheney> ssInfo
[05:11] <thumper> StateServerInfo is a method on *State
[05:11] <davecheney> +st.environTag = names.NewEnvironTag(ssInfo.EnvUUID)
[05:11] <davecheney> you have a LGTM with resevatoins
[05:12] <davecheney> there is no value in griding on this point
[05:12] <thumper> right, here is some ickyness...
[05:12] <thumper> which we can fix
[05:12] <thumper> StateServerInfo is a POD structure
[05:12] <thumper> Plain Old Data
[05:12] <thumper> all public
[05:12] <davecheney> yeah, this is the same POS that infests the state and the mongo packages
[05:12] <davecheney> and binds them tightly to the _client_ api
[05:13]  * thumper nods
[05:13] <thumper> we should separate the serialization structure from the info structure
[05:13] <davecheney> it's fine for it to be public fields
[05:13] <davecheney> it's returned by value
[05:13] <davecheney> we can't change the copy that state has
[05:13] <thumper> but then adding a method that creates an environ tag from the uuid in the struct is meaninless
[05:13] <thumper> as it gives you a false sense of security
[05:14] <thumper> when there is none
[05:14] <davecheney> obiously we'd remove the envUUID field
[05:14] <thumper> nah... I just put it there
[05:14] <davecheney> well shit
[05:14] <thumper> this is all about our shitty data structures
[05:21] <axw> wallyworld__: https://github.com/axw/juju/commit/b56a48d3bd760f9ab58ccada562dd663b1786a0d#commitcomment-7526685
[05:21] <wallyworld__> rightio
[05:21] <thumper> davecheney: let me ponder this env tag for a bit
[05:21] <axw> can you please take a look at that and see if I'm making sense
[05:21] <thumper> before I merge it
[05:21] <thumper> I'd like to ensure that we do it right
[05:26] <axw> wallyworld__: thanks. going to do some more testing before I land, and double check coverage
[05:27] <wallyworld__> ok, it's got potential to break things does this branch
[05:28] <wallyworld__> jam1: i got the bot "fixed" by throwing a larger instance at it - our tests are still horrible
[05:28] <axw> wallyworld__: well it's pretty invasive so yeah... I have done some targeted testing with non-amd64 arch, will test the whole lot before attempting merge tho
[05:29] <wallyworld__> ty
[05:42] <dimitern> morning all
[05:56] <jam1> wallyworld__:  :(
[06:07] <axw> hazmat: in case this got lost in the noise of github activity: https://github.com/juju/juju/pull/596
[08:10] <TheMue> morning
[08:31] <jam> morning TheMue
[08:37] <TheMue> jam: seen your mail, could you tell me a bit more about it?
[08:59] <dimitern> morning jam, TheMue
[08:59] <dimitern> jam, the meeting will start any minute now :)
[08:59] <TheMue> dimitern: heya
[09:02] <dimitern> jam, ping
[10:18] <jam> dimitern: pong
[10:18] <jam> sorry about the delay
[10:18] <jam> dimitern: dang it
[10:18] <jam> sorrI  missed it
[10:18] <jam> I have to take the dog out now, will be back in 20 min or so
[10:21] <dimitern> jam, no worries, i'll bring you up-to-speed at the standup
[10:44] <jam> dimitern: speaking of which :)
[10:45] <jam> TheMue: ^^
[10:46] <dimitern> brt
[10:54] <mattyw> dimitern, ping - when you have a moment
[10:57] <dimitern> mattyw, a tentative pong (doing standup now)? :)
[11:49] <hazmat> do actions use a hook context? (ie. long running)
[11:53] <hazmat> bodie_, fwereade ^
[11:55] <gsamfira> hazmat: hi. There does not seam to be any difference between Hooks and Actions aside from location (when it comes to running). https://github.com/juju/juju/blob/master/worker/uniter/context.go#L330
[11:55] <hazmat> gsamfira, thanks
[11:58] <gsamfira> glad I could help :)
[14:26] <TheMue> eh, maybe I’m blind, but do we have a github.com/juju/juju/upstart?
[14:27] <TheMue> cmd/jujud/machine.go imports it, but I cannot find it (neither can my compiler)
[14:28] <TheMue> dimitern: you’re around for a little crosscheck?
[14:31] <TheMue> mattyw: ping
[14:32] <mattyw> TheMue, hey there
[14:33] <TheMue> mattyw: could you take a look please? it seems the repo has a problem
[14:33] <alexisb> mgz, I am on the hangout whenever you are ready
[14:34] <mattyw> TheMue, I certainly can but I might not be the best person for the job
[14:34] <mgz> alexisb: thanks for the poke
[14:36] <TheMue> mattyw: maybe I already found the checkin
[14:37] <TheMue> so currently our master won’t build, a package is missing
[14:37] <dimitern> TheMue, now I'm here
[14:38] <mattyw> TheMue, this one yeah? https://github.com/juju/juju/commit/1f7148c5e2ae9f68eb9f8b0c94f6c00b82ee4a18
[14:38] <TheMue> dimitern: thx, but found it already.
[14:38] <TheMue> mattyw: yeah, exactly
[14:38] <dimitern> ok
[14:39] <TheMue> dimitern: in jujud machine.go imports a non-existing package :(
[14:39] <mattyw> TheMue, dimitern but it looks like the package isn't used either
[14:39] <dimitern> TheMue, what?
[14:39] <TheMue> mattyw: yep
[14:39] <TheMue> mattyw: I’m only wondering how it passed the bot
[14:40] <mattyw> TheMue, me too - who's the best person to ask about the bot?
[14:43] <mattyw> TheMue, the tests run http://juju-ci.vapour.ws:8080/job/github-merge-juju/431/console
[14:43] <mattyw> TheMue, but that error is in main - do tests on main get run?
[14:43] <dimitern> mattyw, TheMue, is this about upstart?
[14:43] <TheMue> mattyw: this seems to be the problem
[14:44] <TheMue> dimitern: yep
[14:44] <dimitern> TheMue, mattyw, it seems juju/upstart moved into juju/service/upstart
[14:45] <dimitern> and perhaps wallyworld had goimports installed and juju/upstart code was in GOPATH before juju/service/upstart, and probably the same happened on the bot
[14:46] <TheMue> dimitern: hmm, could be the reason
[14:46] <dimitern> it happened in https://github.com/juju/juju/commit/190f98fcab118b5dce269e8c0021a563455fee39#diff-88ad1ca7d18fe89a76f6348caf6ddd42
[14:47] <mattyw> dimitern, makes sense
[14:48] <mattyw> dimitern, TheMue anyidea how we can stop this from happening next time?
[14:48] <dimitern> mattyw, TheMue, it just happened so that code importing the old path was merged last
[14:48] <dimitern> https://github.com/juju/juju/commit/880aaa83f1a474ef7856f1237c3781ab6a51dbfe
[14:48] <TheMue> also upgrades isn’t used in machine.go
[14:49] <dimitern> mattyw, I'm not sure if that's the case, but if it is, then we should look into the bot and see how it does fetch dependencies, etc.
[14:49] <mattyw> dimitern, where is the code for the bot?
[14:50] <dimitern> it might be that ian added the import line manually, rather than using goimports
[14:50] <dimitern> mattyw, mgz would know that I guess
[14:51] <mgz> which bot bit?
[14:52] <mgz> mattyw: you want to look at the make-release-tarball script in lp:juju-release-tools
[14:52] <mattyw> mgz, ok thanks TheMue ^^
[14:53] <TheMue> yep
[14:53] <mgz> what was the symptom exactly? I'm a little confused from the log
[14:53] <mgz> we had broken import that got past the landing?
[14:53] <mgz> or didn't get past the landing but did get past the build?
[14:54] <TheMue> mgz: cmd/jujud/machine.go imports packages it doesn’t use and that doesn’t exist
[14:54] <mgz> I see, on trunk currently.
[14:55] <dimitern> we should file a critical ci blocker bug for that
[14:55] <dimitern> so nothing lands until it gets fixed
[14:55] <mgz> and blame is on the last rev of master? or an earlier one?
[14:57] <dimitern> mgz, last rev
[14:57] <mattyw> tasdomas, dimitern in other news this is ready for more reviews when you have a moment https://github.com/juju/juju/pull/562
[14:57] <dimitern> mattyw, will have a look shortly
[14:57] <mgz> oh, I see
[14:57] <mgz> go fmt passes...
[14:58] <mgz> and the go build line has gone
[14:59] <mgz> dimitern: my suggestion, I land a backout of the last rev
[14:59] <mgz> add `go build ./...` back to the tarball script
[14:59] <dimitern> mgz, there seems to be another issue
[14:59] <mgz> reland the earlier rev and see that it borks?
[14:59] <dimitern> mgz, ../../state/backups/metadata/metadata.go:10:2: cannot find package "github.com/juju/utils/filestorage" in any of:
[14:59] <dimitern> 	/usr/lib/go/src/pkg/github.com/juju/utils/filestorage (from $GOROOT)
[15:00] <dimitern> and it's not in dependencies.tsv as well
[15:00] <mgz> also the same rev?
[15:00] <mgz> if so, covered by the backout
[15:00] <dimitern> mgz, let me check
[15:00] <mgz> seems not..
[15:01] <mgz> probably eric's change?
[15:01] <dimitern> mgz, yes, on trunk
[15:01] <dimitern> mgz, but it's not the same rev I think
[15:01] <mgz> yeah, looks like that's ericsnowcurrently backups-storage
[15:02] <dimitern> mgz, yep https://github.com/juju/juju/commit/f4da7f542947abb798da7da730a5482a029eee44
[15:02] <mgz> so, two borked landings from the build line going... now, why was that removed...
[15:02] <dimitern> mgz, so we're not even trying if it builds ? lol..
[15:04] <mgz> well, not at the tarball stage
[15:04] <dimitern> mgz, iirc there was a unit test for deps.tsv..
[15:04] <mgz> we do before running the tests, and that's working for some reason
[15:04] <dimitern> mgz, it takes like 10 secs - we should do it before running tests, not as late as tarball packaging time i think
[15:04] <mgz> tar
[15:04] <dimitern> ah, I see
[15:04] <mgz> sorry,
[15:05] <mgz> tarball build comes before tests
[15:05] <dimitern> mgz, hmm.. I wonder why that is
[15:05] <mgz> we get deps and make tarball on the main juju machine
[15:05] <mgz> then send the tarball to a new instance to run the tests
[15:05] <mgz> so, the bot *should* still be failing before we run tests, but from the logs it's not for some reason
[15:07] <mgz> can see the line `go build github.com/juju/juju/...` in the landing console log, and it's not got the error
[15:08] <mgz> a little worried it's not actually testing the right juju at present
[15:11] <mgz> hm,
[15:11] <mgz> I'm tempted to blame a godeps change
[15:11] <mgz> nothing on the ci side has changed
[15:14] <mgz> heh
[15:14] <mgz> okay, got it
[15:15] <dimitern> mgz, yep? what is it?
[15:16] <TheMue> ah?
[15:16] <mgz> for some reason, fixDetachedHead from cmd/go/vcs.go is now getting called, when it wasn't before
[15:17] <mgz> and that does checkout master... overwriting the merge
[15:17] <dimitern> yay! :D
[15:18] <mgz> not sure *what* has triggered this, but can fix at least
[15:18] <dimitern> lots of fun
[15:18] <dimitern> godeps perhaps
[15:18] <mgz> lets hope it was recent
[15:18] <TheMue> strange kind of error
[15:18] <dimitern> it depends on how it gets missing revisions from git - if it does not use fetch but pull it can happen
[15:18] <mgz> because we've only been testing the current head, rather than the pending merge, for the last few landings at least
[15:20] <TheMue> I’ve seen it first when testing the dummy provider. here I got it in github.com/juju/juju/environs/jujutest/livetests.go:124: build command „go“ failed …
[15:20] <dimitern> nope, scratch that - godeps uses git fetch, at list in lp:godeps trunk
[15:20] <mgz> for now, I want to just back those two changes out
[15:20] <mgz> and eat lunch...
[16:02] <alexisb> jcw4, bodie_ we are on the hangout when you guys are ready
[16:02] <jcw4> woo hoo
[16:08] <jcw4> TheMue: #jujuskunkworks
[17:00] <perrito666> hello everyone
[17:00] <mgz> hey!
[17:07] <mgz> anyone: pr #609
[17:09] <mgz> gd
[17:09] <mgz> urk
[17:09] <mgz> gsamfira, perrito666: ^
[17:13] <perrito666> mgz: on which repo?
[17:13] <mgz> juju/juju
[17:16]  * perrito666 receives confirmation from msdn of subscription... I wonder when did I subscribe
[17:16] <perrito666> it was at least one month ago
[17:21] <bodie_> so what's the deal with upstart and how far back do we have to roll back to get it to build?
[17:22] <gsamfira> well, 2 options. there is one commit that was ported forward, and if we remove that one, it will build
[17:23] <gsamfira> or, we can do a PR, that removes the extra imports and calls agentConfig.Tag().String() instead of agentConfig.Tag() in a couple of places
[17:23] <gsamfira> and it will also build
[17:23] <gsamfira> but I have not investigated the issues related to the second PR that is being reverted by https://github.com/juju/juju/pull/609
[17:23] <gsamfira> mgz might have more info on that
[17:24] <perrito666> mgz: btw, can https://bugs.launchpad.net/juju-core/+bug/1361721 be reproduced in something other than utopic?
[17:24] <mup> Bug #1361721: MachineSuite.TestDyingMachine failing frequently <juju-core:Triaged> <https://launchpad.net/bugs/1361721>
[17:24] <mgz> perrito666: that's the only job it's on I think, but it's been a dodgy testfor a while
[17:26] <mgz> bodie_: I'm not sure, which upstart what?
[17:26] <bodie_> the failing build on the latest master
[17:26] <gsamfira> bodie_ : the upstart package was moved to the service package quite a while ago.
[17:27] <jcw4> bodie_: mgz has a revert in the pipeline now
[17:27] <mgz> bodie_: my pr should fix the failing build
[17:27] <bodie_> ah, great
[17:28] <gsamfira> if there is no other issue with the commits that are being reverted, the change to get it to build without reverting is about 4 lines. I am running the tests now. Should I let them finish and see if that fixes it?
[17:31] <mgz> gsamfira: I want to just revert, because the tests were never run on those changes
[17:31] <mgz> then fix the landing before putting in new code
[17:33] <gsamfira> fair enough. As you wish. I am running the tests on that code now, with the fix. If you prefer to revert, its fine with me :). I was just offering an alternative that would be shorter
[19:26] <ericsnow> perrito666: how's your morning go?
[19:26] <perrito666> ericsnow: wonderful
[19:27] <ericsnow> perrito666: glad to hear it
[19:27] <perrito666> ericsnow: btw, one of your PRs has just been reverted, please contact mgz for more info
[19:27] <ericsnow> perrito666: yeah, I saw :(
[19:29] <ericsnow> mgz: how exactly was my patch failing?
[19:30] <ericsnow> mgz: I'm guessing it's related to updating dependencies.tsv
[19:31] <mgz> 16:02 < dimitern> mgz, ../../state/backups/metadata/metadata.go:10:2: cannot find package "github.com/juju/utils/filestorage" in any of:
[19:31] <mgz> 16:02 < dimitern> I/usr/lib/go/src/pkg/github.com/juju/utils/filestorage (from $GOROOT)
[19:31] <mgz> 16:02 < dimitern> and it's not in dependencies.tsv as well
[19:32] <ericsnow> mgz: weird
[19:33] <mgz> worth trying the merge again and building locally, see if it's actually okay
[19:33] <ericsnow> mgz: github.com/juju/utils/filestorage has existed for some time and it's in the revision listed in depenedencies.tsv
[19:34] <ericsnow> mgz: at least as long as that revision didn't get reverted too
[19:36] <perrito666> all: I just sent an email to juju-dev in the thread "getting rid of all-machines.log" you opinion will be greatly appreciated
[20:18] <perrito666> ok good news is: I don't need utopic to fix 1.20 tests
[20:19] <mgz> ace
[20:21] <perrito666> sweet, 8G of ram really did the trick for test running
[20:28] <perrito666> why cant I see builds before #545 for http://juju-ci.vapour.ws:8080/job/run-unit-tests-utopic-amd64/ ?
[20:41] <perrito666> abentley: mgz jog anyone can tell since when has 1.20 -> http://juju-ci.vapour.ws:8080/job/run-unit-tests-utopic-amd64/ been broken? I know it failed for the last revision, but do we know if this is indeed something new
[20:41] <perrito666> ?
[20:41] <perrito666> sorry, shift too close to enter
[20:42] <abentley> perrito666: The last revision that passed was eba6e37f
[20:43] <perrito666> abentley: thanks a lot
[20:43] <abentley> perrito666: r6dc9a588 was tested and failed, but I need to check to see whether it was the same failure mode.
[20:44] <abentley> perrito666: Silly me, that's the candidate.
[20:44] <perrito666> weird, gitk says r6dc9a588 is not part of 1.20
[20:46] <abentley> perrito666: I'm sorry, the last to pass was eba6e37f.  The way jenkins displays this is confusing.
[20:47] <perrito666> abentley: I know, dont worry, confuses me each time
[20:48] <perrito666> mm, changes from eba6e37f to head of 1.20 contain a patch mgz just reverted from master
[20:51] <wwitzel3> ping ericsnow, perrito666
[20:51] <ericsnow> wwitzel3: hey
[20:52] <wwitzel3> ericsnow, perrito666: got time for a quick meeting / standup?
[20:52] <ericsnow> wwitzel3: sure
[20:53] <wwitzel3> ok, going to moonstone
[20:59] <ericsnow> wwitzel3: sorry, thought I had joined!
[21:17] <perrito666> sorry was in another window, you guys still there?
[21:17] <ericsnow> perrito666: nope
[21:18] <ericsnow> perrito666: we didn't talk for long
[21:18] <ericsnow> perrito666: just a quick recap for Wayne
[21:18] <perrito666> well not much from me either I am fixing a bug in 1.20
[21:26] <thumper> morning
[21:27] <perrito666> thumper: morning
[21:27] <perrito666> mgz: still here?
[21:33] <mgz> perrito666: yup
[21:37] <perrito666> mgz: ah nevermind I was just curious if git revert worked for you
[21:38] <mgz> perrito666: it does, but is a little finickity
[21:38] <perrito666> mgz: I tried git revert -m 1 hash
[21:38] <perrito666> and got all kind of conflicts
[21:38] <perrito666> I really expected it to be slightly smarter
[21:39] <mgz> are you sure the -m was right?
[21:40] <perrito666> I ... I am not sure, I guess it was not given the result, I do not feel the explanation for what -m does was meant to be understood
[21:48] <perrito666> would anyone please https://github.com/juju/juju/pull/610
[21:48] <perrito666> it fixes https://bugs.launchpad.net/juju-core/+bug/1361721
[21:48] <mup> Bug #1361721: MachineSuite.TestDyingMachine failing frequently <juju-core:Triaged by hduran-8> <https://launchpad.net/bugs/1361721>
[21:49] <mgz> well if it reverted the right stuff, even with conflict pain, presumably
[21:51] <perrito666> mgz: I end up doing it by hand way easier
[21:51] <perrito666> thumper: cmars you are the ocrs
[21:51] <thumper> perrito666: no, that was yesterday :)
[21:52] <perrito666> mgz: if you append .diff to the pr in ghub it will yield the patch in plain text
[21:52] <perrito666> thumper: ah true, it is still today for me lol
[21:52] <thumper> funny, it is still today for me too
[21:52] <wallyworld_> perrito666: why does my branch break that test? the tests pass if run with reduced parallelisim so it's likely coincidental that that commit is blamed
[21:53] <perrito666> wallyworld_: well it is consistent when I run them and I believe mgz has the same results
[21:54] <wallyworld_> they pass for me locally
[21:54] <wallyworld_> and the bot or else it wouldn't have landed
[21:54] <perrito666> wallyworld_: mm, strange, they fail here and in CI
[21:55] <perrito666> wallyworld_: http://juju-ci.vapour.ws:8080/job/run-unit-tests-utopic-amd64/
[21:55] <wallyworld_> they will likely pass if the tests are run with reduced paralleism
[21:55] <wallyworld_> we have a number of tests that fail without -p 2
[21:55] <wallyworld_> that test as also failed intermittently previously
[21:56]  * perrito666 does
[21:56] <wallyworld_> my branch changes machine agent startup to write the tools version earlier in the startup, so it is hard to see how that could affect that test
[21:57] <wallyworld_> once machine agent is up and running, there's no difference
[21:58] <wallyworld_> since it only fails on utopic, it is very likely to be a timing issue, which is an issue many tests have sadly
[21:59] <wallyworld_> since we tend to use timeouts all over the place rather than channels and signals to coordinate
[21:59] <perrito666> wallyworld_: I can reproduce it with thusty
[21:59] <wallyworld_> with -p 2?
[21:59] <perrito666> wallyworld_: I am running with p2
[21:59] <perrito666> lets get coffee while we wait :p
[21:59] <wallyworld_> i have had that test fail even before my branch landed
[22:02] <mgz> wallyworld_: did you see the revert on trunk?
[22:03] <mgz> I still havent fully resolved what changed to make the bot pass borked merges, but will have it fixed
[22:04] <wallyworld_> mgz: haven't seen that revert yet, let me look
[22:08] <wallyworld_> mgz: how did the backup pr break the tests? looks very srlf contained?
[22:09] <mgz> wallyworld_: it may have actually been okay, but dimitern flagged it as dodgy as well
[22:09] <mgz> the dep borked for him
[22:10] <wallyworld_> the utils dep? how did it bork?
[22:12] <mgz> lack of filestorage package
[22:12] <mgz> may have just been a mistake, I told eric to reland if it built for him locally
[22:13] <mgz> I just wanted to back out all suspect changes as the bot had not in fact been testing them
[22:14] <perrito666> wallyworld_: tests fail with go test -test.parallel=2 github.com/juju/juju/...
[22:14] <wallyworld_> perrito666: same test?
[22:15] <arosales> wallyworld_, mgz abentley: added a comment to https://bugs.launchpad.net/juju-core/+bug/1361721
[22:15] <mup> Bug #1361721: MachineSuite.TestDyingMachine failing frequently <juju-core:Triaged by hduran-8> <https://launchpad.net/bugs/1361721>
[22:15] <perrito666> wallyworld_: exact same test
[22:16] <perrito666> golang-go                                             2:1.2.1-2ubuntu1
[22:16] <perrito666> wallyworld_: I can run any other sort of test for you if you want
[22:17] <wallyworld_> arosales: i've only just SOD, but will look into the test and try and see that the issue is, and we'll get 1.20.6 out today
[22:17] <wallyworld_> perrito666: thanks, i need to look at the test to see where it's failing
[22:17] <perrito666> wallyworld_: exact same output that in jenkins
[22:18] <wallyworld_> perrito666: and yet the test passed the bot
[22:19] <mgz> wallyworld_: you mean, you don't love us all bugging you before breakfast? :)
[22:19] <perrito666> wallyworld_: I recall mgz saying earlier that the bot was letting things pass
[22:19] <wallyworld_> we have so many tests that fail due to subtle changes in timing due to different instance types etc
[22:19] <wallyworld_> mgz: before breakfast is ok, not before coffee :-(
[22:20] <perrito666> wallyworld_: let me run one more test and I might be able to give you more info
[22:21] <wallyworld_> ok, thanks
[22:38] <wallyworld> perrito666: the test passes for me - can you try running it in isolation?
[22:39] <wallyworld> i'm running on an SSD, msny of our tests pass more often with fast i/o
[22:39] <perrito666> wallyworld: I am running in an ssd too
[22:39] <perrito666> everything in this machine is ssdish
[22:40] <perrito666> wallyworld: what do you mean in isolation
[22:40] <wallyworld> go test -gocheck.f TestDyingMachine
[22:40] <wallyworld> cd to the cmd/jujud package
[22:40] <wallyworld> and just run that one test
[22:40] <perrito666> wallyworld: running
[22:40] <perrito666> it passes
[22:41] <wallyworld> yup, so it's just another case of our tests being stupid :-(
[22:41] <perrito666> wallyworld: well the tests where written by us so...
[22:41] <perrito666> :(
[22:42] <wallyworld> perrito666: agreed. there's sooo much that needs fixing
[22:51] <perrito666> wallyworld: well I made a few tries and definitely I am not able to figure out why your patch triggers this failure
[22:51] <wallyworld> perrito666: my patch doesn't - this test has failed several times in the past before my patch
[22:51] <perrito666> wallyworld: let me rephrase
[22:52] <wallyworld> i knew what you meant, sorry :-)
[22:53] <perrito666> wallyworld: I believe that your patch somehow triggers our underlying test error, yet I cannot figure out why on the universe
[22:53]  * perrito666 reruns
[22:53] <wallyworld> and sadly, it seems to happy just on utopic on CI, on trusty elsewhere, and not for me at all
[22:54] <wallyworld> s/happy/happen
[22:55] <perrito666> that speaks so bad about the affected piece of code consistency
[22:55] <wallyworld> perrito666: if you can get it to fail, maybe try increasing the poll timeout to see if that makes a difference, just to see if the agent will eventually die or is hung
[22:56] <perrito666> wallyworld: I believe I tried
[22:56] <perrito666> wallyworld: I find somehow interesting the various log entries that state that Open has been called without addresses
[22:57]  * perrito666 swims in seas full of red herrings
[22:58] <wallyworld> s/swims/drowns
[23:02]  * perrito666 does as always in case of fish and starts the barbecue grill
[23:09] <perrito666> wallyworld: I found it
[23:09] <perrito666> MachineSuite is not properly isolated
[23:10] <wallyworld> that and several other test suites :-(
[23:10] <wallyworld> what particular issue did you find?
[23:10] <perrito666> wallyworld: by removing the tests you added to machine_test the whole suite runs
[23:10] <wallyworld> it works for me even with those tests
[23:10] <perrito666> wallyworld: well I guess Ill have to be the one to find the bug then
[23:11] <wallyworld> i'm looking into why the agent is not stopping - well it is stopping according to the logs, but the test doesn't see it
[23:12] <wallyworld> trouble is there's not enough logging
[23:12] <wallyworld> in the machine agent Run() method
[23:12] <perrito666> wallyworld: what is primeAgent?
[23:13] <wallyworld> that creates a machine and tools and sets up a machine agent
[23:14] <perrito666> I have a hunch that at some point a machine is being shared
[23:18] <wallyworld> perrito666: maybe, but the logs show the agent dying in response to the machine being marked as dead. it's just that the test doesn't find that out. and for some reason, the agent tries to start again
[23:19] <wallyworld> perrito666: would be interesting, if you can get it to fail, to add logging around these lines at the end of machine agent's Run() method
[23:19] <wallyworld> 	if err == worker.ErrTerminateAgent {
[23:19] <wallyworld> 		err = a.uninstallAgent(agentConfig)
[23:19] <wallyworld> 	}
[23:19] <wallyworld> 	err = agentDone(err)
[23:19] <wallyworld> 	a.tomb.Kill(err)
[23:19] <wallyworld> 	return err
[23:20] <perrito666> wallyworld: ok, going
[23:24] <wallyworld> perrito666: so it looks like the logic in one of the runners is not detecting that the worker is dying, and is attempting to restart everything
[23:24] <wallyworld> the agent itself correctly notices that the machine is dead, which is what the test is testing for, but the worker doesn't allow the agent to exit
[23:26] <perrito666> so this actually is a bug
[23:28] <wallyworld> well, it's a bug somewhere because the test fails when it shouldn't. but not sure where exactly
[23:28] <wallyworld> i'm guessing it's in the worker/runner infrastructure
[23:31] <thumper> davecheney: https://github.com/juju/juju/pull/605/files
[23:31] <wallyworld> func (runner *runner) run() error {   <-- this function in worker/runner.go is noticing that the runner has been stopped but then attempts to restart because it doesn't know that it's deliberate
[23:32] <davecheney> thumper: looking
[23:32] <thumper> davecheney: ta
[23:33] <thumper> waigani: https://github.com/juju/juju/pull/519 has a merge conflict with master
[23:34] <perrito666> wallyworld: whoa
[23:34] <perrito666> wallyworld: I am running it with some more logging
[23:34] <waigani> thumper: thanks, looking
[23:34] <wallyworld> perrito666: it appears the logs are missing the 'killing "api"' line which means that the api work is not being killed like it should
[23:35] <wallyworld> that's the worker that is then being erroneously restarted
[23:35] <perrito666> wallyworld: good catch
[23:35] <perrito666> looking
[23:36] <wallyworld> perrito666: func killWorker(id string, info *workerInfo) {   <---- if this is not called, then info.start is not set to nil, so when the worker terminates, it will just be restarted again
[23:36] <wallyworld> which is not what we want
[23:39] <perrito666> wallyworld: I am quite close to call you
[23:44] <wallyworld> i introduced a deliberate error into the test and compared the logs - it seems when it fails, the "deployer" worker is not killed as it shold be'
[23:47] <wallyworld> hmmm, but that's because the deployer is not started
[23:49] <thumper> davecheney: still happy with that? If so, I'll merge it (when landing unblocked)
[23:51] <davecheney> thumper: lgtm
[23:51] <davecheney> minor gripes
[23:51] <davecheney> but lgtm
[23:54] <thumper> what are the gripes?
[23:55]  * thumper looks at that test
[23:55] <davecheney> thumper: that's my only comment
[23:55] <davecheney> everything else looks good
[23:56] <perrito666> thumper: well in spanish is the plural for the flu
[23:56] <davecheney> perrito666: lol
[23:57] <perrito666> thumper: also your last name pronounced as read in spanish means penis :p </end of trivia>
[23:58] <thumper> perrito666: yea, back to high school
[23:58] <perrito666> thumper: actually It triggered a very weird look from my wife when I told her your name when I was chatting with you the other night
[23:58] <perrito666> and then I realized