[00:19] <davecheney> fwereade: thumper https://github.com/juju/juju/pull/362
[00:30] <perrito666> fwereade: ping 2
[01:53] <davecheney> thumper: fwereade http://en.wikipedia.org/wiki/.local
[01:53] <davecheney> http://en.wikipedia.org/wiki/.local
[01:53] <davecheney> Networking device hostnames ending with .local are often employed in private networks, where they are resolved either via the multicast domain name service (mDNS) and/or local Domain Name System (DNS) servers.
[01:55] <wallyworld> axw: if you have a moment, could you give https://github.com/juju/juju/pull/361 the once over as well? i've tested live on hp cloud with juju-gui deployed to ensure gui is updated as services are added/removed
[01:55] <axw> wallyworld: sure, looking
[01:57] <wallyworld> thanks
[01:57] <menn0> fwereade: https://github.com/juju/juju/pull/364
[01:59] <fwereade> axw, commented quickly on your doc, thoughts?
[02:01] <axw> fwereade: I was trying to avoid timers, because it seems unnecessary. we already have timer based pinging on the API, so we should be able to use connection liveness to monitor presence
[02:01] <axw> fwereade: since the state server is a machine agent, it will have an entry in the presence map too
[02:01] <axw> I just meant that we'll need to propagate that into state somehow
[02:02] <fwereade> axw, ok, cool -- I'm mainly just saying that I don't think we need to add to the peergrouper
[02:02] <axw> because EnsureAvailability needs something to decide whether a state server is healthy
[02:02] <axw> fwereade: something needs to monitor mongod's health though right? doesn't need to be the peergrouper, but some worker?
[02:02] <fwereade> axw, the api servers will be keeping track of their own connected clients, and those forwarded from other agents, and we may as well pipe that same info into state in the same sort of way we do for environs
[02:03] <fwereade> axw, do we specifically need to monitor mongo's health if we can monitor state server health?
[02:03] <axw> fwereade: no, it was just something rogpeppe and I wanted to do originally, because mongod and jujud could independently die
[02:03] <axw> could be deferred
[02:04] <fwereade> axw, that's true, but I worry it's an overcomplication in this specific context
[02:04] <axw> let's strike that out for now, it's strictly an improvement over what we've got now anyway
[02:05] <fwereade> axw, fwiw, if we were doing it, I feel we could probably keep it local to the individual state servers -- if they detect problems with their local mongo, they can mark themselves messed-up
[02:05] <axw> fwereade: I suppose they could just sever their connection to the other state servers
[02:09] <cmars> tasdomas, thumper: http://paste.ubuntu.com/7839596/
[02:17] <axw> fwereade: I feel like I must've missed something, because the existing implementation is quite complicated. were there previously other use cases?
[02:34] <axw> wallyworld: phew. lgtm
[02:34] <wallyworld> axw: yeah, sorry :-)
[02:34] <axw> :)
[02:34] <wallyworld> axw: i just found another collection in the watcher that was missed
[02:34] <axw> oops
[02:34] <wallyworld> so i'm retesting
[02:35] <wallyworld> axw: also, we can change oplog max size to 5GB
[02:35] <axw> okey dokey
[02:35] <wallyworld> but need to test a real deployment with a deploy -n X
[02:35] <axw> sure, I'll do some testing
[02:35] <wallyworld> i'm trying to use aws but keep running out of instances
[02:35] <axw> oh
[02:35] <axw> you're on it?
[02:36] <wallyworld> yeah, but feel free to test on another platform
[02:36] <wallyworld> but i think we'll be ok
[02:36] <axw> I'll try with Azure, but I can't see it being a problem
[02:36] <wallyworld> me either
[02:38] <axw> wallyworld: did you just change maxOplogSizeMB in mongo/prealloc.go?
[02:38] <wallyworld> axw: actually, i'll land the io timeout one and then will need to backport to 1.20 (will be messy), so if you could do the branch to reduce the max oplog size that would be great
[02:38] <wallyworld> yup
[02:38] <axw> sure
[02:39] <wallyworld> there's been a lot of change in trunk in the same areas as touched by the collection stuff
[02:40] <axw> fun :(
[02:41] <axw> wallyworld: has anyone managed to look at a production juju installation's oplog?
[02:41] <wallyworld> axw: i think kapil did in the bug report
[02:42] <wallyworld> and nate was going to ask in #mongo for advise
[02:42] <wallyworld> but i need to ping him cause i haven't hard back
[02:42] <wallyworld> apil wanted the size reduced also
[02:42] <wallyworld> k
[02:43] <axw> hmm it shows the cap size, but not usage
[02:44] <wallyworld> yeah, i'm not sure we have that data
[02:46] <axw> err, azure simplestreams is busted
[02:47] <axw> wallyworld: do I poke IS or what?
[02:47] <axw> there's only trusty images for China
[02:47] <wallyworld> oh :-(
[02:47] <axw> in the index
[02:48] <wallyworld> sigh, yeah poke #is
[02:48] <wallyworld> they were meant to put tests in place to catc this
[02:48] <wallyworld> i gave them the tools
[02:49] <wallyworld> thumper: yo
[02:49] <wallyworld> or menn0
[02:50] <menn0> wallyworld: yep?
[02:50] <wallyworld> fresh deploy of a 1.21alpha1 environment, dpeloyed gui, all looks ok, but
[02:50] <wallyworld> machine-0: 2014-07-23 02:49:28 ERROR juju.worker runner.go:218 exited "upgrade-steps": unexpected quit
[02:50] <wallyworld> machine-0: 2014-07-23 02:49:28 INFO juju.worker runner.go:252 restarting "upgrade-steps" in 3s
[02:50] <wallyworld> log is full of the above
[02:51] <menn0> hmmm
[02:51] <wallyworld> i can ssh in and poke around. have you seen that?
[02:51] <wallyworld> this is on aws
[02:51] <menn0> can I get the full log?
[02:51] <menn0> also, where can I get the code for that release? I'd like to see what made it in.
[02:52] <wallyworld> menn0: this is trunk
[02:52] <menn0> ok
[02:52] <wallyworld> i'll pastebin the log
[02:53] <menn0> built when? this code has changed quite a bit over the past 3 days (one change is testing for merge right now)
[02:53] <axw> wallyworld: hloeung says the server team manages the index, so I guess I'll just email Ben Howard?
[02:53] <wallyworld> axw: yep, and cc scott moser
[02:54] <menn0> wallyworld: I will try to repro
[02:54] <wallyworld> menn0: i built it just before. i have my own changes in there concerning copying sessions when talking to mongo, so it's possible that make be involved, but everything else works ok
[02:54] <wallyworld> by in there, i mean in state
[02:55] <wallyworld> no the upgrade code itself
[02:55] <wallyworld> the changes are below the waterline
[02:56] <menn0> wallyworld: ok. let me have a dig.
[02:58] <axw> best merge directive evar
[02:59] <davecheney> waigani: http://www.shag.com/
[02:59] <davecheney> something tells me that he wouldn't have his art reprodued on a 3 buck sticker sold on ebay
[03:02] <wallyworld> axw: say wot
[03:02] <axw> wallyworld: https://github.com/juju/juju/pull/364#issuecomment-49827041
[03:03] <wallyworld> omfg
[03:03] <wallyworld> trust menn0 to be a smartarse
[03:03] <wallyworld> well played, sir
[03:04] <axw> wallyworld: how many instances did you get up to on ec2?
[03:04] <wallyworld> axw: about 10 or so
[03:04] <wallyworld> and then the account ran out
[03:05] <wallyworld> very slow
[03:05] <axw> ok
[03:05] <axw> ah, there may be trusty images in the daily stream for azure... will try that
[03:06] <wallyworld> ok
[03:20] <davecheney> who wants to be mean to me ? https://github.com/juju/juju/pull/365
[03:22] <davecheney> gentle ping, https://github.com/juju/juju/pull/362
[03:22] <axw> wallyworld: actually, bootstrapping azure won't tell us much. it's root disks aren't that big (25G), so the oplog isn't large anyway
[03:22] <axw> wallyworld: did you create a large root-disk on ec2?
[03:23] <wallyworld> axw: no. but the idea was to see if a smaller oplog could handle a stress test with large numbers of units etc deployed
[03:23] <axw> wallyworld: ok, I see
[03:23] <wallyworld> if the smaller one could handle it, then 5GB will be plenty
[03:23] <axw> fair enough
[03:24] <davecheney> i think maas is unique in the fact it gives you a large /
[03:24] <davecheney> 'cos you get 100% of the underlying machine
[03:24] <axw> I thought you could request whatever you want with ec2?
[03:24] <axw> one of the cloud providers does that
[03:25] <wallyworld> davecheney: even if te disk is large, i don't get why it's not partitioned to have a smaller /
[03:25] <davecheney> wallyworld: maas is web scale
[03:25] <menn0> wallyworld: fwereade and I are going to change Runner slightly so that a worker doesn't have to wait for the stop channel to close if it wants to exit without error
[03:25] <menn0> wallyworld: this is the source of that problem
[03:26] <wallyworld> ok, sounds good
[03:26] <menn0> wallyworld: we thought that had already been done but it wasn't (we should have checked more closely...)
[03:26] <wallyworld> np, easy enough to fix :-)
[03:27] <menn0> wallyworld: at any rate, it's only log spam and shouldn't have any real adverse affects otherwise. the upgrade-steps worker will be getting restarted but then exiting immediately over and over.
[03:27] <wallyworld> yeah, that's what i saw
[03:29] <wallyworld> axw: yay, just merged trunk version of the copy session stuff back to 1.20, soooo many conflicts \o/
[03:30] <axw> fun times ahead
[03:30] <wallyworld> yeah, i so want to get this comitted today so we can build a 1.20.2 rc
[03:54] <thumper> wallyworld: oh hai
[03:54] <wallyworld> yo
[03:54] <thumper> wallyworld: I have a question for you too
[03:54] <wallyworld> wasn't me
[03:55] <thumper> wallyworld: I'm looking at the disk configstore, and wanting to remove the "create an empty file" bit
[03:55] <thumper> wallyworld: fwereade said to ping you because you have have experience with this
[03:55] <wallyworld> i do?
[03:55] <thumper> and he vaguely remembered talking to you about it
[03:55] <thumper> but if you didn't remember, then to ignore it
[03:55] <wallyworld> let me look at the code
[03:56] <axw> damnit
[03:57] <axw> wallyworld: I just found a bug in azure, load balancing is broken
[03:57] <axw> possibly just on trunk, will need to verify with 1.20
[03:57] <wallyworld> thumper: yeah, i must have been on crack cause i don't recall the conversation
[03:57] <thumper> wallyworld: ok, cool
[03:57] <wallyworld> axw: well at least you found it
[04:15] <davecheney> https://github.com/juju/schema/pull/3
[04:15] <davecheney> ^ anyone? anyone? Beuller ?
[04:16] <axw> wallyworld: 3 state servers, 10 ubuntu units deployed = 2.34MB oplog over 1 hour
[04:16] <davecheney> axw: nice
[04:16] <axw> mostly quiescent tho
[04:16] <wallyworld> axw: will 5GB should be *plenty*
[04:16] <wallyworld> well
[04:16] <axw> indeed
[04:16] <wallyworld> thanks
[04:19] <axw> wallyworld: I think that change I made to API host ports doesn't actually stop the oplog from getting spammed
[04:20] <axw> seems the assertion still gets inserted into the transaction log
[04:20] <wallyworld> ah
[04:20] <wallyworld> i wondered about that
[04:20] <axw> so it's either racy or spammy
[04:20] <wallyworld> whether assert noops got logged
[04:21] <wallyworld> given the frequency of change, it's ok to be "spammy"
[04:21] <wallyworld> imo
[04:21] <wallyworld> since it's not really that spammy anyway
[04:21] <wallyworld> luckily mongo is web scale :-)
[04:24] <wallyworld> axw: did you forward port the api host ports race fix to master?
[04:25] <axw> wallyworld: yep
[04:25] <wallyworld> ok, that bit of code is coming up as a conflict doing my backport
[04:25] <wallyworld> and it appeared to be showing the old code in master, but like my mistake
[04:26] <davecheney> thumper: while you're slcking off,  https://github.com/juju/juju/pull/362,  https://github.com/juju/juju/pull/365
[04:28] <axw> wallyworld: where did the 5GB number come from?
[04:29] <wallyworld> axw: of of thin air. greater than 1GB but an order of magnitude less than 50GB
[04:29] <wallyworld> we use max 1GB for local provider
[04:29] <axw> we use 1MB for local
[04:29] <wallyworld> ah, ooops
[04:29] <wallyworld> i thought it was G
[04:29] <wallyworld> so maybe 1GB is sufficient
[04:29] <axw> I think so
[04:29] <wallyworld> ok
[04:30] <wallyworld> now that there's some numbers
[04:30] <waigani> axw: hello :)
[04:30] <axw> waigani: ahoy
[04:30] <menn0> wallyworld: btw, fix for that upgrade-steps problem on the way: https://github.com/juju/juju/pull/366
[04:30] <waigani> axw: I've got a question - let's see how good your memory is
[04:31] <wallyworld> menn0: awesome, tnaks
[04:31] <jam> axw: if you wanted a worst case, you could use the xplod charm
[04:31] <waigani> axw: https://codereview.appspot.com/70190050/diff2/20001:120001/state/state.go
[04:31] <jam> instead of ubuntu
[04:31] <waigani> axw: I'm trying to remember while I added additionalValidation
[04:32] <jam> axw: xplod
[04:32] <jam> axw: xplod
[04:32] <axw> ? :)
[04:32] <jam> copy & paste fale
[04:32] <jam> fail
[04:32] <jam> https://code.launchpad.net/~jameinel/charms/precise/peer-xplod/peer-xplod
[04:32] <axw> thanks
[04:32] <axw> I'll give that a shot
[04:32] <jam> axw: when you add units, they start a peer chatter amongst them
[04:32] <jam> each one tries to increment a number and report it back to the rest of the peers.
[04:33] <jam> axw: also, as it is a simple charm, you can "juju deploy --to 1" to get more of them
[04:33] <jam> (well add-unit --to 1)
[04:33] <jam> axw: though the fslock means they won't really scale super huge on one machine
[04:33] <axw> jam: cool, thanks
[04:34] <axw> waigani: umm
[04:34] <waigani> axw: yeah hehe
[04:34]  * axw greps for uses of UpdateEnvironConfig
[04:34] <waigani> axw: none
[04:34] <jam> davecheney: why no uppercase letters (A-F) ?
[04:35] <waigani> axw: fwereade has suggested removing it
[04:35] <waigani> axw: but just wanted to double check that there wasn't a good reason to keep it
[04:35] <axw> waigani: not none
[04:35] <waigani> oh?
[04:35] <axw> apiserver/client/EnvironmentSet
[04:35] <fwereade> waigani, there's one test that uses it to fuck things up creatively
[04:35] <fwereade> waigani, axw: and, yeah, we do need that functionality
[04:35] <axw> there's a bit of code in there that checks agent-version isn't set
[04:36] <fwereade> waigani, axw: the fact that the test *can* fuck it up is evidence the method itself is a bit broken
[04:36] <davecheney> jam: 'cos that is what we say is a uuid
[04:36] <fwereade> waigani, axw: but it's gradually been becoming less os over time which is nice
[04:36] <davecheney> jam: is that my old xplod charm ?
[04:49] <jam> davecheney: I adapted yours and put a peer relation in it to make it easier to use
[04:50] <jam> you still get N^2 by just adding units
[04:50] <axw> jam: stupid question, is there a charm URL to deploy that directly?
[04:50] <axw> or do I need to fetch it and local:
[04:50] <jam> axw: not that I know of. branch it locally and then --repository local:
[04:50] <axw> okey dokey
[04:52] <davecheney> jam: be careful
[04:52] <davecheney> that charm for my ec2 credentials locked out for abuse :)
[04:53] <jam> davecheney: we fixed that bug :), and I've used it on EC2 and scaled up to about 200*10 units
[04:53] <jam> (more than 100 doesn't help because of CPU bound on the individual machines)
[04:53] <axw> why did it get you locked out?
[04:53] <jam> axw: we used to have a bug where every hook
[04:53] <jam> would call a Provider
[04:54] <jam> would make a Provider call (for the API server IP addresses)
[04:54] <jam> so hundreds of calls per second?
[04:54] <axw> I see :)
[04:54] <davecheney> yup
[04:54] <davecheney> made it hard to kill that environment ...
[04:54] <jam> davecheney: ai
[04:54] <jam> aiui, you ran into rate limiting first, which meant you couldn't kill it
[04:54] <jam> which let it run away until they shut it down for you :)
[04:55] <davecheney> couldn't even use the aws console
[04:55] <davecheney> as your quote counts towards that api as well
[04:55] <davecheney> (probably calls the same endpoint under the hood)
[05:00] <jam> davecheney: I wonder if you used IAM credentials you could get just the one IAM account locked out, and still access it as your super users.
[05:01] <davecheney> dunno
[05:37] <axw> jam, wallyworld: thoughts on https://bugs.launchpad.net/juju-core/+bug/1344940/comments/17 ?
[05:37] <_mup_> Bug #1344940: Juju state server database is overly large <canonical-is> <cloud-installer> <landscape> <mongodb> <juju-core:Triaged by axwalk> <juju-core 1.20:Triaged by axwalk> <https://launchpad.net/bugs/1344940>
[05:38] <wallyworld> axw: i think 1GB to be safe
[05:38] <wallyworld> that's imo
[05:41] <axw> wallyworld: I'll decrease min to 512MB, and max to 1024MB
[05:41] <wallyworld> ok, sounds good
[05:41] <axw> and if mark doesn't like it, we'll take it from there
[05:44] <jam> axw: sounds good to me, there is a lot of "how out of date can a replica get and come back without a full sync". If we want to support 1 day? 5 hours? all that stuff is pretty arbitrary
[05:44] <axw> yeah
[05:44] <axw> I think 24 hours would be reasonable, but... I'm not an ops guy
[05:45] <jam> axw: again, it depends a lot on how big the actual DB becomes, to determine how long a full sync is going to cost
[05:45] <jam> again, our estimates are that it isn't that bad
[05:45] <wallyworld> axw: do you know why "git diff X...Y" works but "git merge X...Y" doesn't?
[05:45] <jam> axw: it would matter more if you had a 1TB database, and churn of less that 1GB/day, doing a full sync would be painful
[05:45] <axw> wallyworld: I don't
[05:46] <wallyworld> :-(
[05:46] <jam> wallyworld: git cherrypick, IIRC
[05:46] <jam> wallyworld: git cherry-pick -h
[05:46] <wallyworld> jam: i tried that but it brings in other stuff that doesn't show up in the diff
[05:47] <jam> wallyworld: so you can always do "git dif X..Y | patch -p1"
[05:47] <wallyworld> tries that but it says the patch can't apply
[05:47] <wallyworld> although i didn't do the -p1
[05:47] <jam> wallyworld: p1 matters, because they put "a/" and "b/" prefixes
[05:48] <jam> you need to be in the root as well
[05:48] <jam> it may be that conflicts/moved code/etc means a patch won't do a good job
[05:48] <jam> *I* would try cherry-pick as that is the intended git method (AFAICT)
[05:49] <wallyworld> jam: yeah, but for some reason, although the diff shows as correct, cherry-pick was bringin in revs i didn't see in the diff, and aborts partr way through
[05:49] <wallyworld> i just want it to apply whats in the diff and let me resolve merge conflicts after
[05:49] <axw> wallyworld: I've just been listing the commits explicitly in "git cherry-pick" and that worked fine, FWIW
[05:50] <axw> I don't tend to have a lot of commits in a PR though
[05:50] <axw> https://github.com/juju/juju/pull/368
[05:50] <wallyworld> i just want to take the difference between master and a feature branch and backport
[05:50] <axw> oplog moar smaller
[05:50] <wallyworld> nfi why this is so easy in bzr and so hard in git
[05:53] <jam> wallyworld: http://stackoverflow.com/questions/449541/how-do-you-merge-selective-files-with-git-merge seems to recommend rebasing until you get clean commits, then cherrpicking them...
[05:54] <wallyworld> jam: thanks will look. i've not had much luck with rebasing in the past., will try again
[05:54] <jam> wallyworld: the other approach is specifically rebasing the desired revisions into the target
[05:54] <wallyworld> so many revisions i'd rather just apply trhe whole diff and resolve conflicts
[05:54] <jam> and then remove the revisions you don't want merged
[05:55] <wallyworld> i can't see why diff works and I can't just use "merge" in place of "diff" in the command line
[05:55] <jam> wallyworld: patch can't handle merge conflicts, because it only has 2 inputs (you need 3 with a diff3 approach). I can think of how you could do it manually (checkout the common ancestor, checkout both merge tips, etc.)
[05:55] <jam> wallyworld: I don't understand git merge internals
[05:55] <jam> it may refuse to do merges that don't include the whole DAG which is what you are trying to do.
[05:55] <jam> they have cherrypick for that, but it only seems to support single revs
[05:55] <wallyworld> tl;dr; it sucks
[06:15] <axw> wallyworld: gotta go pick up my daughter, but I just confirmed that the azure bug is present in 1.20.1 too
[06:15] <axw> I've added it to the milestone
[06:15] <axw> bbs
[06:15] <wallyworld> axw: np, thanks, i have to head out for a bit as well
[07:08] <jam> cmars: are you still around?
[07:08] <jam> I'd like to discuss your plans with https://github.com/juju/juju/pull/367
[07:24] <TheMue> morning
[07:53] <Egoist_> Hello
[07:54] <Egoist_> How to set in environments.yaml for openstack to use only one security group?
[08:28] <axw> jam: I'm guessing wallyworld's branch has broken the tests, but not really sure
[08:28] <axw> it touched a lot of statey things
[08:28] <axw> my branch is failing the tests too
[08:30] <niedbalski> axw, thanks for the reply on the thread.
[08:30] <axw> niedbalski: hey nps. not sure if that helped at all
[08:31] <axw> hopefully gives some context tho
[08:31] <niedbalski> axw, yep, i needed some context
[08:40] <rogpeppe1> axw: hiya
[08:40] <axw> rogpeppe1: howdy
[08:40] <rogpeppe1> axw: how's tricks?
[08:40] <axw> not too shabby
[08:40] <axw> and with you?
[08:41] <rogpeppe1> axw: pretty good. currently in london for the gui sprint
[08:41] <rogpeppe1> axw: doing charm store stuff
[08:41] <rogpeppe1> axw: i was just wondering if you had any opinions about charm.Reference
[08:41] <axw> rogpeppe1: I don't know what it is, so nope ;)
[08:42] <rogpeppe1> axw: ah, ok. i think it has to die :-)
[08:42] <rogpeppe1> axw: but i'd like to speak to someone who might've been originally involved in creating it
[08:43] <axw> rogpeppe1: looks vaguely related to the changes made not too long ago to support cs:<charm> without specifying series
[08:43] <axw> and having the charm store tell us which one to use
[08:43] <rogpeppe1> axw: yeah
[08:44] <rogpeppe1> axw: the problem is that it makes it impossible to have a bunch of charm urls, some of which specify the series and some of which do not
[08:45] <axw> rogpeppe1: looks like cmars is the man to talk to
[08:45] <rogpeppe1> axw: unfortunately he's also sprinting... in NZ
[08:45] <axw> sorry, out of my domain
[08:45] <axw> ah :(
[08:45] <rogpeppe1> axw: np
[09:21] <bodie_> https://github.com/juju/juju/pull/370 fwiw --
[09:22] <bodie_> something's wacky and the sun is coming up, so I'm putting this one down for now, but if anyone knows about how the cmd.out.Write method handles maps and --format=json, that insight would be valued
[10:46] <TheMue> jam: I’m in da house
[10:52] <jam> TheMue: for some reason pidgin isn't beeping at me
[11:26] <axw> wallyworld: there's a "Revert" button on merged PRs if you didn't realse
[11:26] <axw> realise*
[11:26] <axw> no need to do a reverse PR or anything
[11:32] <wallyworld> axw: ah, rightio, thanks, didn't see that
[11:35] <mgz> but reverse prs are fun...
[11:36] <mgz> (also, I'm not actually sure how much I trust github's merge algo stuf...)
[12:01] <wallyworld_> mgz: katco: axw: sorry, network problems, be there soon
[12:02] <mgz> wallyworld_: no probs
[12:26] <perrito666> good morning
[12:26] <mgz> hey perrito666
[12:32] <natefinch> I'm going to be out for a bit, have to take my 1 year old to the doctor.  Poor thing has had a 102.5° (39°C) fever for 36 hours.
[12:33] <perrito666> ouch natefinch best of lucks with that
[12:34]  * natefinch need a bot to just automatically convert F/miles to C/km for the rest of the non-US (aka non-backwards) world
[12:38] <katco> natefinch: gosh hope he starts feeling better nate
[13:35] <perrito666> I am deleting and re-creating a folder in a given function, how can I test that the new folder is not the same, I tried FileInfo's ModTime, but the deletion/creation happens too fast for ModTime to change and I really do not think this justifies adding a sleep of 1 sec in the middle, ideas are welcome
[13:38] <mgz> inode?
[13:39] <katco> perrito666: i really hate to rely on sleeps; they so non-deterministic. i like mgz's idea
[13:39] <perrito666> katco: exactly I am trying to find an alternative
[13:40] <mgz> you'll have to skip on windows but that's not too bad
[13:40] <perrito666> mgz: not an issue this is restore
[13:40] <perrito666> wow os.FileInfo could really have more information
[13:56] <hackedbellini> natefinch: hi! Is there an eta for when I will be able to update to that new version that will correct the issue I'm having?
[13:57] <katco> hackedbellini: i believe natefinch is out at the moment, taking care of a sick child.
[13:58] <hackedbellini> katco: ahhh, ok. np, will talk to him later so :)
[14:30] <natefinch> back
[14:30] <perrito666> natefinch: how was it?
[14:31] <natefinch> perrito666: possibly an ear infection.  Not conclusive, but possible, so she's on antibiotics.
[14:31] <ericsnow> natefinch: yuck
[14:32] <ericsnow> natefinch: hope she feels better soon!
[14:32] <natefinch> Thanks
[14:33]  * TheMue always felt bad when the kids have been ill. today the kids feel bad when I’m ill. :D
[14:34]  * perrito666 has no kids but knows a lot about ear pain
[14:36] <TheMue> I thankfully seldom had, but the kids from time to time.
[14:37] <TheMue> …oooOOO( Or at least I cannot remember if I had it as kid too sometimes. )
[14:44] <katco> have we ever discussed using the irc bot to do CI notifications in the room? maybe just failures or other important things?
[14:45]  * perrito666 upgrades ISP to double speed for U$D3/month
[14:45] <perrito666> I love promotional loopholes
[14:45] <natefinch> katco: no, but it's a good idefa
[14:45] <natefinch> idea
[14:46] <katco> what bot are we using?
[14:47] <natefinch> perrito666: I called my ISP about upgrading, and they said "Oh, you have 50/25 right?"  And I was like, no I have 70/35.  And they said "you can't have that, we don't offer that".  Well, actually you do/did due to a promotion.  But thanks for being a jerk.
[14:48] <natefinch> katco: no idea  bug #1347715
[14:48] <_mup_> Bug #1347715: Manual provider does not respond after bootstrap <bootstrap> <ci> <regression> <juju-core:Triaged> <https://launchpad.net/bugs/1347715>
[14:48]  * natefinch wasn't sure if the bot said what it was... 
[14:49] <natefinch> I've been using IRC for approximately 360 days, so I'm not really the one to ask :)
[14:49] <perrito666> natefinch: what was the question?
[14:49] <katco> natefinch: it did. were you just posting that bug as an example, or are you wanting me to look at that particular one?
[14:49]  * perrito666 got upgraded but also disconnected
[14:50] <natefinch> katco: just an example. First one that I found
[14:51]  * natefinch wasn't sure if mup was the bot's name or just the name we gave it to post as
[14:51] <katco> natefinch: probably a name someone gave
[14:51] <katco> i'll post something to juju-dev
[14:51] <natefinch> mgz probably knows
[14:52] <mgz> mu p~muo mu?
[14:52] <perrito666> ah, iirc, jenkins has a bot builtin or can have one
[14:52] <perrito666> I had that working for a project once
[14:52] <katco> perrito666: yeah it does
[14:52] <katco> perrito666: it's a plugin i think
[14:52] <perrito666> very annoying thing
[14:52] <perrito666> a lot of finger pointing
[14:52] <katco> lol
[14:53]  * katco currently waiting for jenkins to fail so she can resubmit
[14:53]  * katco would rather not keep flipping to the jenkins console
[14:53] <hackedbellini> natefinch: hi! You weren't here before, so I'll resend the question I've made earlier =P
[14:53] <hackedbellini> natefinch: hi! Is there an eta for when I will be able to update to that new version that will correct the issue I'm having?
[14:55] <natefinch> hackedbellini: I'll get the fix in today, and then I think we're making a cut at end of week anyway, so.... early next week?
[14:55] <mgz> mup is supybot, but there's a launchpad project somewhere with stuff in
[14:56] <katco> mgz: while you're here... is there any way i can kill a jenkins job i know will fail?
[14:56] <mgz> katco: only with admin access... so you can poke me
[14:57] <katco> mgz: #103
[14:57] <mgz> the current one?
[14:57] <katco> yeah
[14:57] <mgz> done
[14:57] <katco> the command i used to run my tests locally skipped over one that's failing
[14:57] <katco> ty sir
[14:57] <mgz> you need a little magic to requeue it
[14:58] <katco> mgz: oh, i just did $$merge$$
[14:58] <mgz> either craft your won github comment with "merge failed: " in it, or get me to requeue via jenkins
[14:58] <katco> the second $$merge$$ seemed to work... huh
[15:01] <perrito666> natefinch: standing thinguie
[15:01] <mgz> katco: oh, you got a tests failed message
[15:01] <katco> mgz: is that surprising?
[15:02] <hackedbellini> natefinch: ok, no problem! I'm just anxious for that fix because without it my juju is dead =P
[15:02] <hackedbellini> let me know if you need me to test anything.
[15:02] <natefinch> hackedbellini: understandable
[15:02] <natefinch> perrito666: ok
[15:03] <mgz> katco: if I got the abort in, that wouldn't happen - might have finished anyway or something
[15:03] <katco> ahhh ok
[15:03] <katco> i follow now; that's why i would have had to craft the failure manually
[15:05] <TheMue> aaaaah *jump *jump* *jump* lxc-ls now shows an ipv6 address for a container
[15:05] <TheMue> sadly had to set it internally by hand, it is not set during deployment
[15:24] <perrito666> ericsnow: what is the whole cleanup thing https://github.com/juju/juju/pull/334/files#diff-076396fa7fd3f93945528111df2d8319R48 ?
[15:25] <ericsnow> perrito666: it's so that we don't leave any empty file behind in case of an error
[15:25] <perrito666> I meant the cleanup variable
[15:26] <ericsnow> perrito666: that could definitely use a comment
[15:26] <perrito666> it seems to me that you only delete when there is an error?
[15:26] <ericsnow> perrito666: that's right
[15:26] <perrito666> ericsnow: you can name the error return and check it
[15:27] <perrito666> since deferred call is after the function exit
[15:27] <ericsnow> perrito666: ah, good point
[15:28] <ericsnow> perrito666: note that in the review and I'll take care of it
[15:29] <perrito666> will do, just wanted to make sure I understood your intentions correctly
[15:29] <ericsnow> perrito666: you do
[15:29] <ericsnow> perrito666: and thanks :)
[15:30] <perrito666> np
[15:32] <perrito666> does anyone know what is the possible error output of filepath.ABS ?
[15:33] <perrito666> oh I see
[15:34] <natefinch> anyone familiar with the upgrade logic?
[15:36] <natefinch> mgz ^^ ?
[15:37] <mgz> hmm. not very
[15:37] <perrito666> sinzui: Ill take https://bugs.launchpad.net/juju-core/+bug/1342937
[15:37] <_mup_> Bug #1342937: Juju restore  fails Could not get lock /var/lib/dpkg/lock <backup-restore> <ci> <regression> <juju-core:Triaged> <https://launchpad.net/bugs/1342937>
[15:37] <sinzui> thank you perrito666
[15:37] <natefinch> sinzui: I'm working on #1342725
[15:37] <_mup_> Bug #1342725: C:/Juju/lib/juju/nonce.txt does not exist, bootstrap failed in win <ci> <regression> <windows> <juju-core:Triaged> <https://launchpad.net/bugs/1342725>
[15:38] <natefinch> I had most of a fix, but it conflicts with the windows changes gsamfira is doing, so I'm going to grab the fix from his code.
[15:50] <jcw4> perrito666: https://github.com/juju/juju/pull/351 is updated... if you get a chance I'd appreciate a look
[15:59] <perrito666> jcw4: will do
[16:00] <jcw4> tx perrito666
[16:01] <perrito666> jcw4: interesting test https://github.com/juju/juju/pull/351/files#diff-7195d3d7d4a41d504d4c75799ca3e540R342
[16:04] <jcw4> perrito666: I knew I couldn't get anything past your eagle eye
[16:04] <jcw4> I remembered I hadn't done that test after merging in master and didn't want to make more changes... I suppose I'll have to now
[16:04] <jcw4> :)
[16:04]  * natefinch tries desperately to get his logs above 1 meg
[16:06] <natefinch> yay, log rotation
[16:07] <jcw4> natefinch: yay
[16:07]  * jcw4 assumes thats a good thing
[16:07] <jcw4> :)
[16:07] <natefinch> haha yes
[16:07] <perrito666> natefinch: if you have anything using mysql turn on debug mode
[16:07] <perrito666> that should do
[16:07] <perrito666> :p
[16:07] <natefinch> heh
[16:08] <natefinch> my log rotation package lets you specify a max size, but it's in megabytes, which is a surprisingly large amount of plaintext
[16:09] <natefinch> -rw-r--r-- 1 root   root   1.0M Jul 23 12:06 machine-0-2014-07-23T16-06-26.278.log
[16:09] <natefinch> -rw-r--r-- 1 root   root   288K Jul 23 12:07 machine-0.log
[16:16] <wwitzel3> natefinch: I thought we were just using logrotate?
[16:17] <natefinch> wwitzel3: logrotate won't work on windows
[16:18] <wwitzel3> hah, dumb, didn't think of that
[16:18] <natefinch> wwitzel3: and we weren't using anything.  I had investigated logrotate, and it probably would have been ok, but honestly probably more work than using a pure-go solution (which also happens to be portable to other OSes)
[16:25] <wwitzel3> natefinch: makes sense
[16:53] <natefinch> review for whoever https://github.com/juju/juju/pull/375
[17:08] <katco> backport for v1.20: https://github.com/juju/juju/pull/376
[17:15] <katco> natefinch: would we ever consider qualifying our variable names to include size units? e.g.: MaxSizeInMB?
[17:18] <natefinch> katco: I wouldn't.... I might consider making a custom type like type Megabyte int   so that you can't accidentally convert from one to the other.
[17:19] <katco> i still like axw's idea to duplicate the time package's implementation for size units
[17:19] <katco> but i wouldn't mind more discussion on the naming; what do you dislike about that technique?
[17:19] <natefinch> katco: I thought of that, and actually did that for v1 of lumberjack (had Megabyte = 1024*1024  and Gigabyte as 1024*Megabyte)
[17:20] <natefinch> katco: the problem that came up was deserializing from a config file.... the config file can't use those constants, and can't just use 1024*1024, so it ended up needing to be like MaxSize: 100000000
[17:21] <katco> i wonder how serialization works with the time package
[17:21] <natefinch> katco: plus it just wasn't necessary to let people specify log size down to the byte.  no one's going to want a log that rolls over at 54 bytes or something
[17:21] <katco> natefinch: no, agreed. it's all about clarity.
[17:21] <natefinch> katco: there are custom serialization formats
[17:22] <katco> i.e.: if i see MaxSize, i need to go look at the declaration to know what the unit is
[17:22] <katco> which is kind of annoying
[17:22] <natefinch> katco: yeah, I know
[17:22] <katco> hence, MaxSizeInMB
[17:23] <natefinch> it seems wrong.... like hungarian notation... but I can't explain why
[17:24] <katco> hungarian notation was wrong b/c it caused tons of churn all over the codebase if you ever changed the type. i suppose you could have the same issue here
[17:25] <katco> i guess the difference is that hungarian notation was trying to represent the syntax in the name, and this would be representing the value in the name; something appropriate for a variable name, and probably OK to cause refactoring since you would be evaluating all the places it's used anyhow
[17:26] <natefinch> It's also the kind of thing you only ever use once in your application and then never touch again
[17:27] <katco> what the variable?
[17:29] <natefinch> in this specific case
[17:29] <natefinch> you set the logging output in one spot one time and you're done.
[17:29] <perrito666> katco: you dont need to go to the  declaration, just MaxSize = "n" and try to build and the error will tell you :p also I am surprised emacs does not tell you that
[17:30] <katco> perrito666: rofl
[17:30] <katco> perrito666: emacs had already simulated, and predicted the outcome of this conversation, and decided it was not worth its time.
[17:31] <katco> perrito666: also, any stupid things i say are emacs trying to undermine my credibility. (nods)
[17:31] <katco> natefinch: yeah it's definitely not something i would dig my heels in on, but interesting conversation.
[17:32] <natefinch> katco: I'm not sure you're wrong.  I have a vague leaning against it, but like I said, I can't really explain why.
[17:32] <katco> natefinch: sometimes those gut checks are correct.
[17:33] <perrito666> well it reminds me of old php code where they did typing trough variable naming
[17:33] <perrito666> people that is
[17:33] <katco> perrito666: see aforementioned comment about name trying to express syntax vs. type of value.
[17:33] <katco> perrito666: that strikes me as hungarian notation, slightly different from this.
[17:34] <perrito666> we will end up with a size struct and MB() GB() B() methods :p
[17:34] <natefinch> I rather like go's named types, like type Megabytes int, however it doesn't help in this case because you can still just do MaxSize = 100, since constants are untyped and mold to fit the type they're assigned to.
[17:35] <perrito666> anyway, I was about to say, for me file size magnitudes should always be in bytes
[17:36] <perrito666> which is default for most unix tools
[17:47] <natefinch> perrito666: I know, but see above about deserializing from a config file.  I did that in v1, and ended up with a config file that had maxsize = 100000000
[17:47] <natefinch> and that's just ugly
[17:47] <natefinch> I don't ever want to have to count zeroes
[17:48] <perrito666> natefinch: most unix config files hold that kind of values
[17:48] <perrito666> or modern ones accept the unit as part of the value
[17:48] <natefinch> yeah, the unit deserialization is a possibility
[17:48] <natefinch> 200MB  or 1.5GB etc
[17:49] <perrito666> true
[17:49] <natefinch> meh
[17:49] <natefinch> megabytes is fine
[17:49] <perrito666> most unix commands evolved from times where bytes where something meaningful so they added units when 1T was something that you could have on your laptop I guess
[17:49] <katco> 200 * size.MB
[17:50] <natefinch> katco: that's fine for code, but doesn't work in deserialization, which is a big use case in logging configuration
[17:50] <katco> oh you're talking about the use-case for fiddling with these settings in the config file where such niceties aren't present
[17:50] <katco> i missed that, sorry
[17:51] <natefinch> right, np
[17:51] <natefinch> makes me want to write a size package so I can have those niceties
[17:51] <natefinch> plus, "package size" is um.... amusing
[17:52] <katco> oh lord lol
[17:53] <natefinch> whelp, I know what my next project is :)
[17:53] <katco> haha
[17:54] <perrito666> ohh man, I arrived to late for the size and package jokes
[17:54] <katco> but you have to give it a name that's a double entendre
[17:54] <natefinch> the key is figuring out how much double entendre you can fit in and still make it unclear if it's on purpose or not
[17:54] <katco> rofl
[17:54] <katco> +1 nate. +1.
[17:55] <perrito666> natefinch: the thing is to actually name it size, make it very useful pakcage, and then get proper english speakers to compliment you on it
[17:56] <katco> "One of the driving principles of the X project is that size shouldn't matter. It's how you use it. With this, we thrust our package into the go community and await feedback."
[18:18] <katco> oh look, an emacs plugin to watch jenkins status. :)
[18:22] <perrito666> ericsnow: ping
[18:23] <ericsnow> perrito666: hey
[18:23] <perrito666> ericsnow: hey, I cannot find what did you patch to make this work https://github.com/juju/juju/pull/334/files#diff-baa2cc9d463ab23cb9521ade2d84a5e9R94
[18:23] <perrito666> :p
[18:23] <perrito666> a little help?
[18:24] <ericsnow> perrito666: it's in setData()
[18:25] <perrito666> ericsnow: I guessed so much, I was not sure how that goes all the way down to backup
[18:26] <ericsnow> perrito666: the actual patching happens in SetUpTest()
[18:26] <perrito666> ohh I see
[18:26] <perrito666> I thought yo where cheating the whole thing so it would return the same hash
[18:26] <ericsnow> perrito666: tempting but no :)
[18:27] <ericsnow> perrito666: by cheating I would have been "done" a lot earlier :)
[18:32] <perrito666> ericsnow: I am done R up to api_test.go later Ill continue with the rest
[18:32] <perrito666> but you have a handful of my comments,
[18:32] <ericsnow> perrito666: awesome
[18:32] <perrito666> :) I am sure more savvy people can give you even better comments
[18:32] <ericsnow> perrito666: thanks so much
[18:35] <perrito666> actually today you could force our own natefinch or cmars to review your code :p or thats what the ocr schedule says
[18:35] <perrito666> bbl, bike time
[18:40] <natefinch> damn, am I OCR today?  Where's that OCR list?
[18:41] <perrito666> https://github.com/juju/juju/pull/377
[18:41] <perrito666> natefinch: sent you the link in priv
[18:45] <perrito666> natefinch: and since you are ocr, that link I just posted is a large part of restore
[18:45] <perrito666> ;)
[18:47] <perrito666> ericsnow: do you mind if I take over https://github.com/juju/juju/pull/113 ? or are you going to work on it? I am really looking forward to have a functional restore (after I fix this week's chapter bug of old restore)
[18:47] <ericsnow> perrito666: go ahead :)
[18:55] <arosales> Hello
[18:55] <natefinch> quick, everyone hide
[18:55] <arosales> fyi, we have a juju core panic on Power reported in bug https://bugs.launchpad.net/ubuntu/+source/juju-core/+bug/1347322
[18:55] <_mup_> Bug #1347322: juju ssh results in a panic: runtime error <ppc64el> <juju-core:Triaged> <juju-core (Ubuntu):Confirmed> <https://launchpad.net/bugs/1347322>
[18:56] <arosales> natefinch: lol :-)
[18:56] <arosales> previous bug was https://bugs.launchpad.net/ubuntu/+source/gccgo-4.9/+bug/1304754
[18:56] <_mup_> Bug #1304754: gccgo has issues when page size is not 4kB <ppc64el> <trusty> <gcc:Fix Released> <gcc-4.9 (Ubuntu):Fix Released> <gccgo-4.9 (Ubuntu):Invalid> <gcc-4.9 (Ubuntu Trusty):Invalid> <gccgo-4.9 (Ubuntu Trusty):In Progress by doko> <gcc-4.9 (Ubuntu Utopic):Fix Released> <gccgo-4.9 (Ubuntu Utopic):Invalid> <https://launchpad.net/bugs/1304754>
[18:56] <arosales> mbruzek: can reproduce this bug pretty easily so please feel free to ping him if any further data is needed.
[18:58] <arosales> If anyone has any insights into that bug it would be much appreciated as it is blocking juju deployments on power.
[18:58] <natefinch> the ssh one?
[18:59] <arosales> natefinch: correct, I think mbruzek also say it on regular deploys
[18:59] <arosales> ssh reliable reproduces it though
[18:59] <mbruzek> natefinch, if you need access to power system I can hook up
[19:00] <arosales> natefinch: initially we thought it to be the same as the compiler bug 1304754
[19:00] <_mup_> Bug #1304754: gccgo has issues when page size is not 4kB <ppc64el> <trusty> <gcc:Fix Released> <gcc-4.9 (Ubuntu):Fix Released> <gccgo-4.9 (Ubuntu):Invalid> <gcc-4.9 (Ubuntu Trusty):Invalid> <gccgo-4.9 (Ubuntu Trusty):In Progress by doko> <gcc-4.9 (Ubuntu Utopic):Fix Released> <gccgo-4.9 (Ubuntu Utopic):Invalid> <https://launchpad.net/bugs/1304754>
[19:18] <natefinch> mgz: the machine agent and unit agent - they're separate processes running at the same time on the same machine, right?
[19:31] <natefinch> wallyworld: are you really there?
[19:42] <natefinch> mgz: you can ignore my previous question when and if you see it :)
[19:55] <mbruzek> natefinch, I saw your update to 1347322
[19:55] <mbruzek> Which log do you need more of?  The text in the bug body is from the console.
[19:56] <mbruzek> I included all of the dmesg output from that system.
[19:56] <natefinch> mbruzek: the juju machine log should
[19:56] <natefinch> mbruzek: have more info
[19:59] <natefinch> mbruzek: this looks like a plain old code problem, not a compiler problem, though it could be the latter that just happens to show up as the former
[19:59] <mbruzek> natefinch, I don't see an error in machine-1.log, do you want to see all-machines.log?
[20:01] <natefinch> mbruzek: oh, I think I was misunderstanding what I saw.  That's a panic in the CLI code.
[20:02] <natefinch> I think
[20:02] <natefinch> it's so hard to read when it's all wrapped wackily like that
[20:02] <mbruzek> natefinch, yeah it is.
[20:03] <natefinch> mbruzek: if you can repro easily, can you get me cleaner output of that text?  or is that the best you can get?
[20:03] <mbruzek> natefinch, when I get the panic the screen is garbled like that always
[20:03] <mbruzek> natefinch, I can work on cleaning it up for you
[20:03] <mbruzek> natefinch, I also got the machine logs from the system if they would be helpful
[20:04] <natefinch> mbruzek: never hurts to attach more logs to a bug :)
[20:05] <mbruzek> https://bugs.launchpad.net/ubuntu/+source/juju-core/+bug/1347322
[20:05] <_mup_> Bug #1347322: juju ssh results in a panic: runtime error <ppc64el> <juju-core:Triaged> <juju-core (Ubuntu):Confirmed> <https://launchpad.net/bugs/1347322>
[20:05] <mbruzek> natefinch, updated
[20:05] <natefinch> mbruzek: thanks
[20:05] <natefinch> mbruzek: i can clean up the log as easily as you, it's no problem
[20:08] <mbruzek> natefinch, the panic only happens *after* I juju ssh to the ubuntu unit.  I just use the terminal a bit and it goes sideways.  I don't believe it is related to what I am running on the terminal.
[20:08] <mbruzek> natefinch, to expediate the bug I juju sshed to the same unit 3 times and got the same grabled text in all three windows.
[20:08] <natefinch> oh weird.  so you're connected for a bit before it actually blows up?
[20:08] <mbruzek> natefinch, yes I am not sure what sets it off.  I was doing several different things
[20:09] <natefinch> mbruzek: does it happen if you ssh into the machine the old fashioned way?
[20:10] <natefinch> in theory, juju ssh just gets the ssh info from state and then runs ssh like a normal person would
[21:51] <davecheney> thumper: https://bugs.launchpad.net/juju-core/+bug/1347939
[21:51] <_mup_> Bug #1347939: build is unstable since 7524c62 <juju-core:Confirmed> <https://launchpad.net/bugs/1347939>
[21:52] <davecheney> wallyworld__: sad to say your session copy fixed broke the build, https://bugs.launchpad.net/juju-core/+bug/1347939
[21:52] <_mup_> Bug #1347939: build is unstable since 7524c62 <juju-core:Confirmed> <https://launchpad.net/bugs/1347939>
[21:52] <wallyworld__> davecheney: i reverted it last night
[21:52] <davecheney> ok thanks
[21:52] <wallyworld__> it passed for me locally and on the bot
[21:53] <wallyworld__> but clearly there's a race in our tests
[21:53] <wallyworld__> :-(
[21:53] <davecheney> wallyworld__: it's not a race
[21:53] <davecheney> it's livelock
[21:53] <davecheney> when I run the test my cpu usage eventually goes to 0 and the test will timeout
[21:53] <wallyworld__> i haven't fully looked into it yet, just making an assumption as to why it passes sometimes and not others
[22:57]  * perrito666 reviewed a 30 file pr
[22:57] <davecheney> wallyworld_: right-o, thanks
[22:59] <perrito666> davecheney: your post about conditional compilation is very cool, thank you, I did not know about _$GOARCH.go
[23:00] <davecheney> perrito666: yup, the pattern is extended to
[23:00] <davecheney> _$GOOS_$GOARCH.go
[23:00] <davecheney> and even
[23:00] <davecheney> _$GOOS_$GOATCH_test.go
[23:02] <davecheney> mattyw: tasdomas http://blog.nuclearsecrecy.com/2014/05/23/oppenheimer-gita/
[23:02] <davecheney> ^ that quote
[23:02] <perrito666> waigani: funny network?
[23:03] <mattyw> davecheney, tasdomas the reflections of feynman: http://www.youtube.com/watch?v=6no328q_VGQ
[23:03] <fwereade> wallyworld_, ping
[23:03] <perrito666> sorry I meant wallyworld_
[23:03] <wallyworld_> hi
[23:04] <fwereade> wallyworld_, I'm trying to figure out tools selection for container proviisioners
[23:04] <wallyworld_> ok
[23:04] <fwereade> wallyworld_, we have a bug that needs to be fixed -- that env provisioners start machines with the agentVersion in env config, not the current running version
[23:04] <wallyworld_> was there a specific question?
[23:05] <fwereade> wallyworld_, and it's looking like the container ones do that as well, but very indirectly
[23:05] <fwereade> wallyworld_, ie they grab the tools with the envconfig agent-version, with arch/series taken from the current machine
[23:06] <wallyworld_> fwereade: should the env config version not match the running version?
[23:06] <fwereade> wallyworld_, but ISTM that they're all going through the same path, ultimately, in which they hit simplestreams
[23:06] <fwereade> wallyworld_, not necessarily
[23:06] <fwereade> wallyworld_, after the agent version is set by upgrade
[23:06] <fwereade> wallyworld_, but before the upgrade actually happens
[23:06] <fwereade> wallyworld_, we could provision a machine with tools not matching those running in the provisioner
[23:07] <fwereade> wallyworld_, and *that* would come up (assuming it *did*) already running new tools
[23:07] <fwereade> wallyworld_, and would never run upgrade steps
[23:07] <perrito666> fwereade: hey, you are here, apparently the cloudsigma links I pointed in my mails are  the new prs, none has more than 6 files and jam\d? said that those are the younes you guys are going to review
[23:07] <fwereade> wallyworld_, leading to all manner of potential unhappiness
[23:07] <wallyworld_> i'll have to look up the moving parts in the container provisioners to remember how the tools selection works in there
[23:07] <fwereade> perrito666, are they? ok I am out of date there, I last spoke to jam about a week ago, he is undoubtedly more current on that
[23:08] <fwereade> wallyworld_, in particular the brokers get initialised with one set of tools
[23:08] <perrito666> fwereade: yup, he answered to the thread I think
[23:08] <fwereade> wallyworld_, looked up over the api with arch/series initialised from the current machine agent
[23:09] <fwereade> wallyworld_, and then when we want to run a container with a different series we just hack up the tools struct so the wrong tools look like they have the right series
[23:09] <fwereade> wallyworld_, and as if by magic everything somehow currently works
[23:09] <fwereade> wallyworld_, more or less
[23:11] <wallyworld> fwereade: the container provisioner gets the tools to run via a call to the Tools client api method
[23:11] <fwereade> wallyworld_, yeah -- and that I think is the problem, because *that too* is using envconfig's agent-version
[23:12] <fwereade> wallyworld, not that actual version that's really running on the machine
[23:12] <wallyworld> i *think* the assumption was that the env version should match the running version
[23:12] <wallyworld> the upgrade procedure is designed to make that happen
[23:13] <wallyworld> ie set env version, trigger upgrade worker, doenload new tools, restart agents
[23:14] <wallyworld> so by design, env version should match runnning version (perhaps implementaton doesn't match that design)
[23:14] <wallyworld> i'm hand waving a bit because i didn't do the initial implementation of all this
[23:14] <wallyworld> what bug are you seeing?
[23:18] <wallyworld_> fwereade: sorry, dropped off again, my irc is so flakey for some reason
[23:18] <wallyworld_> not sure if you responded
[23:19] <fwereade> wallyworld_, sorry, i've decided that it is a rabbit warren of death and I'm not going to fix it today
[23:19] <fwereade> wallyworld_, I found that the upgrader is *also* interestingly fucked wrt what tools it picks
[23:19] <wallyworld_> ok. is there a bug number so i can see the issue?
[23:19] <fwereade> wallyworld_, there might be, but it's really just a race we realised was important for HA upgrades yesterday
[23:19] <fwereade> wallyworld_, I *think* there was a bug a while ago
[23:20] <wallyworld_> ok. perhaps the original design is sound for a single state server
[23:20] <wallyworld_> but not ha
[23:20] <fwereade> wallyworld_, i'm just trying to understand the situation and flailing around and grabbing onto you
[23:20] <perrito666> ok fine people, my brain just SIGQUITted on me, see you all tomorrow, cheers
[23:20] <fwereade> wallyworld_, it's still a bug, fwiw, but probably less impactful
[23:21] <fwereade> wallyworld_, we can't assume that the provisioner running on version X is capable of correctly setting up an instance running !X
[23:21] <perrito666> sinzui: I have a solution for the apt lock issue but Ill code it tomorrow cheeers
[23:21] <fwereade> wallyworld_, worst case non-HA we just fail with one machine
[23:22] <fwereade> wallyworld_, worst case with HA is a bit worse, we end up with a state server never participating in the upgrade synchronisation
[23:22] <fwereade> wallyworld_, anyway
[23:22] <fwereade> wallyworld_, it's not your problem and you don't need to worry about it
[23:22] <fwereade> wallyworld_, sorry noise
[23:24] <fwereade> perrito666, apt lock?
[23:25] <fwereade> perrito666, context please?
[23:25] <fwereade> perrito666, that solution *should* be the hook execution lock
[23:25] <fwereade> perrito666, if it's anything else we may have a problem
[23:25] <fwereade> perrito666, ehh, you're EOD
[23:26] <fwereade> perrito666, please find someone who knows about the hook lock tomorrow before writing anything else that tries to fix apt contention
[23:26] <fwereade> perrito666, sleep well :)
[23:32] <davecheney> FAIL
[23:32] <davecheney> FAIL    github.com/juju/juju/replicaset 173.495s
[23:32] <davecheney> always fails on my machine
[23:32] <davecheney> can anyone confirm they see the same thing ?
[23:38] <davecheney> https://bugs.launchpad.net/juju-core/+bug/1347969
[23:39] <_mup_> Bug #1347969: FAIL: replicaset_test.go:155: MongoSuite.TestAddRemoveSetIPv6 <juju-core:New> <https://launchpad.net/bugs/1347969>
[23:47] <davecheney> thumper: fwereade https://github.com/juju/juju/pull/378