[01:05] <thumper> menn0: found anything yet?
[01:05] <menn0> thumper: still try to replicate it locally
[01:06] <thumper> ok.
[01:06] <menn0> did you notice that the all-machines.log build artifact appears to be imcomplete?
[01:06] <menn0> incomplete even
[01:06]  * thumper needs to do the self-review hr thingy
[01:06] <thumper> menn0: no... didn't look in too much detail
[01:06] <menn0> it doesn't even get to the upgrade to 1.19
[01:17] <menn0> thumper: so things don't look awesome
[01:17] <thumper> in what way?
[01:17] <menn0> thumper: I've just done an upgrade from 1.18.4 to 1.19.4 and the env is pretty screwed
[01:18] <menn0> no idea if it's the same problem as what showed up in CI
[01:18] <menn0> the main thing seems to be that the addresses for machine-1 and machine-2 are now 127.0.0.1 instead of what they should be.
[01:19] <menn0> does that ring any bells
[01:19] <thumper> no, but we do have an address updater worker thingy
[01:19] <thumper> I would have just started with no other machines and just done the bootstrap node
[01:20] <thumper> see if that works
[01:20] <thumper> then add machines
[01:20] <menn0> also, during the upgrade the db-relation-changed hook for mysql failed (I was testing with the standard wp / mysql setup)
[01:20] <menn0> thumper: the problem is my machine is trusty and the problem is supposedly precise specific
[01:21] <thumper> is it?
[01:21] <thumper> I have a precise machine here
[01:21] <menn0> according to the bug
[01:21] <menn0> I was about to test with canonistack because I have that set up and it's all precise there
[01:21] <thumper> ok, try that too
[01:21] <menn0> at any rate, what I did should have been fine and it definitely wasn't
[01:22] <menn0> even if it's not the same bug as what showed up in CI it's pretty bad
[01:23] <menn0> the CI test uses some simple dummy charms. I'll test with them on canonistack, replicating what CI does as closely as I can
[01:26] <thumper> ok
[01:32] <wallyworld> menn0: i think this also 1.20 bug could be related also https://bugs.launchpad.net/bugs/1334773
[01:32] <_mup_> Bug #1334773: Upgrade from 1.19.3 to 1.19.4 cannot set machineaddress <lxc> <maas-provider> <precise> <upgrade-juju> <juju-core:Triaged> <https://launchpad.net/bugs/1334773>
[01:36] <menn0> wallyworld: could be
[01:36] <wallyworld> menn0: can you keep me in the loop as to where you get to?
[01:37] <menn0> wallyworld: will do
[01:37] <wallyworld> ta, i don't think that first bug is related necessarily, but i suspect the other could be
[01:48] <thumper> menn0: do you want some help diagnosing this issue?
[01:49]  * thumper just typed 'juju fetch upstream'
[01:49] <thumper> unsurprisingly it didn't work
[01:49]  * thumper considers writing a juju-fetch plugin
[01:51] <thumper> how do I tell git to have my 1.20 branch track upstream 1.20?
[01:51] <thumper> o/ mramm
[01:56]  * thumper branches 1.20
[02:05] <menn0> thumper: just got back from lunch
[02:05] <menn0> thumper: yes please
[02:05] <thumper> menn0: well there is certainly something wrong
[02:06] <thumper> i'm looking at my current setup
[02:07] <thumper> o..m..g...
[02:07] <thumper> hmm...
[02:07] <menn0> thumper: hangout?
[02:07] <thumper> yeah...
[02:08] <thumper> menn0: https://plus.google.com/hangouts/_/canonical.com/local-debugging
[02:18] <axw> wallyworld: I'm back, ready for 1:1 whenever you are
[02:18] <wallyworld> ok
[02:23] <bodie_> I'm digging in the worker/uniter/uniter_test a bit
[02:23] <bodie_> this is very unusual
[02:23] <bodie_> does anyone know how to pass values around between steps?
[02:24] <bodie_> I guess you'd have to set a value on the ctx struct?
[03:03] <menn0> axw: are you around? at least one of the upgrade issues appears to be mongo replicaset related...
[03:03] <axw> menn0: I am
[03:03] <axw> oh? :(
[03:03] <menn0> axw: https://plus.google.com/hangouts/_/canonical.com/local-debugging
[03:03] <axw> brt
[03:06] <waigani> axw: thanks
[03:13] <waigani> axw: setting locale to default of UTF8 for charm hooks. Is that just a matter of setting an environment variable  LANG=en_US.UTF8 ?
[03:13] <axw> waigani: yup
[03:19] <axw> waigani: sorry gonna need a fix to the known_hosts path quoting
[03:20] <axw> waigani: left a comment on the PR
[03:20] <waigani> axw: ah shit, sorry I hit merge
[03:20] <axw> nps, it's not an immediate problem
[03:21] <waigani> axw: I ran the test and it did not quote
[03:22] <axw> waigani: otp, I'll take a look in a bit
[03:28] <waigani> axw: I think I've got it, using utils.CommandString in ssh_openssh.go
[03:36] <waigani> axw: okay fixed, I pushed up changes. What happens, considering I already started the merge?
[03:38] <axw> waigani: kinda dodgy now that I think of it, but if your original change passes it'll get merged :)
[03:44] <axw> waigani: sorry, I don't think you need to quote the string at all in utils/ssh
[03:45] <axw> waigani: it gets passed to os/exec.Command, not to a shell
[03:46] <waigani> axw: oh right. I'll take out the utils.CommandString then
[03:47] <waigani> axw: and not test for quotations? have a file name with no spaces?
[03:47] <axw> waigani: sec
[03:48] <axw> waigani: the way to test the quoting would be to update fakecommand in ssh_test.go
[03:48] <axw> waigani: surrounding $@ with double quotes should make it quote the args
[03:51] <waigani> axw: $@ already has double quotes
[03:52] <axw> waigani: not in my branch. I have: echo $@ | tee $0.args
[03:53] <waigani> axw: hehe, I was looking in utils
[03:54] <waigani> axw: okay I'll have a play
[04:11] <waigani> axw: just spotted scp tests failing
[04:11] <waigani> axw: I'll fix those and push up again
[04:14] <thumper> menn0,axw: did you find it?
[04:15] <menn0> axw: not yet but axw is pretty sure those asserts are it
[04:15] <menn0> axw wanted to get some food so we broke the call
[04:16] <menn0> I'm currently trying to understand why the problem has stopped happening for me
[04:16] <menn0> I have a theory which I'm testing now
[04:16] <axw> thumper: just making some lunch, will bbs
[04:16] <thumper> axw: ack
[04:17] <thumper> menn0: because the asserts aren't failing?
[04:17] <thumper> now
[04:17] <menn0> but they should be if I'm upgrading from 1.18 to 1.19 right?
[04:19] <menn0> thumper: where are those asserts?
[04:20] <thumper> state/machine.go:975
[04:27] <waigani> axw: okay, should be good to go now.
[04:35] <menn0> thumper: I can't make the problem happen again. Trying something else...
[04:40] <thumper> menn0, axw: so looking at the difference for me, the doc doesn't have Scope values in the db, and we are trying to set with scope values
[04:40] <thumper> and the new list has one more ipv6 address
[04:40] <thumper> however that doesn't explain why the transaction is aborted
[04:40] <axw> thumper: shouldn't matter what we're setting, only what we're comparing between the in-memory and in-db
[04:41] <thumper> hmm...
[04:41] <thumper> may have it...
[04:41] <thumper> maybe
[04:41] <thumper> we look at the actual serialized data we have...
[04:42] <thumper> but set using a string value in bson.D
[04:42] <thumper> what is the structure serialized as?
[04:42] <thumper> could that bit it?
[04:42] <axw> "address"
[04:42] <axw> I compared the structs between 1.18 and 1.19 and didn't see a difference
[04:43] <axw> thumper: only difference is change in location of the structs... don't *think* that matters though...
[04:43] <thumper> well, it shouldn't, but it might
[04:44] <thumper> axw: is there any way to dump at the raw bson structure ?
[04:44] <thumper> or, where are the bson serialisation commands?
[04:44] <axw> thumper: yeah I was about to figure out how :)  there's an mgo.Raw type like in encoding/json
[04:45] <thumper> here is a simpler case:
[04:45] <thumper> 2014-06-27 04:32:09 DEBUG juju.state machine.go:931 addresses currently: []state.address{state.address{Value:"localhost", AddressType:"hostname", NetworkName:"", Scope:""}, state.address{Value:"10.0.3.1", AddressType:"ipv4", NetworkName:"", Scope:""}}
[04:45] <thumper> 2014-06-27 04:32:09 DEBUG juju.state machine.go:978 updating addresses to: []state.address{state.address{Value:"localhost", AddressType:"hostname", NetworkName:"", Scope:"public"}, state.address{Value:"10.0.3.1", AddressType:"ipv4", NetworkName:"", Scope:"local-cloud"}}
[04:45] <thumper> 2014-06-27 04:32:09 DEBUG juju.state.txn txn.go:91 0: err: &errors.errorString{s:"transaction aborted"}
[04:46] <thumper> updating to add scope
[04:46] <thumper> and that is all
[04:46] <thumper> order is less likely to be an issue here with just two values
[04:47]  * thumper has to go and run to get sushi
[04:47] <thumper> before they run out
[04:47] <thumper> I'll check in when I get back
[04:59] <menn0> axw, thumper: I need to do a takeaway run myself but will join in again
[05:09] <axw> menn0 thumper-afk: found it
[05:09] <axw> a field name in state.address was changed
[05:10] <axw> NetworkScope -> Scope
[05:10] <axw> dimitern: I'm going to change the field name of state.address.Scope back to NetworkScope as it was before. let me know if you can think of any problem with that
[05:14] <dimitern> axw, why?
[05:14] <axw> dimitern: because it used to be called networkscope in state
[05:14] <axw> dimitern: the change breaks upgrade
[05:16] <dimitern> axw, ah.. dreaded schema changes
[05:17] <dimitern> axw, ok, can you make it Scope string `bson:"networkscope"` ?
[05:17] <axw> dimitern: yup, that's what I've done
[05:17] <axw> looks like there's no queries on that field
[05:17] <axw> so should be fine
[05:18] <dimitern> great
[05:20] <axw> dimitern: https://github.com/juju/juju/pull/183 please
[05:21] <dimitern> axw, looking
[05:22] <dimitern> axw, done
[05:24] <axw> thanks
[05:30] <thumper> axw: ah, so we were read the data, scope now blank, assert a dict which now didn't match, right?
[05:32] <axw> thumper: something like that. we expected "scope", but state had "networkscope"
[05:32] <axw> same value, different field name
[05:32] <thumper> so our assertion failed
[05:32] <axw> yup
[05:40] <axw> uh oh, the merge bot picked up my 1.20 PR, is going to test it on trunk, and then land it in 1.20
[05:40] <axw> oh well, it's trivial
[05:40] <thumper> heh
[05:41] <axw> waigani: sorry, looking at your changes now
[05:47] <thumper> davecheney: making progress on bug 1334493 ?
[05:47] <_mup_> Bug #1334493: Cannot compile/exec win client <regression> <windows> <juju-core:Triaged> <https://launchpad.net/bugs/1334493>
[05:53] <davecheney> thumper: yup
[06:22] <davecheney> https://github.com/juju/juju/pull/186
[06:22] <wallyworld> axw: my irc sucks today. seems like you're making progess on bug 1334773?
[06:22] <davecheney> should fix the faulty tools
[06:22] <_mup_> Bug #1334773: Upgrade from 1.19.3 to 1.19.4 cannot set machineaddress <lxc> <maas-provider> <precise> <upgrade-juju> <juju-core:In Progress by axwalk> <https://launchpad.net/bugs/1334773>
[06:22] <axw> wallyworld: yeah, I have a fix in the works
[06:22] <axw> being merged
[06:23] <wallyworld> great
[06:23] <axw> merged in fact
[06:23] <wallyworld> \o/
[06:36] <menn0> thumper, axw: do you think it's possible that bug 1334273 may also be caused by the Scope problem?
[06:36] <_mup_> Bug #1334273: Upgrades of precise localhost fail <local-provider> <precise> <regression> <upgrade-juju> <juju-core:Triaged> <https://launchpad.net/bugs/1334273>
[06:37] <axw> menn0: possibly related, but it's not local-specific, and definitely not precise-specific
[06:39] <menn0> axw: yeah it's hard to see how it the networkscope issue could be that specific.
[06:40] <menn0> I still don't see why it magically stopped happening on my machine though. It doesn't seem like the problem should have been intermittent.
[06:41] <menn0> axw: the symptoms of the precise local upgrade problem look a lot like what Tim saw when upgrading from 1.19.1 to 1.19.4 and that was the peer grouper setup not happening.
[06:41] <axw> yeah...
[06:41] <axw> hmm
[06:41] <menn0> axw: but the CI test was going from 1.18.4 to 1.19.4 so it's not quite the same.
[06:42] <wallyworld> menn0: when i looked at the logs on CI, there were no obvious errors logged by the peer grouper when it failed
[06:42] <wallyworld> it seems it could related to going from non-ha to ha set up, and on precise we use an older mongo
[06:43] <menn0> wallyworld: sorry, it's not the peergrouper itself but the replicaset configuration that's done at startup for the peergrouper (it uses code in the peergrouper's module)
[06:43] <wallyworld> which has previously caused issues with local provider, hence it was disabled for a while
[06:43] <wallyworld> yup
[06:43] <wallyworld> my handwavy view is that mongo+precise+replicaset = shit
[06:44] <wallyworld> but i have little evidence
[06:44] <axw> wallyworld: was there ever a known problem with the older version of mongo? the only thing I absolutely know for sure that caused a problem was the oplog size
[06:44] <wallyworld> not sure
[06:45] <wallyworld> but the only thing that i can see as different on precise is the version of mongo
[06:45] <menn0> wallyworld: we found today that upgrading from 1.19 before the replicaset work was done in this series to 1.19.4 causes broken upgrades because the replicaset setup is only done if you're upgrading from pre-1.19.0.
[06:45] <wallyworld> joy
[06:46] <menn0> wallyworld: the thing is the 1.20 blocker for precise upgrades looks awfully similar.
[06:46] <wallyworld> but that's "ok"
[06:46] <wallyworld> we don't support upgrades for dev versions
[06:46] <menn0> very similar symptoms in the logs
[06:46] <wallyworld> menn0: so my view is replicaset is broken somehow, seems like you agree?
[06:46] <menn0> so I wonder if replicaset setup failed for some other reason
[06:47] <menn0> wallyworld: I'm saying there could be a problem initialising the replicaset in some cases.
[06:47] <wallyworld> if it works for straight uo deploy from scratch, i wonder what's different when upgrading
[06:47] <menn0> I need to go again for a bit but I'll dig through the CI failure again when I'm back
[06:48] <menn0> something in the logs might jump out now that there's theory in to what might be happening.
[06:48] <wallyworld> yeah
[06:48] <menn0> back in 30 mins ish
[06:48] <wallyworld> i'll try and look after soccer
[06:48] <wallyworld> but i want to talk to william also
[06:49] <wallyworld> so may not get time
[06:51] <axw> wallyworld: there's a bug in MachineAgent.ensureMongoServer, but I don't know if it's related or not. If we get as far as ensureMongoServer, but then maybeInitiateMongoServer fails (or something in between), then the process would exit and restart
[06:51] <axw> wallyworld: we'd then try the isPreHAVersion block again and fail to connect to state
[06:52] <axw> wallyworld: the reason being that we haven't yet initiated the replicaset, and haven't told mgo to make a Direct connection
[07:23] <menn0> axw: I'm hunting through the logs from the upgrade failure in CI again. Nothing yet.
[07:23] <axw> menn0: yeah nothing jumped out at me. I've put up https://github.com/juju/juju/pull/187 - it's a long shot, but possibly related
[07:25] <menn0> axw: just had a look at that PR. Seems reasonable. That applies even with a single state server right?
[07:26] <axw> menn0: yes, that's the only time it actually will be uesd (pre-HA)
[07:29] <menn0> axw: duh. of course :)
[07:29] <menn0> axw: have you tried a 1.18 to 1.19 upgrade with the changes in that PR?
[07:30] <axw> menn0: yes, just tried and it works fine
[07:31] <menn0> axw: cool, I'll LGTM it.
[07:31] <axw> cheer
[07:31] <axw> s
[07:32] <menn0> axw: done
[07:33] <menn0> axw: one thing that jumps out at me about the failure in CI is that it takes 8 minutes from the time jujud restarts to the new version before MaybeInitiateMongoServer gets called. That seems awfully long.
[07:34] <axw> hmm
[07:35] <axw> it does seem like a long time...
[07:41] <menn0> another thing: would we expect the mongo admin user to get set up once jujud restart into the new software version?
[07:41] <menn0> mongo was started with --noauth and all that
[07:41] <menn0> axw: ^^^
[07:41] <menn0> in the CI test failure
[07:42] <axw> menn0:  adding the admin user is part of the expected upgrade procedure
[07:44] <menn0> axw: ok... well in the 1.18 to 1.19 upgrades on my machine which went without a hitch I see no evidence of that happening. But it did happen in the failed CI upgrade run.
[07:44] <menn0> Could be coincidence but maybe not
[07:46] <dimitern> axw, still around?
[07:47] <axw> dimitern: yes
[07:47] <dimitern> axw, I was wondering about the progress on relation addresses wrt charms
[07:47] <dimitern> axw, there were some unresolved comments on the doc - did you reach agreement?
[07:48] <dimitern> axw, about the new hooks and stuff
[07:48] <axw> dimitern: not 100%. fwereade had a chat with hazmat and came to a vague agreement. I've got a PR up atm that triggers config-changed on units whenever the machine addresses change - will do relation addresses later
[07:48] <dimitern> axw, i have to prepare a doc to sync up on how to expose IPv6 addresses to charms, as this is the most important take on ipv6 support in core from charmers perspective
[07:49] <dimitern> axw, right, i'll ping fwereade and hazmat, cheers
[07:49] <axw> ah, that'll be interesting...
[07:50] <axw> menn0: did you get the full log for CI?
[07:51] <axw> menn0: I upgraded 1.18.1 to 1.19.4 on my machine and I got "starting mongo with --noauth" and "setting admin password" on upgrade
[07:52] <menn0> axw: no. all-machines.log at least is going to be incomplete because of the rsyslogd config changes during the upgrade. the separate machine logs are fine though.
[07:52] <menn0> axw: I think I see a race
[07:52] <axw> ah I see
[07:52] <axw> race? where?
[07:53] <menn0> axw: if the upgrade-steps worker finishes before ensureMongoServer is called by the state worker then the isPreHAVersion check will be false and we won't do the HA setup work.
[07:53] <menn0> Does that sound plausable?
[07:53] <menn0> upgrade-steps updates UpgradedToVersion once it's done
[07:53] <menn0> and that is what ensureMongoServer is checking against
[07:54] <menn0> might explain why I'm only able to see the issue sometimes.
[07:54] <axw> menn0: well shit, I think you're right
[07:55] <axw> we shouldn't do anything until we've upgraded mongo
[07:55] <menn0> by ensureMongoServer I mean the the method not the function with the same name it calls
[07:55] <axw> yep
[07:56] <axw> menn0: that would better explain this bug
[07:56] <menn0> axw: and possibly some of the other replicaset weirdness we've seen?
[07:56] <axw> menn0: hmm actually...
[07:57] <axw> ah never mind
[07:57] <axw> yep, still looks like the culprit
[07:57] <axw> menn0: dunno about other weirdness, this is upgrade specific
[07:57] <menn0> ah ok. I thought the other issues were upgrade specific too.
[07:58] <menn0> axw: so what's the fix? Do the mongo upgrade work before starting any of the workers?
[07:59]  * menn0 suspects he needs to go so lest he upsets his wife
[07:59] <menn0> s/so/soon/
[07:59] <axw> menn0: I can handle it, it's pretty late there
[07:59] <axw> but yes I think that's the solution
[08:00] <menn0> axw: sweet.
[08:00] <menn0> have a good weekend.
[08:01] <axw> cheers, you too
[08:01]  * menn0 is relieved that the day ended with something productive
[08:12] <rogpeppe2> if anyone has some time, i'd very much like a review of this pull request by someone on core, please. It's currently only been reviewed by gui people. https://github.com/juju/charm/pull/9
[08:22] <fwereade> axw, ping
[08:22] <axw> fwereade: pong
[08:22] <fwereade> axw, any insight into https://bugs.launchpad.net/juju-core/+bug/1334683 ?
[08:22] <_mup_> Bug #1334683: juju machine numbers being incorrectly assigned <azure-provider> <juju-core:Triaged> <https://launchpad.net/bugs/1334683>
[08:22] <axw> fwereade: fraid I've been bogged down with critical 1.20 things to even look into it
[08:23] <axw> no ideas off the top of my head
[08:23] <fwereade> axw, no worries, it's only 18h old :)
[08:23] <fwereade> axw, any suggestions for someone awake now/soon who'd know azure?
[08:24] <axw> fwereade: I don't know if anyone other than me does
[08:24] <axw> fwereade: I can take a look if someone wants to take over this jujud upgrade race
[08:25] <axw> fwereade: menn0 noticed that the upgrade steps could theoretically complete before the state worker starts, causing isPreHA to return false and things to go a bit pear shaped
[08:27] <axw> fwereade: eh I think I see the problem in azure
[08:27] <axw> fwereade: in the 1.18 code it's not returning the instances in the same order as the input ids
[08:28] <axw> fwereade: good news is, it's fixed in 1.19.4
[08:28] <axw> or 1.19.0 or whenever I made all the changes
[08:28] <fwereade> axw, you're too awesome, I'd resigned myself to hassling nate about it :)
[08:28] <fwereade> axw, would yu add a really quick note to the bug to that effect please?
[08:28] <axw> will do
[09:31] <bac> fwereade: i've been trying to get an azure instance to bootstrap with trusty with no success for a day. i've a partially booted node up. who might know a thing or two about azure and have a look?
[09:32] <fwereade> bac, if axw is still up he's your best bet by a long way
[09:33] <axw> bac: are you using 1.19.x?
[09:33] <bac> axw: 1.19.4
[09:33] <axw> bac: ok, that's a good start. you've "partially booted a node up"?
[09:33] <bac> axw: i bootstrap with --debug and 'Running apt-get upgrade' is the last thing in the console
[09:33] <axw> how? where did it break?
[09:33] <axw> ok
[09:34]  * axw checks he can bootstrap
[09:34] <bac> axw: i can give you access to the instance
[09:34] <axw> bac: sure, just import lp:~axwalk please
[09:36] <bac> axw: juju-azure-ci3-7ul3u8075q.cloudapp.net
[09:36] <axw> ta
[09:38] <axw> bac: the agent is running... what happens when you run juju status?
[09:39] <bac> axw: instance-state remains ReadyRole
[09:39] <bac> axw: let me paste the whole thing.  until then, here is my yaml config block https://pastebin.canonical.com/112600/
[09:39] <axw> thanks
[09:39] <bac> axw: https://pastebin.canonical.com/112601/
[09:40] <axw> bac: looks fine - what's the issue with it?
[09:41] <bac> axw: well 'juju bootstrap' never terminates.  if i ctl-c out of it, the instance is torn down
[09:41] <axw> ah right. hmm weird
[09:41] <bac> axw: i assumed since it didn't complete it was still doing stuff.  and i don't know what ReadyRole is
[09:41] <bac> so it was less than comforting
[09:42] <axw> bac: that's just azure's term for "machine is started/running"
[09:42] <bac> ok
[09:43] <bac> i guess if you had to jam two random words together those are as good as any
[09:43] <bac> axw: so is my use of --debug possibly the culprit in the non-termination of the bootstrap?
[09:44] <axw> bac: I don't think so, I do that all the time...
[09:44] <axw> I'm just bootstrapping my own instance now
[09:45] <axw> bac: when you say you've been trying to get an azure instance to bootstrap for a day... do you mean that you left that one command running for a day, or you've tried it a bunch of times?
[09:46] <bac> axw: i tried it a bunch of times.  most was on US East and i got different errors.  last night i switched to US West and launched the bootstrap at my eod.  this one has been active overnight
[09:47] <axw> mk
[09:47] <bac> axw: upgraded from 1.19.3 in the middle of my attempts
[09:48] <bac> axw: if i cannot get it resolved this morning, we're going to have to switch our CI setup to another provider.
[09:49] <axw> bac: are you doing this on Linux?
[09:49] <axw> your client
[09:49] <bac> trusty vm
[09:50] <axw> and does it have ssh installed?
[09:50] <bac> yes
[09:55] <axw> bac: uh oh, I have reproduced this issue
[09:55] <bac> yay, boo.
[09:55] <axw> bac: I just realised I had "image-stream: daily" set in my environments.yaml; I took it out and bootstrap is just hanging there
[09:56] <axw> fwereade: ^^ critical bug for 1.20 I think
[09:57]  * fwereade reads back with an unhealthy sense of impending freakout
[09:57] <bac> axw: cool, so i can add image-stream to my config and perhaps get past, for testing purpose only, of course.
[09:58] <fwereade> axw, bac: so, wait, is this juju or the image that's messed up?
[09:59] <bac> fwereade: i can't say.
[09:59] <axw> fwereade: has to be juju. the machine agent is coming up, but the bootstrap client is just hanging there
[09:59] <axw> looks like someone already reported this: https://bugs.launchpad.net/juju-core/+bug/1316185
[09:59] <_mup_> Bug #1316185: juju bootstrap hangs on Azure <juju-core:Triaged> <https://launchpad.net/bugs/1316185>
[10:00] <axw> gonna try again with image-stream: daily back in
[10:01] <bac> yep, that's the same issue
[10:03] <bac> axw: i'll tear down that instance and try again with image-stream: daily to confirm from the other direction
[10:03] <bac> axw: unless you want the instance to remain up
[10:04] <axw> bac: nope, go for it
[10:04] <axw> thanks
[10:04] <bac> axw: fwiw, ctl-c yields
[10:04] <bac> ^C2014-06-27 10:04:21 INFO juju.cmd cmd.go:113 Interrupt signalled: waiting for bootstrap to exit
[10:04] <bac> 2014-06-27 10:04:21 ERROR juju.provider.common bootstrap.go:119 bootstrap failed: subprocess encountered error code 130
[10:04] <bac> Stopping instance...
[10:05] <axw> bac: thanks, 130 just means it was killed by Ctrl-C
[10:11] <axw> bac fwereade: yeah, putting "image-stream: daily" back in fixed it for me
[10:11] <axw> bugger.
[10:13] <fwereade> axw, I has a very confused -- so juju is depending on something in the daily image that's not in the released one?
[10:14] <axw> fwereade: I dunno what the difference is. The thing is, juju actually bootstraps and works with the released images. It's just that the ssh client doesn't see any output from the script past "apt-get upgrade"
[10:14] <axw> and doesn't get EOF
[10:14] <axw> fwereade: I guess one of the packages that gets upgraded is buggering up the communication
[10:15] <axw> about to have a look at what the differences are
[10:15] <bac> axw: confirmed that i can boot cleanly with image-stream: daily
[10:15] <axw> thanks bac
[10:16]  * bac breakfast
[10:29] <axw> fwereade: is there some reason other than "we'd like people to have up-to-date machines" for running "apt-get upgrade" at bootstrap?
[10:30] <bac> axw: this is almost certainly unrelated, but when i was trying to boot yesterday on US East i was getting repeated errors like http://paste.ubuntu.com/7707189/ -- it certainly muddied the waters.
[10:30] <axw> bac: fixed on trunk today
[10:30] <bac> ty
[10:34] <fwereade> axw, apart from the fact that we install stuff and it's generally preferred that we update before doing so, I can't think of one
[10:36] <axw> fwereade: bootstrapping with the daily images, at least on azure, is considerably faster without it
[10:36] <axw> whether it's considerably more broken, I don't know :)
[10:39] <fwereade> axw, so does it work if you drop the apt-get update?
[10:40] <fwereade> axw, /upgrade
[10:40] <axw> trying now
[10:40] <axw> also going to try holding back bash and apt
[10:41] <fwereade> cheers
[10:48] <dimitern> jam1 (if you're here), vladk, standup?
[10:48] <vladk> dimitern: yep
[10:50] <axw> fwereade: works without apt-get update
[11:05] <perrito666> morning
[11:06] <natefinch> morn
[11:06] <natefinch> ing
[11:06] <natefinch> tab completion should work on the word I'm thinking of
[11:12] <TheMue> hel
[11:12] <TheMue> lo
[11:12] <TheMue> na
[11:12] <TheMue> tefi
[11:12] <TheMue> nch
[11:12] <TheMue> ;)
[11:32] <wwitzel3> I do that in the shell sometimes, trying to a tab complete a URL I'm calling curl with.
[11:33] <perrito666> yeah, I do it with passwords
[12:03] <bac> hi mgz
[12:07] <rogpeppe3> dimitern: any chance you could review this for me? it's blocked until i can get a review from someone from juju-core: https://github.com/juju/charm/pull/9
[12:12] <bac> fwereade: now that axw has finished up, is anyone working now who i can ask about azure issues?
[12:17] <axw> bac: I'm still around for the moment...
[12:18] <rick_h_> axw: so in azure mode there's no manual placement. Is that the default in azure then? This means no machine view/colocating?
[12:18] <bac> oh, hi axw.  had a question about azure availability sets.  just found a link to the doc
[12:18] <axw> rick_h_: correct
[12:18] <rick_h_> axw: can you do any form of colocation at all?
[12:18] <axw> not in the current implementation
[12:19] <axw> (unless you disable availability sets)
[12:19] <fwereade> axw, rick_h_: yeah, I thought that if you used rubbish-mode you could still do manual placement
[12:19] <rick_h_> axw: and is this somehting you can change after bootstrap? I see it's a 'bootstrap attribute' so is there a flag on juju bootstrap? Any other way to change afterwards?
[12:19] <fwereade> axw, rick_h_: forgive the looseness of my terminology
[12:19] <axw> rick_h_: no, it's immutable
[12:20] <rick_h_> wow, ok will process and ponder. This will definitely cause some fun with our current machine view work and GUI along with other projects.
[12:24] <axw> rick_h_: in case I wasn't clear, it's configurable at bootstrap time only, and immutable thereafter
[12:24] <rick_h_> axw: gotcha.
[12:24] <rick_h_> axw: just :(
[12:25] <rick_h_> axw: but will spend some time on it thinking through it and it's chain of effects.
[12:25] <axw> it's a bit of a PITA, I know, but azure's model dictates this
[12:25] <bac> axw: is there a switch to juju bootstrap to turn it off?
[12:25] <axw> bac: yeah you can set availability-sets-enabled=false in environments.yaml
[12:25] <bac> axw: ah, ok
[12:44] <perrito666> wwitzel3: ericsnow is any of you taskless?
[12:45] <wwitzel3> perrito666: nope, I'm working on tests for the env client api stuff and then moving on to the legacy setmongopassword cleanup.
[12:46] <wallyworld__> fwereade: did you want a quick chat about charm storage?
[12:46] <fwereade> wallyworld__, sure, 5 mins?
[12:47] <wallyworld__> fwereade: ok, join me in https://plus.google.com/hangouts/_/canonical.com/tanzanite-daily
[12:47] <wallyworld__> rick_h_: did you want to join too?
[12:47] <rick_h_> wallyworld__: joining
[12:49] <bodie_> morning all
[12:49] <perrito666> well ericsnow if you are interested we need to implement this suggestion by rogpeppe "you could add a jujud subcommand which updates the addresses in the agent.conf file"
[12:50] <fwereade> perrito666, ericsnow: doesn't `juju endpoints` have that effect anyway?
[12:50] <fwereade> perrito666, ericsnow: if not, it should, because *every* command that hits the api ought to update the .jenv addresses if they've changed
[12:51] <perrito666> fwereade: tell me/us more, the idea is to be able to let the agents know that the state server has changed
[12:51] <fwereade> perrito666, oh! wait, agent.conf?
[12:52] <perrito666> :)
[12:52] <fwereade> perrito666, that should not be a command
[12:52] <fwereade> perrito666, we update them on initial connect already
[12:52] <fwereade> perrito666, what we should have is a watcher that updates them when they change
[12:52] <perrito666> fwereade: this is something we need to do after a restore
[12:53] <fwereade> perrito666, ah, ok I see, sorry, I misread two separate things
[12:53] <perrito666> so the idea is, state server hung and was killed and we restored, agents have no clue on what the state server is
[12:53] <fwereade> perrito666, just ignore me, if you haven't learned that lesson already
[12:53] <perrito666> lol
[12:53] <fwereade> perrito666, jujud command makes sense
[12:54] <perrito666> is at least an improvement from current method, which is ssh+sed
[13:02] <bac> mgz you around?
[13:07] <alexisb> natefinch, fwereade either you available to join the cloudbase call?
[13:08] <fwereade> alexisb, just finishing up another call, in there in a sec
[13:09] <rogpeppe> please, would someone be able to review this code?! https://github.com/juju/charm/pull/9
[13:09] <alexisb> awesome thankyou
[13:14] <rogpeppe> natefinch, wwitzel3, jam1, wallyworld__, axw, dimitern, mgz: ^
[13:15] <axw> sorry I'm logging off - will take a look on monday if it's still unreviewed
[13:15] <fwereade> wallyworld__, are you still around?
[13:15] <wallyworld__> yeah
[13:15] <axw> fwereade: I give up on the azure problem for tonight. I ended up modifying bootstrap to upgrade the packages individually, and it worked :/
[13:16]  * axw logs off
[13:16] <wallyworld__> fwereade: but i'm too tired to really review anything
[13:16] <fwereade> wallyworld__, np, you're not needed :)
[13:16] <wallyworld__> so situation normal then
[13:17] <natefinch> alexisb: coming
[13:22] <sinzui> Can someone explain or fix  me so that we can merge the juju version changes for master and 1.20. Juju CI will not test something that it knows is released
[13:22] <sinzui> https://github.com/juju/juju/pull/181 and https://github.com/juju/juju/pull/180
[13:44] <rogpeppe> *still* looking for a review of this, please: https://github.com/juju/charm/pull/9
[14:05] <bac> sinzui: turns out this was the bug that i was hitting yesterday: bug 1316185
[14:05] <_mup_> Bug #1316185: juju bootstrap hangs on Azure <juju-core:In Progress by axwalk> <juju-core 1.20:In Progress by axwalk> <https://launchpad.net/bugs/1316185>
[14:06] <sinzui> Don't use daily
[14:08] <natefinch> perrito666: how's your windows knowledge?
[14:09]  * perrito666 sees an avalanche comming
[14:09] <perrito666> natefinch: I know my way around windows 7 I rule at windows xp :p
[14:09] <perrito666> I might work well enough with windows server if it looks anything like windows nt
[14:10] <bac> sinzui: if daily is not a proper work around should it go back to critical?
[14:10] <sinzui> yes. daily was required last year because saucy was the only series that had azure support
[14:11] <natefinch> perrito666: we have work on getting charms deployable to windows
[14:11] <sinzui> bac also, daily is now focuses on utopic. I think you want to use an LTS
[14:12] <natefinch> I'm working with the cloudbase guys getting their code into Juju, but I'm on vacation next week and need someone else to help them out
[14:13] <natefinch> perrito666: it's honestly less windows stuff and more just helping them get their code well integrated into Juju
[14:13] <perrito666> natefinch: I guess I could help, although I wold love a bit more info :)
[14:16] <natefinch> perrito666: hop on here https://plus.google.com/hangouts/_/canonical.com/cloudbase-juju?authuser=1
[14:16] <sinzui> If the bot is going to ignore me or the 1.20 branch, I can merge the 1.20.0 version change myself to unblock the release
[14:17] <perrito666> natefinch: hold I sec I stop the radio, I was waiting for news on the country entering into economic default or not
[14:17] <natefinch> heh
[14:17] <natefinch> perrito666: no rush
[14:17] <natefinch> perrito666: we'll be on there for a long time
[14:17] <perrito666> also apparently our vice president might get arrested for defraudation :p
[14:20] <perrito666> natefinch: says google that the party is over
[14:20] <sinzui> I manually merged 1.20.0 version change into 1.20 branch. CI test it in about 2 hours
[14:21] <natefinch> perrito666: invited via the UI, that should work
[14:21] <sinzui> oh, it will test in in a hour because master has an invalid version, the test suite will exit early
[14:21] <perrito666> natefinch: yup, scared me to death
[14:22] <rogpeppe> another review if anyone cares to, much simpler one this time: https://github.com/juju/charmstore/pull/11
[14:29] <perrito666> when did fwereade pop into that call? I was about to say that natefinch sounded a lot like fwereade today :p
[14:30] <wwitzel3> England, New England .. same thing
[14:30] <natefinch> haha
[14:30] <TheMue> rogpeppe: 11 is reviewed
[14:30] <perrito666> well, clearly NewEngland is a factory for England
[14:30] <rogpeppe> TheMue: thanks
[14:31] <natefinch> lol
[14:31] <TheMue> perrito666: which package? and a reference of a copy?
[14:31] <TheMue> perrito666: usa := europe.NewEngland() ?
[14:31] <perrito666> TheMue: NewEngland() (*europe.England, error) {}
[14:32] <perrito666> I understand England used that a lot around 500 ys ago
[14:32] <TheMue> perrito666: oh, shit, compiler error, didn’t asked for the error
[14:32] <perrito666> :p
[14:33] <perrito666> more in the spirit of NewEngland(unconquered world.Country) (*europe.England, error) {}
[14:34] <perrito666> although that only makes sense in spanish where uk == england in daily use
[14:34] <natefinch> uk == england in most of the united states.  Took me forever to understand the political structure of that little batch of islands
[14:35] <perrito666> I was provided an educational video by an uk guy
[14:35] <perrito666> which explained all of that
[14:35] <TheMue> funnily many Country instances reacted with a panic() but England defered a recover()
[14:35] <perrito666> we are a bunch of nerds
[14:36] <wwitzel3> http://xkcd.com/850/
[14:36] <TheMue> Meeeeee? Nooooo! *blush*
[14:38] <TheMue> wwitzel3: perrito666: regarding england: http://twistedsifter.files.wordpress.com/2013/08/the-only-countries-britain-has-not-invaded.jpg
[14:38] <wwitzel3> lol
[14:40] <ericsnow> natefinch, fwereade: are we having that meeting?
[14:54]  * perrito666 notices that he forgot to cook lunch
[14:55] <jcw4> perrito666: eating it raw?
[14:55] <jcw4> perrito666: that corruption scandal must have really got your attention
[14:55] <perrito666> jcw4: no, actually I was self documenting code
[14:55] <perrito666> :p
[14:56] <jcw4> :)
[15:21] <perrito666> alexisb: natefinch something seems wrong with the hangout
[16:10] <perrito666> ericsnow: ping me when you are available
[16:10] <ericsnow> perrito666: ping
[16:11] <perrito666> ericsnow: priv
[16:32] <rogpeppe> mgz: ping
[16:33] <mgz> rogpeppe: hey
[16:33] <rogpeppe> mgz: would you be able to review a change to godeps?
[16:33] <mgz> sure thing
[16:33] <rogpeppe> mgz: ta! https://codereview.appspot.com/106250043/
[16:34] <mgz> hmm
[16:35] <rogpeppe> mgz: do you think that fetching by default is a bad thing?
[16:35] <mgz> just thinking if it's going to bork anything
[16:35] <mgz> well, it borks the wait, I'm actually on a different branch case
[16:36] <mgz> but I guess it's not to bad to learn to use -F if needed
[16:37] <rogpeppe> mgz: i can't think of a case where i'd ever actually want to use -F
[16:37] <rogpeppe> mgz: after all, the repo may be updated anyway, regardless of -F
[16:37] <rogpeppe> mgz: i guess the only time i might want to use it is if my network connection is poor
[16:38] <mgz> rogpeppe: wehn you want to see deps that need updating, but not screw with trees because you're not sure of their current state
[16:38] <mgz> using godeps -u in that case is a little dodgy anyway
[16:38] <rogpeppe> mgz: there's always the -n flag for that
[16:39] <rogpeppe> ha, i've just spotted a bug
[16:39] <mgz> -P of 10 by default is also maybe a question
[16:39] <mgz> that's enough to make rural broadband pretty sad
[16:42] <rogpeppe> mgz: i bet your web browser fetches more than 10 things at once...
[16:42] <mgz> rogpeppe: really need some actual tests for the changes
[16:42] <rogpeppe> mgz: yeah, i thought you might say that. the tests have been broken for a while :-(
[16:43] <rogpeppe> i should really fix the tests
[16:43] <mgz> rogpeppe: sure, but running ten git processes in parallel is more than just an http get
[16:43] <mgz> rogpeppe: change seems fine in general though
[16:43] <rogpeppe> mgz: you're worried about cpu resources?
[16:43] <rogpeppe> mgz: or does git make lots of connections in fact?
[16:44] <rogpeppe> mgz: thanks
[16:46] <mgz> rogpeppe: more network/memory, but yeah, depending on the url of the repo, it's not just one connection
[16:46] <rogpeppe> mgz: got a suggestion for a better default?
[16:47] <mgz> 4?
[16:48] <rogpeppe> mgz: ok, 4 it is
[17:04] <alexisb> ericsnow, ping
[17:04] <ericsnow> alexisb: coming :)
[17:04] <alexisb> :)
[17:16] <rogpeppe> mgz: well, i've got the original tests passing now at any rate, but no time left to add -u tests.
[17:17] <rogpeppe> mgz: i'll wait until Mon before pushing the code, as i think it's worth having the changes even without tests
[17:17] <rogpeppe> mgz: and at least then i'll be around when the 'bot breaks :-)
[17:22] <jcw4> I've updated https://github.com/juju/juju/pull/164 to merge in bodie_'s work and address some of thumpers comments
[17:22] <jcw4> PTAL ^^
[17:23]  * jcw4 is eow
[18:01] <perrito666> sinzui: I never realised you could menace the bot into merging stuff
[18:02] <perrito666> wwitzel3: ericsnow standup?
[18:02] <ericsnow> wwitzel3: yep
[18:03] <sinzui> perrito666, I have a latent ability to do neigh impossible thing. Intimidate bots, run 386 instances in Hp cloud, create trans-cloud juju envs.
[18:03] <wwitzel3> perrito666: yep omw
[18:03] <sinzui> I attribute this to my new way of looking at issues, and I will cheat if necessary
[18:17] <sinzui> perrito666, wwitzel3 any insight into this critical bug https://bugs.launchpad.net/juju-core/+bug/1335243
[18:17] <_mup_> Bug #1335243: No tools available TestValidateConstraintsCalledWithMetadatasource <regression> <test-failure> <juju-core:Triaged> <juju-core 1.20:Triaged> <https://launchpad.net/bugs/1335243>
[22:45] <wallyworld__> mgz: around?