[00:20] machine-0: 2014-04-11 00:19:50 INFO juju.worker.instanceupdater updater.go:264 machine "0" has new addresses: [public:localhost local-cloud:10.0.3.1] [00:20] localhost is public? [00:21] also... [00:21] $ juju destroy-environment local -y [00:21] ERROR readdirent: no such file or directory [00:21] where did that start coming from? [00:27] thumper: that is um, wrong [00:27] how does that even happy [00:27] happen [00:27] NewAddress requires you to give an address scope [00:28] davecheney: axw landed something in the last day that changed it [00:28] thumper: i've been tinking that Address.Scope should be private [00:28] then we can force every creation to go through a helper function [00:31] davecheney: chat with axw when he starts as he has done a lot of this [00:31] any one want to review the debug-log client hookup? https://codereview.appspot.com/85570044 [00:33] cmars: you got your lgtm, sorry about delay. i've been head down landing some critical 1.18 fixes [00:33] np, thanks wallyworld__ [00:34] thumper: suppose so [00:40] if you remove juju-local juju-mongodb juju-core via apt-get with no bootstraped environment, the mongod process is still around [00:40] is that intentional? [00:41] this is 1.18 on trusty [00:46] stokachu: depends, do you have mongodb-server package? [00:46] thumper: just juju-mongodb [00:47] and its pointing to /usr/lib/juju/bin/mongod [00:47] probably unintentional [00:48] ok ill probably file a bug because if you dont kill that process subsequent bootstraps will fail [00:57] wallyworld__: I talked with rog last night about the error [00:58] ok [00:58] wallyworld__: he suggested that I change the connection error to a more generic NotSupported error [00:58] agree that's better [00:58] * wallyworld__ doesn't like inconsistency [00:58] * thumper nods [00:58] I have found another problem though [00:59] but it is existing and elsewhere [00:59] I'll land this then fix the bug [00:59] s/CodeIsNotImplemented/IsNotSupported :-) [01:00] well... [01:00] not implemented has a different meaning to not supported [01:00] not implemented implies that one day you might [01:00] but yes, agree in general [01:00] sure, but we are using this as a mechanism to delete running against older api servers [01:01] detect [01:01] and failing back to 1dot16foo() [01:02] sinzui: you'll see the email, but i got into 1.18 both john's fixes, for scp and downgrades. hopefully that will allow CI to work again [01:07] wallyworld__: do you want me to just use not implemented? [01:07] I'd be ok with that [01:08] thumper: nah, let's go with the new error and promise hand on heart to port to using it everywhere appropriate :-) [01:08] fft [01:08] the 1dot16 fallbacks will be disappearing anyway [01:08] like that'll happen [01:08] yeah, I also renamed it fallback from 1.16 to 1.18 [01:13] wallyworld__, I am hopeful that lp:juju-core/1.18 r2267 will pass. The azure-deploy test is very ill. I think we need to review both the test and azure cloud itself [01:14] ok, we can ask axw for input there perhaps [01:14] ? [01:14] azure is bad on 1.18? [01:14] axw: sinzui says there are issues with the azure CI test not working [01:15] i haven't looked yet, but perhaps we need to review what's being tested and how [01:15] to see where the issue is [01:15] is there a bug I can look at? [01:15] or build failure [01:15] CI build [01:16] http://ec2-54-84-137-170.compute-1.amazonaws.com:8080/job/azure-deploy/ [01:16] ta [01:16] wallyworld__, I revert the changes that were made to help it pass [01:16] thanks for looking, i'm flat out right now landing stuff [01:17] sinzui: you mean the changes with regard to downgrades and scp changes? [01:17] wallyworld__, axw. We increased the timeouts and tried to change the tools-metadata-url from stable to testing to help tests pass. The effoerts didn't help [01:17] i saw the metadata url bug [01:17] sinzui: what's going on? it's just hanging there? [01:17] wallyworld__, the downgrades restoration fixes 1.18. [01:17] \o/ [01:18] sinzui: i'm porting to trunk now, conflict to solve but should be landed soon [01:18] In trunk no env will upgrade withing 30 minutes [01:18] I see the last one failed in scp, but the current one is just stuck on bootstrap? [01:18] The scp is secondary, though it also acts as a computability test [01:20] 2014-04-11 01:20:06 ERROR juju.cmd supercommand.go:300 charm not found in "/var/lib/jenkins/repository": local:precise/dummy-source [01:20] wat [01:21] axw we tried replacing mysql and wordpress charms with charms the let us test just juju [01:22] axw, we started on it earlier this week, then rushed it into use when we hoped to remove the many had starts that both of those charms have [01:22] the next run will use mysql and wordpress [01:23] In theory charm-testing is responsible for making sure that mysql and wordpress are sane. [01:24] ok [01:36] sinzui: which location do the azure tests run in? [01:37] US West...the only location that has ever worked [01:37] heh ok [01:41] sinzui: thumper http://paste.ubuntu.com/7233131/ [01:41] gettng closer [01:41] the ones maked failed > 600s [01:41] are actually timeouts [01:42] if the builder was faster (building gccgo at the same time) [01:42] they may have passed [01:42] well done [01:42] SHIT [01:42] is jesse around [01:43] provider/common is still complaining about missing keys [01:47] sinzui: I'm not convinced azure is entirely healthy, I'm getting errors I haven't seen before from the API [01:47] e.g. the storage API refusing connections [01:48] there's scheduled maintenance tomorrow on West US, I wonder if they started early on the "safe" parts [01:49] though now I've said that, azure-deploy just passed [01:49] axw, I see a lot of errors. The CI often reties. Azure is actually very healthy today. I only failed 4 out of 24 hours [01:50] ok [01:50] heh :) [01:50] Blessed: lp:juju-core/1.18 r2267 [01:51] I have to backport another fix, but that's good to know [02:00] wallyworld__: I'm about to backport the network addresses change - you're not already doing that, right? [02:01] axw: no, i've been dealing with the other critical issues stopping CI from working [02:01] just porting to trunk now from 1.18 [02:01] nps, thanks [02:01] ah ok [02:02] sinzui: does that mean you worked around the scp issues fixed by r2268? [02:03] wallyworld__, we didn't succeed. It was a lower priority than getting a pass [02:03] i'm not 100% sure, but maybe r2268 allows the original scripts to work? [02:03] wallyworld__, the scp issue only comes into play when the test fail [02:04] ah ok [02:04] anyways, it's merged into 1.18 and heading to trunk so if the tests fail again.... :-) [02:04] wallyworld__, http://ec2-54-84-137-170.compute-1.amazonaws.com:8080/job/canonistack-deploy-devel/ didn't capture logs from the last tests [02:05] yeah, that's r2267 [02:05] r2268 should hopefully cspture the logs [02:06] CI's normal and fallback credentials are broken in canonistack. we cannot test it until the swift authentication issue is fixed. So every canonistack test will fail [02:06] :-( [02:49] anyone? https://codereview.appspot.com/85570045/ [02:52] Blessed: lp:juju-core/1.18 r2270 [02:53] thumper: I put some garbage into jujud/agent.go then ran cmd/juju/bootstrap_test.go TestTest: passes locally, fails on vm [02:53] \o/ [02:53] wallyworld__: don't worry about the juju command there... [02:53] waigani: sorry that was for you [02:53] waigani: but check the others [02:54] waigani: although I do challenge that [02:54] waigani: if you delete ~/go/bin/jujud and rerun the test [02:54] waigani: what happens? [02:55] thumper: passes [02:55] waigani: what is the error on the vm? [02:55] https://bugs.launchpad.net/juju-core/+bug/1304767 [02:55] <_mup_> Bug #1304767: test failure in cmd/juju [02:55] thumper: I'll paste full error, hang on [02:56] waigani: also, run a make check to run all the tests [02:56] thumper: http://pastebin.ubuntu.com/7233312/ [02:56] with a broken jujud you'll get a lot of failures [02:56] that I don't think should happen [02:57] heh [02:57] ok, to test locally [02:57] we should do something like this... [02:57] is it something to to with there not being a candidate match for 14.04:ppc ? [02:57] PatchValue(&version.Current.Series, "magic") [02:58] make the series be something it can never be anywhere [02:58] and you'll hit the same problem locally [02:58] (I think)( [02:58] thumper: still passes [02:59] hmm... [02:59] it is something like that... [02:59] patch it before the start of setup [02:59] ah okay [02:59] the conn suite will bootstrap in setup [03:00] thumper: still passes [03:01] it is something like that.... [03:01] play a bit and break it [03:01] thumper: I'll keep debugging on the vm - slowly but surely [03:02] thumper: okay, I'm good at breaking things :) [03:02] I have to run and get my girl now [03:03] I've cornered the bug, with a bit more testing I should get it tonight. [03:08] waigani: found it [03:08] we patch version.Current, but don't patch arch.HostArch [03:12] wallyworld__, https://bugs.launchpad.net/juju-core/+bug/1302205 bothers me. It is critical, but is not targeted to the current milestone. I can see you working on it. I want to move it to 1.19.0 to reflect how it is being treated [03:12] <_mup_> Bug #1302205: manual provisioned systems stuck in pending on arm64 [03:12] sinzui: what is the plan fo releasing 1.19? [03:13] sinzui: sure [03:14] thumper, if 1.19.0 gets a passing rev today/tomorrow I might release it. lp:juju-core r2587 was the last time 1.19.0 passed. [03:15] 1.18.1 has two passes today, so I think it is more likely to be released [03:16] * thumper nods [03:28] sinzui, i've marked 1303880 and 1295140 as fix-committed. fixes for these have landed in trunk & 1.18 [03:28] thank you cmars [03:46] wallyworld__: backport for addresses fix, can you please review? https://codereview.appspot.com/86720043/ [03:46] sure [03:51] axw: done, looks like a nice change [03:52] wallyworld__: cheers [04:02] davecheney: is this one you've seen before? http://pastebin.ubuntu.com/7233429/ [04:03] dannf: you still online? [04:06] thumper: Maybe you can point a developer to work on bug 1306212. CI will fail trunk because of it [04:06] <_mup_> Bug #1306212: juju bootstrap fails with local provider [04:11] sinzui: make him fix it himself :-P [04:12] except, it may be HA related [04:12] wallyworld__, I don't make fixes after midnight. I do dangerous things when I am tired [04:12] sinzui: no, not you, thumper :-) [04:12] i didn't mean you to fix it [04:12] you need to go to bed! [04:13] I am very skeptical that any of the trunk upgrades will pass. They are all paused and approaching the 30 minutes timeout [04:13] :-( [04:13] deployments look ok [04:14] at least 1.18 works so that can be released soon [04:16] I already downloaded the 1.18.1 tarball and win installer. I could release that tomorrow [04:17] sinzui: fix for bug 1303735 landing now though [04:17] <_mup_> Bug #1303735: public-address change to internal bridge post juju-upgrade [04:17] \o/ [04:17] we should not release without that one [04:18] the only other one is backup failure [04:18] meh for 18.1 i reckon [04:26] wallyworld__, There is not much point fixing that backup bug because the restore bug is targeted for 1.19.0. [04:26] that's what i reckon also [04:26] hence, "meh" :-) [04:27] so we can retarget and release as soon as the final fix lands [04:27] sinzui: fuck off to bed, it's way too late for you to be online [04:27] :-) [04:28] I have more email explain what blocks 1.19.0 and then I will sleep [04:44] wallyworld__: yes [04:44] it turns out that there is a bug in gccgo [04:44] things are only given small stacks [04:44] and when they run off the end, they crash [04:45] boom [04:45] dunno why some ppc machines are ok [04:45] i'm building a test gccgo now [04:45] ok [04:45] mwhudson: ping [04:45] davecheney: btw, arm64 works now \o/ [04:46] wallyworld__: working on a compiler fix [04:46] wallyworld__: it's actually a build option [04:47] the issue i has was bug 1274558, transparent huge pages, there is a wor arpund [04:47] eek [04:47] autoconf is incorrectly turning on the -fsplit-stack option for gccgo [04:47] the issue caused go executables to hang etc etc [04:47] which is really only an intel thing [04:50] wallyworld__: can you grab the last dozen lines of dmesg [04:50] on that system [04:50] ok === vladk|offline is now known as vladk [05:46] davecheney: this is not a juju bug is it https://bugs.launchpad.net/juju-core/+bug/1300256 [05:46] <_mup_> Bug #1300256: juju status results in unexpected fault address on arm64 using/ local provider === vladk is now known as vladk|offline === liam_ is now known as Guest89383 [07:22] morning all [07:23] fwereade, I hope you're feeling better today :), just as a reminder - my two VLAN CLs https://codereview.appspot.com/86010044/ and https://codereview.appspot.com/86600043/ when you can have a look [07:24] * dimitern is away for 1h === vladk|offline is now known as vladk [08:07] mornin' all === vladk is now known as vladk|offline [08:37] davecheney: turned out to be a one liner: https://codereview.appspot.com/86760043/ [08:37] morning rogpeppe [08:38] waigani: hiya [08:38] Exciting Friday night here, coding in the kitchen [08:47] do we have a way to version API methods yet? [08:55] axw: renaming them is the only way currently [08:55] axw: or going the backwards-compatible route [08:55] rogpeppe: ok, thanks [08:56] axw: there are a couple of directions i'd like to go with it, but it's a sensitive issue [08:57] ha, i wondered what i'd done to break the cmd/jujud tests, but this happens in trunk (many times): [LOG] 36.85478 ERROR juju worker: exited "rsyslog": failed to write rsyslog certificates: cannot create temp file: open /var/log/juju/rsyslog-cert.pem824698669: no such file or directory [08:57] this is not great [09:02] rogpeppe: did you get a chance to look at the EnsureAvailability CL? === vladk|offline is now known as vladk [09:02] axw: i'm about half way through the review [09:02] ok [09:02] axw: sorry, will get back to it! [09:03] rogpeppe: nps [09:04] axw: there was one thing i was having difficulty working out [09:04] axw: i couldn't quite see whether it always preserves the invariant that the number of wants-vote machines is always odd [09:05] it should always be the parameter, and that is checked at the top [09:06] if one is taken out of VotingMachineIds, another will replace it [09:07] rogpeppe: I must admit it was a little mindbendy to me, so don't take my word for it ;) [09:20] davecheney: pong [09:24] waigani: nice fix [09:24] yeah, that was what I suspected [09:24] we weren't specifying, so it fell through to 'this machine' [09:24] which didn't match the fixtures [09:25] waigani: interesting line wrapping, are you using a c64 ? [09:26] davecheney: line wrapping? in the description you mean? [09:26] axw: i'm wondering if we should be doing all the stateserverinfo ids manipulation inside a single Update op [09:26] waigani: y [09:27] rogpeppe: what's the benefit? [09:27] axw: it means that other code can't see them in an intermediate state [09:27] davecheney: lol - well I may be formatting out of nostalgia ;) [09:28] rogpeppe: all the ops are run in a single transaction though? [09:28] davecheney: how are we looking now on ppc? [09:28] axw: read operations don't respect transactions [09:28] waigani: no complain', just sayin' [09:28] waigani: will know in < 300 seconds [09:28] rogpeppe: ah, I see [09:28] waigani: provider/common was whinging about mising ssh keys [09:28] oh really? [09:29] I can look into that if you like? [09:29] waigani: i'll know in a few mins [09:29] axw: the other thing i'm trying to persuade myself of is whether the $size asserts in maintainStateServersOps are still sufficient [09:29] hold tight [09:30] rogpeppe: I was thinking they should be changed to exact match - is there a use case for two concurrent callers? [09:31] I figure someone might want to have a cron job calling this [09:31] axw: well, we try to make everything work ok with concurrent callers [09:31] axw: i'm not sure that an exact match is possible to do though [09:31] axw: it might need to be a txn_revno assertion [09:31] davecheney: I looked into 1262967. I suspect it is a similar problem. So far though, none of the tests are failing for me. [09:32] waigani: bootstrap_test.go:69: c.Assert(err, gc.IsNil) [09:32] ... value *errors.errorString = &errors.errorString{s:"no public ssh keys found"} ("no public ssh keys found") [09:32] rogpeppe: sure, I just meant whether we try to allow concurrent modifications or lock each other out entirely [09:32] axw: ah yes [09:33] axw: if we've got two concurrent callers both calling ensure-availability with different numbers, that's likely to be problematic anyway [09:34] axw: but if the server count hasn't changed, i think it should be fine to just let a concurrent call assume that the first one has worked, and return with success [09:34] axw: s/server count/voting server count/ [09:35] rogpeppe: as we do now? [09:35] len(VotingMachineIds) == numStateServers? [09:36] axw: well, that was ok before, but isn't now, because we want to juggle available servers [09:36] davecheney: I can't reproduce that? [09:36] provider/common$ go test on ppc 31 tests pass? [09:36] waigani: rm -rf ~/.ssh :) [09:36] ugh god facepalm [09:37] rogpeppe: it should still work - it checks if there are any machines being taken out of the voting set [09:37] waigani: it should be the same as your provider manual fix, right [09:37] davecheney: is there a bug for that one? [09:37] the underlying cause is the same [09:37] rogpeppe: info is updated by updateAvailableStateServersOps if that's not clear [09:37] davecheney: yep, looks very similar [09:37] waigani: the bug was for both I think [09:37] so VotingMachineIds out may be smaller than going in [09:38] ooh right, okay I can fix and link it to the same bug? [09:38] yup [09:38] sweet, will do [09:38] axw: currently, we just return nil if the number of voting machines hasn't changed. i'm not sure we can still do that. [09:38] mwhudson: i've found hte cause of the juju crashes on ppc [09:38] in the gccgo deb [09:38] actually, i'll step back [09:38] davecheney: oh? [09:39] mwhudson: basically, libgo tests to see if the compiler supposed -fsplit-stack [09:39] which ppc says it does [09:39] but it lies [09:39] ah [09:39] i think the same is for arm [09:39] arm64 [09:40] i mean, you need gold to support it [09:40] and gold isn't even installed on ppc [09:40] rogpeppe: sorry, I don't understand why. if we remove a voting machine ID, then len(VotingMachineIds) < numStateServers. If we bring an available server back in, that count doesn't change, but we add a txn.Op to make the change in mongo [09:40] davecheney: i' [09:41] davecheney: i'm pretty sure i checked that the configure script does not think arm64 supports split stacks [09:41] ok, that might be a different error [09:41] but it looks like ppc is saying it does [09:42] and so libgo configures itself accordingly [09:42] and so each goroutine has a small stack and runs of the end easily [09:44] checking whether -fsplit-stack is supported... no [09:44] hmm, that is odd [09:44] axw: i'm just saying that i don't think we can just do nothing at all if len(VotingMachineIds) == numStateServers [09:44] axw: (which is the current behaviour) [09:45] rogpeppe: ah, we should perform an assertion you mean? [09:46] axw: we'll still need to potentially decommission unavailable machines and add new ones if so [09:48] rogpeppe: yes, that's being done in my CL. possibly not very obviously [09:48] axw: yup [09:48] or maybe I just don't understand [09:48] axw: perhaps when you said "as we do now" you meant "as we do in my branch" ? [09:50] rogpeppe: "if the server count hasn't changed, i think it should be fine to just let a concurrent call assume that the first one has worked, and return with success" -- do we not do that on trunk? maybe you meant server count hasn't changed in mongo, and availability hasn't changed [09:51] anyway. I think we're roughly on the same page [09:51] axw: yeah [09:51] morning all [09:51] natefinch: hiysa [09:52] axw: i suppose the question now in my mind is: is it possible for EnsureAvailability to do any actions without changing the server id counts? [09:53] axw: hmm, i think it might be possible [09:53] morning natefinch [09:54] axw: so perhaps the best thing is to assert on txn_revno and assume that txn.ErrAborted means all's well [09:54] rogpeppe: yes, it will change the counts definitely. the check there at the moment says that the counts didn't change externally [09:54] rogpeppe: SGTM [09:55] waigani: are you going to apply https://codereview.appspot.com/85710043/ to provider/common ? [09:56] axw: what if we've got the following scenario (x means unavailable): machineids: 1 2 3x 4; votingids: 1 2x 4; then we'll end up with something like: machineids: 1 2 4 5; voting ids: 1 4 5 [09:56] axw: i think [09:56] davecheney: what would be best? I can update that branch or create a new one? [09:56] axw: where the voting counts haven't changed, but the contents have [09:56] waigani: you need to createa new branch [09:56] axw: (and 5 is a newly commissioned machine) [09:56] this is the gift that lbox brings us [09:57] davecheney: ah hehe okay [09:57] nice timing, as I was just starting on the old branch [09:57] you must have sensed it (or you've hacked into my computer) [09:58] rogpeppe: each of the promote/demote ops assert that they weren't changed too [09:58] rogpeppe: so if something else modified the the contents concurrently, we'd still see assert on wantvote/hasvote fields [09:59] axw: ah, good point [10:09] axw: reviewed [10:09] rogpeppe: thanks [10:10] rogpeppe: ahhhh, now I see what you mean about voting count not changing [10:10] on ErrTxnAborted [10:10] ErrAborted rather [10:13] axw: cool [10:13] davecheney: lboxing... [10:16] kk [10:17] davecheney: https://codereview.appspot.com/86800043 [10:17] waigani: reviewing [10:17] night all [10:17] it would be nice to get this on in this evening [10:17] davecheney: cool, I'll hang around until it lands [10:20] waigani: nah [10:20] i can land it myself [10:20] i have timezones on my side [10:20] it looks like there might be another bug in cmd/juju [10:20] tests [10:20] simple ordering one [10:21] davecheney: I'm hitting this on the vm: fatal error: bad spsize in __go_go [10:21] eek [10:22] yeah, I don't seem to be able to run my tests [10:28] waigani: $ go test ./provider/common/ [10:28] ok launchpad.net/juju-core/provider/common 24.926s [10:28] LGTM [10:28] ship it [10:28] davecheney: sweet [10:34] davecheney: cmd/juju$ go test [10:34] OK: 226 passed, 2 skipped [10:34] PASS [10:39] waigani: nice one [10:39] give yourself the evening off [10:43] davecheney: thanks. If you see any bugs you want me to look at, send me an email for Monday. [10:43] davecheney: I'll be keen to see how close we are to getting juju working on ppc :) [10:44] waigani: i think we're down to 3 [10:44] cmd/juju [10:44] which looks easy [10:45] cmd/jujud which looks like a timeout [10:45] the joyent provider which is a timeout [10:45] and worker tests [10:45] which are a bug in the compiler i'm working on [10:47] rogpeppe, dimitern: standup! [10:48] waigani: http://paste.ubuntu.com/7233131/ [10:48] from a few hors ago [10:48] be suspcious of any tests thta take 600 seconds [10:48] the watchdog kills it [10:57] https://bugs.launchpad.net/juju-core/+bug/1306536 [10:57] <_mup_> Bug #1306536: replicaset: mongodb crashes during test [10:57] this is a real thing [10:57] if mongo shits itself during our CI [10:57] which it does, continually [10:57] what is it going to do in the field ? [11:02] natefinch: shall we hang out elsewhere? [11:02] rogpeppe: sure [11:03] natefinch: https://plus.google.com/hangouts/_/canonical.com/juju-HA?authuser=1 [11:04] davecheney: we'll look into it. I've seen it occasionally. not sure what causes it yet [11:05] natefinch: what version is the bot running [11:05] is it still running out old crack versoin of mongo we made 2 years ago ? [11:07] natefinch: bzr+ssh://bazaar.launchpad.net/~rogpeppe/juju-core/natefinch-041-moremongo/ [11:07] davecheney: I don't think so, but I'm not sure [11:08] $ ~/bin/juju destroy-environment -y local [11:08] ERROR failed verification of local provider prerequisites: [11:08] juju-local must be installed to enable the local provider: [11:08] umm .... [11:08] so, i can't develop juju unless I have a conflicting juju binary installed ? [11:10] https://bugs.launchpad.net/juju-core/+bug/1306544 [11:10] <_mup_> Bug #1306544: developing juju requires juju-local to be installed [11:11] interesting way to put it, we just need youy to have juju-mongodb [11:11] this is kind of a problem [11:11] it's been a huge problem for new developers.... but you're right, we should have a separate package to just deploy the db [11:11] you have to be super careful `juju` is the juju you mean [11:12] yeah, I think rog hit that [11:12] interesting... actually: [11:12] $ which juju [11:12] /usr/bin/juju [11:12] sad trombone === vladk is now known as vladk|lunch === vladk|lunch is now known as vladk [12:43] does anyone know anything about statecmd.MachineConfig ? [12:44] from the description, it looks like it just returns information, but AFAICS, it's actually responsible for setting up the initial password info on a machine too [12:46] and it seems kinda weird that it's living inside statecmd too [12:47] ah, i see, it's a 1.16 legacy [12:51] evilnickveitch: do you want me to resubmit doc changes against github in order to get them landed? [12:55] mgz, no, it's fine. I did a PR for the GH repo. Nobody has looked at it, so I will just merge it anyhow [12:56] evilnickveitch: I can probably stamp it... [12:56] evilnickveitch: too late, thanks! [12:57] :) [13:00] fwereade: hiya, saw your comment on instance type constraints. i haven't done any work on it for a few days except for merging trunk and resolving conflicts as i've been doing the arm stuff and other work for 1.18.1 etc. i guess i should mark it back as wip. i'll be able to get more done next week [13:31] rogpeppe: back' [13:34] wallyworld__, no worries, I knew we'd talked about some of the stuff I mentioned, just wanted to make sure it was recrded [13:34] sure, ok [13:35] rogpeppe, IIRC it's primarily for the manual provider (and particularly for hazmat's convenience) [13:36] fwereade: i'm just pondering the best place to add mongo password setup for new state servers [13:36] natefinch: https://plus.google.com/hangouts/_/canonical.com/juju-HA?authuser=1 [13:37] rogpeppe, preferably purely inside existing state servers, surely? [13:38] fwereade: sure [13:38] fwereade: i think it's best done along with the other password in the API call ProvisioningScript [13:39] fwereade: (FWIW, that's another bad name - it doesn't sound like it actually sets up the machine too) [13:39] rogpeppe, fair enough, so long as we don't put it in cloudinit [13:39] fwereade: definitely not - it couldn't go in cloudinit anyway [13:40] fwereade: BTW NewAPIAuthenticator seems to be fundamentally misguided - it only gets the state and API addresses once [13:40] rogpeppe, yeah, I know, it sucks [13:40] fwereade: i'm not sure whether to change it to fetch the addresses each time, or to add a watcher to the provisioner [13:41] rogpeppe, midpoint: get the addresses once per batch of machines [13:41] fwereade: i suppose AuthenticationProvider could do the watch too [13:41] rogpeppe, the intent was always to do it per-batch but I forget why it didn't happen [13:42] rogpeppe, I suspect there was some ugly interaction with container provisioners [13:48] fwereade: ah, i see. we should get the address in provisionerTask.startMachines [13:49] how cool would be to have a special var that when something other than nil is assigned to it would automatically produce panic or return err (error handling induced day dreaming) [13:49] fwereade: it's pretty awkward to refactor the code so it uses bulk API calls, BTW [13:50] fwereade: although it's definitely a place that it's worth it [13:51] fwereade: although actually just running all the startMachine calls concurrently would be a big win (and quite likely faster than using sequential bulk calls) [13:51] perrito666: magic is pretty much anathema in Go. I prefer to have the code do exactly what it says it does, and no more. Magic is how you get bugs. [13:51] natefinch: amen [13:56] natefinch: I guess I was more in search of syntactic sugar than magic [14:00] perrito666: also not something Go generally does :) you can do this, though: [14:00] if err := foo(); err != nil { [14:00] return err [14:00] } [14:02] rogpeppe, we do also want environ.StartInstances to mitigate rate limiting [14:02] fwereade: yes. [14:02] natefinch: yep, that is where I go whenever I can, although that is closer to syntactic saccharine than sugar [14:02] rogpeppe, and I'm well aware of just how much hassle that will be, but we can only put it off so long ;) [14:02] fwereade: although i'm yet sure if rate limiting should really be done in the Environ itself [14:03] s/yet/not yet/ [14:03] perrito666, also it keeps the scope of err nice and tight, which is often a benefit of itself [14:03] rogpeppe, I'm reasonably sure it should be, myself [14:03] fwereade: the problem is that not everything that makes provider calls necessarily shares the same Environ [14:03] rogpeppe, which is not to say that we shouldn't also have some stuff around instancepoller too [14:04] rogpeppe, most of them should, though [14:04] rogpeppe, and it's probably not that hard to arrange [14:04] fwereade: doesn't every worker create its own Environ? [14:04] fwereade: but, yeah, it's probably not too hard to pass an environ in to those workers that need one [14:04] fwereade, when you have a moment... [14:05] fwereade: except that it can change [14:05] rogpeppe, yeah, but it doesn't have to -- pass one in, have another worker responsible for updating it [14:05] mattyw, heyhey [14:05] anyone understand this log about wordpress not installing? failed to fstat previous diversions file: No such file or directory whole log: http://paste.ubuntu.com/7235032/ [14:05] Hi jamespage : I have an idea to address a bug 1304493 that I think was caused by republication of juju tools. [14:05] <_mup_> Bug #1304493: Juju tools 1.18.0 streams.canonical.com checksum mismatch [14:05] sinzui, I think so yes [14:05] sinzui, that feels bad to me [14:06] rogpeppe, fwereade, updated https://codereview.appspot.com/86010044/ [14:06] rogpeppe, fwereade, (networks stuff) [14:11] jamespage, I am not convinced it is the right solution since I do not know why streams.canonical.com got a different file size for the amd64 precise tools. I think the only package it could find is the one from the juju stable ppa [14:12] sinzui, one from the PPA, then one from the distro when I uploaded it I suspect [14:12] jamespage, you made one for precise? [14:12] sinzui, no [14:12] I thought you were just making trusty [14:12] oh - that really is odd then [14:13] sinzui, I am [14:13] sinzui, I don't understand then [14:13] we need utlemming I think [14:13] I will keep investigating [14:13] thank you jamespage [14:17] fwereade: can i run something by you? [14:17] rogpeppe, sure [14:18] fwereade: i *think* we can remove StateInfo from environs.MachineConfig and cloudinit.MachineConfig [14:18] rogpeppe, w00t! [14:18] rogpeppe, I've been wanting to do that for months :) [14:19] fwereade: basically, any agent that needs to connect to State can dial localhost [14:22] fwereade: that will need to change in the future if we want to have more API servers than mongo instances [14:23] fwereade: but even then, i don't think it needs to be in MachineConfig [14:24] rogpeppe, +100 [14:24] rogpeppe, anything starting up at bootstrap time can use localhost, and anything starting up late can just grab the info over the api, anyway [14:25] fwereade: yup [14:25] rogpeppe, excellent [14:46] jamespage, I now think bug 1304493 is caused by different versions of gzip. I don't think this issue is about alternate packages. [14:46] <_mup_> Bug #1304493: Juju tools 1.18.0 streams.canonical.com checksum mismatch [14:47] sinzui, ah - yes - deterministic zip creation [14:50] marcoceppi, hazmat: either of you recognize this error from installing a charm? failed to fstat previous diversions file: No such file or directory full log: http://paste.ubuntu.com/7235032/ [14:50] fwereade: i think we'll remove state addresses from agent config too [14:50] seems to only happen when deploying locally [14:50] rogpeppe, sgtm I think [14:51] fwereade: and i'm also changing the agent config to store hostports to avoid the current impedance mismatch. [14:51] fwereade: then it's [][]instance.HostPort throughout [14:52] rogpeppe, cool [15:02] natefinch: that's an interesting error, I've not come across it before. What's the charm? [15:05] marcoceppi: that's wordpress [15:05] marcoceppi: different error for mysql, start fails [15:06] marcoceppi: mysql gets this: http://paste.ubuntu.com/7235297/ [15:44] sinzui, are we likely to see a 1.18.1 release today? as it will be critical bug fixes I can push that pre-release [15:44] and we're still in universe so meh [15:45] jamespage, the gzip bug is blocking, but I hope to be able to start the release in 2 hours [16:11] natefinch, fwereade ping [16:13] alexisb: howdy [16:15] hi natefinch happy friday [16:15] can I lean on you to take a look at the 4 new critical bugs for 1.19.0 and see if you can easily assign them to someone across the juju team? [16:16] looks like most are regressions [16:16] alexisb: I can look, for sure. What's the timeline on getting them fixed? There's not a lot of working hours left for most juju devs [16:17] as soon as we can, but in regular working hours [16:17] alexisb: ok [16:17] we just need to try and unblock sinzui from releasing [16:18] sinzui, are there any bugs in 1.18 that need our assistance for the release jamespage needs? [16:18] and natefinch thank you! [16:18] alexisb: welcome [16:18] sinzui, our == juju-core [16:25] is hook output still labelled with HOOK in the logs? [16:32] alexisb, I am working on the one bug that blocks the release of 1.18.1. [16:32] alexisb, I am going to defer the backup bug because fixing it doesn't allow you to restore. The restore bug is targeted to 1.19.0 [16:33] sinzui, ack [16:38] sinzui, marcoceppi, alexisb: Btw, looks like wordpress and mysql both fail to deploy on trunk, at least using the local provider on trusty. Getting some errors related to apparmor when installing wordpress and when starting mysql. [16:38] natefinch, that would make sense given on of the critical bugs deals with failing to bootstrap a local provider [16:40] alexisb: I can bootstrap ok, it's deploying that I have a problem with [16:41] ah ok, shows my ignorance :) [16:42] alexisb, heyhey, I'll take a look as well [16:42] fwereade, thanks [16:45] fwereade: check out my response to this bug at the bottom and let me know if you disagree: https://bugs.launchpad.net/juju-core/+bug/1208430 [16:45] <_mup_> Bug #1208430: mongodb runs as root user [16:46] natefinch, concur [16:46] natefinch, db access gives you the keys to the kingdom regardless [16:46] natefinch, have we closed the mongod ports externally yet? [16:46] natefinch, I'm guessing not, but I think we can now; right? [16:47] natefinch, at least on providers where we can do firewalls [16:48] natefinch, in fact it should be a matter of just not explicitly opening it any more [16:48] fwereade: roger says he thinks we keep them open still... but I think we can close them (and definitely should close them) [16:48] natefinch, close 'em :) [16:49] natefinch, ideally we'd close them even to traffic from other machines in the environment, but that's not practical yet [16:49] fwereade: yep [16:49] natefinch, then this is the real bug 1305280 [16:49] <_mup_> Bug #1305280: juju command get_cgroup fails when creating new machines, local provider arm32 [16:49] natefinch, The bug is in ubuntu, but I kept it open on juju-core to make it easy for me to track [16:50] sinzui: ahh, ok, thanks [16:50] natefinch, looking at those criticals I am suspicious that the other 3 are ha-/replicaset-related [16:51] sinzui: the armhf in the title seems misleading, since I'm definitely not running arm here [16:52] natefinch, yep. I thought it was limited ubuntu ports/packaging issue [16:52] fwereade: yeah === vladk is now known as vladk|offline [16:54] fwereade: this one seems like it just requires an update to the backup script to find the juju-db binaries: https://bugs.launchpad.net/juju-core/+bug/1305780 [16:54] <_mup_> Bug #1305780: juju-backup command fails against trusty bootstrap node [16:54] oh, i guess that one is 1.81.1 [16:54] er 1.18.1 [16:55] fwereade: certainly the other ones are HA related [16:55] natefinch, I am going to move that bug to 1.19.0 [16:56] natefinch, if we fix the backup bug, the user still hits the two restore bugs in 1.19.0 [16:56] * sinzui moves the bug to be with its friends [16:56] the more the merrier [16:57] natefinch, So I see one bug in progress for 1.18.1, It's my job to solve the gzip compression problem between different machines. [16:58] Juju-Core can focus on getting trunk releasable for Monday/tuesday [17:15] natefinch, yeah, I think they're not on the $PATH [17:15] natefinch, sorry, evening is happening around me a bit ;) [17:15] fwereade: seems like an easy fix at least [17:15] natefinch, yeah [17:15] btw, https://codereview.appspot.com/86910043 is up for review at long long last [17:17] natefinch or fwereade any chance either of you could help debug an issue where trusty cant deploy precise instances using lxc? [17:18] cjohnston, IIRC thumper has started work on that -- is this new? [17:18] cjohnston, did you set default-series in your config...and is your the failure with some charms bug 1305280 [17:18] <_mup_> Bug #1305280: juju command get_cgroup fails when creating new machines, local provider arm32 [17:19] https://bugs.launchpad.net/juju-core/+bug/1306537 is the bug filed for it, yes default_series is set [17:19] <_mup_> Bug #1306537: LXC provider fails to provision precise instances from a trusty host [17:20] I don't see (error: error executing "lxc-start": command get_cgroup failed in juju status, the instances just sit at pending for hours [17:20] however deploying cs:trusty/ubuntu works [17:26] cjohnston, This issue may be fixed in trunk. The opposite scenario was recently fixed: bug 1302820 [17:26] <_mup_> Bug #1302820: juju deploy --to lxc:0 cs:trusty/ubuntu creates precise container [17:27] cjohnston, 1.19.0 will be released next week, a day after trunk stabilises. [17:34] i'm sure i used to be able to move my mouse when my machine was heavily loaded [17:35] rogpeppe: I've noticed that recently, too [17:36] and this IRC client has an amazing failure mode when i've been typing into it when the machine's heavily loaded [17:36] (recently = last few months at least) [17:36] every few keys typed, it ignores what i've typed and types something from the past insteads [17:37] so just then, if i typed "ddddddddd", i'd get "d abled abled" [17:38] hmm, we've taken away support for 1.19 to work against 1.16 clients, right? [17:38] time to get a new irc client [17:42] rogpeppe, There is a commit in trunk that says that [17:42] and I removed the ability to package 1.16. its dead to me [17:42] sinzui: so we don't care about upgrading from 1.16 then? [17:42] sinzui: which upgrade transition has been failing on trunk? [17:43] rogpeppe, we test stable to stable. 1.16 goes to 1.18. 1.18 can go to 1.19 or 1.20 [17:43] sinzui: ok, cool [17:43] * sinzui notes that juju docs should make that very clear [18:22] natefinch, dimitern, anyone else that's around: large but largely trivial: https://codereview.appspot.com/87010044 === vladk|offline is now known as vladk [18:41] review for whoever: https://codereview.appspot.com/86920043/ [18:44] rogpeppe: this is the machine 0 log for upgrading 1.18 to trunk [18:44] http://pastebin.ubuntu.com/7236187/ [19:00] rogpeppe: ubuntu@ec2-174-129-121-255.compute-1.amazonaws.com [20:40] how deep is a copy of an object in go? [20:42] for instance I copy a document that has fields such as []string slices and arrays inside [20:42] do all those get properly copied? [20:42] perrito666: it's hard to answer that without being flippant. The obvious answer is, it copies everything. But that would be confusing [20:43] perrito666: slices, channels, and maps are all implemented as pointers, so the pointers get copied but not what they point to === Ursinha is now known as Ursinha-afk [20:43] interfaces, too [20:43] arrays are actually consecutive memory like you'd expect, and when you copy them, you copy the whole dang thing [20:44] but slices are just pointers to parts of arrays, so when you copy them, you're only copying the pointer [20:44] natefinch: as I thought, if I want to "clone" and object, I need to create a sort of deepcopy [20:45] perrito666: depends on the value, but yes, some values take some extra work [20:46] perrito666: maps and slices are really the only thing you need to worry about (there's no real way to deep-copy an interface, and it doesn't really make sense to copy a channel) [20:47] perrito666: I guess a good question would be - why do you need to clone something? [20:49] natefinch: thank you, btw, how many hours a day do you work, you are here when I arrive and when I leave? [20:51] natefinch: refactoring a big piece of code of assignToMachine [20:51] perrito666: haha... not that much, really. I have a lot of interruptions during the day due to having two small children. I start work around 5:30 or 6am and end at 5pm, but there's usually a few hours of non-work in there. [20:56] ok, EOD for me [20:56] natefinch: have a nice weekend [20:56] perrito666: you too === Ursinha-afk is now known as Ursinha === hatch__ is now known as hatch] === hatch] is now known as hatch === vladk is now known as vladk|offline