=== bradm_ is now known as bradm [01:12] hey thumper, have a good week away? [01:12] axw: yeah, I did [01:12] quite relaxing [01:13] cool [01:13] good week for it ;) [01:13] it was a little bit crazy last week, but all turned out well [01:14] cool [01:15] I'm tyring to get the team's prioritized list [01:15] so we can focus on what to do next [01:15] I'm finishing off the kvm support [01:17] thumper: I'm pretty close to having synchronous bootstrap done I think; I've got some updates to do per jam's suggestion of toning down the output [01:17] * thumper nods [01:17] cool [01:17] thumper: waiting on some feedback from fwereade regarding state changes for API-based destroy-environment [01:17] when that's done I can finish off destroy-env for the manual provider [01:17] that'll be good to finish off [01:18] axw: I think it would be great to create an "ask ubuntu" question around "how do I use juju with digital ocean", and answer yourself demonstrating the manual bootstrap / manual provisioning [01:18] thumper: I also looked into manual provisioning into non-manual env. There's one particularly difficult bit, which is that agents/units/etc. address each other with private addresses, which isn't likely to work with an external machine [01:19] hmm... [01:19] * thumper nods [01:19] thumper: SGTM, tho Mark Baker (I think it was him) wanted me to get in touch before publicising widely [01:19] axw: what about crazy shit like running a sshuttle [01:19] on the bootstrap node [01:19] yeah something like that may be necessary [01:20] tho I'm starting to think that it's a lost cause, and we should just handle it with cross-environment [01:20] haven't gotten too deep into it yet [01:24] hmm... [02:06] wallyworld_: you can just do "defer os.Chdir(cwd)" - no need to change it now though [02:06] ah bollock [02:06] s [02:07] the tests didn't break regardless, just thought i'd be complete [02:07] i'll fix as a driveby [02:09] cool [03:30] thumper: initial framework https://code.launchpad.net/~wallyworld/juju-core/ssh-keys-plugin/+merge/197310 [03:34] wallyworld_: ack [03:39] * thumper nips to the supermarket for dinner stuff... [04:19] axw: "agents address each other with private addresses", in the case of AWS I *think* we use the DNS name, which appropriately resolves private/public based on where you make the request. But I'll agree that you don't guarantee routing. Assuming a flat network for agent communication probably isn't going to change this cycle, but we may get there eventually. [04:21] jam1: it may not be a difficult problem to solve, but I first came up against this when the agents were deployed with stateserver/apiserver set to the private address [04:21] that was on EC2 [04:26] axw: yeah, code has been changing in that area. It would appear something is resolving that address and passing around the IP, rather than the DNS name. === jam1 is now known as jam [04:28] jam: I don't think it was the IP - it was an internal DNS name [04:28] couldn't be resolved outside EC2 [04:28] bbs, getting lunch [05:15] jam: still working out the kinks, but synchronous bootstrap will look something like this in the near future: http://paste.ubuntu.com/6507781/ [06:04] axw_: looks pretty good. === axw_ is now known as axw [06:06] the apt stuff should be at the beginning obviously :) [07:11] morning [08:43] mornin' all [09:15] jamespage: poke about mongo/mongoexport/mongodump/etc [09:15] jam: morning [09:16] jamespage: I hope you had a good weekned [09:16] weekend [09:16] yes thanks - how was yours? [09:16] pretty good. It was Thanksgiving into UAE National Day, and Expo 2020 celebration, so the weekend gaps were all a bit silly. My son will end up with a 5-day weekend [09:18] jam: nice [09:18] jamespage: so I responded to your questions about what tools we need to include. I'd like to know more from your end what the costs of it are so that we can find a good balance point. [09:19] morning fwereade, I had a question for you when you're available [09:19] jam, heyhey [09:19] fwereade: namely, the 14.04 priorities/schedule stuff [09:20] jam, I couldn't see anything by mramm, so I just copied them straight into the old schedule doc [09:20] jam, https://docs.google.com/a/canonical.com/spreadsheet/ccc?key=0At5cjYKYHu9odDJTenFhOGE2OE16SERZajE5XzZlRVE&usp=drive_web#gid=2 [09:21] fwereade: yay, I was hoping they would end up there [09:25] jam: just reading [09:25] k [09:26] jam: OK - I see what you are saying - I'll respond on list so everyone can see === axw__ is now known as axw [09:55] fwereade: re https://codereview.appspot.com/28880043/diff/20001/state/state_test.go#newcode567, there's code in AddServce/AddMachine* to assert environment is alive [09:55] and accompanying tests [09:55] actually that is the test - maybe I misunderstood you [09:56] axw, yeah -- but Ididn't see a test for what happens if the env was alive as the txn was being calculated, but became not-alive before the txn was applied [09:57] axw, look in export_test.go -- inparticular, SetBeforeHook, I think -- for a mechanism that lets you change state at that point [09:58] axw, there are a few other tests that use it... but not many, the capability is relatively recent [09:58] fwereade: thanks. just so I understand- you mean to check what happens between the initial environment.Life() check, and when the transaction is executed? [09:59] axw, yeah,exactly [10:00] okey dokey, I will add that in [10:00] axw, cheers [10:00] fwereade: and this one: https://codereview.appspot.com/28880043/diff/40001/state/environ.go#newcode84 -- I thought I was following your earlier advice ("I kinda hate this, but we're stuck with it without a schemachange for a total-service-refcount somewhere..."), but I suppose I've misunderstood you [10:05] jam: I'd appreciate another look at https://codereview.appspot.com/30190043/ if you have some time to spare later [10:05] axw, indeed, I was a bit worried I might have misunderstood myself there [10:05] heh :) [10:06] axw, I don't *think* I ever said that you'd have to remove everything before an env could be destroyed, but I might have said something else so unclearly that the distinction was minimal [10:06] axw, it's always been manual machines that are the real problem [10:07] indeed [10:07] they are not much like what's preexisting [10:07] axw, yeah [10:07] axw, I think we're massaging them into shape though [10:07] fwereade: how about I just drop that comment and stick with the cleanup. did you have any thoughts on cleaning up machine docs? [10:08] i.e. destroying machines as environment->dying [10:08] axw, I feel that bit should be reasonably short-circuitable [10:09] axw, ie just kill all the instances and be done with it [10:09] axw, the only reason not to is, again, the manual machines [10:09] yeah, except for manual machines [10:09] jinx ;p [10:10] fwereade: so, one option is to have it schedule a new cleanup for machines that *doesn't* imply --force [10:10] and that will wait for all units to clean up (as a result of services being cleaned up) [10:10] axw, so yeah I could maybe be convinced that *if* there are manual machines we should destroy all the others [10:10] axw, well, there's not actually much point waiting [10:11] axw, I am starting to think that actually destroy-machine should always act like --force anyway [10:11] fwereade: I was just thinking about leaving units in a weird state, but maybe we don't care? [10:11] doesn't matter for non-manual of course [10:11] axw, if you don't want a machine, you don't want it, and if the whole thing's being decommissioned there's no point caring about the units in a weird state [10:12] axw, yeah, doesn't apply to manual machines ;p [10:12] yeah... except if you want to reuse that machine [10:12] heh [10:12] ok, I'll play with that some more tomorrow [10:13] axw, destroy-machine implies no reuse, I think [10:13] axw, whereas manual... often implies reuse [10:14] fwereade: at worst, users can destroy-machine the ones they care about [10:14] then wait for things to clean up [10:14] then do destroy-env [10:14] jam, seeding a thought: can we get rudimentary versions into the API by 1.18? can chat about it more after standup [10:14] assuming destroy-machine doesn't change to always --force [10:14] axw, yeah, indeed [10:15] axw, manual machines again are the reason not to always --force [10:15] blasted things ;) [10:15] * fwereade has to pop out a mo, bbs [10:15] * axw has to go [10:15] adios [10:15] have a good evenning axw [10:17] thanks jam [10:47] TheMue: rogpeppe, mgz, standup? https://plus.google.com/hangouts/_/calendar/am9obi5tZWluZWxAY2Fub25pY2FsLmNvbQ.mf0d8r5pfb44m16v9b2n5i29ig [10:48] wallyworld_: ^^ [10:48] jam: i'm already each day in the qa timeout [10:49] TheMue: absolutely, just didn't know if you were coming to ours, so you're welcome if you want [10:49] jam: so i think it's better to take part on thursdays, there's more than status [10:49] jam: right now i'm in an interesting testing, maybe tomorrow [10:52] TheMue: have a good day, then. [10:54] jam: thank, u2 [10:57] jam: short info about current status, built a "Tailer" for filtered tailing of any ReadSeeker and writing to a Writer. looks good, right now testing it. [10:57] jam: it's part of the debug-log api [11:22] * TheMue => lunch [13:00] jam: I can try reaching out to the github.com/canonical guy through linked in... we're like 3rd degree relations. Seems like he keeps it up to date there. [13:00] natefinch: go for it.I'll forward you the email I tried sending earlier [13:00] jam: cool [13:25] * rogpeppe3 goes for lunch === gary_pos` is now known as gary_poster [14:15] jam I still see my blocking bug stuck in triaged: https://launchpad.net/juju-core/+milestone/1.16.4 Is bug 1253643 a duplicate of bug 1252469 [14:15] <_mup_> Bug #1253643: juju destroy-machine is incompatible in trunk vs 1.16 [14:15] <_mup_> Bug #1252469: API incompatability: ERROR no such request "DestroyMachines" on Client [14:16] sinzui: that gets bumped to 1.16.5 because we moved the DestroyMachines code out of 1.16.4 [14:16] I'll fix it [14:16] jam, then the bug is closed :) I will start motions for the release of 1.16.4 [14:17] sinzui: right, the *key* thing is NEC is already using something (1.16.4.1) which we really want to make 1.16.4, and then make the next release 1.16.5 [14:17] I forgot to check the milestone page, as another of the bugs gets bumped as well [14:17] uhg [14:18] sinzui: did we end up re-landing bug #1227952 ? [14:18] <_mup_> Bug #1227952: juju get give a "panic: index out of range" error [14:19] relanding? I don't know. I can check the version [14:19] in deps [14:20] sinzui: I pivoted to remove everything in lp:juju-core/1.16 back to 1.16.3 so that we could land the things that were critical to NEC, and then get the non-critical stuff out in the next one [14:20] I think bug #1227952 needs to target 1.16.5 [14:20] <_mup_> Bug #1227952: juju get give a "panic: index out of range" error [14:21] jam, ah, the missing info from the Wednesday conversation. okay. A fine strategy [14:21] sinzui: as soon as you give me the go ahead I have lp:~jameinel/juju-core/preparation-for-1.16.5 to bring back all the stuff [14:22] understood === gary_poster is now known as gary_poster|away === gary_poster|away is now known as gary_poster [15:51] marcoceppi: It is a work-in-progress, but I would like your initial impressions of lp:~benji/+junk/prooflib-first-cut [15:51] benji: I thought proof was already free of all sys.exit calls already? [15:52] marcoceppi: I haven't looked for sys.exit specifically yet, so far I have concentrated on project structure [15:52] benji: why seperate this from charm-tools? [15:53] I'm confused as to the goal [15:53] benji: care to jump on a hangout? [15:53] the goal is to create a library that third-parties can consume [15:53] I have a call in a couple of minutes, but I can do it after that (say 30 minutes or so from now) [15:54] benji: I'm confused as to why charm-tools can't be that library? [15:54] sounds good! [15:58] stupid question: I'm trying to query the mongodb that gets started when I run make check - but I can't work out where to get the right creds to start a mongo shell connection [16:05] mattyw: what's "make check"? === rogpeppe3 is now known as rogpeppe [16:05] mgz: ping [16:14] rogpeppe, running all the tests [17:10] rogpeppe: pong sorry, not paying attention [17:11] mgz: np, am currently in a call [17:11] yell after if you still need me [17:11] will be around at least another couple of hours [17:49] niemeyer: ping [17:49] rogpeppe: pongus [17:50] niemeyer: i wondered if you might have a moment to join us in a hangout - i'm trying to resolve some issues after restoring a mongodb [17:51] Sure thing [17:51] Link? [17:51] niemeyer: https://plus.google.com/hangouts/_/calendar/am9obi5tZWluZWxAY2Fub25pY2FsLmNvbQ.mf0d8r5pfb44m16v9b2n5i29ig?authuser=1 [18:03] Fixing tests with <-time.After(x) ...bad or truly terrible? [18:05] Hey all, is there a whitelist of characters for service names in core? I need to set up proper validation in the GUI https://bugs.launchpad.net/juju-gui/+bug/1252578 [18:05] <_mup_> Bug #1252578: GUI shows invalid service name as valid when deploying ghost [18:10] found the regex === BradCrittenden is now known as ba === ba is now known as bac [19:18] abentley, i am have a bad day tearing down envs. Local required manual deletions of lxc dirs and symlinks. I got two timeouts destroying azure [19:18] sinzui: Ugh. [19:19] sinzui: I am starting a test run on the new instances. (juju-ci-2 env) [19:19] fab [19:45] sinzui: Is it possible that you recently destroyed an aws environment? [19:46] abentley, in the last 15 minutes I destroyed test-aws [19:46] and test-hp [19:46] abentley, I don't think these can collide [19:47] sinzui: This log is baffling: http://162.213.35.54:8080/job/aws-upgrade-deploy/71/console [19:47] sinzui: It bootstraps successfully, and later reports it's not bootstrapped. [19:47] hmm, maybe the control-buckets are the same. [19:47] sinzui: Most likely explanation is it was torn down after bootstrapping by one of us or our pet machines. [19:48] sinzui: Ends with yakitori? [19:48] abentley, yep [19:49] sinzui: I think that's the explanation, then. [19:49] abentley, can you instrument CI destroying aws and undermining my acceptance test? [19:49] well [19:49] I can teardown now and find some more japanese food [19:50] I didn't understand the request. But I can certainly change the control bucket. [19:50] abentley, I was wondering if you wanted to make ci destroy aws and then I would see if the env went missing [19:51] sinzui: Okay. [19:51] sinzui: Done. [19:52] abentley, that was it [19:52] sinzui: Okay, changing control bucket here. [19:52] and I will change mine too [19:56] sinzui: 1.16.4 on the local provider no longer ooms, but the deploy goes funny: http://162.213.35.54:8080/job/local-upgrade-deploy/75/console [19:57] sinzui: So the upgrade went fine, but the deploy seems to use the 1.16.3 tools instead of the 1.16.4. [19:58] abentley, we could use --show-log or --debug to see more info about what was selected [19:58] Who knows the most about the openstack provider? [19:58] sinzui: Okay. I can say from the existing log that 1.16.3.1 was selected on that run. [20:00] marcoceppi: I wouldn't say I know a lot about the openstack provider, but maybe I can help? [20:01] abentley, looking at before the destroy step, we can see that 1.16.4.1 was selected, but I don't see any agents upgraded during the wait phase [20:02] sinzui: As the agents reach the expected value, they disappear from the list. So when 0 disappears from " 1.16.3.1: 1, 2, mysql/0, wordpress/0", we can infer that it was upgraded. [20:04] sinzui: When /var/lib/jenkins/ci-cd-scripts2/wait_for_agent_update.py exits without an error, that indicates that all the agents have upgraded. I can change it to print that out. [20:04] hmm, why are the two upgrade commands different in this log [20:05] abentley, this command assume that 1.16.4 is at the location specified in the tools url: juju upgrade-juju -e local --version 1.16.4.1 [20:05] morning [20:06] ^ I think that might assume the bucket was populate from the previous effort [20:06] sinzui: They are for different reasons. the first tests upgrades. [20:06] mramm: back in the land of stars and stripes? [20:06] yes [20:06] sinzui: The second is to force the agents to match the client. It was intended for the case where the agent is newer than the client. [20:07] abentley, sure, but --version requires a binary at the tools url [20:08] sinzui: sure, but shouldn't 1.16.4 deploy its binary to the tools url? [20:09] abentley, only --upload-tools will put the tool in the location. I think --version just pulls from that location [20:09] morning thumper [20:09] o/ natefinch [20:09] mramm: our 1:1 has moved from 9am to 11am for me due to summer time here and not there [20:09] we can move it earlier if you like [20:09] ok [20:10] right now I have an interview [20:10] sinzui: So how does 1.16.3 get into the tools url? Remember we're talking local provider here. [20:10] thumper: but I can move it forward for next week [20:10] mramm: sure [20:11] abentley, local-bucket > streams > aws [20:12] sinzui: So do we need to specify a testing tools-url for local-provider then? [20:12] abentley, I think so, I have in the past with limited success. [20:12] * sinzui ponders trying lxc with with aws testing [20:14] sinzui: In the old environment, I have http://10.0.3.1:8040/tools [20:18] sinzui: fyi, we are going to want to test the local provider with lxc and kvm shortly [20:21] thumper, I was planning that for 1.18, but given the topic of jamespages's reply to the 1.16.4 announcement, I think we need to quickly increment the minor numbers and release 1.18 today [20:21] * thumper tilts his head [20:21] sinzui: must have been a personal reply [20:21] didn't see it [20:21] thumper, do you have a reply to "Adding new modes to the provisioner and new plugins feels outside the scope of stable updates to me" [20:21] what is the rationale there? [20:22] I guess that makes sense [20:22] are we needing a release? [20:23] I learned that jam had planned an alternate 1.16.4 from myself. I made the release with jam's plan, but james thinks we should be doing a 1.18 with these changes [20:24] abentley, thumper I am tempted to create a stable series and branch from 1.16 and just release stables from it with selective merges [20:24] sinzui: Does this shed any light? http://162.213.35.54:8080/job/local-upgrade-deploy/76/console [20:27] sinzui: That sounds like it could work, but I think it would be better if the dev team was doing that. Or better yet, landing fixes to stable and merging them into devel. [20:28] abentley, I am sure that strategy would avoid the porting pain we have had in the last 30 days [20:28] sinzui: agreed. [20:28] sinzui: what is the push to get the plugins out? [20:29] is it external need? [20:29] if that is the driver, [20:29] then +1 on a 1.18 from me [20:29] if it is to just get testing [20:29] why not 1.17? [20:29] we haven't done 1.17 yet have we? [20:30] thumper, no way, 1.17 is really broken. we haven't see hp-cloud work in weeks [20:30] really? [20:30] WTF? [20:30] broken how? [20:30] thumper, we can do a dev release with 2071 from Nov 4. [20:31] why is it so broken? [20:31] shouldn't that be a "stop the line" type issue? [20:31] we cannot tell. exactly. Charms don't deploy on it for testing [20:31] * thumper shakes his head [20:31] what about canonistack? [20:31] is that working? [20:31] 2071 always works, 2072 always fails. canonistack passes [20:34] abentley, about the log. I am still not surprised. I think 1.16.3 was pulled from aws because --upload-tools was not used. [20:34] the only think that leaps to mind with that has to do with installing stuff in cloud-init leaving apt in a weird state [20:34] that revision is nothing openstack specific [20:34] so we are left to look at the differences in the actual cloud images [20:34] abentley, let me see if I can get control of my local and try with aws testing location [20:34] sinzui: got logs? [20:35] sure do [20:35] thumper, https://bugs.launchpad.net/juju-core/+bug/1255242 [20:35] <_mup_> Bug #1255242: upgrade-juju on HP cloud broken in devel [20:37] thumper, abentley ^ about this bug. I ponder if Hp got too fast for juju. I see similar errors on Hp when I am making juju go as fast as possible. When I wait a few minutes for the services to come up before add-relation, I get a clean deployment [20:37] sinzui: I can change the script to always supply --upload-tools for the local provider, if that's what we want. [20:38] sinzui: weird [20:38] abentley, I think that might be the case. let me finish my test with with local + aws testing [20:39] sinzui: Sure. BTW, here's an example of our automatic downgrade to 1.16.3 to match a 1.16.3 client: http://162.213.35.54:8080/job/aws-upgrade-deploy/73/console [20:47] abentley, use use --upload-tools for the second deploy phase. my aws testing hacks didn't help [20:48] sinzui: Can I also use it for the first deploy phase? I use the same script for both deploys. [20:49] abentley, if the first juju is stable and the second is proposed, then it is okay [20:50] sinzui: Okay, I will use it for both deploys. [21:09] sinzui: Okay, I applied --upload-tools to bootstrap, but for some reason, 1.16.3.1 was selected again. Did you mean I should apply it to upgrade-juju? [21:10] abentley, for the second case these is a disaster. 1.16.4 should only upload tools for itself. [21:11] abentley, the test is to verify propose stable can bootstrap itsel [21:11] f [21:16] abentley, The local upgrade and bootstrap just works for me. [21:16] abentley, http://pastebin.ubuntu.com/6511274/ [21:17] sinzui: I am not installing the deb, so that I don't have to worry about version conflicts. [21:17] abentley, but is this a case where we are using the extract juju? Since we didn't install juju, the 1.16.3 tools are all that is available [21:17] sinzui: Instead, I am just using the binary directly. [21:18] bingo [21:19] this is tricky [21:19] sinzui: So there's a ./usr/lib/juju-1.16.4/bin/jujud in a directory in the workspace. Any way to convince juju to use that? [21:19] GOPATH? [21:21] I don't know enough about what GOPATH means. Is it for resources as well as executables ? [21:22] GOPATH=./usr/lib/juju-1.16.4/ indicates where to find bins and srcs [21:22] * sinzui can try this now in fact [21:27] abentley, I think GOPATH works are in thunderbirds [21:27] in thunderbirds? [21:28] that's the signal! Delta team, go go! [21:28] >.> [21:28] <.< [21:29] abentley, "thunders are go" http://pastebin.ubuntu.com/6511330/ [21:29] :) [21:30] * sinzui likes supermarionation [21:30] this does not surprise me. [21:31] * abentley liked Team America: World Police, but hasn't seen much of the original stuff. [21:32] my son has a youtube subscription to follow Captain Scarlet [21:42] sinzui: Did I do it wrong? http://162.213.35.54:8080/job/local-upgrade-deploy/81/console [21:44] thumper: my tests for mongo HA require two thread.Sleep() equivalents (<-time.After(time.Second))... it's because I'm starting and stopping mongo servers, and the code that does it is asynchronous.... so sometimes mongo hasn't finished starting yet. Thoughts? I could spend some time making the mongo start function synchronous.. but it's just a test mechanism, so I'm not sure how much time to put into it. [21:45] natefinch: I'd suggest making the start synchronous, it shouldn't be too hard no? [21:45] we do this now with the upstart start method [21:45] so we go: start, are you started? [21:45] abentley, I explicitly call the juju bin instead of PATH resolution [21:45] wait a bit, and ask again [21:45] thumper: yeah, that's what I was thinking of doing. Fair enough. [21:45] we have short attempts [21:46] abentley, but I see you put the bin as the first element in PATH [21:46] sinzui: Also, I run 'which' to make sure I have the right one. [21:46] but I think making it synchronous is the most obvious thing, hide the waiting from the caller, make the tests simple to read and understand [21:46] I don't think you ever regret making tests better and more understandable [21:46] abentley, is is possible sudo got root's PATH [21:46] within reason [21:47] sinzui: I use -E to preserve the environment. [21:50] abentley, Doesn't work for me [21:50] GOPATH=~/Work/juju-core_1.16.4 PATH=~/Work/juju-core_1.16.4/bin:$PATH sudo -E juju --version [21:50] 1.16.3-trusty-amd64 [21:51] abentley, but this works because I removed all of the historic PATH: [21:51] $ GOPATH=~/Work/juju-core_1.16.4 PATH=~/Work/juju-core_1.16.4/bin juju --version [21:51] 1.16.4-trusty-amd64 [21:51] sinzui: Weird. [21:52] yeah [21:52] -E doesn't pass in PATH [21:52] IIRC [21:53] I use "sudo $(which juju) --version" [22:01] mramm: I'm in the hangout... just hanging out [22:45] hello. I have a dead machine but juju still thinks it's there. How can I remove it? terminate-service/destroy-machine all fail to work because they seem to want to contact the agent on the dead machine. [22:59] bigjools: i don't thnk you can at the moment [23:00] davecheney: so my env is fucked? [23:00] you could try building 1.17-trunk from source [23:00] ok [23:00] bigjools: we don't tell the customres their environment is fucked [23:00] but, yes [23:00] :) [23:01] I think that when you see a "hook failed" message it ought to suggest running "juju resolved" [23:01] someone who will remain nameless decided to use "nova delete" instead [23:01] that's a paddlin' === gary_poster is now known as gary_poster|away [23:48] thumper: wrt https://codereview.appspot.com/35800044/, william had some issues. i've responded. i feel like there's value in this work. do we need to discuss?