[00:03] SGTM [00:08] davecheney: does lbox submit also send pending comments? [00:09] yes [00:09] * thumper pulls fwereade_'s latest commit and will rerun the tests before submitting [00:10] thumper, ah, sorry [00:10] fwereade_: np [00:11] fwereade_: it is only a few minutes... [00:14] hey fwereade_ did you get anywhere with the review on the maas branch? [00:14] thumper, I have endured worse test times than this... I've endured build times worse than these test times [00:15] fwereade_: so have I [00:15] I remember one bank I was at where the compile time was two hours [00:16] bigjools, I have been most remiss there -- it all looks essentially fine, but I have yet to step up and convince myself it all fits together [00:16] bigjools, I will do another pass [00:16] * bigjools had a 4 hour build once [00:16] ah wat? [00:16] * thumper sighs [00:16] lbox submit expects the prereq to be committed first [00:17] fwereade_: ok we're still making some changes to it, some as requested in the initial pass and others to actually make it work. I deployed a charm on my microservers yesterday! [00:17] half the point of a prereq is to get work reviewed independently of landing [00:17] thumper: of course [00:17] that's bollocks [00:17] bigjools, sweet! [00:17] * thumper goes to follow lbox's rules [00:17] thumper: lbox expects quite a few things that are unreasonable IMO [00:17] if you propose the chain A -> B -> C [00:18] i don't think it is unresonable to expect A to land beore B, etc [00:18] I think it is unreasonable [00:18] if C has B and A, then C will implicitly land them [00:18] what if A isn't complete? [00:18] or doesn't make sense without B? [00:18] then that isn't a prereq in the sense that lbox undestands [00:18] in which case lbox is dumb [00:19] * davecheney is not fucking arguing about the tools today [00:19] :) [00:19] careful thumper, you don't want to be accused of bitching [00:19] :P [00:20] * thumper merges before heading into town for lunch and errands [00:23] thumper: Tools: newSimpleTools("1.2.3-linux-amd64"), [00:23] this will make you mad [00:23] what series are these tools ? [00:23] no wait, they are the linux series [00:23] linux of course [00:23] :) [00:23] larakin linux [00:24] * davecheney fixes that shit [00:24] * thumper waits for lbox to do its slow merge [00:24] * thumper bitches [00:24] and done [00:24] * thumper heads to lunch === thumper is now known as thumper-afk [01:03] bigjools, ok, I think I've finished the review [01:03] bigjools, I whine about lots of things but basically I want it to land [01:05] davecheney, I don't suppose you're just putting the last finishing touches to a https://codereview.appspot.com/8545043/ review? :) [01:08] fwereade_: sorry, i was not [01:08] it looked really large [01:08] so I was selfishly noodling with my own brach [01:08] branch [01:09] davecheney, no worries -- but it's a bit less bad than it looks, honest :) [01:09] fwereade_: you -> sleep, will review after lunch [01:09] * davecheney goes back to fighting with the cloudinit checkers [01:09] map[interface{}]interface{} can bite me [01:09] * fwereade_ goes off to sleep :) [01:10] * fwereade_ solemnly advocates killing it with fire [01:22] fwereade_: thanks [01:25] not sure I will ever get used to "if" preconditions [01:25] pre-statements even [01:28] constraints_test.go:283: // A failure here indicates that goyaml bug lp:1132537 is fixed; please // delete this test and uncomment the flagged constraintsRoundtripTests. c.Assert(val.Hello, IsNil) [01:28] ... value *string = (*string)(0xc20006b4d0) [01:28] whut ? [01:29] I guess I need to merge to trunk [01:29] are there plans to introduce library versioning into Go at any point? === thumper-afk is now known as thumper [01:30] bigjools: nothing has been discussed on the list [01:30] davecheney: there was a list thread about pulling the latest goyaml [01:31] yeah, i did that [01:31] but the failure above sounds like the opposite [01:31] i've just merged trunk [01:31] i think that has fixed it [01:32] cool [01:54] juju debug-log is awesome [01:55] i can watch everything happening in an environment [02:03] shit [02:03] Proposal: https://code.launchpad.net/~dave-cheney/juju-core/113-juju-bootstrap-raring-cloudinit-II/+merge/158263 [02:03] error: Failed to send patch set to codereview: can't upload base of environs/mongo.go: ERROR: Checksum mismatch. [02:04] ^ i deleted them added this file during the branch [02:04] now it is really upset [02:04] any suggestions [02:04] huh? [02:04] you removed a file? [02:04] and it is complaining? [02:04] then I added it back [02:04] as in copied it from another branch [02:05] oh... [02:05] not so good [02:05] fuckit [02:05] it will complain about file-ids [02:05] can't you just do a reverse merge of the revision that deleted it? [02:05] and discard everything else? [02:06] i'll figure it out [02:13] screw it, just importde the diff into a new branch [02:21] davecheney: you can delete merge poposals in LP [02:21] instead of rejecting them, (if you like) [02:22] thumper: too late now [03:48] davecheney: hey... gotta sec? [03:48] having an hp problem 'error: cannot log in to admin database: auth fails' [03:50] status -v (http://paste.ubuntu.com/5697341/) shows I'm connecting, but... [04:29] m_3: sorry that environment is stillborn [04:30] grab /var/log/cloud-init-output.log from the first machine [04:30] then destroy the environment [04:30] davecheney: ok, lemme look [04:32] davecheney: that log looks normal [04:32] davecheney: and I can ssh to the box [04:33] davecheney: ubuntu@15.185.162.247... your keys are there [04:34] if you have a sec to help, I'm trying to set up a scale-test environment [04:47] nm, I'm just using that instance to test from there [05:01] davecheney: what would your first thought be if I said I wanted "environment: whatever" as the first line of `juju status` ? [05:02] personally I think it is crazy we don't have this in status somewhere [05:02] yeah, it's not even in there huh [05:07] hmm... [05:07] * thumper takes it to the list [05:07] I would just do it [05:07] but it seems that some people get hung up with backward compatability [05:07] but it would be nice to see something while it waits for bootstrap to finish... [05:08] Ha! ok, that's funny, I figured for sure that'd be in the juju-0.6 status [05:08] most of the time you're sort of filtering status output anyways [05:09] the machine ids up top are just noise normally [05:09] but yes, it absolutely needs the current JUJU_ENV imo [05:10] m_3: feel free to comment on my email message then :) [05:16] m_3: checking [05:17] m_3: 2013/04/11 04:09:32 JUJU jujud machine command failed: state entity name not found in configuration [05:17] error: state entity name not found in configuration [05:17] ok, this broke because there is no 1.9.13 tools for hp [05:18] davecheney: np... I think I worked around it [05:44] jam gz: caused by: https://az-2.region-a.geo-1.compute.hpcloudsvc.com/v1.1/17031369947864/servers%!(EXTRA string=Resource limit exeeded at URL %s., string=https://az-2.region-a.geo-1.compute.hpcloudsvc.com/v1.1/17031369947864/servers) [05:44] ^ is there a bug about this format string snaffu ? [05:46] * thumper off swimming, back for the meeting === ChanServ changed the topic of #juju-dev to: https://juju.ubuntu.com | On-call reviewer: | Bugs: 0 Critical, 52 High - https://bugs.launchpad.net/juju-core/ [05:55] mornin' all [05:56] rogpeppe: moning [05:56] morning [05:56] davecheney: yo! [05:56] davecheney: how's tricks? [05:56] good! [05:57] davecheney: i could do with another review on https://codereview.appspot.com/8626044/ if you fancy it [05:57] wrap your peepers around this, https://codereview.appspot.com/8648043/ [05:57] davecheney: swap ya! [05:57] * davecheney looks [06:06] davecheney: nice! reviewed. [06:07] rogpeppe: and to explain myself in my review of your code [06:07] i'm concerned about the magic strings [06:07] "machine-password" [06:07] not becuse they are secrets [06:07] just appear to be test warts [06:07] davecheney: they're only magic in tests [06:07] as you said on the other CL yesterday [06:08] so why not change SetPassword(string) to be ResetPassword() -> string, error [06:08] then it tells you what the password is [06:08] and we don't have to add even more fixtures [06:08] davecheney: are you talking about state.User.SetPassword there? [06:08] yes [06:09] davecheney: no can do [06:09] why not [06:09] davecheney: the whole point is that we're setting the password from the admin-secret in the environment config [06:09] ok fine [06:10] objection withdrawn [06:10] davecheney: cool [06:11] davecheney: your point about possibly leaving a blank password on the admin user is a good one, i think, *because* we don't abort the bootstrap init process if juju bootstrap-state fails [06:11] davecheney: i think we should [06:11] davecheney: i've thought for a while now that we should do "set -e" at the start of the cloudinit script [06:11] davecheney: what do you think? [06:11] if you just did AddUser("admin", $random) that would solve the problem [06:11] the user is created, but nobody knows the password [06:11] would that work ? [06:12] davecheney: i'd prefer to solve the deeper problem [06:12] davecheney: why are we even running the machine agent if the initial set up has failed? [06:12] sounds like a reasonable question [06:12] none of those cloud init steps are optional [06:12] davecheney: exactly [06:14] davecheney: i might change cloudinit to make each command do foo || error foo failed >&2 [06:14] davecheney: that way you won't have to infer the failure in the logs from the command's stderr output [06:15] doesn't cloudinit send &1 and &2 to /var/log/cloud-init-output ? [06:17] davecheney: yeah, the &2 is probably unnecessary, just habit [06:18] davecheney: maybe just set -ex actually [06:23] davecheney: does that sound reasonable to you? [06:24] niemeyer: hiya! bit early for you, ain't it? [06:24] rogpeppe: Nope.. it's actually very late :) [06:24] niemeyer: :-) [06:24] niemeyer: squalling bairn? [06:25] rogpeppe: Have been cooking a release of mgo fixing a cluster resync bug, and spent some hours on it [06:25] niemeyer: cool [06:25] Just about done now [06:25] niemeyer: the resync code is quite subtle, i thought [06:28] rogpeppe: Yeah, real world is not so friendly [06:29] rogpeppe: Servers popping in and out with their own ideas of what they should be doing is fun [06:29] niemeyer: while you're about it, it would be nice to have the redial interval configurable. we're hacking around it currently by sleeping in Dial) but i think the right place is in mgo. [06:30] rogpeppe: What's the issue? [06:30] niemeyer: by default, mongo dials about 10 times a second. [06:31] niemeyer: that's not great when you're waiting for a server to come up. [06:32] rogpeppe: Where does it do that again? [06:32] niemeyer: the logic is scattered [06:33] niemeyer: the emergent behaviour is that it dials 3 (5?) times then sleeps for a bit (0.5s ?) then repeats [06:33] rogpeppe: Ah, right [06:33] niemeyer: i've been through the logic and understood it a few times, but i always forget immediately afterwards :-) [06:34] rogpeppe: It's that retry that I always forget to count [06:34] rogpeppe: I think we could just take that out.. [06:34] rogpeppe: But, not today [06:35] niemeyer: indeed [06:35] niemeyer: our hack works ok for the time being [06:35] rogpeppe: Will keep that in mind [06:35] niemeyer: some time, an bounded exponential backoff would probably be good. [06:35] niemeyer: so we don't suffer too badly from the herd effect after a network outage. [06:38] rogpeppe: This will probably never happen [06:38] niemeyer: ok. too hard? or just not the right thing to do? [06:38] rogpeppe: A database driver is something you want connected as soon as the network is bac [06:38] k [06:39] rogpeppe: But more reasonable timings is a good idea [06:39] niemeyer: if the max bound was only a small number of seconds, it would still be ok. [06:39] rogpeppe: Sure, but it would also be fairly irrelevant I suppose [06:40] niemeyer: i'm not sure. if the network goes down, we don't really want all the clients dialling in synchrony [06:40] niemeyer: the exponential backoff (with a random element) can allow a good spread over time, i think. [06:40] rogpeppe: Perhaps.. it depends on the scale and on the application behavior [06:41] niemeyer: indeed [06:41] rogpeppe: The random element is somewhat unnecessary as well.. they're not wall-clock synchronized [06:42] rogpeppe: This would only be really effective if we allowed the backoff to actually backoff [06:42] niemeyer: if they all see the net go down at the same moment, they all redial immediately and they're all sleeping for the same amount of time, i suspect they'll keep reasonably clustered. [06:42] rogpeppe: Like, spanning through several tens of seconds [06:42] niemeyer: the only random element i'd add would be the first sleep interval. [06:43] 2013/04/11 06:43:16 ERROR JUJU:jujud:machine cmd/jujud: upgrader loaded invalid initial environment configuration: required environment variable not set for credentials attribute: User [06:43] any ideas ? [06:44] rogpeppe: I guess I could put something intelligent based on the number of clients [06:44] rogpeppe: and have the backoff take that into account [06:44] niemeyer: that sounds like a great idea if you have that info [06:44] rogpeppe: Yeah, I think it's around in the server [06:45] davecheney: interesting [06:45] Either way, that's the kind of bug I'd love to get someone complaining about :-) [06:45] niemeyer: lol [06:45] rogpeppe: environment still appears to work [06:45] this is in HP [06:45] we don't need that attr in ec2 [06:45] davecheney: ah. [06:45] davecheney: yeah, i wondered [06:45] Most people that use mgo heavily do so on the basis of a few machines [06:45] millions of requests a days, but just a few servers [06:45] i wonder if this is part of our 'don't upload the secrets' logic [06:46] niemeyer: well, we are going to move in that direction [06:46] davecheney: Heya [06:46] davecheney: it's possible. is this with the openstack driver? [06:46] * davecheney waves [06:46] rogpeppe: Yeah, I have my fingers crossed to have a Critical filed! ;) [06:47] niemeyer: so far the API seems to be working quite well [06:47] niemeyer: although none of the agents are using yet [06:47] Alright, I really need to sleep now, or I won't wake up in time for the meeting in a few hours [06:47] it yet [06:47] niemeyer: sleeeeeeep.... [06:47] have a good Beginning Of Day folks [06:47] niemeyer: you too [06:47] niemeyer: well, a good sleep anyway :-) [06:47] Thanks :) [06:48] https://bugs.launchpad.net/juju-core/+bug/1167723 [06:48] <_mup_> Bug #1167723: environs/openstack: error relating to upgrader on bootstrap node < https://launchpad.net/bugs/1167723 > [07:38] jeeze, i'm not surprised some of the live tests fail. having started an instance, it didn't see that instance when asking for it, even when waiting a whole 10 seconds later. [07:38] how the f*!@# can we make this stuff really reliable? [07:41] rogpeppe: increase timeouts and wait more? [07:41] dimitern: i'm just trying with a 20s timeout [07:41] dimitern: but really, if 20s why not 5 minutes? [07:41] rogpeppe: :) seriously? [07:41] dimitern: i've no idea [07:42] dimitern: i don't know what's happening inside amazon [07:42] rogpeppe: would it affect local live/other tests? [07:42] dimitern: yeah. [07:43] rogpeppe: btw a short and quick review? https://codereview.appspot.com/8630043/ [07:43] dimitern: any time you're expecting an error, you'd need to time out the max amount [07:43] dimitern: looking [07:46] dimitern: i just saw the same thing happen even when waiting for 20 seconds [07:47] dimitern: i guess i should check our logic again. although it succeeds sometimes, it's possible we're getting something wrong somehow. [07:51] rogpeppe: is there any way to log the communication between the client and ec2? [07:51] rogpeppe: sgtm [07:51] TheMue: yeah, we can do that [07:53] rogpeppe: fine, maybe we just interpret the feedback wrong (or they changed a tiny bit in the protocol that's now fools our logic). [07:53] TheMue: it's possible, but slightly unlikely, as our logic does work... some of the time! [07:55] rogpeppe: that's what i meant, the difference between "working" and "done" if you only look at "done" but never interpret "working" (w/o knowing the protocol) [07:56] rogpeppe: me knowing the protocol [07:56] TheMue: i'm pretty sure it's sending exactly the same request each time [07:56] TheMue: (i'm just checking that) [07:57] rogpeppe: the client is sending the request, but i'm more interested in ec2s answer [07:57] TheMue: yeah [07:58] can someone send me the g+ link for the meeting? [07:58] https://plus.google.com/hangouts/_/calendar/bWFyay5yYW1tLWNocmlzdGVuc2VuQGNhbm9uaWNhbC5jb20.gdt9rkp5uspih9n3db6b95kccc [08:02] ha! found the bug! [08:02] TheMue: our own logic *was* screwed [08:08] mramm, ping [08:08] mramm: bleep bleep bleep :-) === ChanServ changed the topic of #juju-dev to: https://juju.ubuntu.com | On-call reviewer: - | Bugs: 0 Critical, 53 High - https://bugs.launchpad.net/juju-core/ [08:23] rogpeppe: I have an issue with m.SetAgentAlive() - returns no error, I can see the pinger starting in the log, but pwatcher.Alive(m.globalKey()) returns false (that's what m.AgentAlive() returns), even after a timeout of 5s or using m.WaitAgentAlive(), still fails - any clues? [08:23] dimitern: you'll have to debug it i'm afraid [08:24] rogpeppe: bugger.. ok, I though I might be missing something obvious [08:24] dimitern: check out the pinger tests [08:48] rogpeppe: i'm getting this every time I run the machiner tests: [LOG] 20.79113 NOTICE juju: authorization error while connecting to state server; retrying [08:48] rogpeppe: and then it reconnects and continues [08:48] rogpeppe: could it be an indication why the pwatcher is not working right? [08:53] rogpeppe: I found it! so adding mr.st.StartSync() in the MA code (not tests) after calling SetAgentAlive() the tests pass [09:05] why did i ever have to call this? i though it was for tests only [09:10] hello all [09:10] I am very sorry I missed the meeting! [09:12] mramm: hiya [09:13] mramm: it went well, we even took notes: https://docs.google.com/a/canonical.com/document/d/1tKtAcrz9ADGF-o6lQtmGKYRf2ZHIR7lb-WctXvcLnfg/edit [09:13] I had the alarm set for the meeting, but it was set an hour late! [09:13] I am reading the notes now. [09:21] dimitern, it the machiner using the same *State as the tests? [09:22] dimitern, you shouldn't have to call StartSync usually, I don't think [09:22] fwereade_: probably not, because the tests use the jujuConn one [09:22] fwereade_: I had StartSync in both, but adding it the MA code and removing it from the test didn't cause failures, so I left it at that and submitted [09:23] dimitern, sorry, I don;t think that's ok [09:23] fwereade_: i'm open to suggestions [09:24] fwereade_: why do you think that? [09:24] dimitern, ok, does SetAgentAlive work in practice? it when you do juju status against a live environment, does it work? [09:25] dimitern, because Sync and StartSync are purely for the convenience of tests AFAI am aware [09:25] fwereade_: in order to test this, I need to finish the status CL, then I'll test it live [09:26] dimitern, hmm, is this the tests for the actual machiner, or for the machine agent? [09:26] fwereade_: machiner [09:27] So, from the notes it seems to me like we ought to call this next release 1.10.0 and that we should try to do a release as soon as it is possible to do the python+go thing [09:27] it will have some bugs, which we should focus on fixing for the next few days, and release 1.10.1 early next week [09:28] mramm, +1 [09:28] fwereade_: sorry, I keep mixing "task" and "agent", even though now I know the difference :) [09:28] dimitern, don't worry it still bugs me [09:28] dimitern, but ok I think I know what the problem is, just a mo [09:29] this weekend should be the aim for python+go release testing [09:30] the python code is in review for including in raring, and the go part should be ready to go after that [09:30] My long term proposal would be that we do a "stable" release at the end of the month every month, and then switch to an unstable number for dev versions for weekly releases within that month. [09:31] mgz: that sounds reasonable [09:31] mramm: +1 [09:32] we should then try to come up with some good ways to test that the 1.10.x to 1.12.x release goes smoothly. [09:33] dimitern, ok, the problem is that StartSync needs to be called after SetAgentAlive to get a timely response, but that the test doesn't have an obvious way of knowing when that's happened [09:33] From the notes: "John: It will be parallel installable and easy to remove; they can’t easily bootstrap" [09:33] what's the second part mean? [09:34] Also, can't we do this for them: "We need to make it clear that they need to set up a completely fresh directory structure for juju2" [09:34] dimitern, I think the right thing to do is to SS in the test every 50ms while waiting for WaitAgentAlive in another goroutine and sending the result out into the same select you're using for the SSs [09:37] fwereade_: I tried that as well, didn't work [09:38] fwereade_: oh, wait - I didn't try a separate goroutine [09:39] fwereade_: let me just finish the status CL and I'll test it live as is first [09:41] dimitern, ok cool [09:41] * dimitern machine needs a restart I think, bbiab [09:58] fwereade_: progressing steadily with our provider feature branch — thanks again for the reviews. We have some things that are unclear though. Could you comment on https://code.launchpad.net/~maas-maintainers/juju-core/maas-provider-skeleton/+merge/157025 and specifically the latest comments by Raphaël? [09:58] jtv, sure [09:58] Thanks [10:14] jtv, I think I've covered it, let me know if I missed something [10:14] jtv, awesome that you got a deploy working :) [10:15] fwereade_: thanks for your reviews. [10:15] fwereade_: looking now... Thank rvba for getting the deployment running! [10:15] rvba, thanks for the deployment! [10:15] ;) [10:16] rvba, let me know if anything needs clarification [10:16] interesting, i just saw a real-world failure caused by a temporary provisioning failure: [10:16] 2013/04/11 09:54:08 ERROR JUJU:jujud:machine worker/provisioner: cannot start instance for machine "1": cannot run instances: The service is unavailable. Please try again shortly. (Unavailable) [10:17] * rogpeppe has finally submitted those ec2 expenses [10:17] * rogpeppe hopes it's not too late [10:18] f@$%! time zone issues with both g calendar and thunderbird [10:19] fwereade_: so the problem is not only with tests [10:20] rogpeppe, cool [10:20] dimitern, oh yes? [10:20] fwereade_: w/o SS in M's loop(), I'm not even getting AgentAlive() -> true, nil right after calling SetAgentAlive() with no error [10:20] dimitern, that's expected [10:21] dimitern, the MA doesn't need to detect it's made itself alive [10:21] fwereade_: or, after using WaitAgentAlive for 5s [10:21] fwereade_: i'm confused now [10:21] fwereade_: i think that just having the pinger around should cause AgentAlive to return true, no? [10:21] dimitern, are you sure about the presence timeout being 5s? [10:22] rogpeppe, hmm, ok, that is an interesting point [10:22] fwereade_: I gave it 5s [10:22] dimitern, but where are you asking AgentAlive()? [10:22] dimitern, sorry, I mean the underlying pwatcher refresh frequency [10:23] fwereade_: I had the following (all combinations tried): SAA() -> no error, WAA(5s) -> timeout, AA() -> false, nil (after SAA()) - that's all without SS after SSA() [10:23] dimitern, when and where are you making these calls, in response to what? :) [10:23] fwereade_: with SSA() + SS() all works [10:23] dimitern, maybe we should g+ [10:24] * fwereade_ starts one [10:24] fwereade_: in the beginning of the loop,before settings status [10:24] fwereade_: sure [10:30] dimitern: if what you're saying is true, surely the presence tests would fail? [10:34] rogpeppe: I got it working, just a sec [10:47] davecheney: oh, almost missed it. happy birthday. [10:49] TheMue: thanks mate [10:51] davecheney: yw, 3h left. and hey, you should party, not being here online ;) [10:53] TheMue: party on the weekend [10:54] davecheney: i would like to join, sadly got no time. *cough* *cough* [10:57] * TheMue steps out for lunch, biab [11:03] fwereade_: while i'm on to this, how about doing the same with the "down" state for units: "status: info" (regardless of the status, not only on error, and include info only when != "")? [11:04] fwereade_: agent-state-info, that is [11:24] fwereade_: anyway, PTAL https://codereview.appspot.com/8561046/ [11:25] dimitern: what was the problem BTW? [11:25] rogpeppe: not really sure, but the difference between state.Sync() and StartSync() made the difference [11:25] rogpeppe: that a look, if you want ^^ [11:28] dimitern: are you saying it worked with StartSync but not with Sync? [11:31] rogpeppe: i'm saying the other way around [11:31] dimitern: but in the CL you're using StartSync, no? [11:32] mgz: poke for standup [11:32] dimitern: it shouldn't make any difference, given that you're doing WaitAgentAlive anyway [11:32] rogpeppe: in the machiner CL yes, but now in the one above I changed it to how it should be (not to use StartSync() in the code, just call Sync() in the test, and it passed) [11:33] dimitern: oh, you were calling Sync in the *code*... i see [11:33] rogpeppe: :) yeah [11:33] rogpeppe: i'm testing it live on ec2 now and it seems to work fine [11:33] dimitern: but i was just looking at the one above, and it calls StartSync not Sync [11:34] dimitern: but that's fine - it shouldn't make a difference [11:34] allenap: is mgz with you? [11:34] rogpeppe: are you sure you're looking at the right one? https://codereview.appspot.com/8561046/diff/11001/worker/machiner/machiner_test.go [11:35] dimitern: ah, i was looking at status_test [11:35] dimitern, LGTM [11:36] fwereade_: cheers, I'll make the changes and submit it soon [11:36] fwereade_: and then i'll need some direction on the deuglifying the logs [11:37] dimitern, it's in the card [11:37] dimitern, drop the JUJU: badge throughout, and drop all package name prefixes from "main" packages (leave others intact) [11:37] fwereade_: ah, ok - so only main packages [11:37] dimitern: so when you used StartSync in the machiner test, it failed? [11:37] fwereade_: I though it was to generally reduce the JUJU strings [11:38] rogpeppe: yeah [11:38] dimitern: hmm, that seems fragile. [11:38] dimitern, this gives us reasonably sensible output as feedback without sacrificing logging clarity, I think [11:38] rogpeppe, I forget why but AgentAlive tests all seem to use sync [11:38] rogpeppe: I reckon because Sync() is synchronous, while StartSync() is not [11:39] rogpeppe, possibly I never knew why, IIRC gustavo did the whole presence system [11:39] dimitern: i know that, but even synchronous just means "delivered *somewhere*" [11:39] dimitern: not necessarily acted on [11:39] dimitern: so if it was racy with StartSync, it's probably racy with Sync [11:40] dimitern: i'll have a quick check in the presence code [11:40] rogpeppe: cheers [11:44] it works live, just tested [11:46] fwereade_: what do you mean by "parenthesize the infos" ? [11:46] fwereade_: agent-state-info: "(somestatus: someinfo)" ? [11:47] dimitern, agent-state: down; agent-state-info: (started) [11:47] fwereade_: right [11:54] dimitern: i see why Sync is different from StartSync in this case. i think it's justifiable. [11:55] * rogpeppe doesn't like the way the watcher code doesn't behave synchronously on receipt of a message, even though it easily could. [11:55] in my opinion, this is a hack: [11:55] w.next = time.After(0) [11:57] rogpeppe: ha! so it returns a channel that's available to read on immediately [11:57] dimitern: yeah. or it sets the timer channel to that anyway. [12:00] fwereade_: i was thinking just yesterday that there really wasn't much value from having two status types [12:00] rogpeppe: it definitely seems like a hack [12:00] dimitern: it's just to save writing a function. [12:00] dimitern: that would be called from two places [12:02] rogpeppe, I'd be +1 on unifying them if you are [12:02] rogpeppe: nasty :) [12:02] rogpeppe, will simplify state a bit too [12:02] fwereade_: definitely. [12:03] dimitern, would you tack a branch to do that onto the end of your status work then please? or integrate it if you prefer [12:03] rogpeppe, fwereade_: just to be clear here, we're talking about having just the same status type, rather than separate MachineStatus and UnitStatus? [12:03] fwereade_: const StatusError Status = "error" ? [12:03] rogpeppe, dimitern: yep [12:04] fwereade_: it'll simplify some things, but not too much I think [12:04] why do those darn live tests keep passing when i want them to fail?! [12:04] dimitern: the Status functions become a little simpler [12:05] dimitern: no type conversion required [12:05] rogpeppe: and the status command as well [12:05] dimitern: definitely - that being the motivating example in this case [12:05] rogpeppe: we can have back the processStatus(status, info string) for both machines and units [12:05] dimitern: +1 [12:06] aren't we missing something? it's easy to do this now, since all the places that use them are easy to find and change, but won't we lose some value out of it? [12:10] fwereade_, rogpeppe ^^ ? [12:10] dimitern: i don't think so [12:10] dimitern: but YMMV [12:10] if not, I can add a card for it and do it later today [12:10] dimitern, describe the negative consequences from your POV [12:10] fwereade_: not sure I see any :) [12:10] dimitern: we're all agreed then :-) [12:10] dimitern, then let's do it :) [12:11] cool, card added and assigned to myself [12:12] * dimitern lunch [12:21] * fwereade_ lunch too [12:22] jam: sorry, lunch [12:24] mgz: sure sure, I hope that sandwhich was worth your position... dun dun duuuunnnn [12:25] :D [12:25] sausage roll special, so it may well have been [12:37] jam: Those sausage rolls really are quite special. [12:37] jamespage has just finished his [12:37] hmmmm [12:37] * jamespage yom yom yom [13:03] dimitern, fwereade_: trivial but important: https://codereview.appspot.com/8661043/ [13:16] * dimitern is back [13:17] rogpeppe: looking [13:21] rogpeppe: LGTM [13:21] dimitern: thanks [13:21] rogpeppe: I found a bug in rietveld - if you double click the send button it sends the mail twice :) [13:22] dimitern: nice === wedgwood_away is now known as wedgwood [13:47] fwereade_: could I have your opinion on this: https://code.launchpad.net/~rvb/juju-core/providers-import-fwreade-1/+merge/158369 ? Maybe you'll have a better idea on where to put the import statements. [13:49] rvba, hmm, why would maas be importing testing? [13:49] rvba, oh, wait, you have all the tests in-package [13:50] rvba, that would maybe be the problem? [13:50] fwereade_: hum, having the tests in the same package allows us to monkey patch things easily. [13:50] rvba, hmm, wait [13:50] rbva, why does cmd/ import environs/all? [13:51] rbva, individual commands can and should, but I don't think the command-utility package should know about them [13:51] fwereade_: well, if we want all the providers registered, it needs to happen in cmd/ right? [13:52] rvba, do it in the individual command packages -- cmd/juju, cmd/jujud [13:52] rvba, and only the ones that actually need it [13:52] * fwereade_ thinks that should do it [13:53] fwereade_: indeed, that will probably fix the import loop as well… [13:53] fwereade_: ta [13:53] rvba, fwiw I ardently support the detailed testing you have in place in maas [13:54] rvba, but I would really prefer those tests which can be external to be external [13:54] rvba, are you familiar with export_test? [13:54] fwereade_: we have a card for that, we will look in it. [13:54] fwereade_: no [13:54] rvba, ok, files ending in "_test.go" are only compiled when running tests [13:55] rvba, this means that you can take internal functions and define them in a file called export_test.go, but still in the package rather than the test package [13:55] rvba, and anything exported in there will be visible to the tests and only the tests [13:55] rvba, this is useful, and allows for monkey-patching just fine [13:56] rvba, but it has a sting in the tail [13:56] fwereade_: indeed, looks exactly like what we need. [13:56] rvba, go test ./... does not guarantee that the tested code will build when _test.go files are not compiled in [13:56] rvba, so you should always check go build ./... as well [13:57] fwereade_: all right [13:58] rogpeppe: was it a problem if state imports state/api/params? I want to make statusDoc have the status as params.Status, rather than string [13:58] can somebody send me a link to the kanban g+? [13:58] dimitern: no, that's the main point of params [13:59] dimitern, https://plus.google.com/hangouts/_/539f4239bf2fd8f454b789d64cd7307166bc9083 [13:59] dimitern: when the agents aren't using state directly, i think params should be able to be merged into state/api [13:59] rogpeppe: I can simplify the status ops then, getting the fields directly, instead of a doc [13:59] fwereade_: cheers [13:59] dimitern: i'd use a doc [14:00] rogpeppe: cool then [14:11] do we test juju-core with gccgo at all? [14:46] * dimitern wonders: params.Started or params.StatusStarted is better? [14:46] * TheMue has to find out why his microphone doesn't work. Already changed the browsers, but that doesn't seem to be the reason. [14:47] fwereade_, rogpeppe: thoughts? ^^ [14:47] dimitern, StatusStarted please [14:47] +1 [14:47] fwereade_: yeah, I was leaning towards that [14:48] mgz, not that I'm aware [14:48] TheMue: i wonder if all the noise we were hearing from you was from your mike going wrong [14:48] mgz: i've never used gccgo [14:48] mgz: it would be great to have a try [14:48] mgz: i'm not sure how well cgo works, but i suppose it *should* work even better... [14:49] rogpeppe: yeah, maybe, i'm checking right now [14:50] TheMue: if you're using an hdmi external monitor it's worth checking the volume control to see it's using the correct input - i have this problem with skype after g+ every time (not the other way round though) [14:52] dimitern: i'm just using my macbook and using the mic for dictation works fine [14:53] dimitern: would you pass me your skype address for a test? [14:55] TheMue: sure: ultramarine_crystal [14:57] dimitern: Thx, contacted you. [15:10] rogpeppe, TheMue: btw, did my defence of inline test arrays find any favour? [15:10] fwereade_: I can live with it :) [15:11] fwereade_: i'm still not keen [15:11] fwereade_: i don't think the naming is a problem (just name it after the test) [15:11] rogpeppe, apart from the level of indentation [15:11] rogpeppe, the naming really is a problem that exists and has the drawbacks I listed [15:11] fwereade_: and i like knowing that all the data is uninfluenced by the stuff that comes into the function [15:12] fwereade_: i know that it's got to pass through the logic in the code, but i can scan that easily [15:12] rogpeppe, the global test arrays have way worse sanity/stability guarantees than the scoped ones [15:12] fwereade_: really? [15:12] rogpeppe, yes [15:13] rogpeppe, they're modifiable from anywhere, and they can be used in multiple test methods [15:13] rogpeppe, if they're not I cannot think of any reason to use a wider scope than necessary [15:13] rogpeppe, apart from, as you, said, indentation ;) [15:14] rogpeppe, I'm willing to pay a tab to banish spooky action at a distance [15:14] fwereade_: due to consistency i personally prefer the tests outside the function, but it's no big reason [15:14] fwereade_: i don't really mind if they're modifiable from anywhere - it's easy to see if anywhere else has a reference to them [15:15] rogpeppe, abuse of global test arrays tripped danilos up yesterday :) [15:15] fwereade_: he won't do it again :-) [15:15] rogpeppe, he didn;t do it [15:15] rogpeppe, someone left a landmine for him [15:15] fwereade_: honestly, i like to see the name of the test next to the logic of the test [15:15] fwereade_: not separated by 200 lines [15:16] fwereade_: or 500 in at least one case [15:16] rogpeppe, but the names only sometimes match the tests [15:17] fwereade_: we should make them match always [15:17] fwereade_: that's the convention i've been trying to follow [15:17] rogpeppe, if there are two suites that have TestFoo, you can't have two fooTests~s [15:17] rogpeppe, so you either relax that or you say, ok, let's have really redundant names everywhere [15:17] fwereade_: the moment you have a collision, you get told [15:18] fwereade_: i don't really see the issue [15:18] fwereade_: i'm interested to know about the issue danilos had yesterday though [15:18] rogpeppe, it's a constant low-level irritation and a source of nothing but hassle [15:19] * rogpeppe doesn't find it so [15:19] rogpeppe, someone had made some test depend on otherTestArray[12] [15:19] fwereade_: oh jeeze [15:19] fwereade_: that was straight-up crack [15:19] rogpeppe, clearly we can;t be trusted with sharp tools ;p [15:19] fwereade_: but shared test data can be very useful [15:19] rogpeppe, bzr diff -r1133..1134 :) [15:19] rogpeppe, sure -- but it should be differentiated from that which is tightly scoped [15:20] rogpeppe, globals are bad, mmkay? [15:20] rogpeppe, basically, there was an array of test configs, and then one of them was overridden in a couple of tests later, and it was referenced as array[12] [15:20] danilos: yeah, i remember now [15:20] fwereade_: i don't think we've got anywhere else like that though [15:20] fwereade_: if i'd seen it i'd have called it out [15:20] i don't agree that globals are bad [15:21] rogpeppe, every global carries a cognitive load [15:21] fwereade_: i like static data [15:21] rogpeppe, you may be better at bearing that load than I am [15:21] fwereade_: i think that a large potentially dynamic array carries its own cognitive load [15:21] rogpeppe, how are the global vars not dynamic? [15:21] fwereade_: "is there somewhere in this data that changes at runtime?" [15:22] rogpeppe, yeah, how do you guarantee that for the global ones? [15:22] fwereade_: it's trivially easy to verify that they're only referred to once [15:22] rogpeppe, instead of reading the test, you have to read the whole package [15:22] fwereade_: grep fooTests *.go [15:22] rogpeppe, every time you see one, you should check? [15:22] rogpeppe, or maybe you shouldn;t *have* to [15:23] fwereade_, +1 [15:23] fwereade_: i think it's easier to check for global references than dynamic data [15:23] rogpeppe, perhaps there's some case you're concerned about that I'm not seeing [15:23] fwereade_, rogpeppe: I'd definitely agree with the notion that test input and expectations should be collocated, ideally in their respective tests [15:24] fwereade_: i can't think of any other issues we've had that relate to this issue, other than occasionally needing to rename a global variable [15:24] fwereade_: and i really think that being able to see the test function as a whole is important [15:24] rogpeppe, the test code is the last block [15:25] rogpeppe, the definition is the first block [15:25] rogpeppe, the tests separate the two [15:26] rogpeppe, putting them all in one places tightens the scope, and the only thing you lose is having the name of the test next to the logic... with the benefit of no longer having an extra global var to worry about [15:27] * rogpeppe tries to find the CL where this came up [15:30] fwereade_: still not keen. i think the second version here is generally easier to read. https://codereview.appspot.com/8604043/diff2/2001:18001/environs/tools/list_test.go [15:30] fwereade_: it emphasises the difference between the static and moving parts [15:31] rogpeppe, I dunno, to me the version I landed is strictly worse [15:32] fwereade_: in the second version, i can read the logic of TestMatch in a glance in the new version. in the old one, the bulky test data interferes. [15:34] rogpeppe, but the actual context you need is still way off up at the top of the definition, it seems like a straight tradeoff readability-wise... one use becomes easier, another harder [15:34] fwereade_: in a way, both the test data and the function can be read independently, and i like it that way [15:35] rogpeppe, if it were a simple matter of readability I wouldn't care, it's just the wanton spamming of globals to no clear benefit... you could separate the test definitions by creating a local var, if you really wanted to [15:36] fwereade_: the test method definition is already global for most intents [15:37] * rogpeppe guesses he perhaps has less of a problem with globals than other people. [15:37] rogpeppe, it's a method on a type... that's not global in the "global vars are bad" sense [15:38] rogpeppe: can you help me with an obscure failing megawatcher_internal_test ? [15:38] fwereade_: i really don't see the problem. you read it once. you don't change it. i have *never* had an issue with this. [15:38] dimitern: sure [15:39] fwereade_: and i don't wanna lose that indent either - test data often struggles with line length. [15:39] rogpeppe, every global variable is a vector for spooky action at a distance within your code [15:39] fwereade_: only if someone else has a reference to it [15:39] rogpeppe: this is the error http://paste.ubuntu.com/5698822/, and the only changes to the code from trunk are params.Unit/Machine* (status) -> params.Status* [15:39] fwereade_: and in this kind of case, it's trivial to check that noone does. [15:40] rogpeppe, every single time? [15:40] rogpeppe: it seems fishy somehow - in setUp the unit is added and set to started, but it's seen as in error state [15:41] rogpeppe: I tried getting the status right after the set, asserting it's the same, calling Sync() or StartSync() before or after SetStatus(), calling SetStatus() twice (!) - all without problems, and the same failure [15:41] rogpeppe, every global variable costs a grep per developer per change to the source code ;p [15:42] rogpeppe, assuming nobody ever abuses them and causes the devlopers to actually think about it [15:42] rogpeppe: it seems the change it sees is for a different unit [15:42] fwereade_: you only write the tests once. they're read 100 times. [15:42] so, here is my summary of the versioning situation: [15:42] https://docs.google.com/a/canonical.com/document/d/1XvVvVls3ka0z_JahAKl9CHVvT1WAMvE1BTmpKdezlQc/edit [15:42] dimitern: is this trunk, or your branch? [15:43] rogpeppe, no argument there [15:43] rogpeppe, anyway, I'll take it to the lists [15:43] I will send something like that to the list in an hour or so, so if you have feedback let me know between now and then [15:44] rogpeppe: my branch, but as i said the only changes are replacing params.(Unit|Machine)(\w+) with params.Status\1 [15:44] dimitern: ah, i know the problem [15:44] dimitern: you've changed the statusDoc definition, yes? [15:44] rogpeppe: and the types of the Status field in Unit/Machine info to params.Status, instead of params.Unit/MachineStatus [15:45] rogpeppe: yes? [15:45] dimitern: hmm, actually, no. [15:45] dimitern: could you push your branch and i'll take a look [15:46] rogpeppe: sure, that's the only thing left before proposing [15:47] rogpeppe: lp:~dimitern/juju-core/031-unify-machine-unit-status-types [15:48] dimitern: looking [15:51] dimitern: ha, i see your problem! [15:52] dimitern: you broke backingStatus.updated [15:52] mramm: great summary, thanks - only two comments: 1) "...the command line interface, the mongo..." -> "...the command line interface or output, mongo..."; 2) "...of these three api's..." -> "...of these three things(?)..." (only the last one is an API) [15:52] rogpeppe: oh? how? [15:52] dimitern: case doesn't work like in C [15:52] well they are effectively API's [15:52] since people program against them [15:53] mramm: just suggestions :) [15:53] sure [15:53] dimitern: you need to revert to the earlier version and just remove the type conversions. [15:53] dimitern: (earlier version of that function, that is) [15:54] rogpeppe: can you point me to the location please? [15:54] dimitern: line 216 [15:54] dimitern: what is that case supposed to be doing? [15:55] dimitern: I will clarify the API bit [15:55] rogpeppe: line 216 is var allWatcherChangedTests [15:55] dimitern: megawatcher.go:216 [15:55] dimitern: the tests are still fine [15:55] dimitern: you just broke the code itself :-) [15:56] dimitern: so it was right that the tests failed... [15:56] rogpeppe: ah, well, since they're essentially the same I wanted to unify them [15:56] dimitern: you can't [15:56] dimitern: unless... [15:56] rogpeppe: ewww.. ok, I'll revert it back [15:56] dimitern: you kinda could, but it wouldn't be worth it [15:57] rogpeppe: isn't that supposed to handle both cases like this, or I'm missing something? [15:57] dimitern: you can't have a variable that's two types at once unless it's an interface [15:57] dimitern: no [15:57] dimitern: cases don't fall through in Go [15:57] dimitern: (that's why we don't put a "break" after each one [15:57] rogpeppe: oh! they're defined there, right! [15:57] dimitern: you can separate possible values with commas though [15:58] rogpeppe: how about case *params.UnitInfo, *params.MachineInfo: ? [15:58] dimitern: that's valid syntax [15:58] dimitern: but wouldn't work [15:58] dimitern: because then the type of info would be the type of info0 [15:58] rogpeppe: I see now, ok, sorry about the noise - i'll revert them [15:58] [16:57:17] dimitern: you can't have a variable that's two types at once unless it's an interface [15:59] dimitern: if you defined a setStatus function on the two types, then you could unify the cases [15:59] dimitern: but you'd need a generic way of making a copy too [16:00] rogpeppe: I don't want to make this otherwise almost trivial CL into a more complicated one [16:00] rogpeppe: agreed? [16:00] dimitern: indeed. i don't believe it's worth it. [16:01] dimitern: i had two cases there before. you can have two cases there still... [16:02] rogpeppe: sure [16:02] rogpeppe: although before they looked a bit more different [16:02] dimitern: agreed [16:02] dimitern: but they're still generating quite different assembly code [16:02] rogpeppe: after a fresh bootstrap I'm getting "error: no reachable servers" when trying "juju status" [16:03] benji: your bootstrap has probably failed for some reason [16:03] benji: try juju status --debug; that will show you the bootstrap node's address; then ssh to that node and see what /var/log/cloud-init-output.log has in it [16:04] rogpeppe: indeed - easy to overlook that though [16:04] benji: i always like to know what failure modes are around [16:05] rogpeppe: will do (it will take a second because I have destroyed the environment to test something, I'll re-bootstrap and dig around) [16:05] fwereade_, rogpeppe: unified status for machines/units: https://codereview.appspot.com/8667043/ - mostly mechanical replacing [16:05] benji: does it happen every time? [16:05] rogpeppe: for the last 3 or so times it has [16:05] benji: ah, ok, that's "good" :-) [16:06] ooh, a new error when bootstrapping: error: cannot save state: cannot write file "provider-state" to control bucket: remote error: handshake failure [16:06] benji: i have a recommendation for that one [16:07] benji: if you cd to $GOPATH/src/launchpad.net/goamz, what does bzr revno tell you? [16:07] rogpeppe: 35 [16:08] benji: hmm, darn [16:09] benji: oh i see [16:09] benji: goamz/s3 has auto retry logic for Get but not Put [16:09] benji: that's an unfortunate transient error [16:09] so I should rebootstrap [16:09] benji: yeah [16:09] k [16:10] benji: i'll try to propose a change to goamz/s3 to help [16:17] rogpeppe: error: cannot save state: cannot write file "provider-state" to control bucket: read tcp 207.171.189.80:443: connection reset by peer [16:18] benji: sounds like a similar issue [16:18] benji: s3 is horribly flaky [16:18] benji: try again... [16:18] will do [16:21] niemeyer: ping [16:21] rogpeppe: pong [16:21] niemeyer: i'd like to fix s3 so it's resilient in the face of transient Put errors [16:21] niemeyer: i have a proposal i'd like to run by you [16:21] niemeyer: it goes something like this: http://paste.ubuntu.com/5698946/ [16:21] rogpeppe: I have some ideas on how to do that already, but they require changing the API, which I think it's necessary either way [16:22] rogpeppe: There are other things to be fixed in that API, and I'd like to do that all at once [16:22] niemeyer: ok. could we apply this band-aid in the meantime. (see benji's woes above) [16:22] ? [16:23] rogpeppe: Can't we apply a band-aid in juju-core instead? [16:23] niemeyer: ok. i'd hoped to avoid duplicating the shouldRetry logic, but i guess not. [16:23] rogpeppe: I'd rather not introduce yet another way to put a file just to break the API soon [16:23] niemeyer: i haven't introduced anything new [16:24] niemeyer: just changed PutReader so if you *happen* to give it a ReadSeeker, it'll retry [16:24] rogpeppe: Even worse.. that's breaking the API [16:25] niemeyer: not badly. there's no guarantee of how much data PutReader will read. [16:25] niemeyer: though i suppose you might give it a reader already half-way through. [16:25] niemeyer: hmm, yeah [16:25] niemeyer: will band-aid in juju [16:26] rogpeppe: Thanks.. I'm hoping it won't be so long until we fix this [16:26] rogpeppe: The new API will look a lot better, and enable other things that are simply impossible ATM [16:26] * rogpeppe lives in ope [16:26] hope [16:26] niemeyer: great [16:27] niemeyer: it would be nice to see CopyObject too, i just wanted it today [16:27] rogpeppe: Hah, yeah, I thought about that after your message [16:27] rogpeppe: Actually, I should answer it [16:27] niemeyer: which message? [16:28] Done [16:28] rogpeppe: "A few issues" [16:29] niemeyer: ah yes! [16:29] niemeyer: that must've been what prompted me to go looking for it - i'd forgotten! [16:29] rogpeppe: I thougth about the copying too, but scratched the reply because it's S3-specific [16:30] rogpeppe: The suggestion I just gave isn't, though [16:30] niemeyer: we've already unified the binaries [16:31] rogpeppe: So it's a single one? [16:31] niemeyer: yup [16:31] rogpeppe: Brilliant! [16:31] niemeyer: still takes a minute to upload though [16:31] rogpeppe: Wow.. really? I don't think it takes a minute even for me [16:31] niemeyer: shitty rural internet bandwidth [16:32] rogpeppe: Ah, okay [16:32] niemeyer: the ADSL is very "A" [16:33] :-) [16:38] fwereade_: reviewed [16:39] rogpeppe: https://codereview.appspot.com/8667043/? [16:39] dimitern: looking [16:43] rogpeppe: https://pastebin.canonical.com/88970/ [16:44] dimitern: reviewed [16:44] benji: looking [16:44] thanks [16:44] rogpeppe: cheers [16:45] benji: ha, looks like a problem with dave cheney's new mongo db packaging branch [16:46] who here knows about dpkg issues? [16:46] mgz: ^ [16:47] see line 308 and after on benji's paste above: https://pastebin.canonical.com/88970/ [16:47] looking [16:48] benji: is this in ec2? [16:48] yep [16:50] benji: this was with --upload-tools? [16:50] rogpeppe: yep; this is the command I ran: juju bootstrap --fake-series precise --upload-tools [16:50] benji: and you're running on quantal? [16:51] probably borked in quantal [16:51] I'd expect that to work in raring [16:51] mgz: that what i'm thinking [16:51] oh rats [16:51] back to the tarball we go [16:51] I don't think I have a quantal vm up right now, but can spin one up quickly [16:53] benji: there's a (sigh) easy fix for you just to get things working. change the CurrentSeries function in the version package to just return "precise" [16:53] has any of the backport work for mongo actually happened yet? [16:53] mgz: what backport work? [16:53] rogpeppe: ok, let me give that a try [16:55] well, this is never going to work till we have mongo 2.2 with ssl in something usable by the pre-raring series [16:56] mgz: it works in precise, i think [16:57] we have a the ppa... [17:00] I guess the issue i the boost versioning somehow [17:02] * rogpeppe has never used boost but loves it anyway [17:02] rogpeppe: I get this error when bootstrapping with the hard-coded precise: error: cannot find tools: no compatible tools found [17:02] should I still be using --fake-series precise and --upload-tools [17:03] benji: what does juju version print? [17:03] rogpeppe: 1.9.14-quantal-amd64 [17:03] benji: looks like you're not using the version you just changed [17:04] hmm, I did a "go build -a launchpad.net/juju-core/..." is there some other rebuild command I should have used? [17:04] benji: go install launchpad.net/juju-core/... [17:04] benji: go build just checks; it doesn't affect anything [17:05] running [17:05] ok, we have "1.9.14-precise-amd64" now, re-bootstrapping [17:07] rogpeppe: "juju bootstrap --fake-series precise --upload-tools" still gives me "error: cannot find tools: no compatible tools found" [17:07] benji: darn [17:08] benji: try adding --debug to that. what does it print? [17:09] rogpeppe: http://paste.ubuntu.com/5699072/ [17:10] benji: line 6 is weird [17:10] yep [17:10] benji: could you paste your changed version of version.go, please? [17:11] rogpeppe: here's the diff http://paste.ubuntu.com/5699080/ Do you want the whole file too? [17:12] benji: no, that's fine [17:12] benji: erm [17:12] * rogpeppe is slightly baffled [17:14] benji: ah! [17:14] benji: you need to change default-series in your environments.yaml too [17:15] rogpeppe: trying [17:15] fwereade_: looks like the upload-tools logic has been broken [17:16] fwereade_: upload-tools should override default-series, i think [17:18] it is looking better [17:24] rogpeppe: all working; much thanks for your help [17:25] benji: np. it illuminated a bug, so it's all for the best. [17:25] pwd [17:53] last CL for today, trivial: https://codereview.appspot.com/8674043/ [17:53] niemeyer: you'll like that :) ^^ [17:54] dimitern: Woohay! [17:56] rogpeppe, fwereade_: any one of you wanna take a look as well? ^^ [17:58] i just took a look at a single machine-0.log file [17:58] the environment had been booted for a couple of hours [17:58] it's 23MB [17:58] 148643 lines [17:58] rogpeppe: that's with debug on? [17:58] dimitern: yeah [17:59] rogpeppe: yeah.. not that surprising [17:59] dimitern: 4.3MB even with debug off though [17:59] dimitern: (53098 lines) [17:59] dimitern: that's not really that sustainable [18:00] rogpeppe: we need logrotate + zipping [18:00] dimitern: total time elapsed since log file started: 2h23m [18:01] rogpeppe: the logs are full of duplicated stuff, so should compress well [18:01] dimitern: with debug on, 1.2MB compressed; with debug off, 177K [18:02] rogpeppe: that's not that bad [18:02] rogpeppe: although... for 24h that's like 2MB compressed [18:02] dimitern: exactly [18:03] rogpeppe: we can implement some smart off loading to the bucket + logrotate and compression [18:04] * rogpeppe is glad he has a command line interface that can easily cope with a 23MB history [18:04] :) [18:05] rogpeppe: btw have a look at the log deuglifying CL when you have 2m [18:06] dimitern: reviewed [18:06] rogpeppe: the card said only do the main packages for now [18:06] rogpeppe: thanks [18:06] dimitern: i'm pretty sure it was intended to take JUJU out too [18:06] fwereade_: ping [18:06] dimitern: let me check [18:08] dimitern: from william's email: [18:08] > 1) Drop package badging from log calls in "main" packages 2) Drop [18:08] > the JUJU:... badging across the board [18:08] rogpeppe: ah, you're right the card says both, but in an earlier talk with fwereade_ he said not to do the JUJU stuff for now, IIRC [18:09] i'll leave it hanging for now then, until we resolve what to do exactly [18:09] dimitern: oh go on, please do :-) [18:09] rogpeppe: I want to, but not today :) [18:09] dimitern: ok then [18:09] rogpeppe: i have some meatballs and beer waiting [18:10] dimitern: time for one more? https://codereview.appspot.com/8545045 [18:10] rogpeppe: will look in a bit [18:14] dimitern: hmm, the SetProvisioned logic has broken this environment [18:14] rogpeppe: oh? [18:15] dimitern: the machine agent is repeatedly doing this: http://paste.ubuntu.com/5699268/ [18:16] rogpeppe: do you have more context where the instance was started? [18:18] dimitern: it was started just there [18:18] rogpeppe: oh, sorry, yes [18:18] rogpeppe: but how about the machiner log? [18:18] dimitern: here's the first stuff we hear from the provisioner: http://paste.ubuntu.com/5699287/ [18:19] rogpeppe: what does status report? [18:19] rogpeppe: reviewed btw [18:20] dimitern: http://paste.ubuntu.com/5699292/ [18:20] rogpeppe: yeah.. as expected [18:20] dimitern: there are two machines because when the first one failed, i removed the service and tried to remove the machine [18:20] dimitern: why can't it set the instance id when there's nothing reported by status for inst id ? [18:20] rogpeppe: really weird though.. I tested this both with tests and with live instances, several times - no problems [18:21] dimitern: how does it judge "already set"? [18:22] rogpeppe: Assert: append(isAliveDoc, notSetYet...), [18:22] rogpeppe: notSetYet := D{{"instanceid", ""}, {"nonce", ""}} [18:23] dimitern, I did not intend to say we should keep the JUJU badging [18:24] dimitern, nobody in the whole world likes the JUJU badging AFAIK ;) [18:24] I believe the characters JUJU should appear whereever JUJU possible [18:25] mgz, that's JUJU crazy talk [18:27] rogpeppe: are you sure the tools the env was bootstrapped with include all my latest CLs? [18:28] dimitern, I would hope that each one of them would work in order ;p [18:28] rogpeppe: I can't see the agent-state being set in status, and where set it says "running", which is wrong, it should be started [18:28] rogpeppe: I removed that case [18:28] dimitern: i'm not sure [18:28] fwereade_: so is it good like this? [18:28] dimitern: but would that impact this bug? [18:28] rogpeppe: checking.. [18:29] dimitern: i just retrieved the value of the Machine: [18:29] &state.Machine{st:(*state.State)(0xc200335d10), doc:state.machineDoc{Id:"1", Nonce:"", Series:"precise", InstanceId:"", Principals:[]string{}, Life:1, Tools:(*state.Tools)(nil), TxnRevno:4, Jobs:[]state.MachineJob{1}, PasswordHash:""}, annotator:state.annotator{globalKey:"m#1", tag:"machine-1", st:(*state.State)(0xc200335d10)}} [18:30] rogpeppe, it's not possible the cli tools have a funny version, is it? [18:30] dimitern, we need to lose the JUJU badging [18:30] dimitern, I'm sorry, I clearly hideously miscommunicated [18:30] fwereade_: ok, so I leave it WIP for now and finish it tomorrow [18:31] dimitern, sgtm [18:31] dimitern, wipped [18:31] rogpeppe: this confirms it - it's using tools from before the nonce was generated [18:31] rogpeppe: probably even before startinstance was respecting the passed nonce [18:32] dimitern: but... [18:32] dimitern: i just tried calling SetProvisioned directly from my client connection [18:32] dimitern: and it failed saying "already set" [18:32] dimitern: even though InstanceId and Nonce are both empty [18:32] rogpeppe: there was a lurking bug in there, initially, which I fixed afterwards [18:34] dimitern: i still see that issue with the latest version of trunk [18:34] dimitern: that is, this code: http://paste.ubuntu.com/5699342/ [18:34] rogpeppe: ok then, that's good, because I don't [18:34] dimitern: produces this output: [18:35] &state.Machine{st:(*state.State)(0xc20032cdc0), doc:state.machineDoc{Id:"2", Nonce:"", Series:"precise", InstanceId:"", Principals:[]string{"buildbot-master/1"}, Life:0, Tools:(*state.Tools)(nil), TxnRevno:2, Jobs:[]state.MachineJob{1}, PasswordHash:""}, annotator:state.annotator{globalKey:"m#2", tag:"machine-2", st:(*state.State)(0xc20032cdc0)}} [18:35] 2013/04/11 19:33:57 set prov: cannot set instance id of machine "2": already set [18:35] dimitern: i'm not sure how that could happen, regardless of what's out there in the cloud [18:36] rogpeppe: file a bug then please, I'll dig into it tomorrow, if I can reproduce it [18:36] dimitern: i have the environment online now... [18:37] dimitern: i can leave it until the morning if you like [18:37] rogpeppe: can I access it? [18:37] rogpeppe: so I can debug the code in place? [18:37] dimitern: hmm, let me think [18:40] rogpeppe: actually, can you try adding some logging into SetProvisioned [18:40] dimitern: sure [18:40] dimitern: i just sent you a PM on canonical IRC [18:40] rogpeppe: log the exact error on Run [18:41] rogpeppe: ok, I'll try it now [18:41] dimitern: it must be ErrAborted [18:41] rogpeppe: ok, let's think aloud [18:41] rogpeppe: indeed it has to be, otherwise it'll be caught and reported earlier [18:42] dimitern: yup [18:42] rogpeppe: this means either assert failed, and since we're checking for alive before that, it has to be the other assert, right? [18:43] rogpeppe: no other case that I can see, AFAIU state/mgo transactions [18:43] dimitern: i assume the composition for AND conjunction works, but i don't *know* [18:44] niemeyer: ping [18:44] dimitern: Heya [18:44] niemeyer: hey, can you please take a look at this code: http://paste.ubuntu.com/5699363/ [18:44] dimitern: i've gotta go in a few moments [18:45] dimitern: Sure.. what should I be looking for? [18:45] niemeyer: and reading a bit further up the log, tell me if i'm correct [18:45] dimitern: Can you be a bit more specific? [18:45] niemeyer: so we're seeing "already set" error being reported from this method [18:45] niemeyer: and in state both instanceid and nonce are empty for that machine [18:46] dimitern: ok, so without the asserts it did succeed. [18:46] niemeyer: so the asserts should work fine [18:46] niemeyer: but somehow it aborts - can it abort for something other than a failed assert? [18:46] dimitern: No [18:47] dimitern: ah... [18:47] dimitern: Are you sure these values are present and empty? [18:47] dimitern: i think i know what's going on [18:47] rogpeppe: without both or without only notSetYet? [18:47] niemeyer: that's the issue [18:47] rogpeppe: Cool, bingo [18:47] dimitern: "not set" != "empty" [18:48] niemeyer: rogpeppe connected to the state server and extracted the machine: &state.Machine{st*state.State)(0xc20032cdc0), doctate.machineDoc{Id:"2", Nonce:"", Series:"precise", InstanceId:"", Principals]string{"buildbot-master/1"}, Life:0, Tools*state.Tools)(nil), TxnRevno:2, Jobs]state.MachineJob{1}, PasswordHash:""}, annotatortate.annotator{globalKey:"m#2", tag:"machine-2", st*state.State)(0xc20032cdc0)}} [18:48] what? [18:48] dimitern: so it is a compat issue after all [18:48] dimitern: i haveta go [18:48] see you tomorrow [18:48] rogpeppe: see you [18:48] niemeyer: can you explain please, because i didn't get it [18:49] niemeyer: the doc is there, so they should be set to empty, right? [18:49] dimitern: MongoDB documents may have an empty field ({"nonce": ""}), and it may also have a non-existent field ({}). Those aren't the same thing. [18:50] niemeyer: is mongo ignoring empty string fields when you insert a doc? [18:50] dimitern: No [18:50] dimitern: But you can do that in several ways from code [18:50] niemeyer: because the code that adds the machine is .. [18:51] niemeyer: state.addMachine, which inserts a machine doc, setting only Id and Life [18:52] niemeyer: others, by the virtue of being uninitialized string fields, should be set to empty string, no? [18:53] niemeyer: no, actually the code is like this: http://paste.ubuntu.com/5699387/ and they should be set explicitly [18:54] dimitern: Just load the document from the database before running code that is failing [18:54] dimitern: and print it [18:54] dimitern: Into a map [18:55] niemeyer: good idea, but how? [18:55] var m map[string]interface{} [18:55] err := collection.FindId(id).One(&m) [18:56] niemeyer: no, I mean I just do st.machines.FindId(m.Id()).One(&map) ? [18:56] if err != nil { return err } [18:56] ah, ok, 10x [18:56] dimitern: I think that's what I've just said, yeah :) [18:56] niemeyer: indeed, thanks for the help [18:56] dimitern: np, let me know what it shows [19:06] niemeyer: unfortunately, I cannot access it, I tried, but it's rogpeppe's environment and despite his comment the aws keys shouldn't matter, they do - i cannot use mine [19:07] dimitern: Well, they don't matter much at least [19:07] dimitern: Do you have ssh access to it? [19:07] niemeyer: ah, let me try that [19:08] niemeyer: same problem - perm denied [19:08] dimitern: Sure, but you have the whole data at your hand [19:08] niemeyer: it's not a shared account or anything [19:08] dimitern: Just connect to the database with mongo and do the same query [19:08] dimitern: mongo localhost: [19:08] dimitern: use juju [19:09] niemeyer: I can't access the mongo there in rog's environment [19:09] dimitern: db.machines.find({_id: }) [19:09] dimitern: Hmm.. why? [19:09] niemeyer: ssh is not working (my key is different) [19:09] dimitern: Oh, okay.. huh [19:09] dimitern: How come Roger assumed you could access it? [19:09] niemeyer: :) probably he's tired [19:10] dimitern: It's a bit of a weird idea if you have no keys whatsoever :-) [19:10] exactly :) [19:10] anyway, I'm tired too, so have a good evening all! [19:10] dimitern: Either way.. Roger said "that's the issue" [19:11] dimitern: So I assume he checked it [19:11] niemeyer: yeah, hope he remembers :) [19:12] niemeyer: thanks again, if the issue is reproducible tomorrow, I'll try what you suggested [19:12] dimitern: indeed :) [19:12] dimitern: np.. I'm pretty sure it's an issue with the document [19:12] dimitern: the code path for such a trivial assertion was exercised enough, I'd hope [19:12] niemeyer: yeah, mongo keeps surprising me here and there [19:13] dimitern: What kind of surprise did you have so far? [19:13] niemeyer: syntax mostly - it's not always trivial to translate from mongo docs into D{{}} things [19:14] dimitern: Hmm [19:14] dimitern: It's actually 1-to-1.. !? [19:14] niemeyer: probably, but haven't got the hang of it yet - still try to find similar examples in the code and adapt [19:15] dimitern: It's really 1-to-1 [19:15] niemeyer: the nested {{}} and sometimes []D{{}} are not helping :) but i'm learning [19:15] dimitern: yeah, the visuals may get confusing [19:15] dimitern: Note that this is an optimization [19:15] dimitern: For non-important code paths, you can use maps, which look a lot better [19:15] niemeyer: how? [19:16] dimitern: {"foo": "bar"} in the mongo shell is M{"foo": "bar"} [19:16] dimitern: So the overhead is a single char :) [19:16] niemeyer: and M is map[string]interface{} ? [19:16] dimitern: assuming a M is bson.M or you own map[string]interface{} [19:16] dimitern: yeah [19:16] dimitern: You can define your local type whenever you feel like it [19:17] niemeyer: this sheds more light on it, actually [19:17] dimitern: type m map[string]interface{}.. m{"foo": "bar"} [19:17] niemeyer: yeah, i did that in some places, esp. nested maps like config attrs [19:17] dimitern: there's zero support for the bson.M type, specifically [19:17] dimitern: It's just a map [19:18] niemeyer: it's not bad, it's just confusing at first to see the go equivalent and parse it visually [19:19] niemeyer: but I agree it's the shortest workaround possible in go probably [19:20] dimitern: Right.. we need at least one char there [19:20] dimitern: Which doesn't feel so bad :) [19:22] niemeyer: indeed [21:17] morning [21:18] hey thumper [22:16] thumper, heyhey [22:16] hi fwereade_ [22:16] thumper, can you think of any reason to upgrade juju with --upload-tools *without* bumping the build number? [22:17] fwereade_: I've not looked into it too deeply, but my first thought was, no, bumping the build number sounds essential with upload-tools [22:17] the only time you wouldn't [22:17] is if you have updated the version number yourself since your last upload [22:18] thumper, *and* there are no tools in the bucket with a matching m.m.p that need to be superseded [22:19] thumper, cool, thanks [22:19] np === wedgwood is now known as wedgwood_away [23:58] mramm: ping