thumperdavecheney: does lbox submit also send pending comments?00:08
* thumper pulls fwereade_'s latest commit and will rerun the tests before submitting00:09
fwereade_thumper, ah, sorry00:10
thumperfwereade_: np00:10
thumperfwereade_: it is only a few minutes...00:11
bigjoolshey fwereade_ did you get anywhere with the review on the maas branch?00:14
fwereade_thumper, I have endured worse test times than this... I've endured build times worse than these test times00:14
thumperfwereade_: so have I00:15
thumperI remember one bank I was at where the compile time was two hours00:15
fwereade_bigjools, I have been most remiss there -- it all looks essentially fine, but I have yet to step up and convince myself it all fits together00:16
fwereade_bigjools, I will do another pass00:16
* bigjools had a 4 hour build once00:16
thumperah wat?00:16
* thumper sighs00:16
thumperlbox submit expects the prereq to be committed first00:16
bigjoolsfwereade_: ok we're still making some changes to it, some as requested in the initial pass and others to actually make it work. I deployed a charm on my microservers yesterday!00:17
thumperhalf the point of a prereq is to get work reviewed independently of landing00:17
davecheneythumper: of course00:17
thumperthat's bollocks00:17
fwereade_bigjools, sweet!00:17
* thumper goes to follow lbox's rules00:17
bigjoolsthumper: lbox expects quite a few things that are unreasonable IMO00:17
davecheneyif you propose the chain A -> B -> C00:17
davecheneyi don't think it is unresonable to expect A to land beore B, etc00:18
thumperI think it is unreasonable00:18
bigjoolsif C has B and A, then C will implicitly land them00:18
thumperwhat if A isn't complete?00:18
thumperor doesn't make sense without B?00:18
davecheneythen that isn't a prereq in the sense that lbox undestands00:18
thumperin which case lbox is dumb00:18
* davecheney is not fucking arguing about the tools today00:19
bigjoolscareful  thumper, you don't want to be accused of bitching00:19
* thumper merges before heading into town for lunch and errands00:20
davecheneythumper:                 Tools:           newSimpleTools("1.2.3-linux-amd64"),00:23
davecheneythis will make you mad00:23
davecheneywhat series are these tools ?00:23
davecheneyno wait, they are the linux series00:23
thumperlinux of course00:23
davecheneylarakin linux00:23
* davecheney fixes that shit00:24
* thumper waits for lbox to do its slow merge00:24
* thumper bitches00:24
thumperand done00:24
* thumper heads to lunch00:24
=== thumper is now known as thumper-afk
fwereade_bigjools, ok, I think I've finished the review01:03
fwereade_bigjools, I whine about lots of things but basically I want it to land01:03
fwereade_davecheney, I don't suppose you're just putting the last finishing touches to a https://codereview.appspot.com/8545043/ review? :)01:05
davecheneyfwereade_: sorry, i was not01:08
davecheneyit looked really large01:08
davecheneyso I was selfishly noodling with my own brach01:08
fwereade_davecheney, no worries -- but it's a bit less bad than it looks, honest :)01:09
davecheneyfwereade_: you -> sleep, will review after lunch01:09
* davecheney goes back to fighting with the cloudinit checkers01:09
davecheneymap[interface{}]interface{} can bite me01:09
* fwereade_ goes off to sleep :)01:09
* fwereade_ solemnly advocates killing it with fire01:10
bigjoolsfwereade_: thanks01:22
bigjoolsnot sure I will ever get used to "if" preconditions01:25
bigjoolspre-statements even01:25
davecheneyconstraints_test.go:283: // A failure here indicates that goyaml bug lp:1132537 is fixed; please // delete this test and uncomment the flagged constraintsRoundtripTests. c.Assert(val.Hello, IsNil)01:28
davecheney... value *string = (*string)(0xc20006b4d0)01:28
davecheneywhut ?01:28
davecheneyI guess I need to merge to trunk01:29
bigjoolsare there plans to introduce library versioning into Go at any point?01:29
=== thumper-afk is now known as thumper
davecheneybigjools: nothing has been discussed on the list01:30
thumperdavecheney: there was a list thread about pulling the latest goyaml01:30
davecheneyyeah, i did that01:31
davecheneybut the failure above sounds like the opposite01:31
davecheneyi've just merged trunk01:31
davecheneyi think that has fixed it01:31
davecheneyjuju debug-log is awesome01:54
davecheneyi can watch everything happening in an environment01:55
davecheneyProposal: https://code.launchpad.net/~dave-cheney/juju-core/113-juju-bootstrap-raring-cloudinit-II/+merge/15826302:03
davecheneyerror: Failed to send patch set to codereview: can't upload base of environs/mongo.go: ERROR: Checksum mismatch.02:03
davecheney^ i deleted them added this file during the branch02:04
davecheneynow it is really upset02:04
davecheneyany suggestions02:04
thumperyou removed a file?02:04
thumperand it is complaining?02:04
davecheneythen I added it back02:04
davecheneyas in copied it from another branch02:04
thumpernot so good02:05
thumperit will complain about file-ids02:05
thumpercan't you just do a reverse merge of the revision that deleted it?02:05
thumperand discard everything else?02:05
davecheneyi'll figure it out02:06
davecheneyscrew it, just importde the diff into a new branch02:13
thumperdavecheney: you can delete merge poposals in LP02:21
thumperinstead of rejecting them, (if you like)02:21
davecheneythumper: too late now02:22
m_3davecheney: hey... gotta sec?03:48
m_3having an hp problem 'error: cannot log in to admin database: auth fails'03:48
m_3status -v (http://paste.ubuntu.com/5697341/) shows I'm connecting, but...03:50
davecheneym_3: sorry that environment is stillborn04:29
davecheneygrab /var/log/cloud-init-output.log from the first machine04:30
davecheneythen destroy the environment04:30
m_3davecheney: ok, lemme look04:30
m_3davecheney: that log looks normal04:32
m_3davecheney: and I can ssh to the box04:32
m_3davecheney: ubuntu@ your keys are there04:33
m_3if you have a sec to help, I'm trying to set up a scale-test environment04:34
m_3nm, I'm just using that instance to test from there04:47
thumperdavecheney: what would your first thought be if I said I wanted "environment: whatever" as the first line of `juju status` ?05:01
thumperpersonally I think it is crazy we don't have this in status somewhere05:02
m_3yeah, it's not even in there huh05:02
* thumper takes it to the list05:07
thumperI would just do it05:07
thumperbut it seems that some people get hung up with backward compatability05:07
thumperbut it would be nice to see something while it waits for bootstrap to finish...05:07
m_3Ha! ok, that's funny, I figured for sure that'd be in the juju-0.6 status05:08
m_3most of the time you're sort of filtering status output anyways05:08
m_3the machine ids up top are just noise normally05:09
m_3but yes, it absolutely needs the current JUJU_ENV imo05:09
thumperm_3: feel free to comment on my email message then :)05:10
davecheneym_3: checking05:16
davecheneym_3: 2013/04/11 04:09:32 JUJU jujud machine command failed: state entity name not found in configuration05:17
davecheneyerror: state entity name not found in configuration05:17
davecheneyok, this broke because there is no 1.9.13 tools for hp05:17
m_3davecheney: np... I think I worked around it05:18
davecheneyjam gz: caused by: https://az-2.region-a.geo-1.compute.hpcloudsvc.com/v1.1/17031369947864/servers%!(EXTRA string=Resource limit exeeded at URL %s., string=https://az-2.region-a.geo-1.compute.hpcloudsvc.com/v1.1/17031369947864/servers)05:44
davecheney^ is there a bug about this format string snaffu ?05:44
* thumper off swimming, back for the meeting05:46
=== ChanServ changed the topic of #juju-dev to: https://juju.ubuntu.com | On-call reviewer: <noone> | Bugs: 0 Critical, 52 High - https://bugs.launchpad.net/juju-core/
rogpeppemornin' all05:55
davecheneyrogpeppe: moning05:56
rogpeppedavecheney: yo!05:56
rogpeppedavecheney: how's tricks?05:56
rogpeppedavecheney: i could do with another review on https://codereview.appspot.com/8626044/ if you fancy it05:57
davecheneywrap your peepers around this, https://codereview.appspot.com/8648043/05:57
rogpeppedavecheney: swap ya!05:57
* davecheney looks05:57
rogpeppedavecheney: nice! reviewed.06:06
davecheneyrogpeppe: and to explain myself in my review of your code06:07
davecheneyi'm concerned about the magic strings06:07
davecheneynot becuse they are secrets06:07
davecheneyjust appear to be test warts06:07
rogpeppedavecheney: they're only magic in tests06:07
davecheneyas you said on the other CL yesterday06:07
davecheneyso why not change SetPassword(string) to be ResetPassword() -> string, error06:08
davecheneythen it tells you what the password is06:08
davecheneyand we don't have to add even more fixtures06:08
rogpeppedavecheney: are you talking about state.User.SetPassword there?06:08
rogpeppedavecheney: no can do06:09
davecheneywhy not06:09
rogpeppedavecheney: the whole point is that we're setting the password from the admin-secret in the environment config06:09
davecheneyok fine06:09
davecheneyobjection withdrawn06:10
rogpeppedavecheney: cool06:10
rogpeppedavecheney: your point about possibly leaving a blank password on the admin user is a good one, i think, *because* we don't abort the bootstrap init process if juju bootstrap-state fails06:11
rogpeppedavecheney: i think we should06:11
rogpeppedavecheney: i've thought for a while now that we should do "set -e" at the start of the cloudinit script06:11
rogpeppedavecheney: what do you think?06:11
davecheneyif you just did AddUser("admin", $random) that would solve the problem06:11
davecheneythe user is created, but nobody knows the password06:11
davecheneywould that work ?06:11
rogpeppedavecheney: i'd prefer to solve the deeper problem06:12
rogpeppedavecheney: why are we even running the machine agent if the initial set up has failed?06:12
davecheneysounds like a reasonable question06:12
davecheneynone of those cloud init steps are optional06:12
rogpeppedavecheney: exactly06:12
rogpeppedavecheney: i might change cloudinit to make each command do foo || error foo failed >&206:14
rogpeppedavecheney: that way you won't have to infer the failure in the logs from the command's stderr output06:14
davecheneydoesn't cloudinit send &1 and &2 to /var/log/cloud-init-output ?06:15
rogpeppedavecheney: yeah, the &2 is probably unnecessary, just habit06:17
rogpeppedavecheney: maybe just set -ex actually06:18
rogpeppedavecheney: does that sound reasonable to you?06:23
rogpeppeniemeyer: hiya! bit early for you, ain't it?06:24
niemeyerrogpeppe: Nope.. it's actually very late :)06:24
rogpeppeniemeyer: :-)06:24
rogpeppeniemeyer: squalling bairn?06:24
niemeyerrogpeppe: Have been cooking a release of mgo fixing a cluster resync bug, and spent some hours on it06:25
rogpeppeniemeyer: cool06:25
niemeyerJust about done now06:25
rogpeppeniemeyer: the resync code is quite subtle, i thought06:25
niemeyerrogpeppe: Yeah, real world is not so friendly06:28
niemeyerrogpeppe: Servers popping in and out with their own ideas of what they should be doing is fun06:29
rogpeppeniemeyer: while you're about it, it would be nice to have the redial interval configurable. we're hacking around it currently by sleeping in Dial) but i think the right place is in mgo.06:29
niemeyerrogpeppe: What's the issue?06:30
rogpeppeniemeyer: by default, mongo dials about 10 times a second.06:30
rogpeppeniemeyer: that's not great when you're waiting for a server to come up.06:31
niemeyerrogpeppe: Where does it do that again?06:32
rogpeppeniemeyer: the logic is scattered06:32
rogpeppeniemeyer: the emergent behaviour is that it dials 3 (5?) times then sleeps for a bit (0.5s ?) then repeats06:33
niemeyerrogpeppe: Ah, right06:33
rogpeppeniemeyer: i've been through the logic and understood it a few times, but i always forget immediately afterwards :-)06:33
niemeyerrogpeppe: It's that retry that I always forget to count06:34
niemeyerrogpeppe: I think we could just take that out..06:34
niemeyerrogpeppe: But, not today06:34
rogpeppeniemeyer: indeed06:35
rogpeppeniemeyer: our hack works ok for the time being06:35
niemeyerrogpeppe: Will keep that in mind06:35
rogpeppeniemeyer: some time, an bounded exponential backoff would probably be good.06:35
rogpeppeniemeyer: so we don't suffer too badly from the herd effect after a network outage.06:35
niemeyerrogpeppe: This will probably never happen06:38
rogpeppeniemeyer: ok. too hard? or just not the right thing to do?06:38
niemeyerrogpeppe: A database driver is something you want connected as soon as the network is bac06:38
niemeyerrogpeppe: But more reasonable timings is a good idea06:39
rogpeppeniemeyer: if the max bound was only a small number of seconds, it would still be ok.06:39
niemeyerrogpeppe: Sure, but it would also be fairly irrelevant I suppose06:39
rogpeppeniemeyer: i'm not sure. if the network goes down, we don't really want all the clients dialling in synchrony06:40
rogpeppeniemeyer: the exponential backoff (with a random element) can allow a good spread over time, i think.06:40
niemeyerrogpeppe: Perhaps.. it depends on the scale and on the application behavior06:40
rogpeppeniemeyer: indeed06:41
niemeyerrogpeppe: The random element is somewhat unnecessary as well.. they're not wall-clock synchronized06:41
niemeyerrogpeppe: This would only be really effective if we allowed the backoff to actually backoff06:42
rogpeppeniemeyer: if they all see the net go down at the same moment, they all redial immediately and they're all sleeping for the same amount of time, i suspect they'll keep reasonably clustered.06:42
niemeyerrogpeppe: Like, spanning through several tens of seconds06:42
rogpeppeniemeyer: the only random element i'd add would be the first sleep interval.06:42
davecheney2013/04/11 06:43:16 ERROR JUJU:jujud:machine cmd/jujud: upgrader loaded invalid initial environment configuration: required environment variable not set for credentials attribute: User06:43
davecheneyany ideas ?06:43
niemeyerrogpeppe: I guess I could put something intelligent based on the number of clients06:44
niemeyerrogpeppe: and have the backoff take that into account06:44
rogpeppeniemeyer: that sounds like a great idea if you have that info06:44
niemeyerrogpeppe: Yeah, I think it's around in the server06:44
rogpeppedavecheney: interesting06:45
niemeyerEither way, that's the kind of bug I'd love to get someone complaining about :-)06:45
rogpeppeniemeyer: lol06:45
davecheneyrogpeppe: environment still appears to work06:45
davecheneythis is in HP06:45
davecheneywe don't need that attr in ec206:45
rogpeppedavecheney: ah.06:45
rogpeppedavecheney: yeah, i wondered06:45
niemeyerMost people that use mgo heavily do so on the basis of a few machines06:45
niemeyermillions of requests a days, but just a few servers06:45
davecheneyi wonder if this is part of our 'don't upload the secrets' logic06:45
rogpeppeniemeyer: well, we are going to move in that direction06:46
niemeyerdavecheney: Heya06:46
rogpeppedavecheney: it's possible. is this with the openstack driver?06:46
* davecheney waves06:46
niemeyerrogpeppe: Yeah, I have my fingers crossed to have a Critical filed! ;)06:46
rogpeppeniemeyer: so far the API seems to be working quite well06:47
rogpeppeniemeyer: although none of the agents are using yet06:47
niemeyerAlright, I really need to sleep now, or I won't wake up in time for the meeting in a few hours06:47
rogpeppeit yet06:47
rogpeppeniemeyer: sleeeeeeep....06:47
niemeyerhave a good Beginning Of Day folks06:47
rogpeppeniemeyer: you too06:47
rogpeppeniemeyer: well, a good sleep anyway :-)06:47
niemeyerThanks :)06:47
_mup_Bug #1167723: environs/openstack: error relating to upgrader on bootstrap node <juju-core:New for dimitern> < https://launchpad.net/bugs/1167723 >06:48
rogpeppejeeze, i'm not surprised some of the live tests fail. having started an instance, it didn't see that instance when asking for it, even when waiting a whole 10 seconds later.07:38
rogpeppehow the f*!@# can we make this stuff really reliable?07:38
dimiternrogpeppe: increase timeouts and wait more?07:41
rogpeppedimitern: i'm just trying with a 20s timeout07:41
rogpeppedimitern: but really, if 20s why not 5 minutes?07:41
dimiternrogpeppe: :) seriously?07:41
rogpeppedimitern: i've no idea07:41
rogpeppedimitern: i don't know what's happening inside amazon07:42
dimiternrogpeppe: would it affect local live/other tests?07:42
rogpeppedimitern: yeah.07:42
dimiternrogpeppe: btw a short and quick review? https://codereview.appspot.com/8630043/07:43
rogpeppedimitern: any time you're expecting an error, you'd need to time out the max amount07:43
rogpeppedimitern: looking07:43
rogpeppedimitern: i just saw the same thing happen even when waiting for 20 seconds07:46
rogpeppedimitern: i guess i should check our logic again. although it succeeds sometimes, it's possible we're getting something wrong somehow.07:47
TheMuerogpeppe: is there any way to log the communication between the client and ec2?07:51
dimiternrogpeppe: sgtm07:51
rogpeppeTheMue: yeah, we can do that07:51
TheMuerogpeppe: fine, maybe we just interpret the feedback wrong (or they changed a tiny bit in the protocol that's now fools our logic).07:53
rogpeppeTheMue: it's possible, but slightly unlikely, as our logic does work... some of the time!07:53
TheMuerogpeppe: that's what i meant, the difference between "working" and "done" if you only look at "done" but never interpret "working" (w/o knowing the protocol)07:55
TheMuerogpeppe: me knowing the protocol07:56
rogpeppeTheMue: i'm pretty sure it's sending exactly the same request each time07:56
rogpeppeTheMue: (i'm just checking that)07:56
TheMuerogpeppe: the client is sending the request, but i'm more interested in ec2s answer07:57
rogpeppeTheMue: yeah07:57
dimiterncan someone send me the g+ link for the meeting?07:58
rogpeppeha! found the bug!08:02
rogpeppeTheMue: our own logic *was* screwed08:02
fwereade_mramm, ping08:08
rogpeppemramm: bleep bleep bleep :-)08:08
=== ChanServ changed the topic of #juju-dev to: https://juju.ubuntu.com | On-call reviewer: - | Bugs: 0 Critical, 53 High - https://bugs.launchpad.net/juju-core/
dimiternrogpeppe: I have an issue with m.SetAgentAlive() - returns no error, I can see the pinger starting in the log, but pwatcher.Alive(m.globalKey()) returns false (that's what m.AgentAlive() returns), even after a timeout of 5s or using m.WaitAgentAlive(), still fails - any clues?08:23
rogpeppedimitern: you'll have to debug it i'm afraid08:23
dimiternrogpeppe: bugger.. ok, I though I might be missing something obvious08:24
rogpeppedimitern: check out the pinger tests08:24
dimiternrogpeppe: i'm getting this every time I run the machiner tests: [LOG] 20.79113 NOTICE juju: authorization error while connecting to state server; retrying08:48
dimiternrogpeppe: and then it reconnects and continues08:48
dimiternrogpeppe: could it be an indication why the pwatcher is not working right?08:48
dimiternrogpeppe: I found it! so adding mr.st.StartSync() in the MA code (not tests) after calling SetAgentAlive() the tests pass08:53
dimiternwhy did i ever have to call this? i though it was for tests only09:05
mrammhello all09:10
mrammI am very sorry I missed the meeting!09:10
dimiternmramm: hiya09:12
dimiternmramm: it went well, we even took notes: https://docs.google.com/a/canonical.com/document/d/1tKtAcrz9ADGF-o6lQtmGKYRf2ZHIR7lb-WctXvcLnfg/edit09:13
mrammI had the alarm set for the meeting, but it was set an hour late!09:13
mrammI am reading the notes now.09:13
fwereade_dimitern, it the machiner using the same *State as the tests?09:21
fwereade_dimitern, you shouldn't have to call StartSync usually, I don't think09:22
dimiternfwereade_: probably not, because the tests use the jujuConn one09:22
dimiternfwereade_: I had StartSync in both, but adding it the MA code and removing it from the test didn't cause failures, so I left it at that and submitted09:22
fwereade_dimitern, sorry, I don;t think that's ok09:23
dimiternfwereade_: i'm open to suggestions09:23
dimiternfwereade_: why do you think that?09:24
fwereade_dimitern, ok, does SetAgentAlive work in practice? it when you do juju status against a live environment, does it work?09:24
fwereade_dimitern, because Sync and StartSync are purely for the convenience of tests AFAI am aware09:25
dimiternfwereade_: in order to test this, I need to finish the status CL, then I'll test it live09:25
fwereade_dimitern, hmm, is this the tests for the actual machiner, or for the machine agent?09:26
dimiternfwereade_: machiner09:26
mrammSo, from the notes it seems to me like we ought to call this next release 1.10.0 and that we should try to do a release as soon as it is possible to do the python+go thing09:27
mrammit will have some bugs, which we should focus on fixing for the next few days, and release 1.10.1 early next week09:27
fwereade_mramm, +109:28
dimiternfwereade_: sorry, I keep mixing "task" and "agent", even though now I know the difference :)09:28
fwereade_dimitern, don't worry it still bugs me09:28
fwereade_dimitern, but ok I think I know what the problem is, just a mo09:28
mgzthis weekend should be the aim for python+go release testing09:29
mgzthe python code is in review for including in raring, and the go part should be ready to go after that09:30
mrammMy long term proposal would be that we do a "stable" release at the end of the month every month, and then switch to an unstable number for dev versions for weekly releases within that month.09:30
mrammmgz: that sounds reasonable09:31
dimiternmramm: +109:31
mrammwe should then try to come up with some good ways to test that the 1.10.x to 1.12.x release goes smoothly.09:32
fwereade_dimitern, ok, the problem is that StartSync needs to be called after SetAgentAlive to get a timely response, but that the test doesn't have an obvious way of knowing when that's happened09:33
mrammFrom the notes: "John: It will be parallel installable and easy to remove; they can’t easily bootstrap"09:33
mrammwhat's the second part mean?09:33
mrammAlso, can't we do this for them: "We need to make it clear that they need to set up a completely fresh directory structure for juju2"09:34
fwereade_dimitern, I think the right thing to do is to SS in the test every 50ms while waiting for WaitAgentAlive in another goroutine and sending the result out into the same select you're using for the SSs09:34
dimiternfwereade_: I tried that as well, didn't work09:37
dimiternfwereade_: oh, wait - I didn't try a separate goroutine09:38
dimiternfwereade_: let me just finish the status CL and I'll test it live as is first09:39
fwereade_dimitern, ok cool09:41
* dimitern machine needs a restart I think, bbiab09:41
jtvfwereade_: progressing steadily with our provider feature branch — thanks again for the reviews.  We have some things that are unclear though.  Could you comment on https://code.launchpad.net/~maas-maintainers/juju-core/maas-provider-skeleton/+merge/157025 and specifically the latest comments by Raphaël?09:58
fwereade_jtv, sure09:58
fwereade_jtv, I think I've covered it, let me know if I missed something10:14
fwereade_jtv, awesome that you got a deploy working :)10:14
rvbafwereade_: thanks for your reviews.10:15
jtvfwereade_: looking now...  Thank rvba for getting the deployment running!10:15
fwereade_rvba, thanks for the deployment!10:15
fwereade_rvba, let me know if anything needs clarification10:16
rogpeppeinteresting, i just saw a real-world failure caused by a temporary provisioning failure:10:16
rogpeppe2013/04/11 09:54:08 ERROR JUJU:jujud:machine worker/provisioner: cannot start instance for machine "1": cannot run instances: The service is unavailable. Please try again shortly. (Unavailable)10:16
* rogpeppe has finally submitted those ec2 expenses10:17
* rogpeppe hopes it's not too late10:17
dimiternf@$%! time zone issues with both g calendar and thunderbird10:18
dimiternfwereade_: so the problem is not only with tests10:19
fwereade_rogpeppe, cool10:20
fwereade_dimitern, oh yes?10:20
dimiternfwereade_: w/o SS in M's loop(), I'm not even getting AgentAlive() -> true, nil right after calling SetAgentAlive() with no error10:20
fwereade_dimitern, that's expected10:20
fwereade_dimitern, the MA doesn't need to detect it's made itself alive10:21
dimiternfwereade_: or, after using WaitAgentAlive for 5s10:21
dimiternfwereade_: i'm confused now10:21
rogpeppefwereade_: i think that just having the pinger around should cause AgentAlive to return true, no?10:21
fwereade_dimitern, are you sure about the presence timeout being 5s?10:21
fwereade_rogpeppe, hmm, ok, that is an interesting point10:22
dimiternfwereade_: I gave it 5s10:22
fwereade_dimitern, but where are you asking AgentAlive()?10:22
fwereade_dimitern, sorry, I mean the underlying pwatcher refresh frequency10:22
dimiternfwereade_: I had the following (all combinations tried): SAA() -> no error, WAA(5s) -> timeout, AA() -> false, nil (after SAA()) - that's all without SS after SSA()10:23
fwereade_dimitern, when and where are you making these calls, in response to what? :)10:23
dimiternfwereade_: with SSA() + SS() all works10:23
fwereade_dimitern, maybe we should g+10:23
* fwereade_ starts one10:24
dimiternfwereade_: in the beginning of the loop,before settings status10:24
dimiternfwereade_: sure10:24
rogpeppedimitern: if what you're saying is true, surely the presence tests would fail?10:30
dimiternrogpeppe: I got it working, just a sec10:34
TheMuedavecheney: oh, almost missed it. happy birthday.10:47
davecheneyTheMue: thanks mate10:49
TheMuedavecheney: yw, 3h left. and hey, you should party, not being here online ;)10:51
davecheneyTheMue: party on the weekend10:53
TheMuedavecheney: i would like to join, sadly got no time. *cough* *cough*10:54
* TheMue steps out for lunch, biab10:57
dimiternfwereade_: while i'm on to this, how about doing the same with the "down" state for units: "status: info" (regardless of the status, not only on error, and include info only when != "")?11:03
dimiternfwereade_: agent-state-info, that is11:04
dimiternfwereade_: anyway, PTAL https://codereview.appspot.com/8561046/11:24
rogpeppedimitern: what was the problem BTW?11:25
dimiternrogpeppe: not really sure, but the difference between state.Sync() and StartSync() made the difference11:25
dimiternrogpeppe: that a look, if you want ^^11:25
rogpeppedimitern: are you saying it worked with StartSync but not with Sync?11:28
dimiternrogpeppe: i'm saying the other way around11:31
rogpeppedimitern: but in the CL you're using StartSync, no?11:31
jammgz: poke for standup11:32
rogpeppedimitern: it shouldn't make any difference, given that you're doing WaitAgentAlive anyway11:32
dimiternrogpeppe: in the machiner CL yes, but now in the one above I changed it to how it should be (not to use StartSync() in the code, just call Sync() in the test, and it passed)11:32
rogpeppedimitern: oh, you were calling Sync in the *code*... i see11:33
dimiternrogpeppe: :) yeah11:33
dimiternrogpeppe: i'm testing it live on ec2 now and it seems to work fine11:33
rogpeppedimitern: but i was just looking at the one above, and it calls StartSync not Sync11:33
rogpeppedimitern: but that's fine - it shouldn't make a difference11:34
jamallenap: is mgz with you?11:34
dimiternrogpeppe: are you sure you're looking at the right one? https://codereview.appspot.com/8561046/diff/11001/worker/machiner/machiner_test.go11:34
rogpeppedimitern: ah, i was looking at status_test11:35
fwereade_dimitern, LGTM11:35
dimiternfwereade_: cheers, I'll make the changes and submit it soon11:36
dimiternfwereade_: and then i'll need some direction on the deuglifying the logs11:36
fwereade_dimitern, it's in the card11:37
fwereade_dimitern, drop the JUJU: badge throughout, and drop all package name prefixes from "main" packages (leave others intact)11:37
dimiternfwereade_: ah, ok - so only main packages11:37
rogpeppedimitern: so when you used StartSync in the machiner test, it failed?11:37
dimiternfwereade_: I though it was to generally reduce the JUJU strings11:37
dimiternrogpeppe: yeah11:38
rogpeppedimitern: hmm, that seems fragile.11:38
fwereade_dimitern, this gives us reasonably sensible output as feedback without sacrificing logging clarity, I think11:38
fwereade_rogpeppe, I forget why but AgentAlive tests all seem to use sync11:38
dimiternrogpeppe: I reckon because Sync() is synchronous, while StartSync() is not11:38
fwereade_rogpeppe, possibly I never knew why, IIRC gustavo did the whole presence system11:39
rogpeppedimitern: i know that, but even synchronous just means "delivered *somewhere*"11:39
rogpeppedimitern: not necessarily acted on11:39
rogpeppedimitern: so if it was racy with StartSync, it's probably racy with Sync11:39
rogpeppedimitern: i'll have a quick check in the presence code11:40
dimiternrogpeppe: cheers11:40
dimiternit works live, just tested11:44
dimiternfwereade_: what do you mean by "parenthesize the infos" ?11:46
dimiternfwereade_: agent-state-info: "(somestatus: someinfo)" ?11:46
fwereade_dimitern, agent-state: down; agent-state-info: (started)11:47
dimiternfwereade_: right11:47
rogpeppedimitern: i see why Sync is different from StartSync in this case. i think it's justifiable.11:54
* rogpeppe doesn't like the way the watcher code doesn't behave synchronously on receipt of a message, even though it easily could.11:55
rogpeppein my opinion, this is a hack:11:55
rogpeppew.next = time.After(0)11:55
dimiternrogpeppe: ha! so it returns a channel that's available to read on immediately11:57
rogpeppedimitern: yeah. or it sets the timer channel to that anyway.11:57
rogpeppefwereade_: i was thinking just yesterday that there really wasn't much value from having two status types12:00
dimiternrogpeppe: it definitely seems like a hack12:00
rogpeppedimitern: it's just to save writing a function.12:00
rogpeppedimitern: that would be called from two places12:00
fwereade_rogpeppe, I'd be +1 on unifying them if you are12:02
dimiternrogpeppe: nasty :)12:02
fwereade_rogpeppe, will simplify state a bit too12:02
rogpeppefwereade_: definitely.12:02
fwereade_dimitern, would you tack a branch to do that onto the end of your status work then please? or integrate it if you prefer12:03
dimiternrogpeppe, fwereade_: just to be clear here, we're talking about having just the same status type, rather than separate MachineStatus and UnitStatus?12:03
rogpeppefwereade_: const StatusError Status = "error" ?12:03
fwereade_rogpeppe, dimitern: yep12:03
dimiternfwereade_: it'll simplify some things, but not too much I think12:04
rogpeppewhy do those darn live tests keep passing when i want them to fail?!12:04
rogpeppedimitern: the Status functions become a little simpler12:04
rogpeppedimitern: no type conversion required12:05
dimiternrogpeppe: and the status command as well12:05
rogpeppedimitern: definitely - that being the motivating example in this case12:05
dimiternrogpeppe: we can have back the processStatus(status, info string) for both machines and units12:05
rogpeppedimitern: +112:05
dimiternaren't we missing something? it's easy to do this now, since all the places that use them are easy to find and change, but won't we lose some value out of it?12:06
dimiternfwereade_, rogpeppe ^^ ?12:10
rogpeppedimitern: i don't think so12:10
rogpeppedimitern: but YMMV12:10
dimiternif not, I can add a card for it and do it later today12:10
fwereade_dimitern, describe the negative consequences from your POV12:10
dimiternfwereade_: not sure I see any :)12:10
rogpeppedimitern: we're all agreed then :-)12:10
fwereade_dimitern, then let's do it :)12:10
dimiterncool, card added and assigned to myself12:11
* dimitern lunch12:12
* fwereade_ lunch too12:21
mgzjam: sorry, lunch12:22
jammgz: sure sure, I hope that sandwhich was worth your position... dun dun duuuunnnn12:24
mgzsausage roll special, so it may well have been12:25
allenapjam: Those sausage rolls really are quite special.12:37
mgzjamespage has just finished his12:37
* jamespage yom yom yom12:37
rogpeppedimitern, fwereade_: trivial but important: https://codereview.appspot.com/8661043/13:03
* dimitern is back13:16
dimiternrogpeppe: looking13:17
dimiternrogpeppe: LGTM13:21
rogpeppedimitern: thanks13:21
dimiternrogpeppe: I found a bug in rietveld - if you double click the send button it sends the mail twice :)13:21
rogpeppedimitern: nice13:22
=== wedgwood_away is now known as wedgwood
rvbafwereade_: could I have your opinion on this: https://code.launchpad.net/~rvb/juju-core/providers-import-fwreade-1/+merge/158369 ?  Maybe you'll have a better idea on where to put the import statements.13:47
fwereade_rvba, hmm, why would maas be importing testing?13:49
fwereade_rvba, oh, wait, you have all the tests in-package13:49
fwereade_rvba, that would maybe be the problem?13:50
rvbafwereade_: hum, having the tests in the same package allows us to monkey patch things easily.13:50
fwereade_rvba, hmm, wait13:50
fwereade_rbva, why does cmd/ import environs/all?13:50
fwereade_rbva, individual commands can and should, but I don't think the command-utility package should know about them13:51
rvbafwereade_: well, if we want all the providers registered, it needs to happen in cmd/ right?13:51
fwereade_rvba, do it in the individual command packages -- cmd/juju, cmd/jujud13:52
fwereade_rvba, and only the ones that actually need it13:52
* fwereade_ thinks that should do it13:52
rvbafwereade_: indeed, that will probably fix the import loop as well…13:53
rvbafwereade_: ta13:53
fwereade_rvba, fwiw I ardently support the detailed testing you have in place in maas13:53
fwereade_rvba, but I would really prefer those tests which can be external to be external13:54
fwereade_rvba, are you familiar with export_test?13:54
rvbafwereade_: we have a card for that, we will look in it.13:54
rvbafwereade_: no13:54
fwereade_rvba, ok, files ending in "_test.go" are only compiled when running tests13:54
fwereade_rvba, this means that you can take internal functions and define them in a file called export_test.go, but still in the package rather than the test package13:55
fwereade_rvba, and anything exported in there will be visible to the tests and only the tests13:55
fwereade_rvba, this is useful, and allows for monkey-patching just fine13:55
fwereade_rvba, but it has a sting in the tail13:56
rvbafwereade_: indeed, looks exactly like what we need.13:56
fwereade_rvba, go test ./... does not guarantee that the tested code will build when _test.go files are not compiled in13:56
fwereade_rvba, so you should always check go build ./... as well13:56
rvbafwereade_: all right13:57
dimiternrogpeppe: was it a problem if state imports state/api/params? I want to make statusDoc have the status as params.Status, rather than string13:58
dimiterncan somebody send me a link to the kanban g+?13:58
rogpeppedimitern: no, that's the main point of params13:58
fwereade_dimitern, https://plus.google.com/hangouts/_/539f4239bf2fd8f454b789d64cd7307166bc908313:59
rogpeppedimitern: when the agents aren't using state directly, i think params should be able to be merged into state/api13:59
dimiternrogpeppe: I can simplify the status ops then, getting the fields directly, instead of a doc13:59
dimiternfwereade_: cheers13:59
rogpeppedimitern: i'd use a doc13:59
dimiternrogpeppe: cool then14:00
mgzdo we test juju-core with gccgo at all?14:11
* dimitern wonders: params.Started or params.StatusStarted is better?14:46
* TheMue has to find out why his microphone doesn't work. Already changed the browsers, but that doesn't seem to be the reason.14:46
dimiternfwereade_, rogpeppe: thoughts? ^^14:47
fwereade_dimitern, StatusStarted please14:47
dimiternfwereade_: yeah, I was leaning towards that14:47
fwereade_mgz, not that I'm aware14:48
rogpeppeTheMue: i wonder if all the noise we were hearing from you was from your mike going wrong14:48
rogpeppemgz: i've never used gccgo14:48
rogpeppemgz: it would be great to have a try14:48
rogpeppemgz: i'm not sure how well cgo works, but i suppose it *should* work even better...14:48
TheMuerogpeppe: yeah, maybe, i'm checking right now14:49
dimiternTheMue: if you're using an hdmi external monitor it's worth checking the volume control to see it's using the correct input - i have this problem with skype after g+ every time (not the other way round though)14:50
TheMuedimitern: i'm just using my macbook and using the mic for dictation works fine14:52
TheMuedimitern: would you pass me your skype address for a test?14:53
dimiternTheMue: sure: ultramarine_crystal14:55
TheMuedimitern: Thx, contacted you.14:57
fwereade_rogpeppe, TheMue: btw, did my defence of inline test arrays find any favour?15:10
dimiternfwereade_: I can live with it :)15:10
rogpeppefwereade_: i'm still not keen15:11
rogpeppefwereade_: i don't think the naming is a problem (just name it after the test)15:11
fwereade_rogpeppe, apart from the level of indentation15:11
fwereade_rogpeppe, the naming really is a problem that exists and has the drawbacks I listed15:11
rogpeppefwereade_: and i like knowing that all the data is uninfluenced by the stuff that comes into the function15:11
rogpeppefwereade_: i know that it's got to pass through the logic in the code, but i can scan that easily15:12
fwereade_rogpeppe, the global test arrays have way worse sanity/stability guarantees than the scoped ones15:12
rogpeppefwereade_: really?15:12
fwereade_rogpeppe, yes15:12
fwereade_rogpeppe, they're modifiable from anywhere, and they can be used in multiple test methods15:13
fwereade_rogpeppe, if they're not I cannot think of any reason to use a wider scope than necessary15:13
fwereade_rogpeppe, apart from, as you, said, indentation ;)15:13
fwereade_rogpeppe, I'm willing to pay a tab to banish spooky action at a distance15:14
TheMuefwereade_: due to consistency i personally prefer the tests outside the function, but it's no big reason15:14
rogpeppefwereade_: i don't really mind if they're modifiable from anywhere - it's easy to see if anywhere else has a reference to them15:14
fwereade_rogpeppe, abuse of global test arrays tripped danilos up yesterday :)15:15
rogpeppefwereade_: he won't do it again :-)15:15
fwereade_rogpeppe, he didn;t do it15:15
fwereade_rogpeppe, someone left a landmine for him15:15
rogpeppefwereade_: honestly, i like to see the name of the test next to the logic of the test15:15
rogpeppefwereade_: not separated by 200 lines15:15
rogpeppefwereade_: or 500 in at least one case15:16
fwereade_rogpeppe, but the names only sometimes match the tests15:16
rogpeppefwereade_: we should make them match always15:17
rogpeppefwereade_: that's the convention i've been trying to follow15:17
fwereade_rogpeppe, if there are two suites that have TestFoo, you can't have two fooTests~s15:17
fwereade_rogpeppe, so you either relax that or you say, ok, let's have really redundant names everywhere15:17
rogpeppefwereade_: the moment you have a collision, you get told15:17
rogpeppefwereade_: i don't really see the issue15:18
rogpeppefwereade_: i'm interested to know about the issue danilos had yesterday though15:18
fwereade_rogpeppe, it's a constant low-level irritation and a source of nothing but hassle15:18
* rogpeppe doesn't find it so15:19
fwereade_rogpeppe, someone had made some test depend on otherTestArray[12]15:19
rogpeppefwereade_: oh jeeze15:19
rogpeppefwereade_: that was straight-up crack15:19
fwereade_rogpeppe, clearly we can;t be trusted with sharp tools ;p15:19
rogpeppefwereade_: but shared test data can be very useful15:19
danilosrogpeppe, bzr diff -r1133..1134 :)15:19
fwereade_rogpeppe, sure -- but it should be differentiated from that which is tightly scoped15:19
fwereade_rogpeppe, globals are bad, mmkay?15:20
danilosrogpeppe, basically, there was an array of test configs, and then one of them was overridden in a couple of tests later, and it was referenced as array[12]15:20
rogpeppedanilos: yeah, i remember now15:20
rogpeppefwereade_: i don't think we've got anywhere else like that though15:20
rogpeppefwereade_: if i'd seen it i'd have called it out15:20
rogpeppei don't agree that globals are bad15:20
fwereade_rogpeppe, every global carries a cognitive load15:21
rogpeppefwereade_: i like static data15:21
fwereade_rogpeppe, you may be better at bearing that load than I am15:21
rogpeppefwereade_: i think that a large potentially dynamic array carries its own cognitive load15:21
fwereade_rogpeppe, how are the global vars not dynamic?15:21
rogpeppefwereade_: "is there somewhere in this data that changes at runtime?"15:21
fwereade_rogpeppe, yeah, how do you guarantee that for the global ones?15:22
rogpeppefwereade_: it's trivially easy to verify that they're only referred to once15:22
fwereade_rogpeppe, instead of reading the test, you have to read the whole package15:22
rogpeppefwereade_: grep fooTests *.go15:22
fwereade_rogpeppe, every time you see one, you should check?15:22
fwereade_rogpeppe, or maybe you shouldn;t *have* to15:22
danilosfwereade_, +115:23
rogpeppefwereade_: i think it's easier to check for global references than dynamic data15:23
fwereade_rogpeppe, perhaps there's some case you're concerned about that I'm not seeing15:23
danilosfwereade_, rogpeppe: I'd definitely agree with the notion that test input and expectations should be collocated, ideally in their respective tests15:23
rogpeppefwereade_: i can't think of any other issues we've had that relate to this issue, other than occasionally needing to rename a global variable15:24
rogpeppefwereade_: and i really think that being able to see the test function as a whole is important15:24
fwereade_rogpeppe, the test code is the last block15:24
fwereade_rogpeppe, the definition is the first block15:25
fwereade_rogpeppe, the tests separate the two15:25
fwereade_rogpeppe, putting them all in one places tightens the scope, and the only thing you lose is having the name of the test next to the logic... with the benefit of no longer having an extra global var to worry about15:26
* rogpeppe tries to find the CL where this came up15:27
rogpeppefwereade_: still not keen. i think the second version here is generally easier to read. https://codereview.appspot.com/8604043/diff2/2001:18001/environs/tools/list_test.go15:30
rogpeppefwereade_: it emphasises the difference between the static and moving parts15:30
fwereade_rogpeppe, I dunno, to me the version I landed is strictly worse15:31
rogpeppefwereade_: in the second version, i can read the logic of TestMatch in a glance in the new version. in the old one, the bulky test data interferes.15:32
fwereade_rogpeppe, but the actual context you need is still way off up at the top of the definition, it seems like a straight tradeoff readability-wise... one use becomes easier, another harder15:34
rogpeppefwereade_: in a way, both the test data and the function can be read independently, and i like it that way15:34
fwereade_rogpeppe, if it were a simple matter of readability I wouldn't care, it's just the wanton spamming of globals to no clear benefit... you could separate the test definitions by creating a local var, if you really wanted to15:35
rogpeppefwereade_: the test method definition is already global for most intents15:36
* rogpeppe guesses he perhaps has less of a problem with globals than other people.15:37
fwereade_rogpeppe, it's a method on a type... that's not global in the "global vars are bad" sense15:37
dimiternrogpeppe: can you help me with an obscure failing megawatcher_internal_test ?15:38
rogpeppefwereade_: i really don't see the problem. you read it once. you don't change it. i have *never* had an issue with this.15:38
rogpeppedimitern: sure15:38
rogpeppefwereade_: and i don't wanna lose that indent either - test data often struggles with line length.15:39
fwereade_rogpeppe, every global variable is a vector for spooky action at a distance within your code15:39
rogpeppefwereade_: only if someone else has a reference to it15:39
dimiternrogpeppe: this is the error http://paste.ubuntu.com/5698822/, and the only changes to the code from trunk are params.Unit/Machine* (status) -> params.Status*15:39
rogpeppefwereade_: and in this kind of case, it's trivial to check that noone does.15:39
fwereade_rogpeppe, every single time?15:40
dimiternrogpeppe: it seems fishy somehow - in setUp the unit is added and set to started, but it's seen as in error state15:40
dimiternrogpeppe: I tried getting the status right after the set, asserting it's the same, calling Sync() or StartSync() before or after SetStatus(), calling SetStatus() twice (!) - all without problems, and the same failure15:41
fwereade_rogpeppe, every global variable costs a grep per developer per change to the source code ;p15:41
fwereade_rogpeppe, assuming nobody ever abuses them and causes the devlopers to actually think about it15:42
dimiternrogpeppe: it seems the change it sees is for a different unit15:42
rogpeppefwereade_: you only write the tests once. they're read 100 times.15:42
mrammso, here is my summary of the versioning situation:15:42
rogpeppedimitern: is this trunk, or your branch?15:42
fwereade_rogpeppe, no argument there15:43
fwereade_rogpeppe, anyway, I'll take it to the lists15:43
mrammI will send something like that to the list in an hour or so, so if you have feedback let me know between now and then15:43
dimiternrogpeppe: my branch, but as i said the only changes are replacing params.(Unit|Machine)(\w+) with params.Status\115:44
rogpeppedimitern: ah, i know the problem15:44
rogpeppedimitern: you've changed the statusDoc definition, yes?15:44
dimiternrogpeppe: and the types of the Status field in Unit/Machine info to params.Status, instead of params.Unit/MachineStatus15:44
dimiternrogpeppe: yes?15:45
rogpeppedimitern: hmm, actually, no.15:45
rogpeppedimitern: could you push your branch and i'll take a look15:45
dimiternrogpeppe: sure, that's the only thing left before proposing15:46
dimiternrogpeppe: lp:~dimitern/juju-core/031-unify-machine-unit-status-types15:47
rogpeppedimitern: looking15:48
rogpeppedimitern: ha, i see your problem!15:51
rogpeppedimitern: you broke backingStatus.updated15:52
dimiternmramm: great summary, thanks - only two comments: 1) "...the command line interface, the mongo..." -> "...the command line interface or output, mongo..."; 2) "...of these three api's..." -> "...of these three things(?)..." (only the last one is an API)15:52
dimiternrogpeppe: oh? how?15:52
rogpeppedimitern: case doesn't work like in C15:52
mrammwell they are effectively API's15:52
mrammsince people program against them15:52
dimiternmramm: just suggestions :)15:53
rogpeppedimitern: you need to revert to the earlier version and just remove the type conversions.15:53
rogpeppedimitern: (earlier version of that function, that is)15:53
dimiternrogpeppe: can you point me to the location please?15:54
rogpeppedimitern: line 21615:54
rogpeppedimitern: what is that case supposed to be doing?15:54
mrammdimitern: I will clarify the API bit15:55
dimiternrogpeppe: line 216 is var allWatcherChangedTests15:55
rogpeppedimitern: megawatcher.go:21615:55
rogpeppedimitern: the tests are still fine15:55
rogpeppedimitern: you just broke the code itself :-)15:55
rogpeppedimitern: so it was right that the tests failed...15:56
dimiternrogpeppe: ah, well, since they're essentially the same I wanted to unify them15:56
rogpeppedimitern: you can't15:56
rogpeppedimitern: unless...15:56
dimiternrogpeppe: ewww.. ok, I'll revert it back15:56
rogpeppedimitern: you kinda could, but it wouldn't be worth it15:56
dimiternrogpeppe: isn't that supposed to handle both cases like this, or I'm missing something?15:57
rogpeppedimitern: you can't have a variable that's two types at once unless it's an interface15:57
rogpeppedimitern: no15:57
rogpeppedimitern: cases don't fall through in Go15:57
rogpeppedimitern: (that's why we don't put a "break" after each one15:57
dimiternrogpeppe: oh! they're defined there, right!15:57
rogpeppedimitern: you can separate possible values with commas though15:57
dimiternrogpeppe: how about case *params.UnitInfo, *params.MachineInfo: ?15:58
rogpeppedimitern: that's valid syntax15:58
rogpeppedimitern: but wouldn't work15:58
rogpeppedimitern: because then the type of info would be the type of info015:58
dimiternrogpeppe: I see now, ok, sorry about the noise - i'll revert them15:58
rogpeppe[16:57:17] <rogpeppe> dimitern: you can't have a variable that's two types at once unless it's an interface15:58
rogpeppedimitern: if you defined a setStatus function on the two types, then you could unify the cases15:59
rogpeppedimitern: but you'd need a generic way of making a copy too15:59
dimiternrogpeppe: I don't want to make this otherwise almost trivial CL into a more complicated one16:00
dimiternrogpeppe: agreed?16:00
rogpeppedimitern: indeed. i don't believe it's worth it.16:00
rogpeppedimitern: i had two cases there before. you can have two cases there still...16:01
dimiternrogpeppe: sure16:02
dimiternrogpeppe: although before they looked a bit more different16:02
rogpeppedimitern: agreed16:02
rogpeppedimitern: but they're still generating quite different assembly code16:02
benjirogpeppe: after a fresh bootstrap I'm getting "error: no reachable servers" when trying "juju status"16:02
rogpeppebenji: your bootstrap has probably failed for some reason16:03
rogpeppebenji: try juju status --debug; that will show you the bootstrap node's address; then ssh to that node and see what /var/log/cloud-init-output.log has in it16:03
dimiternrogpeppe: indeed - easy to overlook that though16:04
rogpeppebenji: i always like to know what failure modes are around16:04
benjirogpeppe: will do (it will take a second because I have destroyed the environment to test something, I'll re-bootstrap and dig around)16:05
dimiternfwereade_, rogpeppe: unified status for machines/units: https://codereview.appspot.com/8667043/ - mostly mechanical replacing16:05
rogpeppebenji: does it happen every time?16:05
benjirogpeppe: for the last 3 or so times it has16:05
rogpeppebenji: ah, ok, that's "good" :-)16:05
benjiooh, a new error when bootstrapping: error: cannot save state: cannot write file "provider-state" to control bucket: remote error: handshake failure16:06
rogpeppebenji: i have a recommendation for that one16:06
rogpeppebenji: if you cd to $GOPATH/src/launchpad.net/goamz, what does bzr revno tell you?16:07
benjirogpeppe: 3516:07
rogpeppebenji: hmm, darn16:08
rogpeppebenji: oh i see16:09
rogpeppebenji: goamz/s3 has auto retry logic for Get but not Put16:09
rogpeppebenji: that's an unfortunate transient error16:09
benjiso I should rebootstrap16:09
rogpeppebenji: yeah16:09
rogpeppebenji: i'll try to propose a change to goamz/s3 to help16:10
benjirogpeppe: error: cannot save state: cannot write file "provider-state" to control bucket: read tcp connection reset by peer16:17
rogpeppebenji: sounds like a similar issue16:18
rogpeppebenji: s3 is horribly flaky16:18
rogpeppebenji: try again...16:18
benjiwill do16:18
rogpeppeniemeyer: ping16:21
niemeyerrogpeppe: pong16:21
rogpeppeniemeyer: i'd like to fix s3 so it's resilient in the face of transient Put errors16:21
rogpeppeniemeyer: i have a proposal i'd like to run by you16:21
rogpeppeniemeyer: it goes something like this: http://paste.ubuntu.com/5698946/16:21
niemeyerrogpeppe: I have some ideas on how to do that already, but they require changing the API, which I think it's necessary either way16:21
niemeyerrogpeppe: There are other things to be fixed in that API, and I'd like to do that all at once16:22
rogpeppeniemeyer: ok. could we apply this band-aid in the meantime. (see benji's woes above)16:22
niemeyerrogpeppe: Can't we apply a band-aid in juju-core instead?16:23
rogpeppeniemeyer: ok. i'd hoped to avoid duplicating the shouldRetry logic, but i guess not.16:23
niemeyerrogpeppe: I'd rather not introduce yet another way to put a file just to break the API soon16:23
rogpeppeniemeyer: i haven't introduced anything new16:23
rogpeppeniemeyer: just changed PutReader so if you *happen* to give it a ReadSeeker, it'll retry16:24
niemeyerrogpeppe: Even worse.. that's breaking the API16:24
rogpeppeniemeyer: not badly. there's no guarantee of how much data PutReader will read.16:25
rogpeppeniemeyer: though i suppose you might give it a reader already half-way through.16:25
rogpeppeniemeyer: hmm, yeah16:25
rogpeppeniemeyer: will band-aid in juju16:25
niemeyerrogpeppe: Thanks.. I'm hoping it won't be so long until we fix this16:26
niemeyerrogpeppe: The new API will look a lot better, and enable other things that are simply impossible ATM16:26
* rogpeppe lives in ope16:26
rogpeppeniemeyer: great16:26
rogpeppeniemeyer: it would be nice to see CopyObject too, i just wanted it today16:27
niemeyerrogpeppe: Hah, yeah, I thought about that after your message16:27
niemeyerrogpeppe: Actually, I should answer it16:27
rogpeppeniemeyer: which message?16:27
niemeyerrogpeppe: "A few issues"16:28
rogpeppeniemeyer: ah yes!16:29
rogpeppeniemeyer: that must've been what prompted me to go looking for it - i'd forgotten!16:29
niemeyerrogpeppe: I thougth about the copying too, but scratched the reply because it's S3-specific16:29
niemeyerrogpeppe: The suggestion I just gave isn't, though16:30
rogpeppeniemeyer: we've already unified the binaries16:30
niemeyerrogpeppe: So it's a single one?16:31
rogpeppeniemeyer: yup16:31
niemeyerrogpeppe: Brilliant!16:31
rogpeppeniemeyer: still takes a minute to upload though16:31
niemeyerrogpeppe: Wow.. really? I don't think it takes a minute even for me16:31
rogpeppeniemeyer: shitty rural internet bandwidth16:31
niemeyerrogpeppe: Ah, okay16:32
rogpeppeniemeyer: the ADSL is very "A"16:32
dimiternfwereade_: reviewed16:38
dimiternrogpeppe: https://codereview.appspot.com/8667043/?16:39
rogpeppedimitern: looking16:39
benjirogpeppe: https://pastebin.canonical.com/88970/16:43
rogpeppedimitern: reviewed16:44
rogpeppebenji: looking16:44
dimiternrogpeppe: cheers16:44
rogpeppebenji: ha, looks like a problem with dave cheney's new mongo db packaging branch16:45
rogpeppewho here knows about dpkg issues?16:46
rogpeppemgz: ^16:46
rogpeppesee line 308 and after on benji's paste above: https://pastebin.canonical.com/88970/16:47
rogpeppebenji: is this in ec2?16:48
rogpeppebenji: this was with --upload-tools?16:50
benjirogpeppe: yep; this is the command I ran: juju bootstrap --fake-series precise --upload-tools16:50
rogpeppebenji: and you're running on quantal?16:50
mgzprobably borked in quantal16:51
mgzI'd expect that to work in raring16:51
rogpeppemgz: that what i'm thinking16:51
rogpeppeoh rats16:51
rogpeppeback to the tarball we go16:51
mgzI don't think I have a quantal vm up right now, but can spin one up quickly16:51
rogpeppebenji: there's a (sigh) easy fix for you just to get things working. change the CurrentSeries function in the version package to just return "precise"16:53
mgzhas any of the backport work for mongo actually happened yet?16:53
rogpeppemgz: what backport work?16:53
benjirogpeppe: ok, let me give that a try16:53
mgzwell, this is never going to work till we have mongo 2.2 with ssl in something usable by the pre-raring series16:55
rogpeppemgz: it works in precise, i think16:56
mgzwe have a the ppa...16:57
mgzI guess the issue i the boost versioning somehow17:00
* rogpeppe has never used boost but loves it anyway17:02
benjirogpeppe: I get this error when bootstrapping with the hard-coded precise: error: cannot find tools: no compatible tools found17:02
benjishould I still be using --fake-series precise and --upload-tools17:02
rogpeppebenji: what does juju version print?17:03
benjirogpeppe: 1.9.14-quantal-amd6417:03
rogpeppebenji: looks like you're not using the version you just changed17:03
benjihmm, I did a "go build -a launchpad.net/juju-core/..." is there some other rebuild command I should have used?17:04
rogpeppebenji: go install launchpad.net/juju-core/...17:04
rogpeppebenji: go build just checks; it doesn't affect anything17:04
benjiok, we have "1.9.14-precise-amd64" now, re-bootstrapping17:05
benjirogpeppe: "juju bootstrap --fake-series precise --upload-tools" still gives me "error: cannot find tools: no compatible tools found"17:07
rogpeppebenji: darn17:07
rogpeppebenji: try adding --debug to that. what does it print?17:08
benjirogpeppe: http://paste.ubuntu.com/5699072/17:09
rogpeppebenji: line 6 is weird17:10
rogpeppebenji: could you paste your changed version of version.go, please?17:10
benjirogpeppe: here's the diff http://paste.ubuntu.com/5699080/ Do you want the whole file too?17:11
rogpeppebenji: no, that's fine17:12
rogpeppebenji: erm17:12
* rogpeppe is slightly baffled17:12
rogpeppebenji: ah!17:14
rogpeppebenji: you need to change default-series in your environments.yaml too17:14
benjirogpeppe: trying17:15
rogpeppefwereade_: looks like the upload-tools logic has been broken17:15
rogpeppefwereade_: upload-tools should override default-series, i think17:16
benjiit is looking better17:18
benjirogpeppe: all working; much thanks for your help17:24
rogpeppebenji: np. it illuminated a bug, so it's all for the best.17:25
dimiternlast CL for today, trivial: https://codereview.appspot.com/8674043/17:53
dimiternniemeyer: you'll like that :) ^^17:53
niemeyerdimitern: Woohay!17:54
dimiternrogpeppe, fwereade_: any one of you wanna take a look as well? ^^17:56
rogpeppei just took a look at a single machine-0.log file17:58
rogpeppethe environment had been booted for a couple of hours17:58
rogpeppeit's 23MB17:58
rogpeppe148643 lines17:58
dimiternrogpeppe: that's with debug on?17:58
rogpeppedimitern: yeah17:58
dimiternrogpeppe: yeah.. not that surprising17:59
rogpeppedimitern: 4.3MB even with debug off though17:59
rogpeppedimitern: (53098 lines)17:59
rogpeppedimitern: that's not really that sustainable17:59
dimiternrogpeppe: we need logrotate + zipping18:00
rogpeppedimitern: total time elapsed since log file started: 2h23m18:00
dimiternrogpeppe: the logs are full of duplicated stuff, so should compress well18:01
rogpeppedimitern: with debug on, 1.2MB compressed; with debug off, 177K18:01
dimiternrogpeppe: that's not that bad18:02
dimiternrogpeppe: although... for 24h that's like 2MB compressed18:02
rogpeppedimitern: exactly18:02
dimiternrogpeppe: we can implement some smart off loading to the bucket + logrotate and compression18:03
* rogpeppe is glad he has a command line interface that can easily cope with a 23MB history18:04
dimiternrogpeppe: btw have a look at the log deuglifying CL when you have 2m18:05
rogpeppedimitern: reviewed18:06
dimiternrogpeppe: the card said only do the main packages for now18:06
dimiternrogpeppe: thanks18:06
rogpeppedimitern: i'm pretty sure it was intended to take JUJU out too18:06
dimiternfwereade_: ping18:06
rogpeppedimitern: let me check18:06
rogpeppedimitern: from william's email:18:08
rogpeppe> 1) Drop package badging from log calls in "main" packages 2) Drop18:08
rogpeppe> the JUJU:... badging across the board18:08
dimiternrogpeppe: ah, you're right the card says both, but in an earlier talk with fwereade_ he said not to do the JUJU stuff for now, IIRC18:08
dimiterni'll leave it hanging for now then, until we resolve what to do exactly18:09
rogpeppedimitern: oh go on, please do :-)18:09
dimiternrogpeppe: I want to, but not today :)18:09
rogpeppedimitern: ok then18:09
dimiternrogpeppe: i have some meatballs and beer waiting18:09
rogpeppedimitern: time for one more? https://codereview.appspot.com/854504518:10
dimiternrogpeppe: will look in a bit18:10
rogpeppedimitern: hmm, the SetProvisioned logic has broken this environment18:14
dimiternrogpeppe: oh?18:14
rogpeppedimitern: the machine agent is repeatedly doing this: http://paste.ubuntu.com/5699268/18:15
dimiternrogpeppe: do you have more context where the instance was started?18:16
rogpeppedimitern: it was started just there18:18
dimiternrogpeppe: oh, sorry, yes18:18
dimiternrogpeppe: but how about the machiner log?18:18
rogpeppedimitern: here's the first stuff we hear from the provisioner: http://paste.ubuntu.com/5699287/18:18
dimiternrogpeppe: what does status report?18:19
dimiternrogpeppe: reviewed btw18:19
rogpeppedimitern: http://paste.ubuntu.com/5699292/18:20
dimiternrogpeppe: yeah.. as expected18:20
rogpeppedimitern: there are two machines because when the first one failed, i removed the service and tried to remove the machine18:20
rogpeppedimitern: why can't it set the instance id when there's nothing reported by status for inst id ?18:20
dimiternrogpeppe: really weird though.. I tested this both with tests and with live instances, several times - no problems18:20
rogpeppedimitern: how does it judge "already set"?18:21
dimiternrogpeppe: Assert: append(isAliveDoc, notSetYet...),18:22
dimiternrogpeppe: notSetYet := D{{"instanceid", ""}, {"nonce", ""}}18:22
fwereade_dimitern, I did not intend to say we should keep the JUJU badging18:23
fwereade_dimitern, nobody in the whole world likes the JUJU badging AFAIK ;)18:24
mgzI believe the characters JUJU should appear whereever JUJU possible18:24
fwereade_mgz, that's JUJU crazy talk18:25
dimiternrogpeppe: are you sure the tools the env was bootstrapped with include all my latest CLs?18:27
fwereade_dimitern, I would hope that each one of them would work in order ;p18:28
dimiternrogpeppe: I can't see the agent-state being set in status, and where set it says "running", which is wrong, it should be started18:28
dimiternrogpeppe: I removed that case18:28
rogpeppedimitern: i'm not sure18:28
dimiternfwereade_: so is it good like this?18:28
rogpeppedimitern: but would that impact this bug?18:28
dimiternrogpeppe: checking..18:28
rogpeppedimitern: i just retrieved the value of the Machine:18:29
rogpeppe&state.Machine{st:(*state.State)(0xc200335d10), doc:state.machineDoc{Id:"1", Nonce:"", Series:"precise", InstanceId:"", Principals:[]string{}, Life:1, Tools:(*state.Tools)(nil), TxnRevno:4, Jobs:[]state.MachineJob{1}, PasswordHash:""}, annotator:state.annotator{globalKey:"m#1", tag:"machine-1", st:(*state.State)(0xc200335d10)}}18:29
fwereade_rogpeppe, it's not possible the cli tools have a funny version, is it?18:30
fwereade_dimitern, we need to lose the JUJU badging18:30
fwereade_dimitern, I'm sorry, I clearly hideously miscommunicated18:30
dimiternfwereade_: ok, so I leave it WIP for now and finish it tomorrow18:30
fwereade_dimitern, sgtm18:31
fwereade_dimitern, wipped18:31
dimiternrogpeppe: this confirms it - it's using tools from before the nonce was generated18:31
dimiternrogpeppe: probably even before startinstance was respecting the passed nonce18:31
rogpeppedimitern: but...18:32
rogpeppedimitern: i just tried calling SetProvisioned directly from my client connection18:32
rogpeppedimitern: and it failed saying "already set"18:32
rogpeppedimitern: even though InstanceId and Nonce are both empty18:32
dimiternrogpeppe: there was a lurking bug in there, initially, which I fixed afterwards18:32
rogpeppedimitern: i still see that issue with the latest version of trunk18:34
rogpeppedimitern: that is, this code: http://paste.ubuntu.com/5699342/18:34
dimiternrogpeppe: ok then, that's good, because I don't18:34
rogpeppedimitern: produces this output:18:34
rogpeppe&state.Machine{st:(*state.State)(0xc20032cdc0), doc:state.machineDoc{Id:"2", Nonce:"", Series:"precise", InstanceId:"", Principals:[]string{"buildbot-master/1"}, Life:0, Tools:(*state.Tools)(nil), TxnRevno:2, Jobs:[]state.MachineJob{1}, PasswordHash:""}, annotator:state.annotator{globalKey:"m#2", tag:"machine-2", st:(*state.State)(0xc20032cdc0)}}18:35
rogpeppe2013/04/11 19:33:57 set prov: cannot set instance id of machine "2": already set18:35
rogpeppedimitern: i'm not sure how that could happen, regardless of what's out there in the cloud18:35
dimiternrogpeppe: file a bug then please, I'll dig into it tomorrow, if I can reproduce it18:36
rogpeppedimitern: i have the environment online now...18:36
rogpeppedimitern: i can leave it until the morning if you like18:37
dimiternrogpeppe: can I access it?18:37
dimiternrogpeppe: so I can debug the code in place?18:37
rogpeppedimitern: hmm, let me think18:37
dimiternrogpeppe: actually, can you try adding some logging into SetProvisioned18:40
rogpeppedimitern: sure18:40
rogpeppedimitern: i just sent you a PM on canonical IRC18:40
dimiternrogpeppe: log the exact error on Run18:40
dimiternrogpeppe: ok, I'll try it now18:41
rogpeppedimitern: it must be ErrAborted18:41
dimiternrogpeppe: ok, let's think aloud18:41
dimiternrogpeppe: indeed it has to be, otherwise it'll be caught and reported earlier18:41
rogpeppedimitern: yup18:42
dimiternrogpeppe: this means either assert failed, and since we're checking for alive before that, it has to be the other assert, right?18:42
dimiternrogpeppe: no other case that I can see, AFAIU state/mgo transactions18:43
rogpeppedimitern: i assume the composition for AND conjunction works, but i don't *know*18:43
dimiternniemeyer: ping18:44
niemeyerdimitern: Heya18:44
dimiternniemeyer: hey, can you please take a look at this code: http://paste.ubuntu.com/5699363/18:44
rogpeppedimitern: i've gotta go in a few moments18:44
niemeyerdimitern: Sure.. what should I be looking for?18:45
dimiternniemeyer: and reading a bit further up the log, tell me if i'm correct18:45
niemeyerdimitern: Can you be a bit more specific?18:45
dimiternniemeyer: so we're seeing "already set" error being reported from this method18:45
dimiternniemeyer: and in state both instanceid and nonce are empty for that machine18:45
rogpeppedimitern: ok, so without the asserts it did succeed.18:46
dimiternniemeyer: so the asserts should work fine18:46
dimiternniemeyer: but somehow it aborts - can it abort for something other than a failed assert?18:46
niemeyerdimitern: No18:46
rogpeppedimitern: ah...18:47
niemeyerdimitern: Are you sure these values are present and empty?18:47
rogpeppedimitern: i think i know what's going on18:47
dimiternrogpeppe: without both or without only notSetYet?18:47
rogpeppeniemeyer: that's the issue18:47
niemeyerrogpeppe: Cool, bingo18:47
niemeyerdimitern: "not set" != "empty"18:47
dimiternniemeyer: rogpeppe connected to the state server and extracted the machine: &state.Machine{st*state.State)(0xc20032cdc0), doctate.machineDoc{Id:"2", Nonce:"", Series:"precise", InstanceId:"", Principals]string{"buildbot-master/1"}, Life:0, Tools*state.Tools)(nil), TxnRevno:2, Jobs]state.MachineJob{1}, PasswordHash:""}, annotatortate.annotator{globalKey:"m#2", tag:"machine-2", st*state.State)(0xc20032cdc0)}}18:48
rogpeppedimitern: so it is a compat issue after all18:48
rogpeppedimitern: i haveta go18:48
rogpeppesee you tomorrow18:48
dimiternrogpeppe: see you18:48
dimiternniemeyer: can you explain please, because i didn't get it18:48
dimiternniemeyer: the doc is there, so they should be set to empty, right?18:49
niemeyerdimitern: MongoDB documents may have an empty field ({"nonce": ""}), and it may also have a non-existent field ({}). Those aren't the same thing.18:49
dimiternniemeyer: is mongo ignoring empty string fields when you insert a doc?18:50
niemeyerdimitern: No18:50
niemeyerdimitern: But you can do that in several ways from code18:50
dimiternniemeyer: because the code that adds the machine is ..18:50
dimiternniemeyer: state.addMachine, which inserts a machine doc, setting only Id and Life18:51
dimiternniemeyer: others, by the virtue of being uninitialized string fields, should be set to empty string, no?18:52
dimiternniemeyer: no, actually the code is like this: http://paste.ubuntu.com/5699387/ and they should be set explicitly18:53
niemeyerdimitern: Just load the document from the database before running code that is failing18:54
niemeyerdimitern: and print it18:54
niemeyerdimitern: Into a map18:54
dimiternniemeyer: good idea, but how?18:55
niemeyervar m map[string]interface{}18:55
niemeyererr := collection.FindId(id).One(&m)18:55
dimiternniemeyer: no, I mean I just do st.machines.FindId(m.Id()).One(&map) ?18:56
niemeyerif err != nil { return err }18:56
dimiternah, ok, 10x18:56
niemeyerdimitern: I think that's what I've just said, yeah :)18:56
dimiternniemeyer: indeed, thanks for the help18:56
niemeyerdimitern: np, let me know what it shows18:56
dimiternniemeyer: unfortunately, I cannot access it, I tried, but it's rogpeppe's environment and despite his comment the aws keys shouldn't matter, they do - i cannot use mine19:06
niemeyerdimitern: Well, they don't matter much at least19:07
niemeyerdimitern: Do you have ssh access to it?19:07
dimiternniemeyer: ah, let me try that19:07
dimiternniemeyer: same problem - perm denied19:08
niemeyerdimitern: Sure, but you have the whole data at your hand19:08
dimiternniemeyer: it's not a shared account or anything19:08
niemeyerdimitern: Just connect to the database with mongo and do the same query19:08
niemeyerdimitern: mongo localhost:<whatever port>19:08
niemeyerdimitern: use juju19:08
dimiternniemeyer: I can't access the mongo there in rog's environment19:09
niemeyerdimitern: db.machines.find({_id: <the id>})19:09
niemeyerdimitern: Hmm.. why?19:09
dimiternniemeyer: ssh is not working (my key is different)19:09
niemeyerdimitern: Oh, okay.. huh19:09
niemeyerdimitern: How come Roger assumed you could access it?19:09
dimiternniemeyer: :) probably he's tired19:09
niemeyerdimitern: It's a bit of a weird idea if you have no keys whatsoever :-)19:10
dimiternexactly :)19:10
dimiternanyway, I'm tired too, so have a good evening all!19:10
niemeyerdimitern: Either way.. Roger said "that's the issue"19:10
niemeyerdimitern: So I assume he checked it19:11
dimiternniemeyer: yeah, hope he remembers :)19:11
dimiternniemeyer: thanks again, if the issue is reproducible tomorrow, I'll try what you suggested19:12
niemeyerdimitern: indeed :)19:12
niemeyerdimitern: np.. I'm pretty sure it's an issue with the document19:12
niemeyerdimitern: the code path for such a trivial assertion was exercised enough, I'd hope19:12
dimiternniemeyer: yeah, mongo keeps surprising me here and there19:12
niemeyerdimitern: What kind of surprise did you have so far?19:13
dimiternniemeyer: syntax mostly - it's not always trivial to translate from mongo docs into D{{}} things19:13
niemeyerdimitern: Hmm19:14
niemeyerdimitern: It's actually 1-to-1.. !?19:14
dimiternniemeyer: probably, but haven't got the hang of it yet - still try to find similar examples in the code and adapt19:14
niemeyerdimitern: It's really 1-to-119:15
dimiternniemeyer: the nested {{}} and sometimes []D{{}} are not helping :) but i'm learning19:15
niemeyerdimitern: yeah, the visuals may get confusing19:15
niemeyerdimitern: Note that this is an optimization19:15
niemeyerdimitern: For non-important code paths, you can use maps, which look a lot better19:15
dimiternniemeyer: how?19:15
niemeyerdimitern: {"foo": "bar"} in the mongo shell is M{"foo": "bar"}19:16
niemeyerdimitern: So the overhead is a single char :)19:16
dimiternniemeyer: and M is map[string]interface{} ?19:16
niemeyerdimitern: assuming a M is bson.M or you own map[string]interface{}19:16
niemeyerdimitern: yeah19:16
niemeyerdimitern: You can define your local type whenever you feel like it19:16
dimiternniemeyer: this sheds more light on it, actually19:17
niemeyerdimitern: type m map[string]interface{}.. m{"foo": "bar"}19:17
dimiternniemeyer: yeah, i did that in some places, esp. nested maps like config attrs19:17
niemeyerdimitern: there's zero support for the bson.M type, specifically19:17
niemeyerdimitern: It's just a map19:17
dimiternniemeyer: it's not bad, it's just confusing at first to see the go equivalent and parse it visually19:18
dimiternniemeyer: but I agree it's the shortest workaround possible in go probably19:19
niemeyerdimitern: Right.. we need at least one char there19:20
niemeyerdimitern: Which doesn't feel so bad :)19:20
dimiternniemeyer: indeed19:22
mgzhey thumper21:18
fwereade_thumper, heyhey22:16
thumperhi fwereade_22:16
fwereade_thumper, can you think of any reason to upgrade juju with --upload-tools *without* bumping the build number?22:16
thumperfwereade_: I've not looked into it too deeply, but my first thought was, no, bumping the build number sounds essential with upload-tools22:17
thumperthe only time you wouldn't22:17
thumperis if you have updated the version number yourself since your last upload22:17
fwereade_thumper, *and* there are no tools in the bucket with a matching m.m.p that need to be superseded22:18
fwereade_thumper, cool, thanks22:19
=== wedgwood is now known as wedgwood_away
davecheneymramm: ping23:58

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!