/srv/irclogs.ubuntu.com/2013/11/26/#juju-dev.txt

wallyworld_	davecheney: question for you	01:33
davecheney	wallyworld_: shoot	01:33
wallyworld_	on trunk, i try this: juju set-env some-bool-field=true	01:34
wallyworld_	it fails	01:34
wallyworld_	expected bool, got string("true")	01:34
davecheney	o_o	01:34
wallyworld_	have you seen that?	01:34
davecheney	i haven't use that command	01:35
davecheney	certainly never with bool fields	01:35
davecheney	do we even support them ?	01:35
davecheney	which charm ?	01:35
wallyworld_	there's code there to parse a string to bool, but it appears to not be called at the right place	01:35
wallyworld_	this is setting an env config value	01:35
davecheney	ahh	01:35
davecheney	i bet nobody ever tried	01:35
davecheney	cf. the horror show that is the environment config	01:35
davecheney	and updating it after the fact	01:35
wallyworld_	yeah, appears so :-(	01:36
davecheney	time for a bug report	01:36
wallyworld_	or it could be fallout from moving to api	01:36
davecheney	could be	01:36
davecheney	the only bool env field I know of is	01:36
davecheney	use-ssl	01:36
wallyworld_	thanks, just wanted to check before raising a bug	01:36
davecheney	or the use-insecure-ssl	01:36
davecheney	i think you've got a live one	01:36
wallyworld_	there's also development	01:36
wallyworld_	]and a new one i am doing	01:36
wallyworld_	provisioner-safe-mode	01:36
wallyworld_	which will tell provisioner not to kill unknown instances	01:37
davecheney	wallyworld_: i think nobody has ever tried to change a boolean env field after deployment	01:37
wallyworld_	:-(	01:37
davecheney	we've only even had that use insecure ssl one and you need that to be set for bootstrapping your openstack env	01:38
wallyworld_	ok, ta. bug time then	01:38
davecheney	sinzui: any word on 1.16.5 / 1.17.0 ?	02:26
jam	wallyworld_, axw: if you're unable to get into garage MaaS, you could probably ask bigjools nicely if you can use his equipment.	02:47
wallyworld_	jam: he's been busy supporting site	02:48
wallyworld_	and only has small micro servers	02:48
jam	axw: from what I inferred when Nate got access, it was essentially smoser just "ssh-import-id nate.finch" as the shared user on that machine.	02:48
jam	wallyworld_: sure, but I don't think we're testing scaling, just that the backup restore we've put together works	02:49
jam	w/ MaaS	02:49
wallyworld_	sure, but we need at least 2 virtual instances, not sure how well that will be handled	02:49
jam	wallyworld_: well, I wasn't suggesting using VMs on top of his MaaS, just using the MaaS	02:50
bigjools	wallyworld_ you already have access	02:51
wallyworld_	yes i do. i was waiting for your on site support effortd to wind down	02:51
bigjools	consider it down	02:51
wallyworld_	you seemed stressed enough, didn;t want to add to it	02:51
jam	hi bigjools	02:51
bigjools	when the guy you're helping f*cks off mid-help, I consider it done.	02:51
jam	ouch	02:51
axw	:/	02:52
bigjools	wallyworld_: you could come round as well if you want direct access	02:52
wallyworld_	so i'm currently working on one of the critical bugs	02:52
jam	bigjools: so do you know someone who already has Garage MaaS access to the shared user? From what I can tell the actual way you get added is by adding your ssh key to the "shared" account	02:52
wallyworld_	was hoping to get that done before i looked at the restore doc	02:52
jam	"needing to be in the group" seems like a red herring	02:53
bigjools	jam: I have access, want me to add anyone?	02:53
wallyworld_	me and axw :-)	02:53
axw	me please	02:53
jam	bigjools: axw, wallyworld_, and ?	02:53
bigjools	heh	02:53
bigjools	lp ids please	02:53
wallyworld_	wallyworld	02:53
axw	~axwalk	02:53
jam	bigjools: I haven't done the other steps, but ~jameinel is probably good for my long term health	02:53
bigjools	ok you're all in	02:55
bigjools	I am having a lunch break, if you need me wallyworld_ can just call me	02:56
axw	hooray. thanks bigjools	02:56
bigjools	np	02:56
jam	axw: the other bit that I've seen, is that you might have a *.mallards.com line in your .ssh/config with your normal user, but you need to still use the other User shared	02:57
jam	if the *.mallards line comes first, it overrides the individual stanza	02:58
axw	jam: I explcitly tried logging in as shared@	02:58
wallyworld_	i can ssh in now	02:58
axw	it works for me now too	02:58
jam	axw: so I think you have to be in the iom-maas to get into loquat.canonical.com, but to get into maas.mallards you just get added to the shared account	02:59
axw	that would seem to be the case	03:00
jam	axw: as in, I'm trying and can't get to loquat	03:00
jam	axw: can you update the wiki?	03:00
axw	jam: right, you need to get IS to do that	03:00
jam	I would get rid of the "host maas.mallards" line in favor of the *.mallards line	03:00
axw	jam: sure - "step 3: ask bigjools to add you to the shared account"? ;)	03:00
jam	axw: ask someone who has access to run "ssh-import-id $LPUSERNAME"	03:01
jam	as the shared user	03:01
jam	axw: Hopefully we can make it a big enough warning for IS people to realize they aren't managing that acccount	03:01
jam	thanks for setting them up bigjools	03:03
* jam is off to take my son to school		03:04
davecheney	ping -> https://code.launchpad.net/~dave-cheney/goose/001-move-gccgo-specific-code-to-individual/+merge/196643	03:50
davecheney	axw: thanks for the review	03:55
davecheney	now I can close this issue	03:55
axw	nps	03:55
jam	davecheney: you forgot to set a commit message on lp:~dave-cheney/goose/goose	04:04
jam	https://code.launchpad.net/~dave-cheney/goose/goose/+merge/196471	04:04
jam	I'll put one in there	04:04
jam	it should get picked up in 1 min	04:05
jam	and then you can approve your above branch	04:05
davecheney	ah	04:06
davecheney	thanks	04:06
davecheney	i was wondering whta was going on	04:06
davecheney	i didn't realise you added the commit message for me	04:06
* axw froths at the mouth a little bit		04:15
axw	wtf is going on with garage maas	04:15
=== philipballew is now known as philip
jam	axw: isn't maas server supposed to be localhost given Nate's instructions?	05:30
jam	You're generally supposed to be running a KVM (virtual MaaS) system just on one of the nodes	05:30
jam	in Garage Maas	05:31
jam	the main reason we use g-MaaS is because the nodes there have KVM extensions and are set up for it	05:31
jam	in theory you could do it on your personal machine	05:31
axw	jam: maas-server gets inherited by the nodes	05:31
axw	they'll just try to contact whatever you put in there	05:32
axw	(e.g. localhost)	05:32
axw	you need to put in an absolute address	05:32
jam	axw: ah, sure. So 10.* whatever, but not 'localhost'	05:33
axw	yup	05:33
jam	axw: I can imagine that maybe bootstrap works, or some small set of things, but then it doesn't actually work together	05:33
axw	seems like the provider should be able to figure it out itself, but I dunno the specifics	05:33
axw	jam: bootstrap doesn't even work- the node comes up, but the cloud-init script tries to grab tools from localhost	05:34
jam	axw: well "juju bootstrap" pre-synchronous works, right? Just nothing else does :)	05:35
jam	"the command runs and exits cleanly"	05:35
axw	yes :)	05:35
jam	wallyworld_: how's bug #1254729 coming?	06:02
_mup_	Bug #1254729: Update Juju to make a "safe mode" for the provisioner <juju-core:In Progress by wallyworld> <https://launchpad.net/bugs/1254729>	06:02
davecheney	jam: we hit small bug where juju set-env something-boolean={true,false}	06:03
davecheney	didn't work as expected	06:03
jam	I saw that part, didn't know you were working on it with him	06:03
davecheney	i think wallyworld_ is in that rabbit hold atm	06:03
wallyworld_	jam: been stuck on some stuff inside the provisioner task. i think i've got a handle on it. issues with knowing about dead vs missing machines	06:03
davecheney	when I saw,	06:03
jam	You could cheat and make it an int	06:03
davecheney	i mean wallyworld_	06:03
davecheney	and when i say we, i mean ian	06:04
wallyworld_	yeah me	06:04
jam	:)	06:04
* davecheney ceases to 'help'		06:04
jam	wallyworld_: so you mean we "should kill machines that are marked dead" but not "machines which are missing" ?	06:04
jam	davecheney: thanks for being supportive	06:04
wallyworld_	yeah	06:04
wallyworld_	sort of	06:05
wallyworld_	we have a list of instance ids	06:05
jam	wallyworld_: I'm guessing thats "we asked to shutdown a machine, wait for the agent to indicate it is dead, and then Terminate" it	06:05
wallyworld_	and knowing which of those are dead vs missing is the issue, due to how the code is constructed	06:05
jam	but we were detecting that via a mechanism that wasn't distinguishing an instance-id we don't know about from one that we asked to die	06:05
jam	wallyworld_: I don't think you mean "missing", I think you mean "extraneous"	06:06
wallyworld_	yeah	06:06
wallyworld_	the code was destroying the known instance id too soon	06:06
jam	agent for $INSTANCE-ID is now Dead => kill machine, unknown INSTANCEID => do nothing.	06:06
axw	jam: I've just started a new instance in MAAS manually - shouldn't machine-0 be killing it?	06:26
axw	it's been there for a little while now, still living	06:26
jam	axw: you're using 1.16.2+ ?	06:26
axw	jam: 1.16.3	06:26
jam	axw: did you start it manually using the same "agent_name" ?	06:26
axw	jam: yeah, I used my juju-provision plugin	06:27
axw	jam: do you know how I can confirm that it's got the same agent_name?	06:27
jam	axw: some form of maascli node list	06:27
jam	axw: it has been a while for me, might want to ask in #maas	06:28
jam	jtv and bigjools should be up around now	06:28
axw	nodes list doesn't seem to show it	06:28
axw	ok	06:28
jam	axw: if nodes list doesn't list it, it sure sounds like it isn't running	06:28
axw	jam: no I mean it doesn't show agent_name	06:29
axw	the node is there in the list	06:29
jam	axw: try "maascli node list agent_name=XXXXX"	06:30
jam	it looks like it isn't rendered, but if supplied it will be used as a filter	06:30
axw	that worked	06:32
axw	jam: the new one does have the same agent_name	06:33
jam	axw: so my understanding is that we only run the Provisioner loop when we try to start a new unit. You might try add-unit or something and see if it tries to kill of the one you added	06:33
axw	ah ok	06:33
axw	thanks	06:33
jam	axw: did it work?	06:38
axw	jam: not exactly; I tried to deploy to an existing machine. it only triggers if a machine is added or removed	06:38
axw	makes sense	06:38
axw	anyway, it was removed	06:39
axw	so I'll go through the rest of the steps now	06:39
jam	axw: so I've heard talk about us polling and noticing these things earlier, but with what ian mentioned it actually makes sense	06:39
jam	the code exists there to kill machines that were in the environment but whose machine agents were terminated	06:39
axw	yup	06:40
jam	and it had the side effect of killing machines it never knew about	06:40
jam	which we decided to go with	06:40
* axw watches paint dry		06:50
jam	axw: ?	06:54
axw	provisioning nodes does not seem to be the quickest thing	06:54
jam	axw: provisioning in vmaas I would think would be reasonably quick, no?	06:55
axw	jam: it's likely the apt-get bit that's slow, but shrug	06:55
axw	it's definitely not quick	06:56
axw	I will investigate later	06:56
davecheney	axw: the fix for that is to smuggle the apt-cache details into your environment	06:58
davecheney	however when you're on one side of the world	06:58
davecheney	and the env is on the other	06:58
davecheney	it's unlikely that there is a good proxy value that will work for both you and your enviornment	06:58
jam	davecheney: garage maas is in Mark S's garage, so I think it would be both reasonably close and have decent bandwidth to the datacenter (I could be completely wrong on that)	07:19
* jam heads to the grocery store for a bit		07:31
fwereade	mgz, rogpeppe: any updates re agent-fixing scripts?	07:58
rogpeppe	fwereade: i've got a script that works, but i don't know whether mgz wanted to use it or not	08:23
rogpeppe	fwereade: i phrased it as a standalone program rather than a plugin, but that wouldn't be too hard to change	08:23
fwereade	rogpeppe, I don't see updates to the procedure doc explaining exactly how to fix the agent and rsyslog configs	08:23
fwereade	rogpeppe, documenting exactly how to fix is the most important thing	08:24
fwereade	rogpeppe, scripting comes afterwards	08:24
fwereade	rogpeppe, sorry if that wasn't clear	08:24
rogpeppe	fwereade: ah, ok, i'll paste the shell scripty bits into the doc	08:24
fwereade	rogpeppe, <3	08:24
axw	fwereade: I just finished running the process (manually) on garage MAAS	08:25
axw	I keep writing garaage	08:25
axw	anyway	08:25
axw	all seems to be fine	08:25
axw	I missed rsyslog, now that I think of it	08:25
fwereade	axw, ok, great	08:25
axw	fwereade: sent out an email with the steps I took	08:26
fwereade	axw, if you can be around for a little bit, would you follow rog's instructions for fixing those please, just for independent verification?	08:26
axw	fwereade: sure thing	08:26
fwereade	axw, so did the addressupdater code not work?	08:27
axw	fwereade: the what?	08:27
fwereade	axw, you said you fixed addresses in mongo	08:27
axw	ah, maybe I didn't need to do that bit?	08:27
fwereade	axw, rogpeppe: addresses should update automatically once we're running	08:27
axw	ok	08:27
fwereade	rogpeppe, can you confirm?	08:28
rogpeppe	fwereade, axw: it seemed to work for me	08:28
axw	rogpeppe: no worries, I was just poking in the database and thought I'd have to update - I'll put a comment in the doc that it was unnecessary	08:29
rogpeppe	fwereade: hmm, i realised i fixed up the rsyslog file, but didn't do anything about restarting rsyslog...	08:30
fwereade	axw, well, technically, we don't know it was unnecessary	08:31
fwereade	axw, rogpeppe: I am a little bit baffled that the "one approach" notes seem to have been used instead of the main doc	08:32
rogpeppe	fwereade: i didn't suggest that	08:32
axw	fwereade: my mistake, I just picked up the wrong thing	08:32
fwereade	rogpeppe, I know you didn't suggest that bit	08:32
rogpeppe	fwereade: i thought dimitern had some notes somewhere, but i haven't seen them	08:33
fwereade	rogpeppe, they're linked in the main document	08:33
fwereade	rogpeppe, axw, dimitern: fwiw I have no objection to writing your own notes for things, this is good	08:33
axw	fwereade: just trying to fill in the hand wavy "do X in MAAS" bits :)	08:34
fwereade	rogpeppe, axw, dimitern: but if they don't filter back into updates to the main doc -- and if they're left lying around without a big link to the canonical one -- we end up with contradictory information smeared around everywhere	08:34
axw	sure	08:34
fwereade	rogpeppe, axw, dimitern: eg axw trying to use rogpeppe's incorrect mongo syntax	08:35
rogpeppe	fwereade: tbh dimitern's isn't quite right either, currently	08:36
rogpeppe	dimitern: shall i update it to use $set ?	08:36
fwereade	rogpeppe, dimitern: fixing your notes is fine if you want	08:36
rogpeppe	fwereade: my notes were fixed when you mentioned the problem FWIW	08:36
axw	fwereade: I'll run through the main doc and see if I can spot any problems	08:37
rogpeppe	fwereade: it was just a copy/paste failure	08:37
fwereade	rogpeppe, dimitern, axw: but the artifact we're meant to have perfect by now is the main one	08:37
fwereade	rogpeppe, I don't mind what notes you make, so long as it's 100% clear that they're not meant to be used by anyone else, and they link to the canonical document	08:37
fwereade	rogpeppe, and I'm pretty sure mramm and I were explicit about using something that understands yaml to read/write yaml files	08:38
fwereade	rogpeppe, sed, for all its joys, is not aware of the structure of the document;)	08:39
rogpeppe	fwereade: does it actually matter in this case? we know what they look like and how they're marshalled, and the procedure leaves everything else unaffected - it's pretty much what you'd do using a text editor	08:39
rogpeppe	fwereade: i wanted to use something that didn't need anything new installed on the nodes	08:40
dimitern	fwereade, sorry, just catching up on emails	08:40
dimitern	fwereade, yes, the $set syntax should work	08:40
rogpeppe	fwereade: and i'm not sure that there's anything yaml-savvy there by default	08:40
fwereade	rogpeppe, crikey	08:41
fwereade	rogpeppe, well if that's the case I withdraw my objections	08:41
axw	rogpeppe: pyyaml is required by cloud-init, so it's on there	08:42
fwereade	rogpeppe, objections backin force	08:42
axw	but... IMHO sed is fine here	08:42
* axw makes everyone hate him at the same time		08:43
* rogpeppe leaves it to someone with less rusty py skills to do the requisite yaml juggling		08:43
* fwereade flings stones indiscriminately		08:43
jam	axw: did you check with anyone in #maas if maas-cli still doesn't support uploading? The post from allenap was from June (could be true, and you could have experienced it first hand)	08:43
fwereade	rogpeppe, did you hear from mgz at all yesterday?	08:43
axw	jam: the bug is still open, so I didn't	08:43
axw	but	08:43
axw	I couldn't get it to work	08:44
rogpeppe	fwereade: briefly - he'd been offline, but i didn't see his stuff	08:44
jam	fwereade: mgz posted his plugin to the review queue	08:45
axw	fwereade: I'll just update the address in mongo back to something crap and make sure the addressupdater does its job; so far the main doc is fine, tho I had to add the quotes into the mongo _id value filters	08:46
jam	axw: as long as its "I tried and couldn't, then I found the bug" I'm happy. vs if it was "I found the bug, so I didn't try"	08:46
axw	jam: defintely the former :)	08:47
fwereade	axw, thanks for fixing the main doc :)	08:48
axw	np	08:48
fwereade	axw, and let me know if the address-updating works as expected	08:48
axw	will do	08:48
bigjools	jam, axw: it doesn't support uploading still	08:49
axw	bigjools: thanks for confirming	08:50
jam	thanks bigjools	08:50
axw	fwereade: confirmed, addressupdater does its job	08:51
axw	sorry for the confusion	08:51
jam	axw: did you have to set LC or LC_ALL when doing mongodump ?	08:52
jam	axw: or is it (possibly) set when you ssh into things	08:53
axw	jam: I did not, but I didn't check if it was there already; I'll check now	08:53
jam	thx	08:54
axw	not set to anything	08:54
axw	dunno why it didn't affect me	08:54
jam	axw: one thought is that you only have to set it if you don't have the current lang pack installed (which a cloud install may not have) ? not really suer	08:55
fwereade	jam, rogpeppe: hey, I just thought of something	08:58
rogpeppe	fwereade: oh yes?	08:58
jam	?	08:58
fwereade	jam, rogpeppe: we should probably be setting all the unit-local settings revnos to 0	08:59
rogpeppe	fwereade: i thought of something similar yesterday actually, but not so nice	08:59
rogpeppe	fwereade: that would be a good thing to do	08:59
fwereade	rogpeppe, yeah, it was inspired by your comments yesterday, it just took a day for it to filter through	09:00
jam	fwereade: I don't actually know what revnos you are talking about. Mongo txn ids?	09:00
rogpeppe	fwereade: that gets you unit settings, but what about join/leave?	09:00
rogpeppe	jam: the unit agent stores some state locally	09:00
rogpeppe	jam: so that it can be sure to execute the right hooks, even after a restart	09:00
fwereade	rogpeppe, join/leave should be good, the hook queues reconcile local state against remote	09:01
rogpeppe	fwereade: great	09:01
rogpeppe	fwereade: do config settings need anything special?	09:01
fwereade	rogpeppe, config settings should also be fine thanks to the somewhat annoying always-run-config-changed behaviour	09:01
fwereade	rogpeppe, we have a bug for that	09:01
rogpeppe	fwereade: currently we can treat it as a useful feature :-)	09:02
fwereade	rogpeppe, indeed :)	09:02
jam	axw: when you did your testing, did you start machine-0 before updating the agent address in the various units?	09:03
rogpeppe	fwereade: it would be interesting to try to characterise the system behaviour when restoring at various intervals after a backup	09:03
axw	jam: no, I started it last	09:03
rogpeppe	fwereade: e.g. when the unit/service was created but is not restored	09:04
axw	jam: sorry, I'll add that step in :)	09:04
axw	jam: actually	09:04
axw	I lie	09:04
axw	I did start it first	09:04
rogpeppe	fwereade: i suspect that's another case where we really don't want to randomly kill unknown instances	09:04
fwereade	axw, dimitern, rogpeppe, mgz, everyone -- please be doubly sure that you test the canonical procedure	09:04
jam	axw: actually we wanted to do it last	09:05
fwereade	rogpeppe, well, there's no way to restore those things at the moment anyway	09:05
fwereade	jam, why?	09:05
jam	axw: so update machine-0 config, start it, then go around and fix the agent.conf	09:05
dimitern	fwereade, ok, i'm starting a fresh test with the canonical procedure now	09:05
jam	fwereade: didn't you want to split "fixing up mongo + machine-0" from "fixing up all other agents" ?	09:05
rogpeppe	fwereade: agreed, but the user might have important data on those nodes	09:05
axw	jam: yeah that's what I did, sorry	09:06
jam	axw: sorry, "when I say do it last" it was confusing what thing "it" is	09:06
jam	axw: start jujud-machine-0 should come before updating agent.conf	09:06
fwereade	jam, I think we suffered a communication failure -- you seemed to be suggesting he should fix agent confs before starting the machine 0 agent	09:06
jam	thanks	09:06
jam	fwereade: yes. I think we all agree on what should be done :)	09:07
axw	jam: I fixed mongo, started machine-0, fixed provider-state, fixed agent.conf	09:07
jam	axw: I'm copying some of your maas specific steps into the doc	09:07
axw	cool	09:07
fwereade	rogpeppe, this is true, hence https://codereview.appspot.com/32710043/ -- would you cast your eyes over that please?	09:07
rogpeppe	fwereade: looking	09:08
fwereade	rogpeppe, there's not much opportunity to fix them, it's true	09:08
axw	fwereade: I used my plugin to provision the new node; how are people expected to do it without it (and get a valid agent_name)?	09:08
fwereade	rogpeppe, and the rest of the system should anneal so as to effectively freeze them out	09:08
jam	rogpeppe: fwereade: are we actually suggesting run "initctl stop" rather than just "stop foo" ?	09:09
fwereade	jam, I don't think so	09:09
jam	we do it differently at different points in the file	09:09
rogpeppe	fwereade: yeah	09:09
fwereade	jam, where didinitctlcomefrom?	09:09
jam	"sudo start jujud-machine-0" but	09:09
jam	"for agent in *; do initctl stop juju-$agent"	09:09
jam	fwereade: in the main doc, I think rogpeppe put it	09:09
jam	I'll switch it	09:09
rogpeppe	jam: i generally prefer "initctl stop" rather than "stop" as i think it's more obvious, but that's probably just me	09:09
rogpeppe	jam: the two forms are exactly equivalent i believe	09:10
dimitern	rogpeppe, it's just you :)	09:10
dimitern	rogpeppe, i preferred service stop xyz before, but now i find stop xyz or start xyz pretty useful	09:11
dimitern	rogpeppe, and i don't think they are quite equivalent	09:11
* rogpeppe thinks it was rather unnecessary for upstart to take control of all those useful verbs		09:11
jam	rogpeppe: honestly, I think they are at least roughly equivalent, but we should be consistent in the doc	09:11
rogpeppe	dimitern: no?	09:11
rogpeppe	dimitern:	09:12
rogpeppe	% file /sbin/stop	09:12
rogpeppe	/sbin/stop: symbolic link to `initctl'	09:12
jam	main problem I had with "service stop" is I always wanted to type it wrong "service stop mysql" vs "service mysql stop"	09:12
jam	I still am not sure which is correct :)	09:12
dimitern	rogpeppe, initctl is the same as calling the script in /etc/init.d/xyz {start\|stop\|etc..}	09:12
rogpeppe	jam: initctl stop mysql	09:12
dimitern	rogpeppe, whereas start/stop and service are provided by upstart	09:12
jam	dimitern: yeah, I confirmed rogpeppe is right that stop is a symlink to initctl	09:13
jam	at least on Precise	09:13
rogpeppe	dimitern: i believe that stop is exactly equivalent to initctl stop	09:13
rogpeppe	dimitern: try man 8 stop	09:14
dimitern	rogpeppe, hmm.. seems right	09:14
rogpeppe	dimitern: (it doesn't even mention the aliases)	09:14
rogpeppe	dimitern: that's why i like using initctl, as it's in some sense the canonical form	09:14
jam	rogpeppe: can you double check the main doc again. I reformatted the text, and reformatting regexes is scary :)	09:14
dimitern	rogpeppe, but again, I usually am too lazy to type more, if I can type less :)	09:14
jam	https://docs.google.com/a/canonical.com/document/d/1c1XpjIoj9ob_06fvvGJz7Jm4qS127Wtwd5vw_Jeyebo/edit#	09:14
rogpeppe	dimitern: this is a script :-)	09:15
rogpeppe	jam: looking	09:15
jam	rogpeppe: actually, it is a document describing what we want other people to type	09:15
jam	again, it doesn't matter terribly, but we should be consistent	09:15
rogpeppe	jam: i don't expect anyone to actually type that	09:15
jam	rogpeppe: that is what this doc is about actually	09:16
jam	rogpeppe: right down what the manual steps are to get things working	09:16
jam	and then maybe we'll script it later	09:16
rogpeppe	jam: i realise that, but surely anyone that's doing it will copy/paste?	09:16
rogpeppe	jam: rather than manually (and probably wrongly) type it all out by hand	09:16
jam	rogpeppe: well, C&P except they have to edit bits, and its actually small, so they'll just type it, and ...	09:16
rogpeppe	jam: i wouldn't trust anyone (including myself) to type out that script by hand	09:16
jam	like "8.3" ADDR=<...>"	09:17
jam	they can't just C&P	09:17
rogpeppe	jam: i deliberately changed it so that the only bit to edit was that bit	09:17
dimitern	rogpeppe, btw for the copy/paste to work we need to use the correct arguments, like --ssl instead of -ssl ;)	09:18
rogpeppe	dimitern: good catch, done	09:18
jam	so... have we stopped the "stay on the hangout" bit of the day ?	09:23
axw	fwereade: I used my plugin to provision the new node; how are people expected to do it without it (and get a valid agent_name)?	09:25
dimitern	jam, i for one find it a bit distracting tbo	09:25
axw	(just wondering if I should proceed to fix it or not)	09:25
jam	axw: maascli acquire agent_name=XXXXX	09:26
axw	jam: ah :)	09:27
axw	then I shall just let that code sit there for now	09:27
axw	jam: do you think it's worth putting that in the doc?	09:28
jam	axw: well, if you don't mind testing it and finding the exact right syntax, then I'd like it in the doc	09:34
axw	jam: I'll see what I can do before the family gets home	09:35
rogpeppe	jam, wallyworld_: reviewed https://codereview.appspot.com/32710043/	09:36
rogpeppe	fwereade: ^	09:36
wallyworld_	rogpeppe: i'll read your comments in detail - the changes i made were what i found i had to do to make the tests pass	09:38
rogpeppe	wallyworld_: what was failing?	09:39
wallyworld_	otherwise it had issues distinguishing between dead vs extra instances	09:39
wallyworld_	a number of provisioner tests	09:39
wallyworld_	concerning seeing which instances were stopped	09:39
wallyworld_	your proposed code may well work also	09:40
rogpeppe	wallyworld_: so that original variable "unknown" didn't actually contain the unknown instances?	09:40
axw	jam: there are other things that StartInstance does for MAAS too, like creating the bridge interface	09:40
rogpeppe	wallyworld_: i would very much prefer to change as little logic as possible here	09:41
wallyworld_	rogpeppe: it also contained dead ones i think from memory	09:41
wallyworld_	cause the dead ones were removed early from machines map	09:41
axw	jam: tho I guess this is moot if they're just doing a bare-metal backup/restore	09:41
jam	axw: so... we should now if this stuff works by going through the checklist we've created. If we really do need something like juju-provision, then we should document it as such.	09:43
axw	jam: the problem is that step 1 is vague as to how to achieve the goal	09:44
jam	axw: so 1.1 in the main doc is about "provision an instance matching the existing as much as possible"	09:45
axw	jam: yeah, how? maybe it's obvious to people seasoned in maas, I don't know	09:45
jam	axw: as in we need to put it in there to help people	09:46
jam	it may be your juju-provision	09:46
jam	it may be "maascli do stuff"	09:46
jam	it may be ?	09:46
jam	but we shouldn't have ? in that doc :)	09:46
axw	jam: ok, we're on the same page now: that is what my question was before	09:46
axw	i.e. is there some other way to do this, or do we still need juju-provision	09:46
jam	axw: so we are focused on "manual steps you can do today" in that document, though referencing "there is a script over here you can use"	09:48
axw	jam: ok. well, fwiw that plugin works fine now, so if we can't figure out something better, there's that	09:50
wallyworld_	rogpeppe: so i needed to leave the dead machines in the machine map until the allinstances had been checked, so that the difference between nachine map and allinstances really represented unknown machines. after that the dead ones could be processed	09:51
axw	jam: didn't get anywhere with maas-cli; I need to head off now, I'll check in later	10:07
jam	axw: np	10:07
jam	axw: have a good afternoo	10:07
jam	afternoon	10:07
rogpeppe	wallyworld_: ok - i'd assumed that unknown really was unknown. i will have a better look at your CL in that light now	10:20
rogpeppe	fwereade: i've added a script to change the relation revnos	10:20
fwereade	rogpeppe, cool, thanks	10:21
dimitern	fwereade, the procedure as described checks out	10:22
fwereade	dimitern, awesomesauce	10:24
dimitern	fwereade, for ec2 ofc, haven't tried the maas parts	10:24
fwereade	rogpeppe, great, thanks	10:27
fwereade	dimitern, would you run rog's new change-version tweak against your env too please?	10:27
dimitern	fwereade, what's that tweak?	10:28
fwereade	dimitern, in the doc: if [[ $agent = unit-* ]]	10:28
fwereade	then	10:28
fwereade	sed -i -r 's/change-version: [0-9]+$/change-version: 0/' $agent/state/relations//	10:28
fwereade	fi	10:28
fwereade	dimitern, to be run while the unit agent's stopped	10:28
fwereade	dimitern, it'll trigger a whole round of relation-changed hooks	10:28
fwereade	dimitern, should be sufficient to bring the environment back into sync with itself even if it was backed up while not in a steady state	10:29
dimitern	fwereade, i'll try that	10:29
dimitern	fwereade, wait, which doc? machine doc?	10:30
fwereade	dimitern, in the canonical source-of-truth doc, in section 8, with the scripts rog write	10:31
dimitern	fwereade, ah, ok	10:31
dimitern	fwereade, i can see the hooks, seems fine	10:34
fwereade	dimitern, sweet	10:34
dimitern	fwereade, rogpeppe, mgz, jam, TheMue, natefinch, standup time	10:46
dimitern	mgz, jam, TheMue: https://plus.google.com/hangouts/_/calendar/am9obi5tZWluZWxAY2Fub25pY2FsLmNvbQ.mf0d8r5pfb44m16v9b2n5i29ig	10:50
jam	TheMue: ^^ ? if you want to join	10:53
jam	mgz: ^^	10:53
wallyworld_	fwereade: pushed some changes. wrt the question - can we call processMachines when setting safe mode - what machine ids would i use in that case?	11:48
wallyworld_	cause normally the ids come from the changes pushed out by the watcher	11:48
rogpeppe	wallyworld_: i think you could probably get all environ machines and use their ids	11:59
wallyworld_	rogpeppe: i considered that but in a large environment the performance could be an issue	12:00
rogpeppe	wallyworld_: no worse than the provisioner bouncing	12:00
rogpeppe	wallyworld_: and this is something that won't happen very often at all, i'd hope	12:00
wallyworld_	hmmm ok	12:01
rogpeppe	wallyworld_: um, actually...	12:01
wallyworld_	i'll look into it	12:01
rogpeppe	wallyworld_: perhaps you could pass in an empty slice	12:01
wallyworld_	then it wouldn't pick up any dead machines, but may not matter	12:02
rogpeppe	wallyworld_: i don't think we'll do anything differently with dead machines between safe and unsafe mode	12:02
rogpeppe	wallyworld_: the thing that changes is how we treat instances that aren't in state at all, i think	12:03
wallyworld_	i thought about using a nil slice and thought it may be an issue but i can't recall why now. i'll look again	12:03
rogpeppe	wallyworld_: BTW you probably only need to call processMachines when provisioner-safe-mode has been turned off	12:04
wallyworld_	yep, figured that :-)	12:05
* TheMue => lunch		12:15
rogpeppe	wallyworld_: reviewed. sorry for the length of time it took.	12:28
wallyworld_	np, thanks. i'll take a look	12:28
wallyworld_	rogpeppe: with the life == Dead check - if i remove it, wont' we encounter this line else if !params.IsCodeNotProvisioned(err) {	12:32
wallyworld_	and exit with an error	12:32
rogpeppe	wallyworld_: i don't think it's an error to call InstanceId on a dead machine	12:33
wallyworld_	well, it will try and find an instance record in the db and fail	12:33
wallyworld_	or maybe not	12:33
wallyworld_	i think it will only fail once the machine is removed	12:34
wallyworld_	i just don't see the point of a rpc round trip	12:34
rogpeppe	wallyworld_: i think it will probably work even then	12:34
wallyworld_	when it is not needed	12:34
rogpeppe	wallyworld_: it is strictly speaking not necessary, yes, but your comment is only necessary because the context that makes the code correct as written is not inside that function	12:35
rogpeppe	wallyworld_: it only works if we know that stopping contains all dead machines	12:36
wallyworld_	yeah it sorta is - the population of stopping and processiing of that	12:36
wallyworld_	ok,i see your point	12:36
wallyworld_	but	12:36
wallyworld_	the comment clears up any confusion	12:37
rogpeppe	wallyworld_: i'd prefer robust code to a comment, tbh	12:37
wallyworld_	and i hate invoking rpc unless necessary, and we are trusting that we either get an instance id or that specific error always and we are not sure	12:37
wallyworld_	calling rpc unnecessarily can be unrobust also	12:38
rogpeppe	wallyworld_: i believe it's premature optimisation	12:38
rogpeppe	wallyworld_: correctness is much more important here	12:38
wallyworld_	eliminating rpc is never premature optimisation	12:39
wallyworld_	especially when we can have 1000s of machines	12:39
rogpeppe	wallyworld_: any optimisation is premature optimisation unless you've measured it	12:39
wallyworld_	except for networking calls	12:39
rogpeppe	wallyworld_: none of this is on a critical time scale	12:39
wallyworld_	they can be indeterminately long	12:39
rogpeppe	wallyworld_: it's all happening at leisure	12:39
wallyworld_	but, it is a closed system and errors/delays add up	12:40
rogpeppe	wallyworld_: look, we're getting the instance ids of every single machine in the environment	12:40
rogpeppe	wallyworld_: saving calls for just the dead ones seems like it won't save much at all	12:40
rogpeppe	wallyworld_: if we wanted to save time there, we should issue those rpc's concurrently	12:41
jam	fwereade: for the fix for "destroy machines". I'd like to warn if you supply --force but it won't be supported, should that go via logger.Warning or is there something in command.Context we would use?	12:41
jam	there is a Context.Stderr	12:41
wallyworld_	rogpeppe: can we absolutely guarantee that for all dead/removed machines, instanceid() will return a value or a not provisioned error?	12:42
fwereade	jam, I'd write it to context.Stderr, yeah	12:42
rogpeppe	wallyworld_: assuming the api server is up, yes	12:42
fwereade	rogpeppe, wallyworld_: NotFound?	12:43
jam	rogpeppe: we don't have any way to make our RPC server pretend an API doesn't actually exist, right?	12:43
rogpeppe	fwereade: can't happen	12:43
jam	it would be nice for testing backwards compat	12:43
rogpeppe	fwereade: look at the InstanceId implementation	12:43
rogpeppe	fwereade: i wouldn't mind an explicit IsNotFound check too though, for extra resilience	12:44
fwereade	rogpeppe, looks possible to me	12:44
rogpeppe	fwereade: if (err == nil && instData.InstanceId == "") \|\| (err != nil && errors.IsNotFoundError(err)) {	12:44
rogpeppe	fwereade: err = NotProvisionedError(m.Id())	12:44
fwereade	rogpeppe, I'm looking at apiserver	12:44
wallyworld_	looks like it will return not found	12:44
wallyworld_	looking at api server	12:45
rogpeppe	fwereade: ah, it'll fetch the machine first	12:45
wallyworld_	that's my issue	12:45
rogpeppe	wallyworld_: in which case, check for notfound too	12:45
wallyworld_	hence the == dead check	12:45
wallyworld_	seems rather fragile	12:45
rogpeppe	wallyworld_: will the == dead check help you?	12:45
wallyworld_	yes, because that short circuits the need for getting instance if	12:45
wallyworld_	id	12:46
wallyworld_	so we don't need to guess error codes	12:46
jam	fwereade: I can give a warning, or I can make it an error, thoughts? (juju destroy-machine --force when not supported should try just plain destroy-machine, or just abort ?)	12:46
rogpeppe	wallyworld_: can't the machine be removed anyway, even if the machine is not dead? it could become dead and then be removed	12:46
fwereade	jam, I'd be inclined to error, myself, tbh	12:46
wallyworld_	rogpeppe: if it is not dead, there is also processing for that elsewhere	12:47
rogpeppe	wallyworld_: i think this code is preventing you from calling processMachines with a nil slice	12:48
wallyworld_	which code specifically?	12:49
rogpeppe	wallyworld_: the "if m.Life() == params.Dead {" code	12:49
wallyworld_	save me looking, how?	12:50
rogpeppe	wallyworld_: because stopping doesn't contain all stopping machines (your comment there is wrong, i think)	12:50
rogpeppe	wallyworld_: it (i think) contains all dead machines that we've just been told had their lifecycle change	12:51
wallyworld_	yes	12:51
rogpeppe	wallyworld_: and this is what makes me think that the code is not robust	12:51
wallyworld_	but that's the the current processing does	12:51
wallyworld_	looks at changed machines	12:51
rogpeppe	wallyworld_: no	12:51
rogpeppe	wallyworld_: task.machines contains every machine, i think, doesn't it?	12:51
wallyworld_	yes, i meant the ids	12:52
wallyworld_	stopping is populated from the ids	12:52
rogpeppe	wallyworld_: so, if there's a dead machine that's not in the ids passed to processMachines, its instance id will be processed as unknown, right?	12:53
wallyworld_	i think so	12:53
wallyworld_	but it would have previous triggered	12:53
rogpeppe	wallyworld_: so this code will be wrong if you pass an empty slice to processMachines, yes?	12:53
wallyworld_	i'd have to trace it through	12:54
rogpeppe	wallyworld_: (which is something that would be good to do)	12:54
rogpeppe	wallyworld_: please write the code in such a way that it's obviously correct	12:54
rogpeppe	wallyworld_: (which the current code is not, IMHO)	12:54
wallyworld_	obviously is subjective	12:54
rogpeppe	wallyworld_: ok, more obviously :-)	12:55
rogpeppe	wallyworld_: "If a machine is dead, it is already in stopping" is an incorrect statement, I believe. Or only coincidentally correct. And thus it seems wrong to me to base the logic around it.	12:55
wallyworld_	if a changing machine is dead it is in stoppting	12:56
wallyworld_	that assumption still needs to be true	12:56
wallyworld_	regardless of if i take out the == dead check	12:56
rogpeppe	wallyworld_: thanks	12:57
rogpeppe	wallyworld_: "if a changing machine is dead it is in stoppting" is not the invariant you asserted	12:57
wallyworld_	what for?	12:57
rogpeppe	wallyworld_: making the change	12:57
wallyworld_	i haven't yet	12:57
rogpeppe	wallyworld_: oh, sorry, i misread	12:57
wallyworld_	still trying to see if i can rework it	12:57
rogpeppe	wallyworld_: i think this code should be robust even in the case that there are dead machines that were not in the latest change event	12:58
rogpeppe	oh dammit	12:59
wallyworld_	yes. i wonder what the code used to do, i'll look at the old code	12:59
rogpeppe	hmm, this code is the only code that removes machines, right?	13:00
wallyworld_	i think so	13:00
wallyworld_	at first glance, i'm not sure if the old code was immune to the issue of ids not containing all dead machines	13:01
wallyworld_	the old code looks like it used to rely on dead machines being notified via incoming ids	13:02
rogpeppe	wallyworld_: i think it was	13:02
rogpeppe	wallyworld_: it certainly relied on that	13:02
wallyworld_	so i'm doing something similar here then	13:03
rogpeppe	wallyworld_: but the unknown-machine logic didn't rely on the fact that all dead machines were in stopping	13:03
rogpeppe	wallyworld_: which your code does	13:03
wallyworld_	hmmm.	13:03
fwereade	wallyworld_, rogpeppe: I'd really prefer to avoid further dependencies on machine status, the pending/error stuff is bad enough as it is	13:03
rogpeppe	wallyworld_, fwereade: BTW i can't see any way that a machine that has not been removed could return a not-found error from the api InstanceId call, can you?	13:03
rogpeppe	fwereade: i'm not quite sure what you mean there	13:04
fwereade	rogpeppe, wallyworld_: the "stopping" sounded like a reference to the status -- as in SetStatus	13:04
rogpeppe	fwereade: nope	13:04
fwereade	rogpeppe, wallyworld_: ok sorry :)	13:05
rogpeppe	fwereade: i'm talking about the stopping slice in provisioner_task.go	13:05
rogpeppe	fwereade: and in particular to the comment at line 288 of the proposal:	13:05
wallyworld_	what he said	13:05
rogpeppe	// If a machine is dead, it is already in stopping and	13:05
rogpeppe	289 // will be deleted from instances below. There's no need to	13:05
rogpeppe	290 // look at instance id.	13:05
fwereade	rogpeppe, wallyworld_: wrt machine removal: destroy-machine --force will remove from state, but I'd be fine just dropping that last line in the cleanup method and leaving the provisioner to finally remove it	13:06
rogpeppe	fwereade: this discussion is stemming from my remark on that comment	13:06
rogpeppe	fwereade: that would be much better	13:06
wallyworld_	one place to remove is best	13:06
rogpeppe	fwereade: otherwise we can leak that machine's instance id	13:06
rogpeppe	fwereade: if we're in safe mode	13:06
=== gary_poster\|away is now known as gary_poster
fwereade	rogpeppe, wallyworld_: I saw I'd done that the other day and thought "you idiot", for I think exactly the same reasons, consider a fix for that pre-blessed	13:07
wallyworld_	rogpeppe: i think i can see your point	13:07
rogpeppe	wallyworld_: phew :-)	13:08
wallyworld_	that stopping won't contain all dead machines	13:08
wallyworld_	sorry, it's late here, i'm tired, that's my excuse :-)	13:08
rogpeppe	wallyworld_: np	13:08
rogpeppe	wallyworld_: thing is, it probably does, but i don't think it's an invariant we want to rely on implicitly	13:08
wallyworld_	i was originally worried about the error fragility	13:09
rogpeppe	wallyworld_: especially because we can usefully break that invariant to good effect (by passing an empty slice to processMachines)	13:09
wallyworld_	i'm still quite concerned about all the rpc calls we make (in general)	13:09
rogpeppe	wallyworld_: an extra piece of code explicitly ignoring a not-found error too would probably be a good thing to add	13:10
wallyworld_	ok	13:10
rogpeppe	wallyworld_: well, me too, but you'll only be saving a tiny fraction of them here	13:10
wallyworld_	yeah, we really need a bulk instance id call - i thought all our apis were supposed to be bulk	13:10
wallyworld_	putting remote interfaces on domain objects eg machine is also wrong, but thats another discussion	13:11
wallyworld_	imagine a telco with 10000 or more machines	13:12
rogpeppe	wallyworld_: they are, kinda, but a) we don't make them available to the client and b) we don't implement any server-side optimisation that would make it significantly more efficient	13:12
wallyworld_	well, here the provisioner is a client	13:12
wallyworld_	task	13:12
rogpeppe	wallyworld_: if we had 10000 or more machines, we would not want to process them all in a single bulk api call anyway	13:12
rogpeppe	wallyworld_: indeed	13:13
wallyworld_	sure, but that optimisation can be done under the covers	13:13
wallyworld_	the bulk api can batch	13:13
wallyworld_	so bottom line - we can't claim to scale well just yet	13:13
wallyworld_	more work to do	13:13
rogpeppe	wallyworld_: to be honest, just making concurrent API calls here would yield a perfectly sufficient amount of speedup, even in the 10000 machine case, i think	13:14
rogpeppe	wallyworld_: without any need for more mechanism	13:14
wallyworld_	you mean using go routines?	13:14
rogpeppe	wallyworld_: yeah	13:15
wallyworld_	well, that could happen under the covers	13:15
wallyworld_	but we need to expose a bulk api to callers	13:15
rogpeppe	wallyworld_: i'm not entirely convinced.	13:15
wallyworld_	and then the implementation can decide how best to do it	13:15
rogpeppe	wallyworld_: the caller may well want to do many kinds of operation at the same time. bulk calls are like vector ops - they only allow a single kind of op to be processed many times	13:16
rogpeppe	wallyworld_: that may not map well to the caller's requirements	13:16
wallyworld_	yes, which is why remote apis need to be desinged th match the workflow	13:16
rogpeppe	wallyworld_: agreed	13:16
wallyworld_	ours are just a remoting layer on top of server methods	13:16
wallyworld_	which is kinda sad	13:17
rogpeppe	wallyworld_: which is why i think that one-size-fits all is not a good fit for bulk methods	13:17
rogpeppe	wallyworld_: actually, it's perfectly sufficient, even for implementing bulk calls	13:17
wallyworld_	all remote methods should be bulk, but how stuff is accumulated up for the call is workflow dependent	13:17
rogpeppe	wallyworld_: it's just a name space mechanism	13:18
wallyworld_	anytime a remote method call is O(N) is bad	13:18
rogpeppe	wallyworld_: there are many calls where a bulk version of the call is inevitably O(n)	13:18
wallyworld_	it should't be if designed right	13:18
wallyworld_	to match the workflow	13:19
rogpeppe	wallyworld_: if i'm adding n services, how can that not be O(n) ?	13:19
wallyworld_	what i mean is - if you have N objects, you don't make N remote calls to get info on each one	13:19
wallyworld_	i don't mean the size of the api	13:19
wallyworld_	but the call frequency	13:19
wallyworld_	to get stuff done	13:20
rogpeppe	wallyworld_: if calls can be made concurrently (which they can), then the overall time can still be O(1)	13:20
wallyworld_	the client should not have to manually do that boiler plate	13:20
rogpeppe	wallyworld_: assuming perfect concurrency at the server side of course :-)	13:20
rogpeppe	wallyworld_: now that's a different argument, one of convenience	13:20
wallyworld_	so imagine if you downloaded a file and the networking stack made you as a client figure out how to chunk it	13:21
rogpeppe	wallyworld_: personally, i think it's reasonable that API calls are exactly as easy to make concurrent as calling any other function in Go	13:21
wallyworld_	no - rpc calls should never be treated like normal calls	13:21
rogpeppe	wallyworld_: it does	13:21
wallyworld_	networked calls are always different	13:22
rogpeppe	wallyworld_: i disagree totally	13:22
wallyworld_	so, you've never read the 7 falicies of neworked code or whatever that paper is called?	13:22
rogpeppe	wallyworld_: any time you call http.Get, it looks like a normal call but is networking under the hood.	13:22
rogpeppe	wallyworld_: we should not assume that it cannot fail, of course	13:23
rogpeppe	wallyworld_: and that's probably one of the central fallacies	13:23
wallyworld_	people know http get is networked at do tend to programme aroud it accordingly	13:23
rogpeppe	wallyworld_: but a function works well to encapsulate arbitrary network logic	13:23
rogpeppe	wallyworld_: sure, you should probably know that it's interacting with the network, but that doesn't mean that calling a function that interacts with the network in some way is totally different from calling any other function that interacts in some way with global state	13:24
rogpeppe	wallyworld_: in a way that can potentially fail	13:24
wallyworld_	it is different - networks can disappear, have arbitary lag, different failure modes etc etc	13:25
wallyworld_	the programming model is different	13:25
rogpeppe	wallyworld_: not really - the function returns an error - you deal with that error	13:25
wallyworld_	it is different at a higher level that that	13:26
rogpeppe	wallyworld_: i don't believe that any network interaction breaks all encapsulation	13:26
wallyworld_	see http://www.rgoarchitects.com/files/fallacies.pdf	13:26
rogpeppe	wallyworld_: which is what i think you're saying	13:26
rogpeppe	wallyworld_: i have seen that	13:27
rogpeppe	wallyworld_: i'm not sure how encapsulating a networking operation in a function that returns an error goes against any of that	13:27
wallyworld_	the apis design, error handling and all sorts of other things are different when dealing with networked apis	13:27
wallyworld_	the encapsulation isn;t the issue	13:28
wallyworld_	it's the whole api design	13:28
wallyworld_	and underlying assumptions abut how such apis can be called	13:28
rogpeppe	wallyworld_: i don't understand	13:28
wallyworld_	case in point - it might make sense to call instanceId() once per 10000 machines when inside a service where a machine domain object is colocated, but it is madness to do that over a network	13:29
wallyworld_	the whole api decomposiiton, assumptoons about errors, retries etc needs to be different for networked apis	13:30
rogpeppe	wallyworld_: so, there's no reason that where we need it, we couldn't have State.InstanceIds(machineIds ...string) as well as Machine.InstanceId	13:30
wallyworld_	we should never have machine.InstanceId() - networked calls do not belong on domain objects but services	13:31
rogpeppe	wallyworld_: well, it's certainly true that some designs can make that necessary; eventual consistency for one breaks a lot of encapulation	13:31
wallyworld_	thats the big mistake java made with EJB 1.0	13:31
wallyworld_	and it took a decade to recover	13:31
rogpeppe	wallyworld_: what's the difference between machine.InstanceId() and InstanceId(machine) ?	13:32
wallyworld_	domain objects encapsulate state; they shouldn't call out to services	13:32
jam	dimitern: trivial review of backporting your rpc.IsNoSuchRPC to 1.16: https://codereview.appspot.com/32850043	13:33
wallyworld_	the first example above promotes single api calls	13:33
wallyworld_	which is bad	13:33
rogpeppe	wallyworld_: and the second one doesn't?	13:33
dimitern	wallyworld_, looking	13:33
wallyworld_	the second should be a bulk call on a service	13:33
rogpeppe	wallyworld_: even if it doesn't make sense to be a bulk call?	13:34
dimitern	wallyworld_, the diff is messy	13:34
rogpeppe	wallyworld_: anyway, i think this is somewhat of a religious argument :-)	13:34
jam	dimitern: did you mean jam ?	13:34
rogpeppe	wallyworld_: we should continue at some future point, over a beer.	13:34
wallyworld_	rogpeppe: it always makes sense to provide bulk calls, and if there happens to be only one, just pass that in as a single elemnt array	13:35
wallyworld_	yes	13:35
dimitern	jam, oops yes	13:35
rogpeppe	wallyworld_: i'm distracting you :-)	13:35
wallyworld_	yes	13:35
wallyworld_	:-)	13:35
jam	dimitern: the diff looks clean here, is it because of unified vs side-by-side?	13:35
wallyworld_	i've seen too many systems fall over due to the issues i am highlighting	13:35
jam	I have "old chunk mismatch" in side-by-side but it looks good in unified, I think	13:36
jam	ugh, it is targetting trunk	13:36
dimitern	jam, yeah, the s-by-s diff is missing	13:36
jam	I thought I stopped it in time	13:36
jam	dimitern: so I'll repropose, lbox broke stuff	13:36
jam	you can look at the unified diff, and that will tell you what you'll see in a minute or so	13:36
dimitern	jam, cheers	13:36
jam	dimitern: https://codereview.appspot.com/32860043/ updated	13:40
dimitern	jam, lgtm, thanks	13:41
jam	dimitern, fwereade: if you want to give it a review, this is the "compat with 1.16.3" for 1.16.4 destroy-machines, on the plus side, we don't have to fix DestroyUnit because that API did exist. (GUI didn't think about Machine or Environment, but it did think about Units)	13:47
jam	https://codereview.appspot.com/32880043	13:47
dimitern	jam, looking	13:47
dimitern	jam, lgtm	13:51
jam	fwereade: do you want to give an eyeball if that seems to be a reasonable way to do compat code? We'll be using it as a template for future compat	13:51
fwereade	jam,will do, we have that meeting in a sec	13:52
jam	fwereade: sure, but it is 1hr past my EOD, and my son needs me to take him to McDonalds :)	13:52
fwereade	jam, ok then, I will look as soon as I can, thanks	13:53
jam	fwereade: no rush on your end	13:53
jam	I think it is ~ok, though I'd love to actually have tests that compat is working	13:53
wallyworld_	rogpeppe: more changes pushed. but calling processMachines(nil) hangs the tests so that bit is not there yet	13:53
jam	sinzui: maybe we could do cross version compat testing in CI for stuff we know changed?	13:54
jam	I could help write those tests	13:54
fwereade	wallyworld_, might processMachines(nil) be a problem if the machines map is empty?	13:54
rogpeppe	wallyworld_: looking	13:54
rogpeppe	wallyworld_: could you propose again? i'm getting chunk mismatch	13:55
wallyworld_	fwereade: could be, i haven't traced through the issue yet fully. not sure how much further i'll get tonight, it's almost midnight and i'm having trouble staying awake	13:55
fwereade	wallyworld_, ok, stop now :)	13:55
fwereade	wallyworld_, tired code sucks	13:55
fwereade	wallyworld_, landing it now will not make the world of difference	13:56
wallyworld_	yep. i don't have to be tired to write sucky code :-)	13:56
rogpeppe	wallyworld_, fwereade: i could try to take it forward. mgz is now online so can probably take the bootstrap-update stuff forward	13:56
fwereade	wallyworld_, ;p	13:56
fwereade	rogpeppe, wallyworld_, mgz: if that works for you all, go for it	13:56
rogpeppe	or, it probably doesn't make much difference, as fwereade says	13:56
wallyworld_	rogpeppe: i pushed again	13:56
rogpeppe	wallyworld_: thanks	13:56
rogpeppe	wallyworld_: you need to lbox propose again.	13:58
rogpeppe	wallyworld_: oh, hold on!	13:58
wallyworld_	a thrid time?	13:58
rogpeppe	wallyworld_: page reload doesn't work, i now remember	13:58
* wallyworld_ hates reitveld		13:58
rogpeppe	wallyworld_: ah, it works, thanks!	13:59
rogpeppe	wallyworld_: that bit is really shite, it's true	13:59
rogpeppe	wallyworld_: i saw a proposal recently to fix the upload logic	13:59
wallyworld_	hope they land it soon	13:59
rogpeppe	wallyworld_: it would be nice if the whole thing was a little more web 2.0, so you didn't have to roundtrip to the server all the time.	14:00
wallyworld_	yeah	14:00
wallyworld_	that also messes up browser history	14:01
sinzui	jam, I had the same idea. I added it to my proposal of what we want to see about a commit in CI https://docs.google.com/a/canonical.com/spreadsheet/ccc?key=0AoY1kjOB7rrcdEl3dWl0NUM3RzE2dXFxcGxwbVZtUFE&usp=drive_web#gid=0	14:03
rogpeppe	wallyworld_: i think i know why your processMachines(nil) call might be failing	14:06
wallyworld_	ok	14:06
rogpeppe	wallyworld_: were you calling it from inside SetSafeMode?	14:07
wallyworld_	yeah	14:07
rogpeppe	wallyworld_: thought so. that's not good - it needs to be called within the main provisioner task look	14:07
rogpeppe	s/look/loop/	14:07
wallyworld_	ok	14:08
rogpeppe	wallyworld_: so i think the best way to do that is with a channel rather than using a mutex	14:08
wallyworld_	rogpeppe: but setsafemode is called from the loop	14:08
rogpeppe	wallyworld_: it is?	14:08
wallyworld_	ah, provisioner loop	14:09
rogpeppe	wallyworld_: yup	14:09
wallyworld_	not provisioner task	14:09
rogpeppe	wallyworld_: indeed	14:09
wallyworld_	save me tracing through the code, why does it matter?	14:09
rogpeppe	wallyworld_: because there is lots of logic in the provisioner task that relies on single-threaded access (all the state variables in environProvisioner)	14:10
rogpeppe	wallyworld_: that's why we didn't need a mutex there	14:10
wallyworld_	makes sense	14:11
rogpeppe	wallyworld_: you'll have to be a bit careful with the channel (you probably don't want the provisioner main loop to block if the provisioner task isn't ready to receive)	14:13
wallyworld_	yeah, channels can be tricky like that	14:14
hazmat	if anyone has a moment, i would appreciate a review of this trivial that resolves two issues with manual provider, https://code.launchpad.net/~hazmat/juju-core/manual-provider-fixes	14:14
rogpeppe	wallyworld_: this kind of idiom can be helpful: http://paste.ubuntu.com/6479150/	14:15
wallyworld_	rogpeppe: thanks, i'll look to use something like that	14:16
rogpeppe	wallyworld_: it works well when there's a single producer and consumer	14:16
rogpeppe	hazmat: i'll look when the diffs are available. codereview would be more conventional.	14:17
hazmat	rogpeppe, doh.	14:18
hazmat	rogpeppe, its a 6 line diff fwiw	14:18
rogpeppe	hazmat: lp says "An updated diff will be available in a few minutes. Reload to see the changes."	14:18
hazmat	http://bazaar.launchpad.net/~hazmat/juju-core/manual-provider-fixes/revision/2095	14:19
* hazmat lboxes		14:19
hazmat	rogpeppe, https://codereview.appspot.com/32890043	14:22
rogpeppe	hazmat: axw_ might have some comments on the LookupAddr change.	14:24
hazmat	rogpeppe, what it was doing previously was broken	14:24
rogpeppe	hazmat: it looks like it was done like that deliberately.	14:24
rogpeppe	hazmat: agreed.	14:24
hazmat	rogpeppe, yes deliberately broken, i've already discussed with axw	14:25
rogpeppe	hazmat: it should at the least fall back to the original address	14:25
hazmat	rogpeppe, it hangs indefinitely	14:25
rogpeppe	hazmat: ok, if you've already discussed, that's fune	14:25
rogpeppe	fine	14:25
hazmat	rogpeppe, and there's no reason for requiring dns name	14:25
rogpeppe	hazmat: hmm, hangs indefinitely?	14:25
rogpeppe	hazmat: ah, if it doesn't resolve, then WaitDNSName will loop	14:26
rogpeppe	hazmat: yeah, i think that's fair enough. the only thing i was wondering was if something in the manual provider used the address to name the instance	14:26
rogpeppe	hazmat: but even then, a numeric address should be fine	14:26
hazmat	yes.. slavish adherence to name is name, when name is actually address and the api should get renamed.	14:27
hazmat	to name is the issue	14:27
rogpeppe	hazmat: yeah.	14:27
* hazmat grabs a cup of coffee		14:28
rogpeppe	hazmat: i think the api was originally named after the ec2 name	14:28
jam	sinzui: * vs x is ?	14:37
jam	stuff that is done, vs proposed ?	14:37
jam	or stuff that is done but failing tests	14:37
jam	sinzui: if you can give me a template or some sort of process to write tests for you, I can do a couple	14:39
sinzui	jam, in15 minutes I can	14:40
hazmat	rogpeppe, thanks for the review, replied and pushed.	14:52
rogpeppe	hazmat: looking	14:53
rogpeppe	hazmat: LGTM	14:54
jam	sinzui: no rush on my end. I'm EOD and just stopping by IRC from time to time	15:06
sinzui	jam, okay, I will send an email to the juju-dev list so that the knowledge is documented somewhere	15:07
natefinch	is there a way to move a window that's off the screen back onto the screen? I know windows tricks to do it, but not linux. (and I know about workspaces, I'm not using them)	15:10
rogpeppe	natefinch: i enabled workspaces for that reason only	15:11
natefinch	rogpeppe: heh, well, maybe I should turn them back on	15:11
TheMue	natefinch: if you click on the workspace icon in the bar you'll get all four and can move windows	15:15
natefinch	TheMue: I had workspaces off.... I think Ubuntu just gets confused when I go from one monitor to multiple monitors and back again	15:16
=== teknico_ is now known as teknico
TheMue	natefinch: computers don't have to have more than one monitor tryingToSoundPowerful	15:17
TheMue	;)	15:17
natefinch	haha	15:17
natefinch	And I turned off workspaces because the keyboard shortcuts don't work :/	15:18
* rogpeppe goes for a bite to eat		15:52
hazmat	are we doing 2 LGTM for branches or one?	16:03
natefinch	one	16:04
hazmat	natefinch, thanks	16:05
hazmat	is there a known failing test in trunk?	16:50
hazmat	ie cd juju-core/juju && go test -> http://pastebin.ubuntu.com/6479834/	16:50
dimitern	hazmat, which one?	16:50
natefinch	hazmat: thats a pretty common sporadic failure, yes.	16:51
dimitern	hazmat, yeah, that's known	16:51
dimitern	hazmat, it's pretty random to reproduce	16:51
rogpeppe	hazmat: if you have a way of reliably reproducing it, i want to know	16:51
hazmat	k, it seems to happen fairly regularly for me	16:51
hazmat	rogpeppe, atm on my local laptop i can reproduce every time.. generating verbose logs atm	16:52
rogpeppe	hazmat: do you get it when running the juju package tests on their own?	16:53
hazmat	rogpeppe, here's verbose logs on the same http://paste.ubuntu.com/6479841/	16:53
hazmat	rogpeppe, yes i do	16:53
rogpeppe	hazmat: and this is on trunk?	16:53
dimitern	hazmat, can you check your /tmp folder to see and suspicious things - like too many mongo dirs or gocheck dirs?	16:53
hazmat	rogpeppe, if i just run -gocheck.f "DeployTest*" i don't get failure	16:53
hazmat	dimitern, not much in /tmp three go-build* dirs	16:54
hazmat	rogpeppe, yes on trunk	16:54
dimitern	hazmat, ok, so it's not related then	16:54
dimitern	hazmat, running a netstat dump of open/closing/pending sockets to mongo might help	16:55
rogpeppe	hazmat: is it always TestDeployForceMachineIdWithContainer that fails?	16:55
hazmat	rogpeppe, checking.. its failed a few times on that one.. every time.. not sure	16:56
hazmat	rogpeppe, yeah.. it does seem to happen primarily on that one	16:56
rogpeppe	hazmat: how about: go test -gocheck.f DeploySuite ?	16:56
hazmat	rogpeppe, i think that works fine.. its just testing the whole package that fails	16:56
hazmat	yeah. that works fine	16:56
hazmat	hmm	16:57
rogpeppe	hazmat: i'd quite like to try bisecting to see which other tests cause it to fail	16:57
hazmat	rogpeppe, hold on a sec.. your cli for gocheck.f results in zero tests	16:57
rogpeppe	hazmat: oops, sorry, DeployLocalSuite	16:57
rogpeppe	hazmat: go test -gocheck.list will give you a list of all the tests it's running	16:58
hazmat	yeah.. all tests pass	16:58
hazmat	if running just that suite	16:58
rogpeppe	hazmat: ok...	16:58
rogpeppe	hazmat: how about go test -gocheck.f 'DeploySuite\|ConnSuite' ?	16:58
hazmat	rogpeppe, thanks for the trip re -gocheck.list	16:59
hazmat	rogpeppe, that fails running both different test failure DeployLocalSuite.TestDeploySettingsError	16:59
hazmat	same error	16:59
rogpeppe	hazmat: good	17:00
rogpeppe	hazmat: now how about go test -gocheck.f 'DeploySuite\|^ConnSuite' ?	17:00
hazmat	rogpeppe, fwiw re Deploy\|Conn -> http://paste.ubuntu.com/6479877/	17:00
rogpeppe	hazmat: oops, that doesn't match what i thought it would	17:01
hazmat	rogpeppe, yeah.. it runs both still	17:02
hazmat	rogpeppe, you meant this ? go test -v -gocheck.vv -gocheck.f 'DeployLocalSuite\|!NewConnSuite'	17:02
rogpeppe	hazmat: ok, instead of juggling regexps, how about putting c.Skip("something") in the SetUpSuite of all the suites except NewConnSuite, ConnSuite and DeployLocalSuite?	17:02
rogpeppe	hazmat: no, i was trying to specifically exclude ConnSuite	17:03
hazmat	rogpeppe, thats what it does	17:03
hazmat	rogpeppe, that cli only runs deploy local suite tests	17:03
rogpeppe	hazmat: hopefully you can then run go test and it'll still fail	17:07
hazmat	rogpeppe, so it passes with 'NewConnSuite\|ConnSuite' and fails if i add \|DeployLocalSuite	17:08
rogpeppe	hazmat: then we can try skipping NewConnSuite	17:08
hazmat	k	17:08
hazmat	rogpeppe, fails with ConnSuite\|DeployLocalSuite	17:08
rogpeppe	hazmat: woo	17:09
rogpeppe	hazmat: does anything change if you comment out the "if s.conn == nil { return }" line in ConnSuite.TearDownTest ?	17:10
hazmat	rogpeppe, no.. still fails with ConnSuite\|DeployLocalSuite and that part commented out	17:13
rogpeppe	hazmat: ok, that was a long shot :-)	17:13
rogpeppe	hazmat: could you skip all the tests in connsuite, then gradually reenable and see when things start failing again?	17:14
hazmat	rogpeppe, sure	17:15
rogpeppe	hazmat: hold on, i might see it	17:15
rogpeppe	hazmat: try skipping just TestNewConnFromState first	17:15
rogpeppe	hazmat: oh, no, that's rubbish	17:16
rogpeppe	hazmat: ignore	17:16
rogpeppe	hazmat: but ConnSuite does seem to be an enabler for the DeployLocalSuite failure, so i'd like to know what it is that's the trigger	17:16
hazmat	rogpeppe, lunch break, back in 20	17:17
rogpeppe	hazmat: k	17:17
hazmat	back, and walking through the tests	17:20
hazmat	rogpeppe, interesting.. i added a skip to the top of every test method in ConnSuite, and it still fails when doing ConnSuite\|DeployLocalSuite	17:23
rogpeppe	hazmat: ah ha! i wondered if that might happen	17:24
rogpeppe	hazmat: what happens if you actually comment out (or rename as something not starting with "Test") the test methods in ConnSuite?	17:25
dimitern	rogpeppe, what i'm seeing when it happens on my machine, is that the SetUpTest (or SetUpSuite - can't remember exactly) is the thing that fails	17:27
rogpeppe	dimitern: which SetUpTest?	17:27
dimitern	which causes one of a few tests to fail	17:27
hazmat	rogpeppe, odd.. that gets a failure (deploymachineforceid), but effectively renaming all the tests negates the suite so... it should be equivalent to running DeployLocalSuite by itself.. which still works for me.	17:27
dimitern	rogpeppe, DeployLocalSuite - always	17:27
hazmat	hmm.. rerunning gets failure on DeployLocalSuite.TestDeployWithForceMachineRejectsTooManyUnits	17:27
rogpeppe	dimitern: i'm very surprised it's SetUpTest, because i don't think that checks for state connection closing	17:27
rogpeppe	hazmat: that's which which tests commented out?	17:28
hazmat	TearDownTest that fails for me	17:28
rogpeppe	s/which/with/	17:28
rogpeppe	dimitern: i think it's usually TearDownTest because that calls MgoSuite.TearDownSuite	17:28
hazmat	rogpeppe, yes thats' with tests prefixed with XTest, the suite doesn't show up at all in -gocheck.list	17:28
rogpeppe	TearDownTest, of couse	17:28
dimitern	hazmat, rogpeppe, ha, yes - it was TearDownTest in fact with me as well	17:29
rogpeppe	hazmat: interesting	17:29
rogpeppe	hazmat: so just to sanity check, you still see failures if you comment out or delete all except SetUpSuite and TearDownSuite in ConnSuite?	17:29
hazmat	k	17:30
dimitern	but can't reproduce it consistently - maybe one in 10 runs, but maybe not, and only when I run all the tests from the root dir	17:30
rogpeppe	dimitern: i can't reproduce it even that reliably	17:30
rogpeppe	dimitern: which why i get excited when someone can :-)	17:30
rogpeppe	which is why...	17:30
hazmat	rogpeppe, yeah.. stilll fails	17:31
hazmat	rogpeppe, even with everything commented but the suite setup/teardown	17:31
rogpeppe	hazmat: now we're starting to get suitably weird	17:31
hazmat	rogpeppe, and still passes if i run DeployLocalSuite in isolation	17:31
dimitern	hazmat, version of go?	17:31
hazmat	1.1.2	17:31
dimitern	maybe it's something related to parallelizing tests gocheck does?	17:32
rogpeppe	hazmat: again to sanity check, does it pass if you comment out the MgoSuite.(SetUp\|TearDown)Suite calls in ConnSuite?	17:32
hazmat	i can switch versions of go if that helps.. i was running trunk of go for a little while, but its pretty broken with juju (and go trunk)	17:32
dimitern	hazmat, no, i'm on 1.1.2 as well	17:32
rogpeppe	hazmat: please don't switch now!	17:32
hazmat	:-) ok	17:32
dimitern	:)	17:33
* dimitern brb		17:33
rogpeppe	hazmat: (though FWIW i'm using go 1.2rc2)	17:33
hazmat	rogpeppe, i had lots of issues with ec2/s3 and trunk.. (roughly close to 1.2rc2) couldn't even bootstrap	17:33
hazmat	which is why i walked back to 1.1.2	17:34
rogpeppe	hazmat: weird. i've had no probs.	17:34
rogpeppe	hazmat: i hope you filed bug reports	17:34
hazmat	rogpeppe, something for another time.. no i didn't.. i've fallen out bug reports.. i should get back into it	17:34
hazmat	rogpeppe, so that still fails with mgoSuite teardown/setup calls commented in ConnSuite	17:34
rogpeppe	hazmat: oh damn	17:35
rogpeppe	hazmat: now that's even weirder	17:35
rogpeppe	hazmat: what if you comment out the LoggingSuite calls?	17:35
rogpeppe	hazmat: (leaving ConnSuite as a do-nothing-at-all test suite)	17:36
hazmat	rogpeppe, sorry i think i missed something on the mgo teardown, revisiting	17:36
hazmat	i had commented it out in setup/teardown on test not suite	17:36
hazmat	commenting out setup/teardown on suite first	17:37
hazmat	er.. on test	17:37
hazmat	sinzui, re this bug, its reproducable for me with JUJU_ENV set.. currently marked incomplete https://bugs.launchpad.net/juju-core/+bug/1250285	17:38
_mup_	Bug #1250285: juju switch -l does not return list of env names <docs> <switch> <ui> <juju-core:Incomplete> <https://launchpad.net/bugs/1250285>	17:38
hazmat	okay.. still fails with test tear/setup commented.. moving on to mgo comments in suite tear/setup	17:38
hazmat	and still fails with mgo commented in connsuite tear/setup	17:39
rogpeppe	hazmat: given that there are no tests in that suite, i wouldn't expect test setup/teardown to make a difference	17:39
rogpeppe	hazmat: in connsuite suite setup/teardown?	17:39
hazmat	rogpeppe, yeah.. i suspect its actually an issue DeployLocalSuite, and running with any additional catches it.	17:39
sinzui	hazmat, I will test that bug again, oh and I think you and rogpeppe are looking at the mgo test teardown that affects me	17:40
rogpeppe	hazmat: i think so too, but i can't see how running LoggingSuite.SetUpTest and TearDownTest could affect anything	17:40
hazmat	rogpeppe, for ref here's my current connsuite http://paste.ubuntu.com/6480040/	17:40
hazmat	ConnSuite is basically empty with only suite tear/setup methods that do nothing	17:41
rogpeppe	hazmat: oh, i thought you were skipping NewConnSuite (and the other suites)	17:41
hazmat	rogpeppe, i'm only running go test -v -gocheck.vv -gocheck.f 'ConnSuite\|DeployLocalSuite'	17:41
rogpeppe	hazmat: that will still run NewConnSuite	17:41
rogpeppe	hazmat: could you comment out or delete or skip NewConnSuite?	17:42
rogpeppe	hazmat: or just comment out line 46	17:42
hazmat	oh..	17:42
hazmat	rogpeppe, sorry for the confusion then.. okay back tracking	17:42
rogpeppe	hazmat: np, it's so easy to do when trying to search for bugs blindly like this.	17:43
hazmat	rogpeppe, so correctly running just ConnTestSuite and DeployLocalSuite works	17:46
rogpeppe	hazmat: ok, so... you know what to do :-)	17:47
hazmat	indeed	17:47
rogpeppe	hazmat: thanks a lot for going at this BTW	17:47
rogpeppe	hazmat: it's much appreciated	17:47
hazmat	rogpeppe, np.. its annoying have intermittent test failures, esp with async ci merges	17:48
rogpeppe	hazmat: absolutely	17:48
mgz	natefinch, fwereade: I have pushed juju tagged 1.16.2 plus the juju-update-bootstrap command to lp:~juju/juju-core/1.16.2+update	17:51
fwereade	mgz, great, thanks -- I've got to be off, I'm afraid, would you please reply to the mail so ian knows where to go? and nate, please test when you get a mo	17:53
fwereade	natefinch, I'll try to be back on to hand over to ian at least	17:53
natefinch	fwereade: no problem	17:53
mgz	fwereade: replying to your hotfix branch email now	17:54
hazmat	rogpeppe, so its not an exact test failure, its some subset of the newconnsuite .. still playing with it, but this is the current minimal set of tests to failure http://pastebin.ubuntu.com/6480107/...	17:56
rogpeppe	hazmat: if you could get to a stage where you can't remove any more tests without it passing, that would be great	17:57
rogpeppe	hazmat: actually, i have a glimmer of suspicion. each time you run the tests, could you pipe the output through timestamp (go get code.google.com/p/rog-go/cmd/timestamp). i'm wondering if there's something time related going on in the background.	18:00
rogpeppe	hazmat: it's probably nothing though	18:01
hazmat	there a certain amount of randomness to it.. so it quite possible	18:02
hazmat	rogpeppe, so i think i have some progress. i can get both suites running reliably minus one test.. TestConnStateSecretsSideEffect	18:21
rogpeppe	hazmat: cool	18:22
rogpeppe	hazmat: so if you skip that test and revert everything else, everything passes reliably for you?	18:23
hazmat	just leaving that one test commented out the entire package test suite succeeds (running everything 5 times to account for intermittent)	18:25
hazmat	yeah.. reliably passes minus that test	18:25
rogpeppe	hazmat: great	18:25
* hazmat files a bug to capture		18:26
rogpeppe	hazmat: out of interest, what happens if you comment out the SetAdminMongoPassword line?	18:26
hazmat	fwiw filed as https://bugs.launchpad.net/juju-core/+bug/1255207	18:27
_mup_	Bug #1255207: intermittent test failures on package juju-core/juju <juju-core:New> <https://launchpad.net/bugs/1255207>	18:27
hazmat	rogpeppe, that seems to do the trick, . still verifying.. found a random panic.. on Panic: local error: bad record MAC (PC=0x414311) but unrelated i think	18:29
rogpeppe	hazmat: i think that's unrelated, but i have also seen that.	18:29
hazmat	rogpeppe, yeah.. passed 20 runs with that one liner fix	18:31
rogpeppe	hazmat: could you paste the output of go test -gocheck.vv with that fix please?	18:31
rogpeppe	pwd	18:31
hazmat	also verified i can still get the error with the line back in.. output coming up	18:32
hazmat	rogpeppe, http://paste.ubuntu.com/6480306/	18:33
rogpeppe	hazmat: ok, line 667 is what i was expecting	18:34
rogpeppe	hazmat: there's something odd going on with the mongo password logic	18:35
rogpeppe	hazmat: what version of mongod are you using, BTW?	18:35
hazmat	2.4.6	18:35
rogpeppe	hazmat: ahhh, maybe that's the difference	18:35
rogpeppe	hazmat: where did you get it from?	18:36
rogpeppe	hazmat: i'm using 2.2.4 BTW	18:36
hazmat	rogpeppe, 2.4.6 is everywhere i think..	18:36
hazmat	rogpeppe, its the package in saucy and its in cloud-archive	18:36
hazmat	tools pocket	18:36
rogpeppe	hazmat: ah, i'm still on raring	18:36
hazmat	cloud-archive tools pocket means that's what we use in prod setups on precise..	18:37
fwereade	rogpeppe, driveby: it's what we install everywhere and should be using ourselves	18:37
rogpeppe	fwereade: i know, but i had an awful time upgrading to raring (took me weeks to recover) and i've heard that saucy has terrible battery life probs	18:38
rogpeppe	fwereade: and i really rely on my battery a lot	18:38
hazmat	not really noticed anything bad	18:38
hazmat	the btrfs improvements are very nice	18:38
hazmat	with the new kernel	18:38
hazmat	battery life impact seems pretty minimal but maybe a few percent	18:39
hazmat	rogpeppe, alternatively you can just install latest mongodb	18:39
rogpeppe	hazmat: for the moment, i'd like to do that.	18:40
rogpeppe	hazmat: i can't quite bring myself to jump of the high board into the usual world of partially completed and broken OS installs	18:40
natefinch	rogpeppe: for one data point - my battery life isn't terrible.... it's hard for me to judge on the new laptop, but it seems within range of what is expected. perhaps slightly lower than what people were seeing on windows for my laptop, but not drastically so.	18:42
rogpeppe	natefinch: that's useful to know. i currently get about 10 hours, and a little more usage can end up as a drastic reduction in life	18:42
rogpeppe	natefinch: and certainly at one point in the past (in quantal, i think) i only got about 2 hours, and i really wouldn't like to go back there	18:43
rogpeppe	natefinch: still, my machine has been horribly flaky recently	18:43
hazmat	rogpeppe, understood, i used to feel that way.. atm. i tend to jump onto the new version during the beta cycle.. the qa process around distro has gotten much better, things are generally pretty stable during the beta/rc cycles.... i don't generally tolerate losing work do to desktop flakiness.	18:43
rogpeppe	natefinch: perhaps saucy might improve that	18:43
hazmat	rogpeppe, what's your battery info like?	18:43
rogpeppe	hazmat: battery info?	18:43
hazmat	rogpeppe, upower -d	18:45
hazmat	it will show design capacity vs current capacity on your battery if your battery reports it through acpi	18:45
rogpeppe	hazmat: cool, didn't know about that	18:46
rogpeppe	hazmat: http://paste.ubuntu.com/6480352/	18:46
hazmat	ummm.. you should be getting way more than 2hrs	18:46
rogpeppe	hazmat: i do, currently	18:47
hazmat	rogpeppe, i use powertop to get a guage of where my battery usage is going	18:47
rogpeppe	hazmat: but some time in the past i didn't	18:47
rogpeppe	hazmat: currently i get about 10h	18:47
hazmat	and i have some script i use when i unplug to get extra battery life by shutting down extraneous things.	18:47
rogpeppe	hazmat: which means i can hack across the atlantic, for example	18:47
rogpeppe	hazmat: usually i shut down everything and dim the screen, which gets me a couple more hours	18:48
hazmat	yeah.. getting off topic.. but switching out to saucy really shouldn't do much harm to battery life, i havent really noticed anything significant (intel graphics / x220)	18:50
rogpeppe	hazmat: do you use a second monitor?	18:54
natefinch	rogpeppe: multi montitor support is not ubuntu's strong suit. I just had to putmy laptop to sleep and then open it back up after unplugging two monitors, otherwise my laptop screen was blank :/	18:56
natefinch	rogpeppe: or at least, it's not a strong suit on the two recent laptops I've had	18:56
rogpeppe	natefinch: it works ok for me usually, except the graphics driver acceleration goes kaput about once a day	18:57
natefinch	rogpeppe: it's only really a problem for me when I add or remove monitors. Steady state works fine for me.	18:57
rogpeppe	natefinch: adding and removing works ok usually. i was really interested to see if hazmat had the same issue as me, 'cos his hardware is pretty similar	18:58
natefinch	rogpeppe: ahh	18:59
natefinch	rogpeppe: what laptop do you have, anyway? 10 hours is impressive	18:59
rogpeppe	natefinch: lenovo x220	18:59
natefinch	very nice. I get about 4-5 hours on battery... I probably should have gone for the bigger battery in this thing that would have given me 6-8.	19:01
rogpeppe	natefinch: you've got a much bigger display, i think	19:01
hazmat	rogpeppe, i do use a second monitor	19:01
natefinch	rogpeppe: yeah, mine's 15.6" and hi res	19:02
hazmat	rogpeppe, i typically only use one external screen and turn off internal.. i used to do two internal screens (with docking station)	19:02
hazmat	er.. two external	19:02
hazmat	works pretty well for me	19:02
natefinch	one screen, wow, I wouldn't be able to do it :)	19:02
rogpeppe	hazmat: hmm, i think i'm the only person that ever sees the issue	19:02
hazmat	natefinch, one .. 24 inch screen works well enough for me.	19:03
rogpeppe	hazmat: i reported the bug ages ago, but i probably reported it to the wrong place. never saw any feedback.	19:03
hazmat	natefinch, i've had that issue, the screen is still there, though.. i just enter a password to get past the unrendered screen saver password, and i'm back to the desktop.. its basically a wake from monitor shutdown..	19:04
hazmat	er.. monitor power saving mode	19:04
natefinch	hazmat: yeah, if I close the laptop lid and reopen it, it seems to sort itself out. Just kind of annoying.	19:04
hazmat	not very common anymore, but still annoying.. and led to me accidentally doing password into active window (irc ) a few weeks ago.	19:04
hazmat	rogpeppe, the x220 tricks out quite nicely.. i added an msata card for lxc containers and 16gb of ram as upgrades this year.. also picked up the slice battery, but not clear that was as useful.. but with it roughly 16hrs of battery life (my is bit more degraded then yours on capacity)	19:06
natefinch	hazmat: haha, I did the same thing, into an IT-specific facebook group, no less	19:06
hazmat	a bit annoyed there moving to max 12gb of ram on the x240 and x440	19:06
hazmat	natefinch, the m3800 / xps looks pretty nice, just not sure about that screen res issue on the os level. i assume your just playing around with the scaling to make things usable?	19:07
natefinch	hazmat: yeah, I set the OS font 150%, set the cursor to be like triple normal size, and zoom in on web pages.... it's actually not terrible	19:08
natefinch	hazmat: and it is a really really sharp display	19:09
natefinch	hazmat: and the build quality overall is exceedingly nice. It feels really sturdy, but surprisingly thin and light for being a pretty beefy machine	19:10
natefinch	btw, is there a way to get ubuntu to turn off the touchpad while I'm typing? I palm-click constantly	19:10
rogpeppe	natefinch: msata is just a solid state drive, right?	19:10
natefinch	rogpeppe: msata is just the interface type and size, but yes, there's no spinning msatas that I know of.	19:11
hazmat	yeah.. too small for spinning rust	19:11
natefinch	rogpeppe: electrically, it's just a different shaped plug from regular sata.... exact same specs etc, you can mount an msata in a regular sata drive by just hooking up the wires correctly	19:12
rogpeppe	hazmat: so there's room an an x220 for one of those in addition to the usual drive?	19:12
hazmat	rogpeppe, yes	19:12
natefinch	ahh, cool, yeah, my xps15 has that too	19:12
natefinch	though at the expense of the larger battery	19:12
hazmat	rogpeppe, i dropped a 128gb plextor m5 in.. needs a keyboard removal though, but its pretty straightforward, youtube videos cover it	19:12
natefinch	er rather, the 2.5" drive is at the expense	19:12
rogpeppe	hazmat: cool. i'm a little surprised there's space in there!	19:13
hazmat	rogpeppe, there's some additional battery draw, in terms of finding a perf compromise.. the msatas are super tiny	19:13
hazmat	rogpeppe, http://www.google.com/imgres?imgurl=http://www9.pcmag.com/media/images/357982-will-ngff-replace-msata.jpg%3Fthumb%3Dy&imgrefurl=http://www.pcmag.com/article2/0,2817,2409710,00.asp&h=275&w=275&sz=64&tbnid=D6nAHdfDO9YioM:&tbnh=127&tbnw=127&zoom=1&usg=__fRuk3l4RfCrNCEY6gQ32RZaHaA8=&docid=uliVfmMKZbEonM&sa=X&ei=3fKUUrXUDaiusASxiYCYDw&ved=0CDwQ9QEwAw	19:13
hazmat	ugh.. google links	19:13
natefinch	heh	19:14
sinzui	I reported Bug #1255242 about a CI failure that relates to an old revision. Upgrading juju on hp cloud consistently breaks mysql	19:24
_mup_	Bug #1255242: upgrade-juju on HP cloud broken in devel <ci> <hp-cloud> <regression> <upgrade-juju> <juju-core:Triaged> <https://launchpad.net/bugs/1255242>	19:24
natefinch	dammit, my mouse cursor disappeared.	19:43
jam	sinzui: a comment posted to bug #1255242	19:47
_mup_	Bug #1255242: upgrade-juju on HP cloud broken in devel <ci> <hp-cloud> <regression> <upgrade-juju> <juju-core:Triaged> <https://launchpad.net/bugs/1255242>	19:47
jam	I need to go to bed now	19:47
jam	sinzui: I don't doubt we have a problem, but from all indications this isn't an upgrade bug, because Upgrade is never triggered in that log file	19:48
sinzui	jam, yes, the issue is confusing, which is why we spent so long looking into it ourselves	19:49
jam	Line 50 is: 50:juju-test-release-hp-machine-0:2013-11-26 15:06:39 DEBUG juju.state.apiserver apiserver.go:102 <- [1] machine-0 {"RequestId":6,"Type":"Upgrader","Request":"SetTools","Params":{"AgentTools":[{"Tag":"machine-0","Tools":{"Version":"1.17.0-precise-amd64"}}]}}	19:50
jam	which is machine-0 telling itself that its version is 1.17.0	19:50
jam	sinzui: ERROR juju runner.go:220 worker: exited "environ-provisioner": no state server machines with addresses found	19:51
jam	is probably a red herring	19:52
jam	I think it is the environ-provisioner waking up before the addresser	19:52
sinzui	jam, thank you for the comment. I think I see a clue. The bucket has a date str in it and we increment it because I think it can contain cruft. That date is not even close to now. So out HP tests might be dirty. It also relates to our concern that we want juju clients to bootstrap matching servers.	19:52
jam	so it tries to see what API servers to connect to, but the addresser hasn't set up the IP address yet	19:52
* sinzui arranges for a test with a new bucket		19:52
jam	sinzui: 2013-10-10 does look a bit old	19:53
* jam goes to bed		19:53
jam	sinzui: ok, I thought I was going.... I'm all for being able to specify what version you want to bootstrap "juju bootstrap --agent-version=1.16.3" or something like that. I don't think users benefit from it over getting the latest patch (1.16.4) when their client is out of date.	19:55
sinzui	jam, fab. I will arrange another play of the test with a clean bucket	19:56
rogpeppe	wallyworld_, fwereade: i've sent an email containing a branch and some comment on my progress	19:59
* rogpeppe is done for the day		20:06
rogpeppe	g'night all	20:06
hazmat	woot just got 666666 otp 2fa	20:36
rick_h_	sinzui: abentley do you guys have a good jenkins backup/restore config setup in place?	20:49
rick_h_	hazmat: lol, now if only it was fri-13th	20:49
abentley	rick_h_: No.	20:50
rick_h_	abentley: ok so much for cribbing :P	20:50
abentley	jam: We never released 1.16.4 because it would have introduced an API incompatibility. It's not safe to assume that agent 1.16.4 is compatible with client 1.16.3. This is not a theoretical risk. It very nearly happened.	21:02
natefinch	mgz: you around?	21:48

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!