/srv/irclogs.ubuntu.com/2014/08/12/#juju-dev.txt

perrito666	sinzui: news?	00:09
perrito666	ok, something is very very broken with aws	01:21
=== Ursinha is now known as Ursinha-afk
waigani	axw: do you mean just setting state-server: true in tests where it is set to false?	01:45
axw	waigani: yes, the ones where it matters anyway	01:45
axw	leaking details of the dummy provider into the command is not great	01:45
axw	AFAIK, the state-server thing is just there to speed up the tests	01:46
axw	but if we need it, then the tests should change	01:46
waigani	axw: yeah for sure, it's ugly. I assumed the state-sever was set to false in tests for a reason	01:46
axw	waigani: there's lots of wrong in the code, don't be afraid to question it ;)	01:47
waigani	axw: so you're saying the state-server setting should be deprecated?	01:47
axw	no, just change it to true for the tests where we run the bootstrap command	01:48
=== Ursinha-afk is now known as Ursinha
axw	alternatively, do the TODO now... not sure which one is less work really	01:48
waigani	sigh	01:49
waigani	kludges are less work ;)	01:49
waigani	I had to look that up	01:49
axw	yes, often less work in the short term	01:49
waigani	and found this image: http://en.wikipedia.org/wiki/Kludge#mediaviewer/File:Miles_Glacier_Bridge,_damage_and_kludge,_1984.jpg	01:49
axw	lol	01:50
waigani	sums it up pretty nicely haha	01:50
waigani	I should put that on my coding profile ;)	01:50
* wwitzel3 flips his desk		02:43
axw	wwitzel3: what'cha doing?	02:48
wwitzel3	trying to get rsyslog rotation for all-machines.log working	02:49
axw	ah, fun times	02:49
wwitzel3	since all day	02:49
wwitzel3	:/	02:49
wwitzel3	I was in bed attempting to go to sleep and I was have weird dreams about rsyslog in human for bullying and taunting me.	02:50
wwitzel3	s/for/form	02:50
wwitzel3	lol	02:50
axw	well, that's not cool... I don't recall having personified a software package before	02:51
wwitzel3	haha, happens to me all the time	02:51
axw	did it look like this guy? https://plus.google.com/+RainerGerhards	02:52
axw	:)	02:52
wwitzel3	though my favorite dream was I was travel through code and I kept hitting this unhandled exception (which was the same one that was happening in production). And when I woke up I had instant thought of how to fix it and got it resolves in 5 minutes.	02:53
menn0_	axw: bad news... the "State remove setmongopassword" change is what's causing this CI blocker: bug #1355320	02:53
axw	god damnit	02:53
menn0_	axw: I've been bisecting my way through recent changes and that's the one	02:53
axw	thanks menn0_	02:53
axw	I'll take it over if you like	02:54
menn0_	sure	02:54
menn0_	I'll update the ticket with the best repro details I have	02:54
menn0_	give me a few minutes to see if I can simplify them a bit further	02:54
axw	menn0_: thanks	02:55
axw	weird, I'm certain I tested this...	03:09
waigani	menn0_: how do you bisect your way through the changes?	03:10
menn0_	axw: after issuing ensure-availability did you wait for the new state servers to hit "started"?	03:11
axw	menn0_: pretty sure I did, but it was a little while ago and my memory isn't great	03:12
menn0_	I'm seeing the new machine agents getting stuck in pending because they can't connect to the API	03:12
axw	yeah, I see the same thing now	03:12
menn0_	axw: I've updated the ticket and assigned it to you. All yours!	03:15
axw	menn0_: thanks	03:16
menn0_	waigani: the process I used is:	03:16
menn0_	reproduce the problem with master	03:16
menn0_	use git log to see the changes between 1.20.1 and master	03:16
menn0_	git checkout <some rev in between>	03:17
menn0_	git install ./..	03:17
menn0_	try to reproduce the problem	03:17
menn0_	does it exist?	03:17
menn0_	no: the problem revision is after this on	03:17
menn0_	yes: the problem revision is before this one (or it IS this one)	03:17
menn0_	you can do a straight binary search	03:17
menn0_	but if you have some idea of where the problem lies, like in this case, you can be a bit smarter about picking revs to narrow it down more quickly.	03:18
menn0_	keeping good notes on what you've done helps a lot too	03:19
waigani	menn0_: ah right, thanks for the explanation	03:19
menn0_	if the reproduction steps were completely automated (which they can be but I didn't bother) then you could get "git bisect" to do all the work for you.	03:20
menn0_	what i've described is exactly what it does	03:21
menn0_	actually, looking at the docs "git bisect" can also be used to assist with manual searches. I should have used it.	03:23
menn0_	previously I've only ever used "git bisect run" which does automated searching	03:23
waigani	menn0_: yeah I was thinking that should be able to be automated	03:24
menn0_	waigani: the fiddly part in this case is writing some code to wait at the right times and parse the status output	03:26
menn0_	waigani: all doable though and I'm sure the helper functionality in juju-ci-tools would help too.	03:27
waigani	menn0_: setting up a process that could automatically find the offending commit for a CI bug would save us a lot of time	03:28
waigani	potentially reverting it too	03:28
menn0_	I think a semi-automatic process is probably best	03:29
menn0_	the CI tests often fail for infrastructure related reasons	03:29
waigani	right, well it could add suspicious commits to the bug comments	03:30
menn0_	you wouldn't want these to trigger reverts or long running investigations	03:30
menn0_	yes	03:30
stokachu	is there any examples where juju actions are being used in practice?	03:36
menn0_	axw: you associated bug 1347715 with the "don't enter upgrade mode unnecessarily" PR	03:37
axw	menn0_: yep	03:37
menn0_	that PR wasn't intended to solve that bug :)	03:37
axw	menn0_: see the latest error message at the bottom of the bug	03:38
axw	that LP bug has morphed	03:38
axw	the latest one was due to "upgrade in progress"	03:38
menn0_	all that PR does is allow the machine agent workers to start up a little faster, and avoid some unnecessary log noise	03:38
menn0_	I can see how it might help with that bug	03:39
menn0_	but I wonder if that's the whole story	03:39
menn0_	if it is then, great	03:39
stokachu	there doesnt seem to be an action-set either is this feature not complete yet?	03:39
menn0_	but I'm not sure	03:39
axw	stokachu: still a work in progress AFAIK	03:40
stokachu	axw, ok cool	03:40
menn0_	axw: is it worth reverting 3b6da1d429bff627a636ca4512c1ae8230f26539 or do you think you'll have this sorted quickly	03:42
* menn0_ would like to get CI unblocked		03:42
axw	menn0_: I guess it's no big deal to revert it in the mean time	03:44
axw	I'll do that...	03:44
jcw4	stokachu: yes, it's still in-progress	03:45
jcw4	stokachu: bodie_ is primarily working on action-set and action-get	03:45
=== menn0_ is now known as menn0
=== uru_ is now known as urulama
axw	menn0: reverted	04:15
menn0	axw: yep, saw that. cheers	04:15
menn0	axw: does the bug get marked as "Fix committed" now?	04:15
axw	menn0: yeah I've done that	04:16
axw	I'll keep looking into it tho	04:16
menn0	cool	04:16
menn0	of course	04:16
axw	this is one time where I really wish I hadn't rebased	04:16
axw	pretty sure I tested it and it all worked, then I rebased and it's broken since then	04:17
menn0	axw: so you think it's some interaction with this change and some other change that got rebased in?	04:19
axw	I think so	04:20
menn0	axw: in other news, that manual provider problem is still there despite the "don't enter upgrade mode unnecessarily" change	04:20
axw	sigh	04:20
axw	all good news today :)	04:20
* menn0 reopens the bug		04:21
axw	hold up	04:21
axw	it's still publishing?	04:21
axw	menn0: ^^	04:21
axw	or did you test it manually?	04:21
axw	eh, never mind	04:21
axw	hadn't refreshed	04:21
menn0	http://juju-ci.vapour.ws:8080/job/manual-deploy-trusty-ppc64/523/	04:22
menn0	axw: shall I take 1347715 for a bit?	04:26
axw	menn0: that'd be great, thank you	04:27
menn0	axw: cool	04:27
menn0	I need to go out and pick up some furniture but I'll keep going with it once I'm back	04:27
=== uru_ is now known as urulama
menn0	axw: I'm back but need to deal with the kids for a bit	05:26
axw	menn0: no worries	05:26
menn0	I'll continue with this manual provider issue later this evening	05:26
* axw hopes it isn't something else he's broken		05:27
axw	menn0: I'm getting "auth fails" even without my change	06:25
axw	:/	06:25
axw	hrm, 2nd shot worked	06:41
* axw scratches head		06:41
mattyw	morning all	08:16
TheMue	heya	08:26
dimitern	morning	08:26
voidspace	TheMue: morning	08:28
voidspace	dimitern: morning	08:28
menn0	axw: I'm back again (kids were a nightmare to get to bed)	08:55
axw	menn0: heya. fun :(	08:56
axw	menn0: so I've reworked my previous PR and got something working now	08:56
axw	ignore my comment from before, I think I hadn't uploaded the right tools	08:56
menn0	axw: ok that's good to hear	08:56
menn0	axw: let me know if you need a review	08:57
axw	the problem with the old one was that mongo.SetAdminMongoPassword was also doing Login straight after	08:57
dimitern	so is anyone working on the blocker bug https://bugs.launchpad.net/juju-core/+bug/1355324 ?	08:57
axw	menn0: it can wait, it's well after EOD there...	08:57
axw	not I	08:57
menn0	dimitern: no I don't think so	08:58
menn0	other blockers have been keeping us busy	08:58
dimitern	menn0, how about the other one assigned to you?	08:58
voidspace	launchpad won't even show me the bug	08:58
dimitern	voidspace, oh, which one?	08:59
voidspace	I don't think it's my internet being rubbish this time	08:59
voidspace	1355324	08:59
voidspace	page won't load	08:59
dimitern	voidspace, :( aw	08:59
voidspace	just launchpad being slow I think	08:59
menn0	voidspace: weird. it's working for me	08:59
voidspace	odd	08:59
menn0	I'm going to look at #1347715 for a bit longer	09:00
menn0	but someone else might have to take over depending how far I get with it	09:00
mattyw	voidspace, I'm having trouble with lp, canonical.com and ubuntu.com this morning	09:03
mattyw	voidspace, you on bt?	09:03
voidspace	mattyw: over here in Romania? I doubt it :-)	09:03
voidspace	mattyw: don't know who the isp is to be honest	09:03
voidspace	but today most of the internet works well, which is better than previous days, except for launchpad	09:04
mattyw	voidspace, cool - how long you there for?	09:04
voidspace	I'll try canonical.com and ubuntu.com as well	09:04
voidspace	mattyw: till the end of August	09:04
voidspace	mattyw: visiting wife's family	09:04
mattyw	voidspace, nice, hope you have a great time	09:04
voidspace	canonical.com loads fine	09:04
voidspace	mattyw: thanks	09:04
voidspace	mattyw: I'm mostly working	09:05
voidspace	mattyw: we're taking a week at the end of August to visit seaside and mountains though	09:05
voidspace	should be good	09:05
voidspace	yep, launchpad not loading for me at all at the moment	09:05
voidspace	tried two browsers	09:05
voidspace	I wonder if it's a dns issue, I can't ping it either	09:11
mattyw	voidspace, I was able to get to canonical.com by changing dns servers but everything else is down for me	09:13
menn0	axw: well this looks like a problem. using the manual provider:	09:43
menn0	2014-08-12 09:36:54 INFO juju.mongo open.go:104 dialled mongo successfully on address "127.0.0.1:37017"	09:43
menn0	2014-08-12 09:36:54 DEBUG juju.worker.logger logger.go:45 reconfiguring logging from "<root>=DEBUG" to "<root>=WARNING;unit=DEBUG"	09:43
menn0	2014-08-12 09:36:55 ERROR juju.worker runner.go:219 exited "machiner": machine-0 failed to set status started: cannot get machine 0: EOF	09:43
axw	menn0: yep, but what would be causing the EOF?	09:43
menn0	that's the last thing in the machine-0 log following bootstrap	09:44
axw	I saw it, but it's not a very enlightening error message ;)	09:44
menn0	I'm going to crank up the log level during bootstrap and see what I can see	09:44
axw	cool	09:44
menn0	the error happens after the root logger goes to WARNING	09:44
axw	ah right	09:44
axw	yeah, I just set my environments' logging-config="<root>=DEBUG"	09:45
menn0	and I might have to do the bisect thing again to see if I can track down the rev	09:45
menn0	this one has been broken for a while so there could be a lot of revs...	09:46
mattyw	folks - is someone able to help me out with a couple of things around the unit type?	10:00
mattyw	^^ it appears (at least in some tests) that the CharmURL in a unit doesn't have a value (== nil)	10:00
menn0	axw: alright	10:01
menn0	now I have a nice long panic	10:01
axw	menn0: cool	10:01
axw	I have to run	10:02
perrito666	morning	10:02
menn0	axw: no problems	10:02
menn0	I need to stop soon too	10:02
axw	menn0: if you get sick of looking at that, can you please attach the panic to the bug and I'll take a look tomorrow	10:02
menn0	axw: I'll try to get someone else to look in the mean time too	10:02
menn0	but I will add everything I have to the bug either way	10:03
axw	cool	10:03
TheMue	dimitern: a cable technician just came, could be that I’m a bit later at our hangout	10:38
dimitern	TheMue, ok, np	10:38
voidspace	dimitern: are we postponing?	10:50
dimitern	voidspace, let's wait for TheMue a few minutes	10:50
voidspace	dimitern: sure	10:50
TheMue	dimitern: I’m there, are you done?	10:56
voidspace	TheMue: we didn't start	10:57
menn0	right... I'm done	11:24
menn0	who wants to take over this CI blocker: 1347715	11:24
menn0	I've done a lot of the legwork. There's repro instructions and other findings attached to the ticket.	11:24
menn0	I'm pretty close but I haven't quite nailed the actual bug yet.	11:25
menn0	no takers regarding the above?	11:43
tasdomas	could somebody take a look at https://github.com/juju/juju/pull/490 ?	11:51
mgz	menn0: I read through the bug, did you narrow down when it was introduced, or not get that far yet?	11:52
menn0	mgz: well as per my last comment, it's definitely since 1.20.2	11:52
menn0	but I haven't narrowed it further than that	11:52
menn0	given that we have a fair simple repro, git bisect could help?	11:53
mgz	yeah, just a bit slow, right, as it involves reprovisioning a machine?	11:54
mgz	or can you do it on a dirty one?	11:54
menn0	focussing on changes that involved the manual provider might also help narrow things down more quickly	11:54
menn0	no you don't need to reprovision	11:54
menn0	because it's the manual provider	11:54
menn0	just make sure you do a juju destroy-environment --force --yes after each run	11:54
menn0	that's pretty good at getting rid of most of the previous run	11:55
menn0	it's not exactly fast (the bootstrap still takes a while) but it's manageable	11:55
menn0	limiting the upload series to just trusty also helps a bit	11:56
menn0	it's midnight. I really need to go to bed	11:56
mgz	menn0: okay, good night	11:57
menn0	have a good one	11:57
fwereade	tasdomas, LGTM	11:59
tasdomas	fwereade, thanks	12:20
=== ChanServ changed the topic of #juju-dev to: https://juju.ubuntu.com \| On-call reviewer: see calendar \| Open critical bugs 1347715, 1355324
fwereade	natefinch, ping -- when you have a moment I'd like to chat about transactions	13:41
natefinch	fwereade: how about now? :)	13:41
fwereade	natefinch, perfect :)	13:42
fwereade	natefinch, in irc for a bit in case anyone else sees something ineresting go past?	13:42
natefinch	fwereade: sure	13:42
fwereade	natefinch, basically I'm worried about interactions between the two systems as we switch	13:42
natefinch	fwereade: yep	13:43
fwereade	natefinch, (oh wait, it might actually not be a problem, let's keep going for a bit)	13:43
fwereade	natefinch, in particular, assertions	13:43
fwereade	natefinch, now I think that we will write to every affected document --including those we just assert on -- as part of a transaction	13:44
fwereade	natefinch, if that's the case I think we're good	13:44
fwereade	natefinch, the problem isn't at crossover time	13:44
fwereade	natefinch, it's with using toku on its own	13:44
fwereade	natefinch, in particular, assertions about particular document states	13:45
fwereade	natefinch, I think that within any transaction we need to write to any document on whose state we depend	13:45
natefinch	fwereade: I haven't actually done a lot of research into how toku transactions work.... do you not get a consistent view of the database once you enter a transaction? Or only the ones that you write to?	13:47
fwereade	natefinch, and that, best-case, this is going to lead to frequent document deadlocks	13:47
fwereade	natefinch, consistent state doesn't help though, I think	13:48
natefinch	ahh right, if someone else changes the doc out from under you	13:48
fwereade	natefinch, it's about needing all that state to still satisfy certain conditions when the txn lands	13:48
natefinch	right	13:48
fwereade	natefinch, writing to the document will help there, it pulls it into the transaction if you like	13:49
fwereade	natefinch, but I think it seriously increases the already-present risks of deadlock in the DB	13:49
fwereade	natefinch, I read docs X and Y, you read Y and X	13:50
natefinch	right	13:50
natefinch	classic deadlock	13:50
natefinch	fwereade: have you read up on their transactions? I don't feel qualified to talk about them without doing some reading. I'd just been doing some testing to make sure it was even possible to use their DB.... but hadn't actually researched the way their transactions worked. hazmat had done the work there, I believe.	13:51
fwereade	natefinch, I've read like 1 small white paper	13:52
* natefinch is reading the white paper now		13:52
fwereade	natefinch, so anyway -- I think it will work fine with mgo txns underneath, for a given value of fine, if not actually very performant	13:54
fwereade	natefinch, although	13:54
hazmat	fwereade, its mvcc fwiw	13:55
hazmat	consistent reads, so a simple query satisifes mgo conditions	13:55
hazmat	fwereade, more of concern is the use of write hotspots in the current codebase	13:56
fwereade	hazmat, are you thinking more of "presence" or of things like service documents?	13:56
hazmat	fwereade, service docs	13:56
fwereade	hazmat, right, I think those things are a nexus of fuckery in either txn system	13:57
hazmat	fwereade, they exist due to async gc ? or..	13:57
hazmat	fwereade, latent gc can use simple queries afaics.. i tended to regard them as offshots of the extant txn sys in conjuction w/ lifecycle management	13:58
fwereade	hazmat, well service documents get cleaned up with the last unit to be removed from a dying service	13:58
fwereade	hazmat, but that's managed by a refcount on the service document	13:58
hazmat	fwereade, why not just a async gc cleaner job in the state server?	13:59
Beret	wallyworld, any update on that flag to prevent a failed bootstrap from tearing itself down?	13:59
hazmat	fwereade, doesn't that mean machine loss of last unit leaves to stuck service otherwise?	14:00
hazmat	minus using destroy-machine --force to try and post clean up after the fact from the api server	14:01
fwereade	hazmat, no? when a unit gets removed either by the machine agent or by forced removal of the machine, the last one cleans up anything that depends on it	14:01
fwereade	hazmat, or indeed "yes" depending on perspective	14:01
natefinch	Beret: we've been fighting some critical bugs lately, and most of the team leads were on a sprint last week, so it hasn't gotten done yet, AFAIK. But I think it could be done this week if we have someone free to work on it.	14:02
Beret	natefinch, ok, thanks	14:03
natefinch	Beret: we'll try to get it in... it's something we want for ourselves, too.	14:03
Beret	natefinch, it's not more important than real bugs, I just wanted to make sure it hadn't gotten in and we just didn't know it	14:03
fwereade	hazmat, to step back a moment	14:03
* hazmat nods		14:03
hazmat	meeting.. bbiab	14:03
fwereade	hazmat, in either txn system, our performance is largely dependent on the areas of document space touched by a given txn -- thus designing db operations that minimise overlap is important either way	14:04
hazmat	fwereade, agreed	14:04
fwereade	hazmat, we have occasionally done a really poor job of designing these things in the past	14:05
fwereade	hazmat, but now we have schema changes we have a path to mitigate these, and we can expect to see benefits from doing this in either system	14:05
fwereade	hazmat, however, we won't be able to eliminate overlaps	14:06
fwereade	hazmat, our job is structuring things such that overlaps are (1) rare and (2) not too disruptive	14:07
fwereade	hazmat, and I am at the moment mainly thinking about the impact on how we actually have to write code	14:09
fwereade	hazmat, natefinch: so if we wrap a mgo txn in a toku txn...	14:10
fwereade	hazmat, natefinch: let's pretend for a moment we get all our info from a consistent snapshot, and craft a txn assuming the truth of that stuff, and then we execute that txn	14:12
fwereade	hazmat, natefinch: only when we execute that txn do we hit the docs and grab the locks, and if we deadlock at that point we may have trouble	14:13
fwereade	hazmat, natefinch: if the txn runner fails out for this reason, what exactly is the impact on the code trying to craft the txn and retry in the face of contention?	14:14
fwereade	hazmat, go back to the beginning of the whole set of operations and start crafting txns again from scratch?	14:16
fwereade	hazmat, in particular I fear that is going to be a major source of gray goo transactions that end up locking large chunks of the DB, deadlocking, failing out, and doing the same thing over and over again	14:17
fwereade	natefinch, sorry, I am also interested in your thoughts on the above	14:18
natefinch	hah ok	14:18
natefinch	I was going to give my thoughts anyway ;)	14:18
fwereade	natefinch, just realised I'd stopped badging you at some point	14:18
fwereade	(but that doesn't mean I'm going to stop badgering you, ho ho)	14:20
* fwereade looks shamefaced		14:20
=== jog_ is now known as jog
natefinch	haha	14:20
natefinch	fwereade: I gotta run in a couple minutes unfortunately..... but... do we need mgo transactions if we have toku transactions? Can we just stop using mgo transactions?	14:21
fwereade	natefinch, yes, but that's a lot of stuff to rewrite rather than wrap	14:22
natefinch	right	14:22
fwereade	natefinch, there will certainly be things that are easier to do in a pure-toku world	14:22
natefinch	yep	14:22
fwereade	natefinch, I'm just worried that mixing the two will actually have a pathologically awful impact	14:23
natefinch	I gotta run, sorry, I'll keep thinking and will read the channel history.. Back in an hour-ish	14:23
fwereade	natefinch, cheers, take care	14:23
hazmat	fwereade, per the acid doc, we could divert mgo transactions to toku	14:24
fwereade	hazmat, I can sort of see it but it's a bit fuzzy in my mind	14:27
fwereade	hazmat, we still have to do what I said, I think?	14:27
fwereade	hazmat, we can be a bit more proactive about writing to docs to take their locks a bit earlier than we could otherwise	14:27
fwereade	hazmat, and I daresay we can convert failure to do so into ErrRefresh, and make all the transactions handle it?	14:28
fwereade	hazmat, but wouldn't we have to actually start a fresh toku txn anyway?	14:29
fwereade	hazmat, so in a multi-step toku operation, if the Nth step fails we back everything out and start again?	14:30
fwereade	hazmat, heh, maybe even track and try to grab locks on everything we thought we needed last time through?	14:33
fwereade	hazmat, it all feels potentially really yucky	14:34
=== Ursinha is now known as Ursinha-afk
fwereade	natefinch, more blithering above ^^	14:40
hazmat	fwereade, sorry still in meeting, bbiab	14:44
fwereade	hazmat, np, I'm around for a bit I think	14:44
=== Ursinha-afk is now known as Ursinha
natefinch	fwereade: back	16:01
fwereade	natefinch, I need to be off soon, but read back and see if anything resonates or inspires a response	16:23
fwereade	natefinch, just type at me in here if I'm gone, I'll see it soon	16:24
hazmat	fwereade, sorry my meeting ran over time.. and into another meeting.. i'll capture notes for discussion tomorrow if your not around in a bit	16:26
hazmat	fwereade, nutshell locks are implicit with mods to a doc in a txn.. basically we just take a op runner and do it in toku txn (begin, end txn) with catch on lock and retry behavior.	16:29
natefinch	fwereade: no problem. Trying to do a few things at once here. I don't know that grabbing locks early has any effect in toku... if two threads are both trying to grab locks early, it doesn't really buy us much. My preference would be to just make the code straightforward and do what it's supposed to do, and if something else gets into the DB first, the TX will have to retry.	16:30
hazmat	fwereade, re backout, its implicit at that commit, failed lock is rollback, and error to app to handle, at which point we retry similiar to runner loop now	16:30
hazmat	there is no grabbing a lock, its implied by writing to a doc during a txn.	16:30
fwereade	hazmat, natefinch: ok, this sounds broadly sane, I am worried that we have some funky layering issues to work around wrt actually managing rollbacks	16:31
natefinch	that's certainly possible	16:31
fwereade	hazmat, natefinch: well I am contending that we do need document locks for those docs that in mgo/txn we would assert on	16:31
fwereade	hazmat, natefinch: but that we will have to lock them ourselves by, as you say, writing to them in a transaction	16:31
hazmat	fwereade, we don't because we're operating in a mvcc world with implicit write level locks.	16:32
fwereade	hazmat, right	16:32
hazmat	ie read committed, or serialized if preferred (both options avail)	16:32
fwereade	hazmat, which is great for ensuring that conditions hold for the documents we're writing	16:32
hazmat	fwereade, right.. toku should be a nice runner for the extant mgo transactions since conditions are explicitly specified and re-runnable	16:32
natefinch	fwereade: right... my reading of MVCC is... you don't have to worry about basing your behavior on reads that get out of date, because if they get out of date, one of the two transactions aborts and retries	16:32
fwereade	hazmat, natefinch: so once you start a transaction, toku tracks everything you read? and aborts your txn if someone writes to one of those docs?	16:34
fwereade	hazmat, natefinch: that's the only way I could see it aborting a transaction in that situation	16:34
hazmat	fwereade, again depends on isolation level.. read serialized you'll read the copy that was current at the time of txn begin	16:35
hazmat	fwereade, read committed, means you read other docs current as of the time of read	16:35
hazmat	as opposed to our read dirty model now.. ie. read partially committed state	16:35
fwereade	hazmat, natefinch: I don't see how either read serialized or read committed actually helps us -- I'm not interested in either of those two states	16:36
hazmat	rephrasing .. current == latest commmitted revision of the doc	16:36
hazmat	fwereade, their rather important .. but perhaps we should take a step back.. what's the concern?	16:37
fwereade	hazmat, if I have a txn that should only go ahead if a property continues to hold for some doc not in the txn	16:37
fwereade	hazmat, then it's philosophically impossible for me to read a document in either of those situations and be certain that it can't change under me	16:38
fwereade	hazmat, but if I write to that document I can	16:38
fwereade	hazmat, because I take a lock	16:38
fwereade	hazmat, and guarantee the failure of either myself or the other txn that wanted to write it	16:39
fwereade	hazmat, (sucks if that one just wanted to read it too, but anyway)	16:39
fwereade	hazmat, making sense?	16:39
hazmat	fwereade, serializable would give you that semantic	16:39
fwereade	hazmat, ok, so that does take locks on read?	16:40
fwereade	hazmat, (hopefully happy smart read locks not nasty actually-triggering-serialization write locks?)	16:40
hazmat	fwereade, it does.. but it may not cover every use case.. ie inserting new doc in collection.. since your asserting a negative read	16:40
fwereade	hazmat, yeah, we depend on that a bit	16:41
* hazmat attention is gripped by tosca meeting		16:41
fwereade	hazmat, natefinch: ok, modulo d- asserts, we sound like we'll be fine with the serializable level	16:42
* fwereade needs to go out anyway		16:43
fwereade	thanks for clearing that up, I didn't get that impression from the descriptions	16:43
fwereade	I guess we need to be careful about what we do read when we're in a txn then..?	16:43
natefinch	fwereade: well, generally you don't just read stuff for no reason. You read stuff because the logic depends on the contents	16:44
fwereade	natefinch, ok, but sometimes you read an easy-to-express superset of what you need and extract in code	16:44
hazmat	fwereade, it maybe we need to do separate collection level locks for insert	16:44
natefinch	it's funny, Toku has an office in Lexington.. I could like, drop down there and say hi	16:44
fwereade	natefinch, if that's going to quietly take a bunch of locks we should be aware of it	16:45
natefinch	fwereade: that's true	16:45
hazmat	fwereade, i'd like to define the common scenarios and patterns we have	16:45
fwereade	hazmat, I can live with that, I'm just fretting because I want to be sure we've got answers for all these things ;)	16:46
hazmat	and then verify solutions.. common idioms for them with tokumx	16:46
fwereade	hazmat, that would probably be the smart thing to do, indeed -- natefinch, are you ok to try to build those up?	16:46
fwereade	natefinch, would be happy to discuss in more detail	16:46
fwereade	natefinch, ...but not now	16:46
* fwereade disappears in a puff of smoke		16:47
hazmat	fwereade, natefinch should i setup a call for tomorrow?	16:47
natefinch	fwereade: see ya	16:47
natefinch	fwereade: yeah	16:47
hazmat	fwereade, cheers	16:47
hazmat	done	16:48
jcw4	mgz, cmars fix for lp-1355521 https://github.com/juju/juju/pull/500 , ptal	17:01
mgz	jcw4: you can jfdi that through if you want	17:03
ericsnow	natefinch: one-on-one?	17:03
jcw4	mgz: tx	17:03
jcw4	mgz tx again :)	17:06
mgz	:)	17:07
natefinch	ericsnow: sorry, coming	17:12
wwitzel3	woohoo! I'm down to permission issues	17:31
wwitzel3	victory is mine!	17:41
natefinch	wwitzel3: nice!	17:50
perrito666	sinzui: I would prett much guess that the cause for restore to no longer finish is https://github.com/juju/juju/commit/55a9507924dea63658598361797ec864b9879e84#diff-d41d8cd98f00b204e9800998ecf8427e	17:53
wwitzel3	natefinch: yeah, the outchannel command cannot take arguments, so you have to do it as a script. also if outchannel encounters a non-zero exit code from the script, it assumes the channel is bad and stops sending to it. Until you restart/reload.	17:54
natefinch	huh ok, so your script wasn't returning 0 I guess?	17:54
wwitzel3	natefinch: and lastly, the logrotate conf itself, must have the right set of permissions.	17:54
wwitzel3	natefinch: right, it wasn't because of a permission issue, which was just being thrown away	17:55
natefinch	ahh	17:55
sinzui	perrito666, I just got permission to merge my log recovery changes. The next runs of the recovery tests will try to get logs from that machine that restore created	17:55
sinzui	perrito666, yes, the commit does look like it relates to the vague error in the jenkins console	17:56
wwitzel3	natefinch: so it had 3 issues, but I just had to discover them in the right order.	17:56
wwitzel3	natefinch: for example, at first I was calling logrotate as my command and passing it arguments ... logrotate was being called and returning 0 so logging continued, but the actual conf wasn't being passed.	17:57
perrito666	voidspace: care to shed some light on the commit?	17:58
perrito666	it has your name	17:58
wwitzel3	natefinch: then finally burried in the documentation for outchannel I found an small italic note about the fact outchannel command can't take any arguments.	17:58
wwitzel3	oh and the issue with logrotates default state file path, that was easy though once I was actually able to start getting debug output from the rsyslog outchannel executing the command.	17:59
natefinch	yeesh	17:59
natefinch	I think you should write all that up and put it on juju-dev, so that someone else has a chance of understanding everything later.... and actually, a big comment in the code about it wouldn't hurt either.	18:00
wwitzel3	natefinch: so the rsyslog cert and key, live in the log folder even though they aren't logs.	18:01
natefinch	or in the docs somewhere... something	18:01
natefinch	wwitzel3: nice	18:01
wwitzel3	natefinch: so I assume it is ok for the logrotate.conf to live there too?	18:01
wwitzel3	natefinch: also the logrotate helper script that runs logrotate with the juju conf.	18:02
wwitzel3	natefinch: yeah I was planning to document the behavior of outchannel and ref the docs as well as the permission requirements for the logrotate.conf and state file.	18:03
wwitzel3	natefinch: since I also need to document the existence of all-machine.log.1 and the max size, etc..	18:04
natefinch	ahh yep, definitely	18:04
natefinch	and yeah, we can put more stuff in there if there's already non-logs in there	18:04
tych0	h	18:05
hazmat	wwitzel3, you get your rsyslog issues resolved?	18:23
wwitzel3	hazmat: yep, thanks :)	18:53
=== urulama is now known as urulama-afk
* sinzui kicks CI to test ha and restore now		19:13
natefinch	sinzui: I have a possible fix for the other bug too	19:15
natefinch	sinzui: https://github.com/juju/juju/pull/501	19:15
natefinch	perrito666, ericsnow, wwitzel3: one of you want to review? ^^ really simple change with a test and everything. I don't actually know why the code worked before and then suddenly stopped working, but at the very least this is a change that won't break anything and has a test to make sure it continues working.	19:18
natefinch	super simple change	19:19
ericsnow	natefinch: sure	19:19
natefinch	test verified to panic on the old code, just like in comment #18 here: https://bugs.launchpad.net/juju-core/+bug/1347715	19:20
natefinch	(and test verified not to panic with the new code)	19:20
natefinch	aside: I wish the comments on launchpad bugs were anchors, so I could do <bug-url>#comment-18 and have it jump directly to the comment. Instead the comment numbers just open up a separate window with no context which is completely useless.	19:21
=== urulama-afk is now known as urulama
=== urulama is now known as uru__
thumper	morning folks	20:48
thumper	sinzui: you around?	20:53
sinzui	I am	20:54
thumper	sinzui: what's the tl;dr on CI status?	20:56
sinzui	thedac, 2 blockers, natefinch's fix for 1 is queue to play in the next hour	20:57
natefinch	thumper: here's the fix... though I admittedly don't know why it worked before: https://github.com/juju/juju/pull/501	20:58
sinzui	thumper, my effort to get more logs from the failed restore tests also failed. no new data	20:58
* thumper takes a quick look		20:58
* sinzui wishes --debug didn't leak certs, keys, users, and passwords		20:58
natefinch	sinzui: me too	20:59
natefinch	I gotta run	20:59
natefinch	good luck everyone, and good night	20:59
thumper	sinzui: so the remaining CI failure is a restore one?	21:00
sinzui	thumper, yes https://bugs.launchpad.net/juju-core/+bug/1355324	21:01
sinzui	thumper, the test is playing right now. I hope to ssh in at the right moment and cat the logs	21:01
thumper	kk	21:01
waigani	thumper: welcome back!	21:13
menn0	thumper, waigani: good morning	21:13
thumper	o/	21:13
waigani	morning :)	21:13
* thumper headdesks		21:16
sinzui	perrito666, I added logs to bug 1355324. Tomorrow I am going to investigate redirecting all stderr to a private location on disk to try --debug	21:51
perrito666	sinzui: the rev I pointed is the culprit, it appears that we no longer have mongo listening on StatePort and therefore restore can not connect to It I need to get voidspace or fwereade to find out more about it	21:59
thumper	perrito666: why isn't mongo listening?	22:00
sinzui	perrito666, okay. that right, I have too much going on to keep my work list clear	22:00
perrito666	thumper: I dont think its not listening, we are no long publicizing the port I believe	22:00
perrito666	thumper: https://github.com/juju/juju/pull/449/files	22:01
sinzui	thumper, we agreed some months ago that nothing is allowed to talk to mongo directly...but restore does	22:01
ericsnow	perrito666: from the PR it sounds like it's not even listening on external ports	22:01
perrito666	ericsnow: I really need to dig more into this otherwise I am talking in educated guesses	22:02
thumper	ah...	22:03
thumper	didn't I see a change where mongo only listened internally?	22:03
ericsnow	perrito666: https://github.com/juju/juju/pull/449#issuecomment-50825367	22:03
thumper	like only on localhost?	22:03
ericsnow	perrito666: "I've also confirmed that with current master I can telnet to port 37017, and that with this branch I can't."	22:03
perrito666	not even a day running ubuntu on this machine and I already have unity behaving oddly.. that must be a record for a fresh install	22:03
thumper	ericsnow: that's the one	22:03
perrito666	thumper: sounds to me that not even, which puzzles me a bit unlessss	22:04
perrito666	api is using direct connection	22:04
perrito666	I dont know if direct is the right name	22:04
ericsnow	perrito666: yeah, that's what I find weird	22:04
ericsnow	thumper: yep, that's the PR for the changeset that broke restore	22:04
perrito666	thinkpads ability to swap fn/ctrl on the bios is marvelous	22:05
thumper	ericsnow, perrito666: simplest thing IMO is to revert PR 449	22:06
thumper	and take a fresh look later at closing the port	22:07
thumper	better than trying to work out now how to selectively open it	22:07
perrito666	I guess there is no other option, also by doing this I can get the original author to grep again for the use of StatePort :p	22:09
ericsnow	perrito666: yeah, tell him about that grep thing ;)	22:10
ericsnow	thumper, perrito666: +1 on reverting	22:12
ericsnow	(too bad another blocker will show up before I have a chance to merge anything tomorrow <wink>)	22:12
perrito666	ok, let me have a late evening snack and Ill propose the revert	22:15
perrito666	btw, did I mention I have the eating habits of a hobbit?	22:15
perrito666	so, to revert we do a reverse pr or use the "sorry I screwed up, please undo" feature from github	22:40
thumper	perrito666: um... there is a github feature?	22:44
thumper	perrito666: I'd have just done a reverse PR	22:44
* thumper goes to make a coffee		22:44
perrito666	thumper: I saw github offering me to undo with a button the other day	22:44
thumper	waigani: standup hangout time	23:02
perrito666	how do I reference a ticket/pr on a pr description	23:11
perrito666	?	23:11
perrito666	sinzui: or anyone	23:16
perrito666	the $$fixes tag is to be added into the $$merge or in the pr description?	23:16
sinzui	perrito666, fixes-nnnnn	23:17
sinzui	perrito666, in a comment by itself or with other text	23:17
perrito666	sinzui: yup, but, that is to be added into the pr body or into the merge comment?	23:17
perrito666	my question is, will that trigger the merge?	23:17
sinzui	merge comment perrito666	23:17
sinzui	perrito666, it doesn't trigger a merge.	23:18
sinzui	perrito666, $$merge$$ to trigger the merge and fixes-nnnnnn ti explain why the merge is valid	23:18
perrito666	tx sinzui	23:19
perrito666	ok anyone ptal https://github.com/juju/juju/pull/503	23:19
ericsnow	I'm signing off, but if anyone is able, ptal @ https://github.com/juju/utils/pull/16 https://github.com/juju/utils/pull/19 https://github.com/juju/juju/pull/462 https://github.com/juju/juju/pull/453	23:29
sinzui	perrito666, I just discovered the first comment is not a comment according to github api. I can see ericsnow is the first comment, and my test comment is the second	23:29
perrito666	sinzui: that is why I did not add the $$ fixes part, I will add it in the merge comment	23:30
sinzui	perrito666, okay. I am a little disappoint because I think it is nice to state in the first comment why the branch should be merged :/	23:40
menn0	testing... please ignore: bug 1347715	23:52
mup	Bug #1347715: Manual provider does not respond after bootstrap <bootstrap> <ci> <regression> <juju-core:In Progress by natefinch> <https://launchpad.net/bugs/1347715>	23:52
menn0	\o/ (fixed!)	23:52
perrito666	thumper: that is not going to work	23:53
perrito666	you did not add the $$fixes-###$$	23:53

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!