/srv/irclogs.ubuntu.com/2014/08/12/#juju-dev.txt

perrito666sinzui: news?00:09
perrito666ok, something is very very broken with aws01:21
=== Ursinha is now known as Ursinha-afk
waiganiaxw: do you mean just setting state-server: true in tests where it is set to false?01:45
axwwaigani: yes, the ones where it matters anyway01:45
axwleaking details of the dummy provider into the command is not great01:45
axwAFAIK, the state-server thing is just there to speed up the tests01:46
axwbut if we need it, then the tests should change01:46
waiganiaxw: yeah for sure, it's ugly. I assumed the state-sever was set to false in tests for a reason01:46
axwwaigani: there's lots of wrong in the code, don't be afraid to question it ;)01:47
waiganiaxw: so you're saying the state-server setting should be deprecated?01:47
axwno, just change it to true for the tests where we run the bootstrap command01:48
=== Ursinha-afk is now known as Ursinha
axwalternatively, do the TODO now... not sure which one is less work really01:48
waiganisigh01:49
waiganikludges are less work ;)01:49
waiganiI had to look that up01:49
axwyes, often less work in the short term01:49
waiganiand found this image: http://en.wikipedia.org/wiki/Kludge#mediaviewer/File:Miles_Glacier_Bridge,_damage_and_kludge,_1984.jpg01:49
axwlol01:50
waiganisums it up pretty nicely haha01:50
waiganiI should put that on my coding profile ;)01:50
* wwitzel3 flips his desk02:43
axwwwitzel3: what'cha doing?02:48
wwitzel3trying to get rsyslog rotation for all-machines.log working02:49
axwah, fun times02:49
wwitzel3since all day02:49
wwitzel3:/02:49
wwitzel3I was in bed attempting to go to sleep and I was have weird dreams about rsyslog in human for bullying and taunting me.02:50
wwitzel3s/for/form02:50
wwitzel3lol02:50
axwwell, that's not cool...  I don't recall having personified a software package before02:51
wwitzel3haha, happens to me all the time02:51
axwdid it look like this guy? https://plus.google.com/+RainerGerhards02:52
axw:)02:52
wwitzel3though my favorite dream was I was travel through code and I kept hitting this unhandled exception (which was the same one that was happening in production). And when I woke up I had instant thought of how to fix it and got it resolves in 5 minutes.02:53
menn0_axw: bad news... the "State remove setmongopassword" change is what's causing this CI blocker: bug #135532002:53
axwgod damnit02:53
menn0_axw: I've been bisecting my way through recent changes and that's the one02:53
axwthanks menn0_02:53
axwI'll take it over if you like02:54
menn0_sure02:54
menn0_I'll update the ticket with the best repro details I have02:54
menn0_give me a few minutes to see if I can simplify them a bit further02:54
axwmenn0_: thanks02:55
axwweird, I'm certain I tested this...03:09
waiganimenn0_: how do you bisect your way through the changes?03:10
menn0_axw: after issuing ensure-availability did you wait for the new state servers to hit "started"?03:11
axwmenn0_: pretty sure I did, but it was a little while ago and my memory isn't great03:12
menn0_I'm seeing the new machine agents getting stuck in pending because they can't connect to the API03:12
axwyeah, I see the same thing now03:12
menn0_axw: I've updated the ticket and assigned it to you. All yours!03:15
axwmenn0_: thanks03:16
menn0_waigani: the process I used is:03:16
menn0_reproduce the problem with master03:16
menn0_use git log to see the changes between 1.20.1 and master03:16
menn0_git checkout <some rev in between>03:17
menn0_git install ./..03:17
menn0_try to reproduce the problem03:17
menn0_does it exist?03:17
menn0_no: the problem revision is after this on03:17
menn0_yes: the problem revision is before this one (or it IS this one)03:17
menn0_you can do a straight binary search03:17
menn0_but if you have some idea of where the problem lies, like in this case, you can be a bit smarter about picking revs to narrow it down more quickly.03:18
menn0_keeping good notes on what you've done helps a lot too03:19
waiganimenn0_: ah right, thanks for the explanation03:19
menn0_if the reproduction steps were completely automated (which they can be but I didn't bother) then you could get "git bisect" to do all the work for you.03:20
menn0_what i've described is exactly what it does03:21
menn0_actually, looking at the docs "git bisect" can also be used to assist with manual searches. I should have used it.03:23
menn0_previously I've only ever used "git bisect run" which does automated searching03:23
waiganimenn0_: yeah I was thinking that should be able to be automated03:24
menn0_waigani: the fiddly part in this case is writing some code to wait at the right times and parse the status output03:26
menn0_waigani: all doable though and I'm sure the helper functionality in juju-ci-tools would help too.03:27
waiganimenn0_: setting up a process that could automatically find the offending commit for a CI bug would save us a lot of time03:28
waiganipotentially reverting it too03:28
menn0_I think a semi-automatic process is probably best03:29
menn0_the CI tests often fail for infrastructure related reasons03:29
waiganiright, well it could add suspicious commits to the bug comments03:30
menn0_you wouldn't want these to trigger reverts or long running investigations03:30
menn0_yes03:30
stokachuis there any examples where juju actions are being used in practice?03:36
menn0_axw: you associated bug 1347715 with the "don't enter upgrade mode unnecessarily" PR03:37
axwmenn0_: yep03:37
menn0_that PR wasn't intended to solve that bug :)03:37
axwmenn0_: see the latest error message at the bottom of the bug03:38
axwthat LP bug has morphed03:38
axwthe latest one was due to "upgrade in progress"03:38
menn0_all that PR does is allow the machine agent workers to start up a little faster, and avoid some unnecessary log noise03:38
menn0_I can see how it might help with that bug03:39
menn0_but I wonder if that's the whole story03:39
menn0_if it is then, great03:39
stokachuthere doesnt seem to be an action-set either is this feature not complete yet?03:39
menn0_but I'm not sure03:39
axwstokachu: still a work in progress AFAIK03:40
stokachuaxw, ok cool03:40
menn0_axw: is it worth reverting 3b6da1d429bff627a636ca4512c1ae8230f26539 or do you think you'll have this sorted quickly03:42
* menn0_ would like to get CI unblocked03:42
axwmenn0_: I guess it's no big deal to revert it in the mean time03:44
axwI'll do that...03:44
jcw4stokachu: yes, it's still in-progress03:45
jcw4stokachu: bodie_ is primarily working on action-set and action-get03:45
=== menn0_ is now known as menn0
=== uru_ is now known as urulama
axwmenn0: reverted04:15
menn0axw: yep, saw that. cheers04:15
menn0axw: does the bug get marked as "Fix committed" now?04:15
axwmenn0: yeah I've done that04:16
axwI'll keep looking into it tho04:16
menn0cool04:16
menn0of course04:16
axwthis is one time where I really wish I hadn't rebased04:16
axwpretty sure I tested it and it all worked, then I rebased and it's broken since then04:17
menn0axw: so you think it's some interaction with this change and some other change that got rebased in?04:19
axwI think so04:20
menn0axw: in other news, that manual provider problem is still there despite the "don't enter upgrade mode unnecessarily" change04:20
axwsigh04:20
axwall good news today :)04:20
* menn0 reopens the bug04:21
axwhold up04:21
axwit's still publishing?04:21
axwmenn0: ^^04:21
axwor did you test it manually?04:21
axweh, never mind04:21
axwhadn't refreshed04:21
menn0http://juju-ci.vapour.ws:8080/job/manual-deploy-trusty-ppc64/523/04:22
menn0axw: shall I take 1347715 for a bit?04:26
axwmenn0: that'd be great, thank you04:27
menn0axw: cool04:27
menn0I need to go out and pick up some furniture but I'll keep going with it once I'm back04:27
=== uru_ is now known as urulama
menn0axw: I'm back but need to deal with the kids for a bit05:26
axwmenn0: no worries05:26
menn0I'll continue with this manual provider issue later this evening05:26
* axw hopes it isn't something else he's broken05:27
axwmenn0: I'm getting "auth fails" even without my change06:25
axw:/06:25
axwhrm, 2nd shot worked06:41
* axw scratches head06:41
mattywmorning all08:16
TheMueheya08:26
dimiternmorning08:26
voidspaceTheMue: morning08:28
voidspacedimitern: morning08:28
menn0axw: I'm back again (kids were a nightmare to get to bed)08:55
axwmenn0: heya. fun :(08:56
axwmenn0: so I've reworked my previous PR and got something working now08:56
axwignore my comment from before, I think I hadn't uploaded the right tools08:56
menn0axw: ok that's good to hear08:56
menn0axw: let me know if you need a review08:57
axwthe problem with the old one was that mongo.SetAdminMongoPassword was also doing Login straight after08:57
dimiternso is anyone working on the blocker bug https://bugs.launchpad.net/juju-core/+bug/1355324 ?08:57
axwmenn0: it can wait, it's well after EOD there...08:57
axwnot I08:57
menn0dimitern: no I don't think so08:58
menn0other blockers have been keeping us busy08:58
dimiternmenn0, how about the other one assigned to you?08:58
voidspacelaunchpad won't even show me the bug08:58
dimiternvoidspace, oh, which one?08:59
voidspaceI don't think it's my internet  being rubbish this time08:59
voidspace135532408:59
voidspacepage won't load08:59
dimiternvoidspace, :( aw08:59
voidspacejust launchpad being slow I think08:59
menn0voidspace: weird. it's working for me08:59
voidspaceodd08:59
menn0I'm going to look at #1347715 for a bit longer09:00
menn0but someone else might have to take over depending how far I get with it09:00
mattywvoidspace, I'm having trouble with lp, canonical.com and ubuntu.com this morning09:03
mattywvoidspace, you on bt?09:03
voidspacemattyw: over here in Romania? I doubt it :-)09:03
voidspacemattyw: don't know who the isp is to be honest09:03
voidspacebut today most of the internet works well, which is better than previous days, *except* for launchpad09:04
mattywvoidspace, cool - how long you there for?09:04
voidspaceI'll try canonical.com and ubuntu.com as well09:04
voidspacemattyw: till the end of August09:04
voidspacemattyw: visiting wife's family09:04
mattywvoidspace, nice, hope you have a great time09:04
voidspacecanonical.com loads fine09:04
voidspacemattyw: thanks09:04
voidspacemattyw: I'm mostly working09:05
voidspacemattyw: we're taking a week at the end of August to visit seaside and mountains though09:05
voidspaceshould be good09:05
voidspaceyep, launchpad not loading for me at all at the moment09:05
voidspacetried two browsers09:05
voidspaceI wonder if it's a dns issue, I can't ping it either09:11
mattywvoidspace, I was able to get to canonical.com by changing dns servers but everything else is down for me09:13
menn0axw: well this looks like a problem. using the manual provider:09:43
menn02014-08-12 09:36:54 INFO juju.mongo open.go:104 dialled mongo successfully on address "127.0.0.1:37017"09:43
menn02014-08-12 09:36:54 DEBUG juju.worker.logger logger.go:45 reconfiguring logging from "<root>=DEBUG" to "<root>=WARNING;unit=DEBUG"09:43
menn02014-08-12 09:36:55 ERROR juju.worker runner.go:219 exited "machiner": machine-0 failed to set status started: cannot get machine 0: EOF09:43
axwmenn0: yep, but what would be causing the EOF?09:43
menn0that's the last thing in the machine-0 log following bootstrap09:44
axwI saw it, but it's not a very enlightening error message ;)09:44
menn0I'm going to crank up the log level during bootstrap and see what I can see09:44
axwcool09:44
menn0the error happens after the root logger goes to WARNING09:44
axwah right09:44
axwyeah, I just set my environments' logging-config="<root>=DEBUG"09:45
menn0and I might have to do the bisect thing again to see if I can track down the rev09:45
menn0this one has been broken for a while so there could be a lot of revs...09:46
mattywfolks - is someone able to help me out with a couple of things around the unit type?10:00
mattyw^^ it appears (at least in some tests) that the CharmURL in a unit doesn't have a value (== nil)10:00
menn0axw: alright10:01
menn0now I have a nice long panic10:01
axwmenn0: cool10:01
axwI have to run10:02
perrito666morning10:02
menn0axw: no problems10:02
menn0I need to stop soon too10:02
axwmenn0: if you get sick of looking at that, can you please attach the panic to the bug and I'll take a look tomorrow10:02
menn0axw: I'll try to get someone else to look in the mean time too10:02
menn0but I will add everything I have to the bug either way10:03
axwcool10:03
TheMuedimitern: a cable technician just came, could be that I’m a bit later at our hangout10:38
dimiternTheMue, ok, np10:38
voidspacedimitern: are we postponing?10:50
dimiternvoidspace, let's wait for TheMue a few minutes10:50
voidspacedimitern: sure10:50
TheMuedimitern: I’m there, are you done?10:56
voidspaceTheMue: we didn't start10:57
menn0right... I'm done11:24
menn0who wants to take over this CI blocker: 134771511:24
menn0I've done a lot of the legwork. There's repro instructions and other findings attached to the ticket.11:24
menn0I'm pretty close but I haven't quite nailed the actual bug yet.11:25
menn0no takers regarding the above?11:43
tasdomascould somebody take a look at https://github.com/juju/juju/pull/490 ?11:51
mgzmenn0: I read through the bug, did you narrow down when it was introduced, or not get that far yet?11:52
menn0mgz: well as per my last comment, it's definitely since 1.20.211:52
menn0but I haven't narrowed it further than that11:52
menn0given that we have a fair simple repro, git bisect could help?11:53
mgzyeah, just a bit slow, right, as it involves reprovisioning a machine?11:54
mgzor can you do it on a dirty one?11:54
menn0focussing on changes that involved the manual provider might also help narrow things down more quickly11:54
menn0no you don't need to reprovision11:54
menn0because it's the manual provider11:54
menn0just make sure you do a juju destroy-environment --force --yes after each run11:54
menn0that's pretty good at getting rid of most of the previous run11:55
menn0it's not exactly fast (the bootstrap still takes a while) but it's manageable11:55
menn0limiting the upload series to just trusty also helps a bit11:56
menn0it's midnight. I really need to go to bed11:56
mgzmenn0: okay, good night11:57
menn0have a good one11:57
fwereadetasdomas, LGTM11:59
tasdomasfwereade, thanks12:20
=== ChanServ changed the topic of #juju-dev to: https://juju.ubuntu.com | On-call reviewer: see calendar | Open critical bugs 1347715, 1355324
fwereadenatefinch, ping -- when you have a moment I'd like to chat about transactions13:41
natefinchfwereade: how about now? :)13:41
fwereadenatefinch, perfect :)13:42
fwereadenatefinch, in irc for a bit in case anyone else sees something ineresting go past?13:42
natefinchfwereade:  sure13:42
fwereadenatefinch, basically I'm worried about interactions between the two systems as we switch13:42
natefinchfwereade: yep13:43
fwereadenatefinch, (oh wait, it might actually not be a problem, let's keep going for a bit)13:43
fwereadenatefinch, in particular, assertions13:43
fwereadenatefinch, now I think that we will write to every affected document --including those we just assert on -- as part of a transaction13:44
fwereadenatefinch, if that's the case I think we're good13:44
fwereadenatefinch, the problem isn't at crossover time13:44
fwereadenatefinch, it's with using toku on its own13:44
fwereadenatefinch, in particular, assertions about particular document states13:45
fwereadenatefinch, I think that within any transaction we need to write to any document on whose state we depend13:45
natefinchfwereade: I haven't actually done a lot of research into how toku transactions work.... do you not get a consistent view of the database once you enter a transaction?  Or only the ones that you write to?13:47
fwereadenatefinch, and that, best-case, this is going to lead to frequent document deadlocks13:47
fwereadenatefinch, consistent state doesn't help though, I think13:48
natefinchahh right, if someone else changes the doc out from under you13:48
fwereadenatefinch, it's about needing all that state to still satisfy certain conditions when the txn lands13:48
natefinchright13:48
fwereadenatefinch, writing to the document will help there, it pulls it into the transaction if you like13:49
fwereadenatefinch, but I think it *seriously* increases the already-present risks of deadlock in the DB13:49
fwereadenatefinch, I read docs X and Y, you read Y and X13:50
natefinchright13:50
natefinchclassic deadlock13:50
natefinchfwereade: have you read up on their transactions?  I don't feel qualified to talk about them without doing some reading.  I'd just been doing some testing to make sure it was even possible to use their DB.... but hadn't actually researched the way their transactions worked.  hazmat had done the work there, I believe.13:51
fwereadenatefinch, I've read like 1 small white paper13:52
* natefinch is reading the white paper now13:52
fwereadenatefinch, so anyway -- I think it will work fine with mgo txns underneath, for a given value of fine, if not actually very performant13:54
fwereadenatefinch, although13:54
hazmatfwereade, its mvcc fwiw13:55
hazmatconsistent reads, so a simple query satisifes mgo conditions13:55
hazmatfwereade, more of concern is the use of write hotspots in the current codebase13:56
fwereadehazmat, are you thinking more of "presence" or of things like service documents?13:56
hazmatfwereade, service docs13:56
fwereadehazmat, right, I think those things are a nexus of fuckery in either txn system13:57
hazmatfwereade, they exist due to async gc ? or..13:57
hazmatfwereade, latent gc can use simple queries afaics.. i tended to regard them as offshots of the extant txn sys in conjuction w/  lifecycle management13:58
fwereadehazmat, well service documents get cleaned up with the last unit to be removed from a dying service13:58
fwereadehazmat, but that's managed by a refcount on the service document13:58
hazmatfwereade, why not just a async gc cleaner job in the state server?13:59
Beretwallyworld, any update on that flag to prevent a failed bootstrap from tearing itself down?13:59
hazmatfwereade, doesn't that mean machine loss of last unit leaves to stuck service otherwise?14:00
hazmatminus using destroy-machine --force to try and post clean up after the fact from the api server14:01
fwereadehazmat, no? when a unit gets removed either by the machine agent or by forced removal of the machine, the last one cleans up anything that depends on it14:01
fwereadehazmat, or indeed "yes" depending on perspective14:01
natefinchBeret: we've been fighting some critical bugs lately, and most of the team leads were on a sprint last week, so it hasn't gotten done yet, AFAIK.  But I think it could be done this week if we have someone free to work on it.14:02
Beretnatefinch, ok, thanks14:03
natefinchBeret: we'll try to get it in... it's something we want for ourselves, too.14:03
Beretnatefinch, it's not more important than real bugs, I just wanted to make sure it hadn't gotten in and we just didn't know it14:03
fwereadehazmat, to step back a moment14:03
* hazmat nods14:03
hazmatmeeting.. bbiab14:03
fwereadehazmat, in either txn system, our performance is largely dependent on the areas of document space touched by a given txn -- thus designing db operations that minimise overlap is important either way14:04
hazmatfwereade, agreed14:04
fwereadehazmat, we have occasionally done a really poor job of designing these things in the past14:05
fwereadehazmat, but now we have schema changes we have a path to mitigate these, and we can expect to see benefits from doing this in *either* system14:05
fwereadehazmat, however, we won't be able to eliminate overlaps14:06
fwereadehazmat, our job is structuring things such that overlaps are (1) rare and (2) not too disruptive14:07
fwereadehazmat, and I am at the moment mainly thinking about the impact on how we actually have to write code14:09
fwereadehazmat, natefinch: so if we wrap a mgo txn in a toku txn...14:10
fwereadehazmat, natefinch: let's pretend for a moment we get all our info from a consistent snapshot, and craft a txn assuming the truth of that stuff, and then we execute that txn14:12
fwereadehazmat, natefinch: only when we execute that txn do we hit the docs and grab the locks, and if we deadlock at that point we may have trouble14:13
fwereadehazmat, natefinch: if the txn runner fails out for this reason, what exactly is the impact on the code trying to craft the txn and retry in the face of contention?14:14
fwereadehazmat, go back to the beginning of the whole set of operations and start crafting txns again from scratch?14:16
fwereadehazmat, in particular I fear *that* is going to be a major source of gray goo transactions that end up locking large chunks of the DB, deadlocking, failing out, and doing the same thing over and over again14:17
fwereadenatefinch, sorry, I am also interested in your thoughts on the above14:18
natefinchhah ok14:18
natefinchI was going to give my thoughts anyway ;)14:18
fwereadenatefinch, just realised I'd stopped badging you at some point14:18
fwereade(but that doesn't mean I'm going to stop badgering you, ho ho)14:20
* fwereade looks shamefaced14:20
=== jog_ is now known as jog
natefinchhaha14:20
natefinchfwereade: I gotta run in a couple minutes unfortunately..... but... do we need mgo transactions if we have toku transactions?  Can we just stop using mgo transactions?14:21
fwereadenatefinch, yes, *but* that's a lot of stuff to rewrite rather than wrap14:22
natefinchright14:22
fwereadenatefinch, there will certainly be things that are easier to do in a pure-toku world14:22
natefinchyep14:22
fwereadenatefinch, I'm just worried that mixing the two will actually have a pathologically awful impact14:23
natefinchI gotta run, sorry, I'll keep thinking and will read the channel history.. Back in an hour-ish14:23
fwereadenatefinch, cheers, take care14:23
hazmatfwereade, per the acid doc, we could divert mgo transactions to toku14:24
fwereadehazmat, I can sort of see it but it's a bit fuzzy in my mind14:27
fwereadehazmat, we still have to do what I said, I think?14:27
fwereadehazmat, we can be a bit more proactive about writing to docs to take their locks a bit earlier than we could otherwise14:27
fwereadehazmat, and I daresay we can convert failure to do so into ErrRefresh, and make all the transactions handle it?14:28
fwereadehazmat, but wouldn't we have to actually start a fresh toku txn *anyway*?14:29
fwereadehazmat, so in a multi-step toku operation, if the Nth step fails we back *everything* out and start again?14:30
fwereadehazmat, heh, maybe even track and try to grab locks on everything we thought we needed last time through?14:33
fwereadehazmat, it all feels potentially *really* yucky14:34
=== Ursinha is now known as Ursinha-afk
fwereadenatefinch, more blithering above ^^14:40
hazmatfwereade, sorry still in meeting, bbiab14:44
fwereadehazmat, np, I'm around for a bit I think14:44
=== Ursinha-afk is now known as Ursinha
natefinchfwereade: back16:01
fwereadenatefinch, I need to be off soon, but read back and see if anything resonates or inspires a response16:23
fwereadenatefinch, just type at me in here if I'm gone, I'll see it soon16:24
hazmatfwereade, sorry my meeting ran over time.. and into another meeting.. i'll capture notes for discussion tomorrow if your not around in a bit16:26
hazmatfwereade, nutshell locks are implicit with mods to a doc in a txn.. basically we just take a op runner and do it in toku txn (begin, end txn) with catch on lock and retry behavior.16:29
natefinchfwereade: no problem.  Trying to do a few things at once here.   I don't know that grabbing locks early has any effect in toku... if two threads are both trying to grab locks early, it doesn't really buy us much.  My preference would be to just make the code straightforward and do what it's supposed to do, and if something else gets into the DB first, the TX will have to retry.16:30
hazmatfwereade, re backout, its implicit at that commit, failed lock is rollback, and error to app to handle, at which point we retry similiar to runner loop now16:30
hazmatthere is no grabbing a lock, its implied by writing to a doc during a txn.16:30
fwereadehazmat, natefinch: ok, this sounds broadly sane, I am worried that we have some funky layering issues to work around wrt actually managing rollbacks16:31
natefinchthat's certainly possible16:31
fwereadehazmat, natefinch: well I am contending that we do need document locks for those docs that in mgo/txn we would assert on16:31
fwereadehazmat, natefinch: but that we will have to lock them ourselves by, as you say, writing to them in a transaction16:31
hazmatfwereade, we don't because we're operating in a mvcc world with implicit write level locks.16:32
fwereadehazmat, right16:32
hazmatie read committed, or serialized if preferred (both options avail)16:32
fwereadehazmat, which is great for ensuring that conditions hold for the documents we're writing16:32
hazmatfwereade, right.. toku should be a nice runner for the extant mgo transactions since conditions are explicitly specified and re-runnable16:32
natefinchfwereade: right... my reading of MVCC is... you don't have to worry about basing your behavior on reads that get out of date, because if they get out of date, one of the two transactions aborts and retries16:32
fwereadehazmat, natefinch: so once you start a transaction, toku tracks everything you read? and aborts your txn if someone writes to one of those docs?16:34
fwereadehazmat, natefinch: that's the only way I could see it aborting a transaction in that situation16:34
hazmatfwereade, again depends on isolation level.. read serialized you'll read the copy that was current at the time of txn begin16:35
hazmatfwereade, read committed, means you read other docs current as of the time of read16:35
hazmatas opposed to our read dirty model now.. ie. read partially committed state16:35
fwereadehazmat, natefinch: I don't see how either read serialized *or* read committed actually helps us -- I'm not interested in either of those two states16:36
hazmatrephrasing .. current == latest commmitted revision of the doc16:36
hazmatfwereade, their rather important .. but perhaps we should take a step back.. what's the concern?16:37
fwereadehazmat, if I have a txn that should only go ahead if a property continues to hold for some doc not in the txn16:37
fwereadehazmat, then it's philosophically impossible for me to read a document in either of those situations and be certain that it can't change under me16:38
fwereadehazmat, but if I *write* to that document I can16:38
fwereadehazmat, because I take a lock16:38
fwereadehazmat, and guarantee the failure of either myself or the other txn that wanted to write it16:39
fwereadehazmat, (sucks if that one just wanted to read it too, but anyway)16:39
fwereadehazmat, making sense?16:39
hazmatfwereade, serializable would give you that semantic16:39
fwereadehazmat, ok, so that does take locks on read?16:40
fwereadehazmat, (hopefully happy smart read locks not nasty actually-triggering-serialization write locks?)16:40
hazmatfwereade, it does.. but it may not cover every use case.. ie inserting new doc in collection.. since your asserting a negative read16:40
fwereadehazmat, yeah, we depend on that a bit16:41
* hazmat attention is gripped by tosca meeting16:41
fwereadehazmat, natefinch: ok, modulo d- asserts, we sound like we'll be fine with the serializable level16:42
* fwereade needs to go out anyway16:43
fwereadethanks for clearing that up, I didn't get that impression from the descriptions16:43
fwereadeI guess we need to be careful about what we *do* read when we're in a txn then..?16:43
natefinchfwereade: well, generally you don't just read stuff for no reason.  You read stuff because the logic depends on the contents16:44
fwereadenatefinch, ok, but sometimes you read an easy-to-express superset of what you need and extract in code16:44
hazmatfwereade, it maybe we need to do separate collection level locks for insert16:44
natefinchit's funny, Toku has an office in Lexington.. I could like, drop down there and say hi16:44
fwereadenatefinch, if that's going to quietly take a bunch of locks we should be aware of it16:45
natefinchfwereade: that's true16:45
hazmatfwereade, i'd like to define the common scenarios and patterns we have16:45
fwereadehazmat, I can live with that, I'm just fretting because I want to be sure we've got answers for all these things ;)16:46
hazmatand then verify solutions.. common idioms for them with tokumx16:46
fwereadehazmat, that would probably be the smart thing to do, indeed -- natefinch, are you ok to try to build those up?16:46
fwereadenatefinch, would be happy to discuss in more detail16:46
fwereadenatefinch, ...but not now16:46
* fwereade disappears in a puff of smoke16:47
hazmatfwereade, natefinch should i setup a call for tomorrow?16:47
natefinchfwereade: see ya16:47
natefinchfwereade: yeah16:47
hazmatfwereade, cheers16:47
hazmatdone16:48
jcw4mgz, cmars fix for lp-1355521 https://github.com/juju/juju/pull/500 , ptal17:01
mgzjcw4: you can jfdi that through if you want17:03
ericsnownatefinch: one-on-one?17:03
jcw4mgz: tx17:03
jcw4mgz tx again :)17:06
mgz:)17:07
natefinchericsnow: sorry, coming17:12
wwitzel3woohoo! I'm down to permission issues17:31
wwitzel3victory is mine!17:41
natefinchwwitzel3: nice!17:50
perrito666sinzui: I would prett much guess that the cause for restore to no longer finish is https://github.com/juju/juju/commit/55a9507924dea63658598361797ec864b9879e84#diff-d41d8cd98f00b204e9800998ecf8427e17:53
wwitzel3natefinch: yeah, the outchannel command cannot take arguments, so you have to do it as a script. also if outchannel encounters a non-zero exit code from the script, it assumes the channel is bad and stops sending to it. Until you restart/reload.17:54
natefinchhuh ok, so your script wasn't returning 0 I guess?17:54
wwitzel3natefinch: and lastly, the logrotate conf itself, must have the right set of permissions.17:54
wwitzel3natefinch: right, it wasn't because of a permission issue, which was just being thrown away17:55
natefinchahh17:55
sinzuiperrito666, I just got permission to merge my log recovery changes. The next runs of the recovery tests will try to get logs from that machine that restore created17:55
sinzuiperrito666, yes, the commit does look like it relates to the vague error in the jenkins console17:56
wwitzel3natefinch: so it had 3 issues, but I just had to discover them in the right order.17:56
wwitzel3natefinch: for example, at first I was calling logrotate as my command and passing it arguments ... logrotate was being called and returning 0 so logging continued, but the actual conf wasn't being passed.17:57
perrito666voidspace: care to shed some light on the commit?17:58
perrito666it has your name17:58
wwitzel3natefinch: then finally burried in the documentation for outchannel I found an small italic note about the fact outchannel command can't take any arguments.17:58
wwitzel3oh and the issue with logrotates default state file path, that was easy though once I was actually able to start getting debug output from the rsyslog outchannel executing the command.17:59
natefinchyeesh17:59
natefinchI think you should write all that up and put it on juju-dev, so that someone else has a chance of understanding everything later.... and actually, a big comment in the code about it wouldn't hurt either.18:00
wwitzel3natefinch: so the rsyslog cert and key, live in the log folder even though they aren't logs.18:01
natefinchor in the docs somewhere... something18:01
natefinchwwitzel3: nice18:01
wwitzel3natefinch: so I assume it is ok for the logrotate.conf to live there too?18:01
wwitzel3natefinch: also the logrotate helper script that runs logrotate with the juju conf.18:02
wwitzel3natefinch: yeah I was planning to document the behavior of outchannel and ref the docs as well as the permission requirements for the logrotate.conf and state file.18:03
wwitzel3natefinch: since I also need to document the existence of all-machine.log.1 and the max size, etc..18:04
natefinchahh yep, definitely18:04
natefinchand yeah, we can put more stuff in there if there's already non-logs in there18:04
tych0h18:05
hazmatwwitzel3, you get your rsyslog issues resolved?18:23
wwitzel3hazmat: yep, thanks :)18:53
=== urulama is now known as urulama-afk
* sinzui kicks CI to test ha and restore now19:13
natefinchsinzui: I have a possible fix for the other bug too19:15
natefinchsinzui: https://github.com/juju/juju/pull/50119:15
natefinchperrito666, ericsnow, wwitzel3: one of you want to review? ^^  really simple change with a test and everything.  I don't actually know why the code worked before and then suddenly stopped working, but at the very least this is a change that won't break anything and has a test to make sure it continues working.19:18
natefinchsuper simple change19:19
ericsnownatefinch: sure19:19
natefinchtest verified to panic on the old code, just like in comment #18 here: https://bugs.launchpad.net/juju-core/+bug/134771519:20
natefinch(and test verified not to panic with the new code)19:20
natefinchaside:  I wish the comments on launchpad bugs were anchors, so I could do <bug-url>#comment-18 and have it jump directly to the comment. Instead the comment numbers just open up a separate window with no context which is completely useless.19:21
=== urulama-afk is now known as urulama
=== urulama is now known as uru__
thumpermorning folks20:48
thumpersinzui: you around?20:53
sinzuiI am20:54
thumpersinzui: what's the tl;dr on CI status?20:56
sinzuithedac, 2 blockers, natefinch's fix for 1 is queue to play in the next hour20:57
natefinchthumper: here's the fix... though I admittedly don't know why it worked before: https://github.com/juju/juju/pull/50120:58
sinzuithumper, my effort to get more logs from the failed restore tests also failed. no new data20:58
* thumper takes a quick look20:58
* sinzui wishes --debug didn't leak certs, keys, users, and passwords20:58
natefinchsinzui: me too20:59
natefinchI gotta run20:59
natefinchgood luck everyone, and good night20:59
thumpersinzui: so the remaining CI failure is a restore one?21:00
sinzuithumper, yes https://bugs.launchpad.net/juju-core/+bug/135532421:01
sinzuithumper, the test is playing right now. I hope to ssh in at the right moment and cat the logs21:01
thumperkk21:01
waiganithumper: welcome back!21:13
menn0thumper, waigani: good morning21:13
thumpero/21:13
waiganimorning :)21:13
* thumper headdesks21:16
sinzuiperrito666, I added logs to bug 1355324. Tomorrow I am going to investigate redirecting all stderr to a private location on disk to try --debug21:51
perrito666sinzui: the rev I pointed is the culprit, it appears that we no longer have mongo listening on StatePort and therefore restore can not connect to It I need to get voidspace or fwereade to find out more about it21:59
thumperperrito666: why isn't mongo listening?22:00
sinzuiperrito666, okay. that right, I have too much going on to keep my work list clear22:00
perrito666thumper: I dont think its not listening, we are no long publicizing the port I believe22:00
perrito666thumper: https://github.com/juju/juju/pull/449/files22:01
sinzuithumper, we agreed some months ago that nothing is allowed to talk to mongo directly...but restore does22:01
ericsnowperrito666: from the PR it sounds like it's not even listening on external ports22:01
perrito666ericsnow: I really need to dig more into this otherwise I am talking in educated guesses22:02
thumperah...22:03
thumperdidn't I see a change where mongo only listened internally?22:03
ericsnowperrito666: https://github.com/juju/juju/pull/449#issuecomment-5082536722:03
thumperlike only on localhost?22:03
ericsnowperrito666: "I've also confirmed that with current master I can telnet to port 37017, and that with this branch I can't."22:03
perrito666not even a day running ubuntu on this machine and I already have unity behaving oddly.. that must be a record for a fresh install22:03
thumperericsnow: that's the one22:03
perrito666thumper: sounds to me that not even, which puzzles me a bit unlessss22:04
perrito666api is using direct connection22:04
perrito666I dont know if direct is the right name22:04
ericsnowperrito666: yeah, that's what I find weird22:04
ericsnowthumper: yep, that's the PR for the changeset that broke restore22:04
perrito666thinkpads ability to swap fn/ctrl on the bios is marvelous22:05
thumperericsnow, perrito666: simplest thing IMO is to revert PR 44922:06
thumperand take a fresh look later at closing the port22:07
thumperbetter than trying to work out now how to selectively open it22:07
perrito666I guess there is no other option, also by doing this I can get the original author to grep again for the use of StatePort :p22:09
ericsnowperrito666: yeah, tell him about that grep thing ;)22:10
ericsnowthumper, perrito666: +1 on reverting22:12
ericsnow(too bad another blocker will show up before I have a chance to merge anything tomorrow <wink>)22:12
perrito666ok, let me have a late evening snack and Ill propose the revert22:15
perrito666btw, did I mention I have the eating habits of a hobbit?22:15
perrito666so, to revert we do a reverse pr or use the "sorry I screwed up, please undo" feature from github22:40
thumperperrito666: um... there is a github feature?22:44
thumperperrito666: I'd have just done a reverse PR22:44
* thumper goes to make a coffee22:44
perrito666thumper: I saw github offering me to undo with a button the other day22:44
thumperwaigani: standup hangout time23:02
perrito666how do I reference a ticket/pr on a pr description23:11
perrito666?23:11
perrito666sinzui: or anyone23:16
perrito666the $$fixes tag is to be added into the $$merge or in the pr description?23:16
sinzuiperrito666, fixes-nnnnn23:17
sinzuiperrito666, in a comment by itself or with other text23:17
perrito666sinzui: yup, but, that is to be added into the pr body or into the merge comment?23:17
perrito666my question is, will that trigger the merge?23:17
sinzuimerge comment perrito66623:17
sinzuiperrito666, it doesn't trigger a merge.23:18
sinzuiperrito666, $$merge$$ to trigger the merge and fixes-nnnnnn ti explain why the merge is valid23:18
perrito666tx sinzui23:19
perrito666ok anyone ptal https://github.com/juju/juju/pull/50323:19
ericsnowI'm signing off, but if anyone is able, ptal @ https://github.com/juju/utils/pull/16 https://github.com/juju/utils/pull/19 https://github.com/juju/juju/pull/462 https://github.com/juju/juju/pull/45323:29
sinzuiperrito666, I just discovered the first comment is not a comment according to github api. I can see ericsnow is the first comment, and my test comment is the second23:29
perrito666sinzui: that is why I did not add the $$ fixes part, I will add it in the merge comment23:30
sinzuiperrito666, okay. I am a little disappoint because I think it is nice to state in the first comment why the branch should be merged :/23:40
menn0testing... please ignore: bug 134771523:52
mupBug #1347715: Manual provider does not respond after bootstrap <bootstrap> <ci> <regression> <juju-core:In Progress by natefinch> <https://launchpad.net/bugs/1347715>23:52
menn0\o/ (fixed!)23:52
perrito666thumper: that is not going to work23:53
perrito666you did not add the $$fixes-###$$23:53

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!