[00:43] <lifeless> desperately seeking susan^Wreviewer
[00:54] <lifeless> checkwatches.base needs expurgation of its oops stuff.
[01:06] <lifeless> wgrant: StevenK: I need a hand, make jsbuild is failing, and I have no idea why ;)
[01:06] <lifeless> I'm guessing one of you ran into that recently
[01:06] <lifeless> wallyworld: or ^
[01:07] <wallyworld> lifeless: what's the issue?
[01:07] <lifeless> no rule to make target 'lib/canonical/launchpad/icing/yui/yui/yui.js' needed by .../launchpad.js
[01:08] <wallyworld> hmmm. haven't seen anything like that in a while. last time that sort of thing happened, a make clean got it going again
[01:08] <lifeless> trying that, thanks
[01:08] <wallyworld> it sounds like the yui symlinks got messed up
[01:08] <lifeless> mwhudson: can we nuke vostok-archive ?
[01:09] <mwhudson> lifeless: yeah, i reckon so
[01:09] <mwhudson> nuke all of vostok if it's getting in the way any i guess
[01:09] <lifeless> wallyworld: thank, that works (which means we have a bug in our dep rules)
[01:09] <lifeless> mwhudson: it just seems stubby to me
[01:10] <lifeless> mwhudson: as in, an abandoned experiment
[01:10] <mwhudson> yep
[01:10] <mwhudson> (hey at least i managed to clean up the publisher code in the rest of lp a bit when i added it ...)
[01:10] <lifeless> \o/
[01:11] <lifeless> and I'm back to nuking getLastOops calls
[01:11] <wallyworld> what are those calls being replaced with?
[01:11] <lifeless> self.oopses[-1]
[01:11] <wallyworld> ok
[01:12] <lifeless> or direct subscription in doctests
[01:12] <lifeless> why ?
[01:12] <lifeless> I mean, if you're hacking in that area, we can collaborate
[01:13] <lifeless> I will broadcast to the list once I've sorted it all out
[01:13] <wallyworld> i was just curious
[01:13] <lifeless> kk :)
[01:13] <lifeless> so getLastOopsReport is not isolated between tests
[01:13] <lifeless> so test A can write an oops test B sees
[01:13] <lifeless> and we've had this happening
[01:14] <lifeless> its also not threadsafe
[01:14] <lifeless> (test thread A can write a report test thread B thinks is its)
[01:14] <wallyworld> yeah, i recall some issues from a while ago in this area, can't remember the details
[01:48] <rsalveti> hey, don't know if someone reported this already, but it seems launchpad is blocking new packages to be both published and built today
[01:48] <rsalveti> https://launchpad.net/~linaro-maintainers/+archive/overlay/+packages
[01:49] <rsalveti> I copied a few packages from other series today, and also pushed new ones, but they are all waiting for hours already
[01:49] <StevenK> Yes, we're debugging an issue with the PPA publisher
[01:50] <rsalveti> StevenK: ok, great
[01:50] <rsalveti> yeah, the issue is just at the publisher, just saw that the packages I pushed today were all built fine
[01:50] <rsalveti> but they're now all locked at the publisher
[01:55] <lifeless> hah
[01:55] <lifeless> mailman doc tests are not being run
[01:56] <StevenK> I thought that was by design?
[01:57] <lifeless> for our monkey patches? no
[02:21] <lifeless> wow
[02:21] <lifeless> test_uploadProcessor is full of pain
[02:41] <wallyworld> lifeless: i'm having a weird feature flag issue that has got me stumped, can you spare a minute to help a poor lost soul?
[02:42] <lifeless> sure
[02:42] <wallyworld> if i uncomment the commented out line, it works: https://pastebin.canonical.com/54429/
[02:43] <wallyworld> so something is messing with the feature flag infrastructure
[02:43] <wallyworld> to add a bit of content - i am expecting delete_allowed to be true
[02:44] <lifeless> and whats it set to in the test ?
[02:44] <wallyworld> flags = {u"disclosure.delete_bugtask.enabled": u"on"}
[02:44] <wallyworld> there are other tests which all work
[02:44] <wallyworld> but for this test, the flag cannot be found
[02:45] <lifeless> the rule :)
[02:45] <lifeless> the flag is always found, but may evaluation to None
[02:45] <lifeless> what does getFeatureFlag(
[02:45] <lifeless>             'disclosure.delete_bugtask.enabled')
[02:45] <lifeless> evaluate to ?
[02:45] <lifeless> oh, and are you sure its reaching your code?
[02:45] <wgrant> I assume the authorization cache is being populated by the owner check.
[02:45] <wallyworld> none i think, i'll check
[02:45] <lifeless> zope security caches
[02:46]  * lifeless bites back a biting commentary on using caches to solve architectural problems
[02:46] <lifeless> (and yes, I'm aware of CPU layer-N caches)
[02:47] <wallyworld> here's a test which works for example: i have a security adaptor and it is getting to that point ok
[02:47] <wallyworld> oops
[02:47] <wallyworld> paste error
[02:48] <wallyworld> https://pastebin.canonical.com/54430/
[02:48] <wallyworld> and to answer your question, getFeatureFlag('xxxx') evaluates to None
[02:48] <StevenK> I didn't think flags worked inside the security adapter.
[02:49] <wallyworld> inside the security adaptor, unless i uncomment that line
[02:49] <wallyworld> StevenK: they appear to work, at least for the tests i have which pass :-)
[02:49] <wgrant> StevenK: No reason they shouldn't.
[02:49] <wgrant> Apart from caching issues like this :)
[02:49] <wgrant> We'll see if they're relevant soon...
[02:50] <wallyworld> not sure if it's a caching issue per se
[02:50] <wgrant> It very probably is.
[02:50] <lifeless> wallyworld: I'd like to see the following: a print / pdb session in the security adapter showing the feature controller and the evaluation of the flag, and if using print, prints before and after the call so we can eliminate other entries into the codepath
[02:51] <StevenK> wgrant: Where is your obsolete-distroseries fix at?
[02:51] <wgrant> StevenK: I should finish that off today, good point.
[02:51] <wallyworld> lifeless: any particular attributes of the feature controller?
[02:52] <lifeless> nope
[02:53]  * wallyworld starts gather data
[03:02] <wgrant> mwhudson: *cough* beautifulsoup on codeimport creation forms.
[03:02] <mwhudson> wgrant: heh heh heh
[03:02] <mwhudson> is that still there?
[03:02] <wgrant> I dare not check.
[03:02] <wgrant> lib/lp/code/browser/codeimport.py:from BeautifulSoup import BeautifulSoup
[03:02] <wgrant> lib/lp/code/browser/codeimport.py:        soup = BeautifulSoup(self.widgets['rcs_type']())
[03:02] <wgrant> Yes
[03:02] <mwhudson> wgrant: beatifulsoup > re.compile(r'(?<=class=["\'])(.*)(?=["\'])') though
[03:03] <wgrant> True.
[03:03] <mwhudson> wgrant: i think the beautifulsoup thing falls into my bucket of "automatic form generation is a crock of **** and you shouldn't use it ever"
[03:09] <wgrant> mwhudson: Somewhat like ORMs that lazy-load, automatic form generation makes small things easy but inevitably screws you over completely in expensive ways.
[03:11] <wallyworld> lifeless: here's some printed data. the act of putting in the code to print the data made the test pass. https://pastebin.canonical.com/54431/
[03:13] <lifeless> 15:52 < lifeless> nope
[03:13] <lifeless> 15:53  * wallyworld starts gather data
[03:13] <lifeless> before my adsl stopped
[03:14] <wallyworld> lifeless: here's some printed data. the act of putting in the code to print the data made the test pass. https://pastebin.canonical.com/54431/
[03:14]  * StevenK starts a fund to buy lifeless some better Internets
[03:14] <lifeless> take out a hit on telecom
[03:14] <StevenK> Haha
[03:14] <wallyworld> the first printout is just before the check_permission call
[03:15] <lifeless> wallyworld: I wanted the controller itself :)
[03:15] <wallyworld> ah
[03:15] <lifeless> wallyworld: so I could see if it was falling back to a different object
[03:15]  * wallyworld tries again
[03:15] <lifeless> e.g. due to the participation / interaction being futzed or something weird
[03:15] <StevenK> lifeless: My DSL has been connected since the end of Aug
[03:15] <lifeless> interesting data point that the debug fixed it
[03:15] <StevenK> So 45 days or so
[03:16] <StevenK> lifeless: I don't recall you having issues when you were in Epping
[03:17] <lifeless> StevenK: indeed
[03:17] <lifeless> StevenK: thus, telecom.
[03:17] <lifeless> I will ring and rant soon
[03:18] <StevenK> Do you hold out much hope?
[03:19] <wallyworld> lifeless: before and inside the call, the feature controller is <lp.services.features.flags.FeatureController object at 0xf7d4b10>
[03:19] <wallyworld> it fails without all the other print statements
[03:20] <wallyworld> i'll see if i can find which print statement makes it work
[03:22] <lifeless> StevenK: if we can id the issue, yes
[03:22] <lifeless> thats mainly dependent on getting through first level 'technical' support
[03:23] <StevenK> Oh, absolutely
[03:24] <lifeless> which the rant is all about
[03:25] <wallyworld> lifeless: so calling features.getAllFlags() before the check_permission call makes it work (as well as bug.default_bugtask)
[03:25] <wallyworld> and not calling either of those 2 things makes it fail
[03:25] <wgrant> jtv: There's a regression fix for cocoplum that I'd like to deploy tonight, and it's stuck behind your translations-export-to-branch fix. Are you likely to have QA for that done in the next 4 or so hours?
[03:26] <jtv> wgrant: I think I will, but it won't be much sooner.
[03:27] <wgrant> OK. I may have to cowboy it anyway, since there's an existing possibly unclobberable cowboy there.
[03:32] <wallyworld> lifeless: narrowed it down to feature_controller.rule_source.getAllRulesAsDict() - so it seems the StormFeatureRuleSource() content is getting clobbered somehow?
[03:35] <lifeless> I wonder if you can't trigger the first feature rule lookup from within a security adapter because they don't nest? [speculation]
[03:38] <wallyworld> not sure
[03:38] <wallyworld> but it all seems rather fragile at the moment
[03:39] <lifeless> wgrant: what do you think about my speculation ?
[03:39] <wgrant> lifeless: What don't nest?
[03:40] <lifeless> wgrant: I'm trying to come up with a story that would explain wallyworlds symptoms
[03:41] <wallyworld> weird thing is that bug.default_bugtask also makes it work
[03:41] <lifeless> you probably need to step through with pdb
[03:42] <wallyworld> yeah, started to do that. lots of api calls to look at
[03:42] <lifeless> zope is phat
[03:42] <wallyworld> at least it's not something dumb i'm doing wrong (hopefully)
[03:43] <wallyworld> seems like a genuine problem with the infrastructure
[03:46] <wgrant> lifeless: Are you sure the mailman test bug is actually a bug? We deliberately don't run MailmanLayer by default, because it's crap.
[03:50] <lifeless> wgrant: we have tests that exist; they should be run, or not exist.
[03:51] <lifeless> wgrant: otherwise they -will- bitrot and -will- just accumulate debt
[03:51] <StevenK> Delete them, then
[03:51] <lifeless> I found a bug that they were not being run, but it was fix released years ago indicating they were meant to run again.
[03:51] <lifeless> StevenK: not without chatting to curtis I think
[03:52] <StevenK> I wish I had more knowledge so we could delete lib/mailman
[04:08] <wgrant> s/I/LP/ s/more knowledge/some architecture/
[04:08] <StevenK> Harsh
[04:10] <lifeless> 6 uses of getLastOops
[04:17] <lifeless> mwhudson: hey, around ?
[04:17] <lifeless> mwhudson: have a quickie on codehosting
[04:18] <mwhudson> lifeless: yep
[04:18] <lifeless> mwhudson: make_error_utility appends the pid to the oops prefix
[04:18] <wgrant> Haahahah
[04:18] <lifeless> is this for any reason *other* than oops sucking at concurrency
[04:18] <wgrant> I think we established that there isn't.
[04:18] <mwhudson> lifeless: no, i'm pretty sure that's only to avoid races
[04:18] <wgrant> It's there to avoid concurrency issues, and to confuse the shit out of everyone.
[04:18] <lifeless> mwhudson: as the processes are ephemeral, I'm checking theres not need to keep that
[04:18] <lifeless> mwhudson: great, deleted.
[04:19] <lifeless> wgrant: if you want confusing
[04:19] <lifeless> grep for setOopsToken and note the EMAIL usage
[04:19] <lifeless> mwhudson: I'm assuming pullerworker is the same ?
[04:19] <wgrant> Huh. Handy.
[04:19] <mwhudson> lifeless: yes
[04:20] <lifeless> wgrant: do you happen to know if thats semantic or just crazy
[04:20] <wgrant> lifeless: I assume crazy.
[04:20] <wgrant> But don't know for sure.
[04:21] <wgrant> Delete it and see if matsubara complains? :)
[04:25] <lifeless> aieee @ available_oops_prefixes
[04:25] <wgrant> Delete
[04:27] <lifeless> oh man
[04:27] <lifeless> I hope its not punning that with concurrency limiting
[04:36] <wgrant> lol
[04:38] <wgrant> lifeless: I think you may have projectegg set badly in python-oops-tools
[04:38] <wgrant> It currently tries to use oops-tools.settings as the settings module.
[04:38] <wgrant> Which obviously isn't going to work :)
[04:44] <wallyworld_> lifeless: i've found the problem. the setAllRules() method on StormFeatureRuleSource needs a store.flush(). Or else the rules passed into the fixture setup are not written to tge db because check_permission() has a @block_implicit_flushes decorator
[04:44] <wgrant> Hahaha
[04:45] <wallyworld_> and those other things which trigger the test tp pass must have done so because they caused a flush
[04:45] <lifeless> wgrant: thats fixed by my branch
[04:45] <lifeless> wgrant: you're welcome to review it if you want
[04:48] <wgrant> omg
[04:49] <StevenK> wgrant: Hm?
[04:49] <wgrant> The number went down :)
[04:49] <wgrant> Significantly.
[04:49] <StevenK> By 7
[04:49] <wgrant> It was actually 272 this morning.
[04:50] <StevenK> Oh, so 10
[04:55] <mwhudson> 26 days to go!?
[04:56] <wgrant> Heh
[04:57] <StevenK> mwhudson: You tell funny jokes
[04:57] <StevenK> I think I have to file one, anyway
[04:58] <StevenK> I can't see DSP:+questions in our bugs
[05:01] <mwhudson> wgrant: is it the ppa publisher that's backed up?
[05:01] <wgrant> The queries are quick, so it's unlikely to time out much.
[05:01] <wgrant> mwhudson: Was, but yes.
[05:01] <wgrant> mwhudson: It's been fixed for a few hours.
[05:02] <wgrant> And now even has scriptmonitor running on it.
[05:02] <mwhudson> wgrant: ok, how often does it run when it's not backed up?
[05:02] <wgrant> Every 5 minutes.
[05:02] <wgrant> But often only every 10.
[05:02] <wgrant> Because it's crap.
[05:02]  * mwhudson has a vague memory of */20
[05:02] <mwhudson> hah
[05:02] <mwhudson> ok
[05:02] <wgrant> It was */20 long ago
[05:36] <lifeless> wgrant: you've been spelunking a lot; whats the fastest way to tell if script X has an oops config
[06:02] <lifeless> wgrant: you've been spelunking a lot; whats the fastest way to tell if script X has an oops config
[06:03] <wgrant> lifeless: Somewhat disturbingly, despite porting dozens of scripts around to LaunchpadScript and rewriting its internals, I've not run into that bit of code.
[06:03] <lifeless> I want to check the setOopsToken('EMAIL') thing is safe when gone, if you see what i mean
[06:05] <wgrant> Oh, that's lovely.
[06:05] <wgrant> Scripts normally just call globalErrorUtility.configure('something') themselves.
[06:12] <lifeless> +214
[06:12] <lifeless> -687
[06:12] <lifeless> 1874 lines of diff
[06:12] <lifeless> and we're not done yet
[06:12] <lifeless> + it would be freakishly hard to make this separate branches
[06:13] <lifeless> I pity the fool^Wreviewer
[06:14] <lifeless> StevenK: oops -> critical
[06:14] <lifeless> StevenK: also, we don't use confirmed :)
[06:50] <lifeless> poolie: hi, can we talk about your pending writes branch briefly ?
[06:50] <poolie> sure!
[06:50] <poolie> here, or phone?
[06:50] <lifeless> either, whats your pref ?
[06:51] <poolie> let's start here
[06:51] <lifeless> so, the bug (as I read it) is that when someone pushes twice in quick succession, we don't update the merge diff properly
[06:51] <lifeless> there are a few different orders to the race condition
[06:51] <lifeless> sometimes we generate the error and update proplerl
[06:51] <poolie> i think that's how you reach it yes
[06:51] <lifeless> y
[06:52] <poolie> yes it seems so
[06:52] <lifeless> sometimes we generate the error and don't update properly
[06:52] <poolie> that may be possible
[06:52] <lifeless> I'm worried that you're papering over the issue
[06:52] <poolie> i can understand that concern
[06:52] <poolie> however
[06:52] <poolie> i think there are really two bugs
[06:53] <poolie> 1- "sometimes mp diffs are not generated if the branch is repeatedly written to"
[06:53] <poolie> 2- "launchpad sends pointless spam"
[06:53] <poolie> i'm trying to fix 2
[06:53] <poolie> i'm not sure if 1 actually exists
[06:53] <lifeless> I'm sure it does
[06:53] <lifeless> per the analysis in comment #2
[06:55] <poolie> i thought that perhaps the completion of the second write would cause a new job to be generated
[06:55] <poolie> perhaps there is some ordering where that doesn't happen
[06:55] <lifeless> can only have one job outstanding for the branch
[06:55] <poolie> at any rate i don't see how leaving bug 2 open helps us fix bug 1
[06:55] <poolie> at the moment we don't even log when this occurs!
[06:55] <lifeless> so if the first job hasn't finished erroring before the second job is created, the second job isn't made and the first job just fails.
[06:56] <lifeless> poolie: we don't generate an OOPS ?
[06:56] <poolie> no
[06:56] <lifeless> ok
[06:56] <poolie> it is telling only the users who can't do anything about it
[06:56] <poolie> unless the idea is to annoy them (me) into fixing the whole bug :)
[06:56] <poolie> which is a valid, though risky, strategy
[06:57] <lifeless> so, I think bug 1, which is the bug your branch purports to be about, is about fixing the race condition
[06:57] <_mup_> Bug #1: Microsoft has a majority market share <iso-testing> <ubuntu> <Clubdistro:Confirmed> <Computer Science Ubuntu:Confirmed for compscibuntu-bugs> <dylan.NET.Reflection:Invalid> <dylan.NET:Invalid> <EasyPeasy Overview:Invalid by ramvi> <GenOS:In Progress by gen-os> <GNOME Screensaver:Won't Fix> <Ichthux:Invalid by raphink> <JAK LINUX:Invalid> <LibreOffice:In Progress by bjoern-michaelsen> <Linux:New> <Linux Mint:In Progress> <The Linux OS P
[06:57] <lifeless> bah, 1-
[06:57] <lifeless> :)
[06:57] <poolie> my mp is only about suppressing the mail
[06:57] <poolie> i get annoyed by the mail but i never see a missing diff
[06:57] <lifeless> And also, issue 1- is the only one where the user has no control over the situation
[06:57] <poolie> because they can delete/filter the mail?
[06:58] <lifeless> poolie: because they can push content into the branch, or delete the mp if they had done something crazy
[06:58] <lifeless> I admire your desire to stop sending spam, but I don't think, except for case 1-, that these branch mails -are- spam
[06:58] <lifeless> and case 1- has an analysis of the race condition, just needs coding
[06:58] <poolie> lots of people seem to disagree :)
[06:59] <lifeless> poolie: not on that bug
[06:59] <poolie> :(
[06:59] <lifeless> poolie: in general, 'lp sends too much mail', sure : but telling you something you requested fails is useful
[06:59] <poolie> it doesn't tell you what failed
[06:59] <poolie> as james said "I don't really know what it means, which merge proposal it is referring to, or what
[06:59] <poolie> I can do about it, so I don't know why I got an email about it."
[07:00] <poolie> lp really should not be sending that
[07:00] <poolie> it's different to bugmail
[07:00] <lifeless> so, the other bug, which I've unduped, is about the lack of context
[07:00] <poolie> which one?
[07:00] <lifeless> fixing that will address some of james_w's mystery around the mail
[07:00] <lifeless> bug 640882
[07:00] <_mup_> Bug #640882: " Launchpad error while generating the diff for a merge proposal" mails don't indicate branch <code-review> <email> <lp-code> <Launchpad itself:Triaged> < https://launchpad.net/bugs/640882 >
[07:00] <poolie> ok
[07:00] <poolie> i'll just drop it
[07:01] <poolie> i'm sad because i was trying to make this a little less crap and i feel like it's being held hostage to fixing the whole thing
[07:01] <poolie> not sending pointless mail to people is a step forward
[07:01] <poolie> recording when something goes wrong is a step forward
[07:01] <lifeless> I'd love to see improvements here, I don't think masking the issue is one; fixing the issue (which should be ~ as simple as self.suspend(5 minutes) or something) would be
[07:01] <lifeless> poolie: I agree that not sending pointless mail is a step forward; and recording when it goes wrong is a step forward.
[07:01] <poolie> how is this masking it?
[07:02] <lifeless> poolie: My understanding was that you were going to squelch the email for this case, and that that was the sum of the branch
[07:02] <poolie> and i was going to log that it failed
[07:03] <poolie> ok, if just doing self.suspend(5 minutes) is enough, i'll try that
[07:03] <lifeless> I'm handwaving
[07:03] <mrevell> Hi
[07:04] <lifeless> poolie: jobs have a defer-for-a-bit system, I don't know the details.
[07:04] <poolie> i don't feel you and aaron are taking into account the actual user data here
[07:04] <lifeless> poolie: I think for the pending-writes case, logging and not emailing is fine; I agree with Aaron that the other cases are different enough not to change.
[07:04] <poolie> nobody is saying "i'm glad lp told me about this" or "that explains why my thing had no diff"
[07:04] <lifeless> poolie: I think fixing the issue, logging and not emailing is even better
[07:05] <lifeless> I feel like you are saying 'not getting email is more important than the system working'
[07:05] <lifeless> I know thats not what you mean
[07:05] <lifeless> but it kindof feels that way
[07:05] <poolie> mm
[07:06] <lifeless> I think you mean 'not sending email in this case is better even if its not fixed', and I've acked that - twice I think - above
[07:06] <lifeless> pending writes shouldn't be categorised as a user error
[07:06] <poolie> mm
[07:06] <poolie> i get more annoyance from lp spam than i do from diffs being missing
[07:07] <poolie> in that sense it's more important
[07:07] <poolie> and, generally, there are always going to be some errors, and i think handling them gracefully is important
[07:07] <lifeless> I get annoyance from devs having to spend time tracking down, *again*, a self inflicted case of user confusion
[07:07] <lifeless> :)
[07:07] <poolie> why 'self inflicted'?
[07:07] <lifeless> (self inflicted by us developers)
[07:08] <poolie> oh i see
[07:08] <lifeless> poolie: because we created a system with a race condition, classified it as user error, and tada
[07:08] <poolie> yep
[07:08] <poolie> and, i think, did not look at the actual mail that was sent
[07:08] <lifeless> this needs two changes: unclassify it as user error, and fix the race condition
[07:08] <poolie> yep
[07:08] <lifeless> and yes, the lack of context in the mail is the icing on the cake
[07:08] <poolie> if the race is as simple as just rescheduling the job i can do that
[07:09] <lifeless> I think that if the branch has pending writes, the job should just wait for it
[07:09] <lifeless> indefinitely
[07:10] <poolie> i guess the 'lack of context' bug can then apply to other mail sent about branches, if any
[07:10] <lifeless> poolie: there are, IIRC, 3 other cases for MP's where the same template is used for the email
[07:10] <poolie> i agree, though i think the "users need to trust whether lp is working" argument applies equally there
[07:10] <lifeless> poolie: cases which this bugfix won't impact
[07:10] <lifeless> poolie: well, pushing up an empty branch and proposing it for merge, *is* a user error
[07:11] <lifeless> if you get one of those mails, I think its helpful (if it told you the branch :))
[07:11] <poolie> yes, bug 640882 will be irrelevant to the specific case it complains about, but relevant to things like empty branches
[07:11] <_mup_> Bug #640882: " Launchpad error while generating the diff for a merge proposal" mails don't indicate branch <code-review> <email> <lp-code> <Launchpad itself:Triaged> < https://launchpad.net/bugs/640882 >
[07:11] <poolie> ok
[07:12] <poolie> so
[07:12] <poolie> this would have been a lot easier if someone had just said "why don't you just call self.suspend and that will probably fix it"
[07:12] <poolie> in the first place
[07:13] <lifeless> poolie: that would have been nice
[07:13] <lifeless> understand I'm handwaving, bug there is something like that there :)
[07:13] <lifeless> and I'll be happy (tomorrow) to go spelunking with you looking for it
[07:14] <poolie> it's probably fairly obvious on the base class
[07:14] <poolie> so then no oops, just deferral
[07:14] <poolie> and maybe a log message
[07:15] <lifeless> yah
[07:21] <jtv> wgrant: Q/A for those codehosting translations-export bugs is done.  Go ahead.
[07:21] <wgrant> jtv: Thanks.
[07:22] <poolie> lifeless: most of the bugs have all the issues of "no context" and "shouldn't get this mail anyhow" tangled together
[07:22] <poolie> please don't undupe them all
[07:22] <lifeless> poolie: there were two that were previously a unit, and you'd moved to the other bug, I was just restoring the,
[07:25] <lifeless> poolie: (i.e. I've no more tweaking planned on these bugs)
[07:26] <StevenK> lifeless: The bug I filed is not the cause of the OOPS -- that is already filed.
[07:26] <StevenK> lifeless: I used High since the bug is *shown* in the OOPS, but isn't the cause.
[07:26] <lifeless> StevenK: ah, that wasn't clear to me. Sorry for creating noise.
[07:26] <StevenK> lifeless: Should I set it back to High, then?
[07:27] <lifeless> StevenK: up to you; lazy evaluation and timeouts can be nonobvious - we may well have timeouts due to that bug anyhow
[07:27] <lifeless> (it is a timeout isn't it ?)
[07:27] <StevenK> lifeless: Yes, but the timeout is due to the direction=backward madness
[07:28] <lifeless> ah right, a clear cause :)
[07:28] <StevenK> jtv: Bug 375013 is not marked OK, but 812500 is
[07:28] <_mup_> Bug #375013: Cannot commit directly to a stacked branch <commit> <launchpad> <lp-translations> <qa-ok> <rodeo2011> <stacking> <Bazaar:Fix Released by jameinel> <bzr-builder:Fix Released by jelmer> <Launchpad itself:Fix Committed by jtv> < https://launchpad.net/bugs/375013 >
[07:28] <jtv> StevenK: I guess one of my changes didn't come through.  Hang on.
[07:29] <poolie> lifeless: i think the other thing here is https://bugs.launchpad.net/launchpad/+bug/483945
[07:29] <_mup_> Bug #483945: No way to ask Launchpad to refresh a stale diff <code-review> <lp-code> <mp-preview-diff> <openstack> <trivial> <Launchpad itself:Triaged> < https://launchpad.net/bugs/483945 >
[07:29] <poolie> to give people a way tor ecover
[07:29] <lifeless> poolie: that would be nice
[07:29] <jtv> StevenK: actually, it did come through.  So the deployment report simply hasn't picked it up yet.
[07:30] <jtv> StevenK: the one that's not marked OK yet is the one I updated last, IIRC.  Here's hoping this is not a problem with multiple bugtasks.
[07:53] <StevenK> Can I get a review? https://code.launchpad.net/~stevenk/launchpad/dsp-questions-statement-death/+merge/79519
[08:04] <adeuring> goood mornin
[08:07] <StevenK> adeuring: Hi, I know it's not your OCR day, but would you mind a small (+22/-5) review? https://code.launchpad.net/~stevenk/launchpad/dsp-questions-statement-death/+merge/79519
[08:07] <lifeless> wgrant: hi
[08:07] <lifeless> wgrant: https://code.launchpad.net/~lifeless/python-oops-tools/amqp/+merge/79505
[08:07] <lifeless> adeuring: StevenK: I've reviewed that branch
[08:07] <lifeless> wgrant: baaah
[08:07] <StevenK> Oh, have you?
[08:07] <lifeless> StevenK: https://code.launchpad.net/~lifeless/python-oops-tools/amqp/+merge/79505
[08:07]  * StevenK refreshes
[08:08] <StevenK> lifeless: Haha
[08:08] <StevenK> lifeless: Fine, I'll look at your MP then. :-P
[08:09] <StevenK> lifeless: If you look at the OOPS attached to the bug it did 1,200 SPN queries
[08:10] <adeuring> StevenK: anyway, a nice branch!
[08:10] <lifeless> StevenK: yes, crazy shit man
[08:11] <StevenK> adeuring: :-)
[08:14] <StevenK> lifeless: r=me
[08:15] <lifeless> thanks!
[08:15] <stub> How do I set the submit branch?
[08:16] <StevenK> bzr push --remember
[08:16] <wgrant> That's the push branch.
[08:16] <StevenK> Bah
[08:17] <wgrant> wgrant@lucid-test-lp:~/launchpad/lp-branches/bug-876171$ grep -A1 lp-branches.$ ~/.bazaar/locations.conf
[08:17] <wgrant> [/home/wgrant/launchpad/lp-branches]
[08:17] <wgrant> submit_branch = bzr+ssh://bazaar.launchpad.net/~launchpad-pqm/launchpad/devel
[08:17] <lifeless> the default submit branch is the parent branch isn't it ?
[08:17] <wgrant> I think that applies in some places but not others.
[08:17] <stub> I gotta weird one from an old branch and nothing in locations.conf
[08:18] <wgrant> stub: .bzr/branch/branch.conf?
[08:18] <stub>   submit branch: bzr+ssh://bazaar.launchpad.net/%2Bbranch/losa-db-scripts/
[08:18]  * stub looks
[08:19]  * StevenK peers at the e-mail he just got from PQM
[08:19] <stub> yer - in there.
[08:19] <wgrant> StevenK: Did lifeless set the wrong default reviewer?
[08:19] <StevenK> PQMException: 'Failed to verify signature: gpgv exited with error code 2
[08:20] <wgrant> Yeah.
[08:20] <StevenK> I don't think I caused that
[08:20]  * wgrant looks.
[08:20] <wgrant> You sent email to it.
[08:20]  * wgrant fixes.
[08:20] <StevenK> Oh, the project is broken.
[08:21] <wgrant> The branch, specifically.
[08:21] <StevenK> Right
[08:52] <nigelb> Morning everyone
[08:52] <nigelb> On second thoughts, evening :)
[08:59] <rvba> lifeless: Hi Rob, may I email you with the details of a test isolation failure I'm fighting with?  Maybe you'll have an idea on what's going on.
[09:01] <lifeless> sure
[09:02] <rvba> Thanks :)
[09:02] <rvba> This is on a branch that is really not urgent.
[09:07] <lifeless> \o/ \o/ \o/ \o/ \o/ \o/ \o/ \o/ \o/ \o/ \o/ \o/ \o/ \o/ \o/ \o/ \o/ \o/ \o/ \o/ \o/ \o/ \o/ \o/
[09:07] <wgrant> Oh?
[09:07] <lifeless> (I have a loopback test grabbing oopses off of amqp into TestCase.oopses
[09:08] <wgrant> oh, nice.
[09:08] <lifeless> the end is in sight
[09:11] <lifeless> rvba: I will look tomorrow
[09:11] <rvba> lifeless: Thank you.
[09:11] <lifeless> rvba: if you want to look in the interim, I'd question the state that rabbit is in when the test starts
[09:12] <rvba> lifeless: hum, right, I'll have a look.
[09:13] <lifeless> rvba: I haven't looked at your failures yet but
[09:13] <lifeless> I note that 'session' is a thread local
[09:14] <lifeless> rvba: and they are generally trouble
[09:14] <lifeless> rvba: in particular, I don't see anythin in RabbitSession to handle rabbit going down
[09:14] <nigelb> bigjools: lol @ cricket :D
[09:15] <lifeless> rvba: currently, as I read the code, the first reset of rabbit will break all LP appservers trying to use rabbit
[09:15] <lifeless> bigjools: ^ you might like to file a bug on that, for fixing before we go love.
[09:15] <lifeless> *live*
[09:15] <bigjools> nigelb: meh
[09:15] <lifeless> bigjools: (or tell me I'm wrong :))
[09:15] <nigelb> muhahaha
[09:16] <rvba> lifeless: that's the reason why we wanted to isolation rabbit's failures as much as possible.
[09:16] <rvba> s/isolation/isolate/
[09:16] <bigjools> nigelb: it's probably the dodgy food upsetting them
[09:16] <lifeless> rvba: sure, but we need to handle things like rabbit being reset
[09:17] <lifeless> rvba: the problem is that RabbitSession assumes it *never* goes away, but the Unreliable variant assumes that *any error doesn't matter*
[09:17] <lifeless> rvba: there is a middle ground
[09:17] <lifeless> where an EPIPE is handled by reconnecting, but other errors cause a failure
[09:18] <rvba> lifeless: true, but since we don't know where this middle ground is we wanted to use the Unreliable variant and watch what was happening (hence the agressive oops logging).
[09:18] <rvba> lifeless: that's the purpose of the branch in question.
[09:19] <lifeless> ok, so I can give a -little- guidance (I'm a novice here too :P) about the middle ground
[09:19] <lifeless> there are two broad cases I see
[09:20] <lifeless> firstly if rabbit goes away and comes back, the first transaction *after* it comes back it should start working
[09:20] <lifeless> this is testable (see the oops_amqp tests for inspiration)
[09:20] <lifeless> secondly, if rabbit goes away and comes back, messages queue for send-at-end during the current transaction should also be sent
[09:21] <lifeless> (also testable :P) - only if it goes and stays gone, should we be going into zomg mode
[09:21] <lifeless> rvba: I think aggressive logging and oopsing is great as well
[09:21] <lifeless> rvba: these two things just seem like high frequency cases we can anticipate
[09:21] <rvba> lifeless: makes sense.
[09:23] <lifeless> wgrant: btw the rabbit-management package should really put the cli on sbin
[09:23] <rvba> lifeless: I'll file bugs about these two cases.
[09:23] <lifeless> rvba: thanks!
[09:24] <rvba> lifeless: thank *you* ;)
[09:24] <lifeless> rvba: I have a minor tweak to rabbit.py in lp:~lifeless/launchpad-useoops - dunno when the branch will land, but thought a headsup might be useful
[09:24] <rvba> Okay, I'll have a look.
[09:24] <lifeless> bah
[09:24] <lifeless> lp:~lifeless/launchpad/useoops
[09:25] <wgrant> lifeless: Probably, yeah.
[09:25] <lifeless> for now, -> bed
[09:25] <wgrant> lifeless: File a bug against the PPA oh wait :)
[09:25] <wgrant> Night.
[09:25] <rvba> Night lifeless.
[09:25] <lifeless> tomorrow I shall wire up subprocesses sending to amqp
[09:25] <lifeless> call sync() on the 7 uses of getLastOopsReport
[09:26] <lifeless> and purge purge purge purge purge purge
[11:00] <mrevell> Hey henninge, how's it going?
[11:00] <henninge> Hi mrevell!
[11:01] <henninge> mrevell: Going well, thanks!
[11:01] <mrevell> Good to hear :)
[11:04] <henninge> mrevell: I have been looking for a place to find out who was hired to replace me. Wasn't there a squad listing on the dev or help wiki?
[11:04] <mrevell> henninge, That person hasn't joined yet.
[11:04] <henninge> ah
[11:04] <henninge> hard to replace me, I know ...
[11:04] <henninge> :-P
[11:04] <mrevell> henninge, As for the page... let me look
[11:04] <mrevell> heh, true true :)
[11:05] <mrevell> henninge, There's this: https://dev.launchpad.net/Squads
[11:05] <henninge> oh, that simple
[11:06] <henninge> stupid moin, searching for "squads" does not yield that page.
[11:06] <henninge> ah, case-sensitive search
[11:06] <henninge> mrevell: thanks! ;)
[11:07] <mrevell> henninge, Heh. We really need to sort out the dev wiki. We have a new Usability and Communications Specialist joining soon who will help us with that.
[11:07] <henninge> cool
[13:02] <deryck> Morning, everyone.
[13:16] <abentley> benji: could you please review https://code.launchpad.net/~abentley/launchpad/mustache-bugs/+merge/79441 ?
[13:16] <benji> abentley: gladly
[13:22] <bigjools> anyone else notice that DatabaseLayer is massively slower to start up these days?
[13:31] <deryck> adeuring, ping for standup.
[13:31] <adeuring> deryck: ok, soory
[13:35] <abentley> deryck: https://code.launchpad.net/~abentley/launchpad/mustache-bugs/+merge/79441
[13:51] <deryck> abentley, got it, thanks.
[14:24] <benji> abentley: the branch looks good; I did note one small thing I think you'll want to change
[14:24] <abentley> benji: what's that?
[14:24] <benji> abentley: I suspect you want to remove the "<em>Server-side mustache</em>" and "<em>Client-side mustache</em>" bits as well as making the client/server rendering conditional on the feature flag.
[14:26] <abentley> benji:   The client-server rendering is already conditional on the feature flag.  I don't want to remove the multiple copies yet, because I need to be able to QA all the renderings.
[14:27] <abentley> benji: See line 450 of the patch for where the rendering is made conditional.
[14:29] <benji> abentley: right, but it renders both versions; does the "normal" version get rendered if the feature flag is off?
[14:29] <abentley> benji: Yes, the normal version gets rendered regardless of the feature flag status.
[14:29] <benji> if so, I guess the feature flag is (at least at this moment) more about being able to do in-production QA and not about being able to really turn the feature on and off
[14:30] <abentley> benji: Right, the feature is only for the team developing it at the moment.
[14:31] <benji> abentley: k; I might have made that conditional on a query string instead of a feature flag, but I've heard that some people aren't me ;)
[14:32] <abentley> benji: This is part of an actual feature that is planned to take 4-6 weeks to complete, and the flag will prevent work on that feature from affecting normal users while we are working on it.
[14:36] <sinzui> gary_poster, bigjools: can either of you allocate someone to fix bug 870130
[14:36] <_mup_> Bug #870130: shortlist error requesting recipe build <easy> <oops> <recipe> <Launchpad itself:Triaged> < https://launchpad.net/bugs/870130 >
[14:36] <jelmer> hi benji, can I add another branch to your queue?
[14:36] <benji> jelmer: certainly
[14:37] <bigjools> sinzui: I can
[14:37] <gary_poster> bigjools, if someone on your side could grab it...thank you bigjools :-) .
[14:37] <bigjools> :)
[14:39] <gmb> Can someone who knows something about the lazr.restfulclient .deb confirm or invalidate this bug: https://bugs.launchpad.net/lazr.restfulclient/+bug/876445
[14:39] <_mup_> Bug #876445: missing lazr.uri dependency in the debian/ubuntu package <lazr.restfulclient:New> < https://launchpad.net/bugs/876445 >
[14:39] <gmb> (Or tell me how to)
[14:41] <benji> gmb: I don't know anything about the .deb but if it doesn't include a dependency on lazr.uri, then it should
[14:42] <gmb> benji: Agreed, but I don't know where to look to check. I could grab the source package I suppose...
[14:43] <gmb> benji: And whaddaya know, it doesn't have that dependency.
[14:43] <jelmer> benji: Thanks - the MP is @ https://code.launchpad.net/~jelmer/launchpad/stacked-code-imports-newer-bzr/+merge/79538
[14:44] <rvba> benji: Hi, could you please review this js branch? https://code.launchpad.net/~rvb/launchpad/bug-869221-archindepwidget/+merge/79561
[14:44] <benji> rvba: sure
[14:44] <gmb> benji: See, I like this arrangement, other people get to do the thinking and I look like I've done some actual work.
[14:44] <gmb> Thanks :)
[14:44] <rvba> benji: Thank you.
[14:45] <benji> gmb: I do too, I just have to type and other people do the work
[14:46] <gmb> benji: It's symbiotic development at its best.
[14:46] <benji> gmb: I'm reminded of this xkcd: http://xkcd.com/722/
[14:46] <gmb> Heh :)
[14:51] <sinzui> jcsackett, do you have time to mumble in the new purple channel
[14:51] <jcsackett> sure.
[14:52]  * jcsackett goes to find the new purple channel.
[14:57] <gary_poster> sinzui, hi.  Thank you for the html5browser review and help.  On a slightly related subject, Francis wanted me to offer my services to your squad to help you with the new yuixhr tests, if that is of value.  They should be of particular interest to feature squads.  I'm happy to help however you think might be appropriate, including phone calls and/or more/better documentation.
[14:57] <gary_poster> Once the html5browser changes are live, I can (try to re-) land some simplifications and improvements to the yuixhr usage.  After that would probably be the right time to really dig in.
[14:57] <gary_poster> (That's all I had to say; just making the offer.)
[14:58] <sinzui> gary_poster, otp
[14:58] <gary_poster> cool
[15:29] <nigelb> HAHA
[15:29] <nigelb> sinzui: You guys did take the name I suggested!
[15:30] <nigelb> (I suggested Teal Assasins) :P
[15:30] <sinzui> nigelb, Then my thanks go to you and wallyworld_ who contributed the colour
[15:31] <nigelb> hehe
[15:31] <nigelb> I'm still laughing uncontrollably :P
[16:02] <flacoste> sinzui: shouldn,t it be Aubergine Assassins ;-)
[16:02] <sinzui> gary_poster, bigjools: We have a spam project in the review list given to ~registry. The registrant must be suspended too. There may be other spams projects I did not see in my commercial view.
[16:03] <gary_poster> sinzui, bigjools, I'll take it
[16:03] <nigelb> flacoste: heh
[16:03] <bigjools> Dammit I wanted to rename to purple!
[16:03] <gary_poster> I'll start my CHR run in a few, and include that in it
[16:03] <sinzui> flacoste, I pondered that. I didn't want people thinking we we will fix every Canonical stakeholder issue
[16:04] <sinzui> We can fix community bugs too
[16:04] <nigelb> heh
[16:08] <nigelb> flacoste: if they were named aubergine, you could sacrifice that squad to stakeholders :P
[16:09] <abentley> bigjools: You can rename to aubergine.
[16:09] <bigjools> I shall think of better
[16:10] <nigelb> cyan?
[16:10] <flacoste> Pink!
[16:12] <sinzui> bigjools, Teal is available now
[16:12] <bigjools> flacoste: oooo get you
[16:13] <bigjools> sinzui: lol
[16:13] <bigjools> Taupe Twits
[16:14] <nigelb> bigjools: *cough* cricket :P
[16:14] <nigelb> bigjools: You can still take "Men in blue"
[16:14] <bigjools> nigelb: how many games did you win over here all summer, again?
[16:14] <nigelb> haha
[16:14] <bigjools> ;)
[17:10] <mrevell> Night all
[17:37] <lifeless> morning
[17:42] <lifeless> sinzui: you might like to rename your lp team
[17:42] <lifeless> from 'green'
[17:43] <sinzui> lifeless, I renamed the teal team before the annoucement
[17:43] <sinzui> is there a green team?
[17:43] <lifeless> sinzui: there is a wiki link
[17:43] <lifeless> I saw it in one of the pages you edited
[17:44] <sinzui> I will hunt those down
[18:37] <lifeless> flacoste: yo
[18:41] <flacoste> hi lifeless
[18:42] <flacoste> lifeless: do you want to talk now?
[18:42] <lifeless> that would be cool :)
[19:20]  * deryck goes offline for long late lunch, back later
[19:45] <micahg> benji: got a minute to chat about bug 876594?  I think this has the potential for ill will and is a regression in this case
[19:45] <_mup_> Bug #876594: rejected builds for synced packages send mail to Debian maintainer <derivation> <Launchpad itself:Triaged> < https://launchpad.net/bugs/876594 >
[19:46] <benji> micahg: sure, I'm glad to discuss it.  (Thanks for letting me know it's a regression, I'll update the bug.)
[19:46] <micahg> benji: I think it is, cjwatson could probably verify
[20:03] <cjwatson> micahg: sounds like it, yes.  We should not be mailing Debian maintainers for activity in Ubuntu just because their name was on the package we synced
[20:03] <cjwatson> (to see why not, imagine the full set of Debian derivatives)
[20:04] <micahg> benji: yeah ^^, so what you did was fine, thanks
[20:05] <benji> micahg: my pleasure
[20:05] <micahg> cjwatson: thanks
[22:06] <huwshimi> Morning all
[22:33] <elmo> are you guys ever going to implement the critical/high split?  the number in /topic is depressing me and I'm not even on the LP team
[22:33] <Beret> heh
[22:33] <StevenK> elmo: I hope so. Everytime I look through the Critical list, I get depressed.
[22:37] <lifeless> elmo: what critical/high split ?
[22:38] <elmo> lifeless: I thought there was discussion on splitting the 273 criticals into critical/high, with critical being reserved for the genuinely holy-crap-wtf crticials
[22:39] <lifeless> elmo: there is discussion, there isn't resolution
[22:40] <lifeless> elmo: we're analysing *why* we are generating so many new criticals
[22:40] <lifeless> elmo: thats a crucial step in addressing the issue, because we're fairly good at closing them.
[22:42] <elmo> lifeless: *shrug* k - just seems crazy to me - you de facto can't have 273 criticals, some of those must be more critical than others
[22:42] <elmo> i.e. the ones you're working on now and next
[22:43] <elmo> rather - you de facto can't have 273 criticals which are actually critical by the definition most people understand 'critical' to mean
[22:43] <lifeless> elmo: so we have a policy that says things that are broken are critical, plus things stakeholders have escalted are critical
[22:43] <lifeless> -all- the criticals meet one of those definitions
[22:44] <lifeless> yes its a problem, yes folk can't be working on 273 things in parallel, but the root issue here is that we are breaking things faster than we're fixing them
[22:46] <lifeless> if we shuffle all our highs to medium/low criticals to high and then pick a subset of brokenness to treat as critical, the problem doesn't get any better
[22:46] <elmo> I'm not saying that's not an issue
[22:46] <elmo> but I think it's orthogonal to the issue I'm unsolicitedly whining about
[22:47] <elmo> you could still figure out the breaking faster than fixing thing, if stuff got shuffled, e.g. by treating escalations as high
[22:47] <lifeless> politically, we want to work on escalations before we've fixed all thats broken
[22:47] <lifeless> (and escalation are ~5% of maintenance work atm, IIRC)
[22:48] <elmo> ok, sure
[22:49] <elmo> I guess what I'm saying is - all the other stuff aside, having non-now/next items in critical strikes me as insane, and doing it over such a long period of time has to be demotiviational
[22:49] <elmo> but *shrug*, that's just MO
[22:50] <elmo> I'll shut up and go back to my own criticals ;-)
[22:50] <lifeless> so, what is non-now/next in the critical bucket ?
[22:51] <elmo> I'm talking non-now/next in the kanban sense
[22:51] <elmo> at least AIUI
[22:52] <elmo> which means you can't possibly have a capacity that would allow you to fit all 273 into now + next
[22:52] <elmo> which means you've got X on the go and X that you'll be doing next
[22:52] <elmo> and that X+X << 273
[22:52] <lifeless> right, kanban wise if each engineer has a WIP limit of 2, now+next = 10*2*2 -> 40 items
[22:53] <elmo> where did 10 come from?
[22:53] <lifeless> but that aspect of kanban w.r.t. software development is primarily about directed-effort, this is category..
[22:53] <lifeless> squad size 5, maintenance squad count 2
[22:53] <elmo> ah
[22:55] <lifeless> so squads * members * WIP * (now + next=2)
[22:56] <StevenK> Er, we have 4 squads?
[22:56] <lifeless> 2 on maintenance
[22:56] <lifeless> feature squads are not working from the critical queue
[22:56] <StevenK> [09:53] < lifeless> squad size 5, maintenance squad count 2
[22:57] <lifeless> StevenK: yes?
[22:57] <StevenK> Oh, the number of people in the squad
[22:57] <lifeless> yes
[22:57]  * StevenK goes back to breaking the archive.
[23:02] <jelmer> hah, there is my punishment for deprecating Branch.revision_history
[23:02] <jelmer> lots of things in Launchpad seem to rely on it
[23:03] <wallyworld_> thumper: that utf-8 branch mail patch is deployed. can you let me know if you see any issues come up
[23:04] <lifeless> jelmer: also pqm
[23:04] <thumper> wallyworld_: awesome, thanks
[23:04] <lifeless> jelmer: would love a patch for that too
[23:04] <wallyworld_> thumper: np. and thanks for not mentioning the rugby :-(
[23:04] <thumper> wallyworld_: np
[23:11] <lifeless> elmo: when we restart rabbit in the dc, how long does it take to come back up?
[23:11] <lifeless> elmo: also, when we restart a server of the type its going to be on, how long would it be down for?
[23:12] <elmo> lifeless: I start to get worried if a DL380 takes > 100s to come back
[23:12] <lifeless> elmo: I'm writing some code that I'd like to be resilient to rabbit getting maintained, but to not assume it will always come good eventually
[23:12] <lifeless> elmo: so I'd like to have a timeout on its retry-connections duration, after which it will bail out and exit
[23:12] <lifeless> elmo: does this sound reasonable /
[23:12] <lifeless> (its the suck OOPSes from AMQP and push them into the oops-tools database code)
[23:13] <elmo> lifeless: restarting rabbit on a random staging box took ~4s
[23:13] <lifeless> ok, so ignoring truely exception things like box rebuilds, where we'd reconfigure to a different rabbit anyway, 2 minutes as a timeout seems sufficient ?
[23:14] <lifeless> elmo: and does this approach sound ok from your ops perspective, or would you rather just fail-hard, or never-fail policies ?
[23:14] <elmo> lifeless: is this oops, or in general?
[23:14] <lifeless> this specific code is in oops_amqp so yes, oops specific
[23:15] <lifeless> for the LP appservers we have clear transaction boundaries we can retry on, and disable rabbit if its not up at the start of the transaction
[23:15] <lifeless> I guess the txlongpoll service has/needs something similar to this as well - i haven't audit its code.
[23:15] <lifeless> wgrant: what does the txlongpoll twistd daemon do if rabbit is away / goes away ?
[23:16] <elmo> lifeless: does this mean that an app server could stall for up to 2m trying to talk to rabbit?
[23:16] <lifeless> elmo: no
[23:16] <lifeless> elmo: for oopses, the sender fails fast
[23:16] <lifeless> elmo: this is for the slave, the oops-tools side of it
[23:16] <elmo> hmm
[23:17] <wgrant> lifeless: Not sure.
[23:17] <lifeless> elmo: which has nothing tickling it to say 'try again now' because its only interface is rabbit pushing stuff to it
[23:17] <elmo> lifeless: so the thing trying to consume off the queue?
[23:17] <lifeless> yes
[23:17] <elmo> ok
[23:17] <elmo> how would a 'never-fail' policy work?
[23:17] <elmo> infinite retry?
[23:18] <lifeless> yes
[23:18] <lifeless> which makes me nervous :)
[23:18] <elmo> ok, I think I prefer the 2m-timeout or fail-hard
[23:18] <elmo> I don't mind which
[23:18] <elmo> I guess 2m-timeout is a little more forgiving
[23:18] <lifeless> just means a little more self healing
[23:18] <lifeless> one less zomg chase after a reboot of rabbit
[23:18] <elmo> self-healing would be nice
[23:18] <elmo> yeah
[23:18] <elmo> because apparently I just broke U1 staging
[23:18] <lifeless> \o/
[23:18] <elmo> with my test restart of rabbit
[23:19] <lifeless> its bug filing time :)
[23:19] <elmo> so, if we could avoid that same brain damage in LP, I'd appreciate it
[23:19] <lifeless> yup
[23:19] <lifeless> we filed bugs about this for the new stuff last night, when I read the current code
[23:19] <lifeless> as I say, I haven't checked txlongpoll yet
[23:20] <lifeless> its probably ok, if its written naively, it will only connect when a js session connects, and not share channels/connections, which scales poorly but would make it robust around this