[15:58] <intellectronica> me
[16:00] <Ursinha> haha
[16:00] <jtv> hi Ursinha
[16:00] <intellectronica> hihi
[16:00] <jtv> hi intellectronica
[16:00] <Ursinha> #startmeeting
[16:00] <Ursinha> Welcome to this week's Launchpad Production Meeting. For the next 45 minutes or so, we'll be coordinating the resolution of specific Launchpad bugs and issues.
[16:00] <Ursinha> [TOPIC] Roll Call
[16:00] <Ursinha> Not on the Launchpad Dev team? Welcome! Come "me" with the rest of us!
[16:00] <MootBot> Meeting started at 10:00. The chair is Ursinha.
[16:00] <MootBot> Commands Available: [TOPIC], [IDEA], [ACTION], [AGREED], [LINK], [VOTE]
[16:00] <MootBot> New Topic:  Roll Call
[16:00] <sinzui> me
[16:00] <Ursinha> me
[16:00] <Ursinha> !
[16:00] <herb> me
[16:01] <rockstar> ni!
[16:01] <jtv> me
[16:01] <Ursinha> lol
[16:01] <Ursinha> I demand a SHRUBBERY!
[16:01] <mars> me
[16:01] <jtv> (the rest of the Translations team is out on vacation.  Presumably I'll get a T-shirt)
[16:01] <Ursinha> matsubara, hi
[16:01] <Ursinha> lol
[16:01] <Ursinha> bigjools, hi
[16:01] <bigjools> me, still on TL call
[16:01] <Ursinha> okay, nothing to soyuz today
[16:02] <Ursinha> from oops land
[16:02] <Ursinha> stub, hi
[16:02] <matsubara> me
[16:02] <stub> me
[16:02] <Ursinha> matsubara, welcome back
[16:02] <matsubara> thanks Ursinha
[16:02] <Ursinha> [TOPIC] Agenda
[16:02] <Ursinha>  * Actions from last meeting
[16:02] <Ursinha>  * Oops report & Critical Bugs & Broken scripts
[16:02] <Ursinha>  * Operations report (mthaddon/herb/spm)
[16:02] <Ursinha>  * DBA report (stub)
[16:02] <MootBot> New Topic:  Agenda
[16:02] <Ursinha> [TOPIC] * Actions from last meeting
[16:02] <Ursinha>     * matsubara to chase rockstar about failure on updatebranches script
[16:02] <Ursinha>     * stub to get RC for branch that fixes bug 403283
[16:02] <Ursinha>         * landed in r8319
[16:02] <Ursinha>     * ursinha do file bug for OOPS-1300XMLP5
[16:02] <Ursinha>         * filed https://bugs.edge.launchpad.net/launchpad-foundations/+bug/403606
[16:02] <MootBot> New Topic:  * Actions from last meeting
[16:02] <Ursinha>     * Ursinha to keep one eye on UnicodeDecodeErrors, and will report back next meeting
[16:02] <Ursinha>         * we're having less errors, Salgado will try to fix bug 61171
[16:02] <Ursinha>     * mars to take a look at bug 354593
[16:02] <Ursinha>     * matsubara to chase salgado about people pruning script
[16:03] <jtv> Ursinha: We're having "fewer" errors :-P
[16:03] <stub> The person pruner is running on production. It will take 3 or 4 weeks to complete.
[16:03] <matsubara> Ursinha, I haven't had time to do that this week as I was on vacation
[16:03] <Ursinha> thanks jtv :P
[16:03] <Ursinha> matsubara, okay
[16:04] <matsubara> please, re-add to the list and I'll chase
[16:04] <Ursinha> [action] matsubara to chase rockstar about failure on updatebranches script
[16:04] <mars> stub, wow
[16:04] <MootBot> ACTION received:  matsubara to chase rockstar about failure on updatebranches script
[16:04] <matsubara> Ursinha, the other one about the pruning script too
[16:05] <Ursinha> [action] matsubara to chase salgado about people pruning script
[16:05] <MootBot> ACTION received:  matsubara to chase salgado about people pruning script
[16:05] <stub> Chase what? it is running.
[16:05] <Ursinha> but what stub said?
[16:05] <Ursinha> yes, yes
[16:05] <rockstar> Yes, what stub said.
[16:05] <Ursinha> matsubara, ^
[16:05] <salgado> stub, don't we need to pass --experimental to it?
[16:06] <stub> It is - I filed an rt and chased it through with spm.
[16:06] <stub> And monitoring too
[16:06] <matsubara> ok, so that's sorted then :-)
[16:06] <Ursinha> [action] remove last matsubara item as stub already reported it's ok]
[16:06] <MootBot> ACTION received:  remove last matsubara item as stub already reported it's ok]
[16:06] <matsubara> :-)
[16:06] <matsubara> thanks Ursinha
[16:06] <Ursinha> matsubara, np :)
[16:06] <Ursinha> mars, did you have the time to look at bug 354593?
[16:08]  * Ursinha wonders if mars is on the TL call as well
[16:08] <mars> nope
[16:08] <rockstar> Why is there a TL call this early?
[16:08] <Ursinha> *whew*
[16:08] <mars> I did not have a chance to address it
[16:08] <Ursinha> oops section is all foundations today
[16:08] <rockstar> I doubt my TL is on that TL call...
[16:08] <bigjools> he's not
[16:09] <Ursinha> well, mars, can you do that then, please? :)
[16:09] <mars> stub, would you be able to take the SSO bug?  Or find someone who has time to address it?
[16:09] <stub> The branding one?
[16:09] <Ursinha> stub, yes sir
[16:10] <mars> yes
[16:10] <Ursinha> stub, bug 354593
[16:10] <stub> I can try. I haven't done UI stuff for a long time though so it will be slow.
[16:11] <Ursinha> mars, ^
[16:11] <mars> Ursinha, ?
[16:12] <mars> stub, I can give you a hand with it
[16:12] <Ursinha> okay then
[16:13] <Ursinha> [action] stub to give a try on bug 354593 with mars help if needed
[16:13] <MootBot> ACTION received:  stub to give a try on bug 354593 with mars help if needed
[16:13] <Ursinha> moving on
[16:13] <Ursinha> [TOPIC] * Oops report & Critical Bugs & Broken scripts
[16:13] <Ursinha> mars, I have two bugs and one oops for foundations
[16:13] <Ursinha> mars, can you triage bug 403606, and take a look at it, if possible?
[16:13] <Ursinha> mars, also, OOPS-1307J16 shows an AssertionError without any description
[16:13] <Ursinha> and finally
[16:13] <MootBot> New Topic:  * Oops report & Critical Bugs & Broken scripts
[16:13] <Ursinha> mars, do you know if someone could have some time to fix bug 310818? we had some weird timeouts today and the oops report was borked
[16:14] <mars> Ursinha, I'll have to ask the team
[16:14] <Ursinha> mars, about what exactly? :)
[16:15] <Ursinha> btw, thanks for helping with the Unicode issues debugging last week, was very useful
[16:16] <mars> Ursinha, to see who on foundations has time to address the issue - I would guess gary or stub, but I don't know how heavily committed they are at the moment
[16:17] <Ursinha> mars, the bug 310818?
[16:17] <stub> I was going to look at it but someone designated me victim for the SSO exception stuff ;)
[16:17] <Ursinha> haha
[16:17] <gary_poster> Heh, It feels like I have a backlog to kingdom come, and I keep getting more. :-)  But if this is urgent, then it's urgent
[16:17] <stub> I can probably do both - what should be done first Ursinha?
[16:17] <Ursinha> stub, the oops one, I'd suggest
[16:18] <Ursinha> as I said earlier we had some weird timeouts and it would be nice to be able to debug them if they happen again
[16:19] <Ursinha> [action] stub to fix bug 310818
[16:19] <mars> Ursinha, sorry, I was reading the Expat error one.  That sounded like something gary would be able to address, since it starts in the heart of Zope, and has to do with exception bubbling through the architecture
[16:19] <MootBot> ACTION received:  stub to fix bug 310818
[16:19] <Ursinha> mars, right. gary_poster, want to fix that? :)
[16:20] <mars> gary_poster, ^ does that make sense?
[16:20] <gary_poster> looking
[16:21] <mars> Ursinha, I'll look at OOPS-1307J16
[16:21] <Ursinha> [action] mars to take a look at OOPS-1307J16
[16:21] <MootBot> ACTION received:  mars to take a look at OOPS-1307J16
[16:21] <Ursinha> thanks mars
[16:21] <Ursinha> mars, I'd open a bug but had no clue about what happened
[16:22] <gary_poster> Ursinha, not precisely clear on the goal but we can talk later.  I'm guessing this is low priority.  I'll take it.
[16:22] <gary_poster> (bug 403606)
[16:23] <Ursinha> thanks gary_poster
[16:23] <Ursinha> gary_poster, the point is that it fills the oops reports with those
[16:23] <Ursinha> it would be great to get rid of them
[16:23] <gary_poster> and they just mean that somebody is sending us bad XMLRPC, afaict, and we don't want to care?
[16:23] <mars> gary_poster, it would be nice if the ExpatError was, say, turned into a 400 HTTP status code
[16:24] <mars> right
[16:24] <Ursinha> creating a section on the summary or moving them to dev/null was told to be not a good idea (I agree)
[16:25] <gary_poster> ack, ok.
[16:25] <Ursinha> gary_poster, do you think it's to painful to fix?
[16:26] <Ursinha> *too
[16:26] <Ursinha> [action] Ursinha to learn to type
[16:26] <MootBot> ACTION received:  Ursinha to learn to type
[16:26] <gary_poster> Ursinha: It will involve changing zope publication machinery.  Doing so will mean either hacking our zope tree, which we are really trying not to do; or migrating to a newer version of the publication machinery, which should wait on the Zope-buildbot work that I keep not finishing, and then will be a migration exercise, possibly accompanied with a negotiate-with-upstream exercise.
[16:26] <gary_poster> lol
[16:27] <gary_poster> so...
[16:27] <gary_poster> the only quick and easy fix is the hack
[16:27] <Ursinha> I see.....
[16:27] <gary_poster> that I would be hoping to eliminate RSN
[16:27] <gary_poster> when I move all the zope stuff to eggs
[16:28] <Ursinha> gary_poster, so, a very temporary hack?
[16:28] <gary_poster> so I'm happy to have that be a bug, but I'd like it to be a back-burner bug, myself
[16:28] <gary_poster> right
[16:28] <gary_poster> I certainly understand the pain though :-(
[16:28] <mars> Ursinha, sound good?  We can review the solution outside the meeting
[16:28] <gary_poster> or have a hint of it at least
[16:28] <Ursinha> mars, sure
[16:29] <mars> thanks gary_poster
[16:29] <Ursinha> [action] Discuss the solution proposed by gary_poster after the meeting, about ExpatErrors and bug 403606
[16:29] <MootBot> ACTION received:  Discuss the solution proposed by gary_poster after the meeting, about ExpatErrors and bug 403606
[16:29] <Ursinha> thanks mars and gary_poster
[16:29] <Ursinha> well
[16:29] <gary_poster> thanks mars and Ursinha  :-)
[16:29] <Ursinha> :)
[16:29] <Ursinha> we're not having more InterfaceErrors (thanks salgado), but we're still having OperationalErrors (OOPS-1306J75, OOPS-1306J96) and DisconnectionErrors (OOPS-1306I440, OOPS-1306J343)
[16:29] <Ursinha> what can we do about it?
[16:30] <Ursinha> I think that's yours too mars
[16:30] <mars> Ursinha, I'll need to look at them
[16:30] <mars> (obviously :)
[16:30] <Ursinha> :)
[16:30] <Ursinha> stub, do you have any clues?
[16:31] <mars> stub, that looks like something for you - Storm barfing while talking over the internal network?
[16:31] <stub> gary_poster: You could probably switch of the oops in errorlog.py - we already skip certain well defined exceptions entirely.
[16:32] <gary_poster> stub: oh, ok.  not familiar with that.  sounds good on the face of it.
[16:32] <mars> stub, gary_poster, well, that's why I suggested wrapping the ExpatError in a 400 status - it's the nice HTTP thing to do, since the client is in the wrong, not us.
[16:32] <stub> Ursinha: All 'due to administrator command' are when the server killed the connection, usually because it was sitting idle too long.
[16:33] <gary_poster> mars, can't do that without getting into the publication machinery.  stub's approach is not as elegant as what you suggest, but much more doable in the short term.
[16:33] <mars> stub, so too much lag between opening the app server request and actually getting to issue commands?
[16:33] <gary_poster> hm, unless there's an indirection lurking around...checking...
[16:33] <mars> gary_poster, ok
[16:34] <stub> mars: For an appserver request, we kill anything that doesn't complete in 2 minutes. For scripts, we kill anything that is idle-in-transaction for 90 minutes (or something like that).
[16:35] <stub> mars: So a massive slowdown or pause anytime after the db transaction starts
[16:35] <mars> stub, ok, would you be able to do an analysis, or help me do one, after the meeting?
[16:35] <mars> to find what is taking so long to execute
[16:36] <stub> I already looked at this one - I have no idea why that query would stop (the OOPS i'm looking at Ursinha pointed me at earlier)
[16:36] <mars> you know what the problem is, but I would have to troll the OOPSes to find the root cause
[16:36] <Ursinha> stub, did I?
[16:36] <stub> A number of requests timed out all at exactly the same time. I doubt we can reproduce it, and there doesn't seem enough information to diagnose it.
[16:37] <Ursinha> stub, that was another oops, I guess
[16:37] <stub> Yup. But the one I'm looking at is select replication_lag() again - this time it took so long it got terminated by the reaper.
[16:37] <Ursinha> stub, oh, I see
[16:38] <Ursinha> so they are related
[16:38] <Ursinha> stub, because of having not enough data in that oops you can't diagnose the Errors?
[16:39] <Ursinha> *Errors
[16:40] <Ursinha> mars, stub, we're running out of time, can we discuss that on -dev after the meeting?
[16:40] <mars> Ursinha, sure
[16:40] <Ursinha> allright
[16:40] <Ursinha> [action] mars and stub to discuss the Disconnection and OperationalErrors after the meeting
[16:40] <MootBot> ACTION received:  mars and stub to discuss the Disconnection and OperationalErrors after the meeting
[16:41] <Ursinha> critical bugs: we have bug 403283, that is in progress as commented on it
[16:41] <Ursinha> moving on!
[16:41] <Ursinha> thanks a lot guys
[16:41] <Ursinha> [TOPIC] * Operations report (mthaddon/herb/spm)
[16:41] <MootBot> New Topic:  * Operations report (mthaddon/herb/spm)
[16:41] <herb> 2009-07-28 - Rolled critical fixes to the app servers and scripts server.
[16:41] <herb> We've had some issues with codebrowse in the last week where the process dies but doesn't leave a core file. It's also not leaving anything interesting in the logs. We haven't filed a bug yet, but expect one soon. Any help in determining the best way to debug the issue would be helpful.
[16:41] <herb> mthaddon has a 2nd librarian instance up and running. We're preparing to load balance between them. mthaddon updated bug #403283 with some questions and it appears stub has responded to them.
[16:41] <herb> There aren't any pending queries or cherry pick requests.
[16:41] <herb> That's it for the LOSAs unless there are questions.
[16:42] <Ursinha> anything for herb?
[16:42] <Ursinha> thanks herb!
[16:42] <Ursinha> moving on
[16:43] <Ursinha> [TOPIC] * DBA report (stub)
[16:43] <MootBot> New Topic:  * DBA report (stub)
[16:43] <herb> thanks Ursinha
[16:43] <stub> I generated a new database baseline from the production database and landed it. We do this occasionally to ensure that the version of the db we are developing on matches what is actually running on production (patches made to the live system not backported to the trunk or stuffups can cause drift).
[16:43] <stub> If you have an approved but unlanded database patch you will need a new database patch number from me.
[16:44] <Ursinha> stub, oot?
[16:44] <Ursinha> :)
[16:44] <stub> oot
[16:44] <Ursinha> sweet
[16:44] <Ursinha> anyone wants to say something?
[16:45] <Ursinha> thanks stub :)
[16:45] <Ursinha> I have something to say to all contacts
[16:45] <Ursinha> we want to close the 2.2.7 milestone, so, if you have pending bugs or blueprints, please, retarget of fix release them
[16:46] <Ursinha> intellectronica, I saw that bugs team has a lot (really) of not assigned bugs targeted to 2.2.7
[16:46] <intellectronica> Ursinha: sure, will make sure we sort them out now
[16:46] <Ursinha> thanks a lot intellectronica
[16:46] <Ursinha> and all guys
[16:46] <Ursinha> you rock
[16:47] <Ursinha> Thank you all for attending this week's Launchpad Production Meeting. See https://dev.launchpad.net/MeetingAgenda for the logs.
[16:47] <Ursinha> #endmeeting
[16:47] <MootBot> Meeting finished at 10:47.
[16:47] <intellectronica> Ursinha: you rock too
[16:47] <herb> thanks everyone
[16:47] <gary_poster> (mars, re bug 403606: the only hook I see is the big honking one that specifies what request class to use.  we could do that probably.  I wish it could go in the standard implementation though, since returning a 400, as you suggest, is good reasonable behavior.  I suspect stub's error squelching solution will be an easier quick hack.)
[16:47] <Ursinha> thanks herb
[16:47] <Ursinha> ahh.. I love my job