[15:00] <Ursinha> OOPS-1307J16
[15:00] <Ursinha> thanks ubottu
[16:00]  * Ursinha looks at matsubara
[16:00] <matsubara> #startmeeting
[16:00] <MootBot> Meeting started at 10:00. The chair is matsubara.
[16:00] <MootBot> Commands Available: [TOPIC], [IDEA], [ACTION], [AGREED], [LINK], [VOTE]
[16:00] <matsubara> Welcome to this week's Launchpad Production Meeting. For the next 45 minutes or so, we'll be coordinating the resolution of specific Launchpad bugs and issues.
[16:00] <matsubara> [TOPIC] Roll Call
[16:00] <MootBot> New Topic:  Roll Call
[16:00] <sinzui> me
[16:00] <matsubara> Not on the Launchpad Dev team? Welcome! Come "me" with the rest of us!
[16:00] <gary_poster> me
[16:00] <Ursinha> me
[16:00] <danilos> me
[16:00] <matsubara> stub, cprov, herb, rockstar, intellectronica: hi
[16:00] <cprov> me
[16:01] <rockstar> ni!
[16:01] <mthaddon> me
[16:01] <intellectronica> me
[16:01] <matsubara> hi mthaddon
[16:01] <mthaddon> matsubara: herb won't be attending these meetings any more since he's no longer a LOSA
[16:02] <Ursinha> it's true
[16:02] <matsubara> mthaddon, indeed!
[16:02] <matsubara> let me update the page
[16:02] <Chex> hello!
[16:02] <mthaddon> matsubara: most likely Chex will be his replacement (given he's on the same timezone that herb was on)
[16:02] <Ursinha> hi Chex, welcome!
[16:02] <matsubara> mthaddon, all right thanks
[16:02] <matsubara> hi Chex, welcome
[16:02] <Chex> all: thank you
[16:03] <stub> moo
[16:03] <matsubara> ok, everyone is here
[16:03] <matsubara> [TOPIC] Agenda
[16:03] <MootBot> New Topic:  Agenda
[16:03] <intellectronica> hi Chex, welcome
[16:03] <matsubara>  * Actions from last meeting
[16:03] <matsubara>  * Oops report & Critical Bugs & Broken scripts
[16:03] <matsubara>  * Operations report (mthaddon/herb/spm)
[16:03] <matsubara>  * DBA report (stub)
[16:03] <matsubara> [TOPIC] * Actions from last meeting
[16:03] <Ursinha> matsubara, you'll may want to s/flacoste/gary_poster in that page
[16:03] <MootBot> New Topic:  * Actions from last meeting
[16:03] <matsubara> Ursinha, already done
[16:03] <Ursinha> matsubara, thanks
[16:03] <Andre_Gondim> me
[16:03] <matsubara>   * ursinha to chase mars about OOPS-1307J16 and file a bug about it
[16:03] <matsubara>   * matsubara to file a bug for OOPS-1315A253
[16:03] <matsubara>     * Filed https://launchpad.net/bugs/413706
[16:03] <matsubara>   * sinzui to file bugs for OOPS-1318S626, OOPS-1321EB223 and OOPS-1318EA4
[16:03] <matsubara>   * gary_poster to chase librarian-gc failure and report back to the list
[16:03] <matsubara>   * matsubara to ask stub to email the dba report to the list
[16:03] <matsubara>     * stub sent the dba report to the list
[16:04] <matsubara> hi Andre_Gondim, welcome
[16:04] <Andre_Gondim> thanks =]
[16:04] <matsubara> hi sinzui, did you file those bugs?
[16:04] <matsubara> Ursinha, no news about that oops? shall I remove the action item?
[16:05] <Ursinha> matsubara, do that, I'll file a bug if that happens again
[16:05] <matsubara> Ursinha, thanks
[16:05]  * sinzui has no screen
[16:05] <matsubara> re: the librarian-gc failure, it was disabled that week, that's why we had a script failure email to the list
[16:06] <gary_poster> stub is working on that as his next task
[16:06] <mthaddon> I think there's a CP pending approval for that
[16:06] <sinzui> matsubara: I did file bugs
[16:06] <matsubara> gary_poster, mthaddon: cool. thanks
[16:06] <stub> The next bit of work on the librarian may be related - depends on what happens with the cherry pick and test run ;)
[16:06] <Ursinha> gary_poster, this is bug 410576, right?
[16:06] <sinzui> OOPS-1315A253 is soyuz
[16:07] <matsubara> sinzui, thanks. if you have them handy, could you priv msg them to me?
[16:07] <sinzui> bug 413174
[16:07] <gary_poster> Ursinha, that's not my understanding.  hm, that's a dupe.
[16:08] <Ursinha> gary_poster, a dupe? is there another?
[16:08] <Ursinha> this one is set as Critical... I'll talk about it in the next section :)
[16:09] <sinzui> matsubara: OOPS-1318EA4 is new. It relates to another bug that I intend to fix in 3.0 I will file and assign it
[16:09] <matsubara> thanks sinzui
[16:09] <gary_poster> Ursinha: either dupe or related: bug 413749
[16:10] <Ursinha> gary_poster, let me see
[16:10] <Ursinha> matsubara, you can move to the next section and we keep discussing there
[16:10] <matsubara> ok, thanks Ursinha and gar0t0
[16:10] <matsubara> err
[16:10] <matsubara> gary_poster,
[16:10] <gary_poster> :-)
[16:10] <matsubara> [TOPIC] * Oops report & Critical Bugs & Broken scripts
[16:10] <MootBot> New Topic:  * Oops report & Critical Bugs & Broken scripts
[16:10] <matsubara> there you go Ursinha
[16:10] <Ursinha> okay
[16:11] <Ursinha> +branches timeout has a fix already committed, and also that horrible 'specications' bug is fix committed as well
[16:11] <Ursinha> so, two issues to ask: foundations and registry
[16:11] <Ursinha> sinzui, I can see a lot of these ExpatErrors, that are bug 403606, does barry said something about fixing that?
[16:11] <Ursinha> gary_poster, bug 410576 is Critical but I see there's no activity for almost a week now, is that really critical?
[16:11] <Ursinha> (in this meantime, I'll check bug 413749
[16:11] <Ursinha> )
[16:12] <gary_poster> Ursinha: I believe it is high: afaik, the criticality is what mthaddon describes in his comments to that issue.  This is what stub is going to next.
[16:12] <sinzui> Ursinha: barry has not provided any insight into the issue yet. I cannot estimate it
[16:12] <stub> Its critical because it is part of the impending librarian collapse.
[16:13] <sinzui> matsubara: bug #41648
[16:13] <mthaddon> gary_poster: it's critical - LP will blow up in 20 days or so if it's not fixed (as the librarian will run out of space)
[16:13] <matsubara> sinzui, hmm that doesn't look like a lp bug
[16:13] <sinzui> matsubara: bug #416483
[16:13] <matsubara> cool. thanks sinzui!
[16:13] <sinzui> ^ points the the related bug too
[16:13] <gary_poster> mthaddon, stub: (procedural, apologies) what does critical mean then?  I thought it meant drop everything, while afaict this is a do it within 10 days?
[16:14] <Ursinha> gary_poster, mthaddon, we have two bugs here, bug 410576 and bug 413749
[16:14] <Ursinha> gary_poster, that's my question as well
[16:14] <mthaddon> gary_poster: I think if we know it's going to blow up all of LP in a short period of time, that's critical
[16:14] <gary_poster> afaik 413749 is the (a?) symptom of 410576.  stub, mthaddon, can you please correct me?
[16:15] <stub> gary_poster: It is my top priority, as we need to know the genuine rate of disk consuption for the librarian so we can accurately predict when new disk has to be purchased and installed by, or soyuz has to decrease their consumption by
[16:15] <gary_poster> stub thank you
[16:15] <mthaddon> gary_poster: it's related, but fixing the librarian-gc will buy us more time, not fix it forever
[16:15] <gary_poster> ok, gotcha
[16:16] <gary_poster> So Ursinha, it is critical, and we should be moving to in progress, at least, within a day or so.
[16:16] <Ursinha> great gary_poster, thanks a lot
[16:17] <matsubara> Ursinha, anything else re: oops and critical bugs?
[16:17] <Ursinha> sinzui, could you poke barry again about that bug? I can do that as well if you want :)
[16:17] <sinzui> I will
[16:17] <Ursinha> thanks a lot sinzui
[16:17] <cprov> stub: we have to adjust the removal of BPRs to be more aggressive.
[16:17] <danilos> cprov: can you (i.e. Soyuz team) provide data flacoste asked for in https://bugs.edge.launchpad.net/launchpad-foundations/+bug/413749 so we've got raw numbers there as well?
[16:18] <mthaddon> cprov: any idea of how much space that would buy us?
[16:18] <cprov> danilos: sure, I can try.
[16:18] <stub> cprov: Bug 413749 has a soyuz task, so you may want to triage it.
[16:18] <matsubara> garbo-hourly failed on the 17th even after spm adjusted the check to 12 hours. stub do you know what's up?
[16:19] <stub> matsubara: I wasn't aware of that.
[16:20] <cprov> mthaddon: can't tell exactly, but I issue the queries for estimating few other scenarios than 1 month quarantine for BPR files
[16:20] <mthaddon> ok
[16:21] <matsubara> there's a "Scripts failed to run: loganberry:garbo-hourly" email sent to the list on the 17th. could you investigate and reply to that email?
[16:21] <matsubara> stub, ^
[16:21] <Ursinha> cprov, can you follow up later on that bug then, please?
[16:21] <cprov> Ursinha: sure
[16:21] <Ursinha> thanks cprov
[16:22] <matsubara> [action] cprov to follow up on bug 413749
[16:22] <MootBot> ACTION received:  cprov to follow up on bug 413749
[16:22] <matsubara> [action] stub to investigate garbo-hourly failure after spm adjusted script checking to 12h
[16:22] <MootBot> ACTION received:  stub to investigate garbo-hourly failure after spm adjusted script checking to 12h
[16:24] <matsubara> [action] sinzui to poke barry about ExpatError OOPSes (bug 403606)
[16:24] <MootBot> ACTION received:  sinzui to poke barry about ExpatError OOPSes (bug 403606)
[16:24] <sinzui> done
[16:24]  * sinzui eagerly awaits an assessment
[16:25] <matsubara> cool
[16:25] <matsubara> I think that's all for this section
[16:25] <matsubara> thanks everyone
[16:25] <Ursinha> thanks a bunch sinzui
[16:25] <Ursinha> and everyone else :)
[16:25] <Ursinha> do ahead matsubara
[16:25] <Ursinha> *go
[16:25] <matsubara> [TOPIC] * Operations report (mthaddon/Chex/spm)
[16:25] <MootBot> New Topic:  * Operations report (mthaddon/Chex/spm)
[16:25] <danilos> mbarnett for the agenda as well? :)
[16:26] <mthaddon> :)
[16:26] <Chex> - Buildbot now hosted from the DC
[16:26] <Chex>  - Multiple Cherry Picks this past week
[16:26] <Chex>  - Will be beginning to implement recommendations from SplitIt Sprint before too long
[16:26] <Chex>  - Codebrowse needed restarting more than usual this week (see IncidentLog)
[16:26] <Chex>  - Incident with edge rollout breaking as one app server refused to stop, and interaction with the session DB being trashed - see Incident Report and most likely discussed earlier in the meeting
[16:26] <Chex>  - LOSA sprint this week to get new LOSAs (Chex, mbarnett) up to speed
[16:26] <matsubara> danilos, good catch. thanks
[16:26] <Chex> and thats it for us, unless there are any questions??
[16:27] <gary_poster> yay buildbot in DC! :-)
[16:28] <danilos> yeah, great stuff, looking forward to everything else that enables :)
[16:28] <danilos> (like the production branch in buildbot *grin*)
[16:28] <matsubara> thanks Chex
[16:28] <matsubara> [TOPIC] * DBA report (stub)
[16:28] <MootBot> New Topic:  * DBA report (stub)
[16:29] <stub> Our disk usage is going steadily up. Nothing alarming yet, but it did prompt me to turn on the long-running-transaction killer. Non-system transactions running over 3 hours will now be killed. This should alleviate database bloat, which adversely affects everything. It will also stop processes that block on long running transactions from blocking too long (like the garbo).
[16:29] <stub> I've bumped up the default statistics target to 250. We have twice over the last several months had a query chewing up huge amounts of disk space in temporary tables, and my best guess as to why is bad query plans. The higher statistics target should make this less likely.
[16:29] <stub> Done.
[16:29]  * Ursinha misses the oot thing
[16:29] <Ursinha> questions for stub?
[16:30] <danilos> stub: ok, so that means that fixing langpack exporter is now critical for us, right?
[16:31] <stub> danilos: I can turn it off if necessary. I'm not sure what effect is has on the langpack export.
[16:31] <stub> Will all of them be affected?
[16:31] <stub> oot
[16:31] <danilos> stub: most of the runs will
[16:31] <Ursinha> hehe
[16:32] <danilos> stub: I've made it critical for us, it should be a simple fix, it'll only require cherrypicking
[16:32] <stub> danilos: ok. I'd like that issue raised to high or critical. I'll turn the check to 8 hours which will cover the current longest transaction I'm seeing in the graphs.
[16:32] <stub> k
[16:32] <danilos> stub: it was high and scheduled for 3.0, now it's scheduled for asap :)
[16:32] <stub> Please add a note to the CP request that the limit needs to be put back.
[16:32] <danilos> stub: sure, thanks
[16:33] <Ursinha> thanks stub
[16:33] <Ursinha> and danilos
[16:33] <matsubara> thanks stub and danilos
[16:34] <stub> danilos: Bug number?
[16:34] <danilos> stub: bug 411697
[16:34] <matsubara> * In-team handling of OOPSes (Danilo)
[16:34] <danilos> ok, a long paste follows
[16:34]  * matsubara hands the mic to danilos 
[16:35] <danilos> Breaking news from the team leads call!  Read all about it!
[16:35] <danilos> Many of the duties Diogo and Ursula had you spoiled with (like trawling OOPS summaries and error logs and matching/filing relevant bugs) is what QA contacts in each team should do (generally, it was considered that this is what they should have been doing anyway).
[16:35] <danilos> According to Gary, Diogo is happy to continue maintaining oops-tools (and relevant infrastructure, which will stay in Foundations turf), but everybody else is invited to contribute and take interest in the tools if they want something added.
[16:35] <danilos> Similarly, if someone finds it hard to go through numerous places to see all the possible problems (i.e. going through several OOPS summaries, error-reports list, etc), they are welcome to improve our infrastructure for aggregating these.
[16:35] <danilos> I am personally hoping that once we pick a release manager for 3.0, (s)he'll take care that all QA contacts are on top of their game. Perhaps we can have Ursula and Diogo continue as is until RM for 3.0 is appointed.
[16:35] <danilos> Any suggestions on what should change in the format of the meeting to make sure this is not a regression compared to what we do today?
[16:36] <gary_poster> (eh, that summary came out in such a way that I feel I should have talked with matsubara first.  sorry, matsubara, and feel free to correct the summary about your personal position)
[16:37] <matsubara> gary_poster, it's correct :-)
[16:37] <danilos> gary_poster: (I was just being careful not to put words in matsubara's mouth, I should have talked to him first, but there just wasn't the time between the teamleads call and this meeting :)
[16:37] <gary_poster> cool :-)
[16:38] <danilos> anyway, how should the meetings be run from now on? matsubara, you want to keep running them?
[16:38] <gary_poster> +1 if you are willing matsubara
[16:38] <matsubara> danilos, yes, I talked to francis about it and Ursinha and I will still run the production meeting
[16:38] <cprov> +1
[16:38] <danilos> anybody else has any comments? everybody, this means more work for you and less for matsubara, Ursinha :)
[16:39] <Ursinha> +1 from me
[16:39] <stub> How to teams claim an oops? The benefit of a central monitor and this meeting is when teams disagree on who the problem belongs too.
[16:39] <matsubara> but it'd be nice to have help from the QA contacts doing the daily oops analysis and help with triage
[16:39] <danilos> stub: that's for the release manager to worry about IMO, but in general, we should be having bug attached to all the OOPSes
[16:40] <stub> Who creates the bugs?
[16:40] <Ursinha> danilos, that's the idea
[16:40] <Ursinha> stub, it depends
[16:40] <Ursinha> stub, for instance, afaik, translations has been creating its own bugs for some time now
[16:40] <Ursinha> checking the summaries daily
[16:40] <danilos> stub: in general, we might be able to improve tools to split summaries by vhost initially
[16:40] <stub> I'm just wondering how we avoid them being dropped on the floor because, say, translations thinks an oops is a foundations issue and vice versa.
[16:40] <Ursinha> danilos, matsubara has the idea of using page ids
[16:40] <Ursinha> for splitting
[16:41] <Ursinha> *had
[16:41] <danilos> Ursinha: right, that might be a good one as well
[16:41] <stub> splitting the reports into areas of responsibility would address my concern I think.
[16:41] <danilos> Ursinha: actually, it's perfect
[16:41] <cprov> okay, running the risk to sound like an idiot,  who are the current QA contacts ? TLs ?
[16:41] <Ursinha> cprov, the people that attend this meeting
[16:41] <stub> TLs until they delegate ;)
[16:41] <matsubara> cprov, everyone who attend this meeting weekly
[16:41] <danilos> cprov: it means it's you! :)
[16:42] <matsubara> cprov, actually it's bigjools, but he's away today
[16:42] <cprov> fantastic! thanks.
[16:42] <Ursinha> danilos, :P, bigjools actually
[16:42] <danilos> heh, ok... in general, I think this is best done by a team lead
[16:42] <danilos> (and soon enough, I'll be replacing henninge as the translations QA contact)
[16:42] <Ursinha> danilos, it was TL's call when they pointed the QA contacts
[16:43] <danilos> Ursinha: I know
[16:43] <gary_poster> hm.  question.  if we *all* trawl oops, is that a collective time loss?
[16:43] <Ursinha> but that can be changed for this new experiment
[16:43] <Ursinha> gary_poster, if we separate per teams, not that much
[16:43] <Ursinha> I believe
[16:43] <danilos> so, matsubara, can we have an action for me to discuss with Ursinha and you how we can split OOPS reports into per-team summaries?
[16:43] <Ursinha> per "teams"
[16:43] <gary_poster> oh I see
[16:43] <danilos> gary_poster: right, see above
[16:43] <gary_poster> ok thanks
[16:44] <matsubara> [action] danilos, Ursinha and matsubara to discuss oops summaries split per team
[16:44] <MootBot> ACTION received:  danilos, Ursinha and matsubara to discuss oops summaries split per team
[16:44] <danilos> matsubara: thanks
[16:44] <Ursinha> gary_poster, we in fact have a new feature on oops-tools that associate a bug to a exception type (matsubara correct me if I'm wrong here)
[16:44] <Ursinha> this helps a lot
[16:44] <danilos> ubottu: thanks for nothing (just so you don't get used to praise only)
[16:45] <Ursinha> sometimes you freak me out ubottu
[16:45] <Ursinha> anyway
[16:45] <Ursinha> :)
[16:45] <danilos> anyway, that's all settled afaiac
[16:45] <matsubara> gary_poster, Ursinha: now we have a feature on oops-tools that once an oops is linked to a bug, subsequent oopses of that same type are already linked to the bug report
[16:45] <Ursinha> gary_poster, if you click the oops, most of them have a bug associated, on top left
[16:45] <danilos> we'll be reporting back, everything stays as is until we've got better oops reports, but do expect changes soon
[16:45] <matsubara> makes analysis much easier
[16:45] <Ursinha> bug report?
[16:45] <matsubara> next step is to add that info to the summary
[16:46] <gary_poster> heh.  ah I see cool
[16:46] <gary_poster> thanks Ursinha, matsubara
[16:46] <matsubara> all right. thanks danilos for bringing this up
[16:46] <Ursinha> ah, I got that
[16:46] <matsubara> and thanks everyone
[16:46] <Ursinha> thanks everyone
[16:46] <matsubara> Thank you all for attending this week's Launchpad Production Meeting. See https://dev.launchpad.net/MeetingAgenda for the logs.
[16:46] <matsubara> #endmeeting
[16:46] <MootBot> Meeting finished at 10:46.
[16:47] <matsubara> 1 min late. sorry about that
[16:47] <gary_poster> :-) thanks matsubara
[16:47] <Ursinha> thanks matsubara and everyone!
[16:47] <gary_poster> thanks Ursinha :-)