=== danilo-afk is now known as danilo === mrevell is now known as mrevell-lunch === mrevell-lunch is now known as mrevell === Edwin is now known as Guest73773 === Guest73773 is now known as EdwinGrubbs [16:00] me? [16:00] you [16:00] us [16:00] Ursinha: no, not you [16:00] them [16:00] ah [16:00] :( [16:00] sorry [16:00] #startmeeting [16:00] Meeting started at 10:00. The chair is matsubara. [16:00] Commands Available: [TOPIC], [IDEA], [ACTION], [AGREED], [LINK], [VOTE] [16:00] roll call,roll call [16:00] my firefox died [16:00] me [16:00] * stub belches [16:00] hang on a second please [16:00] me [16:00] poor matsubara [16:01] * jml eavesdrops [16:01] * bigjools wafts stub's belch away [16:01] ni [16:01] Welcome to this week's Launchpad Production Meeting. For the next 45 minutes or so, we'll be coordinating the resolution of specific Launchpad bugs and issues. [16:01] [TOPIC] Roll Call [16:01] New Topic: Roll Call [16:01] meeee [16:01] me [16:01] Not on the Launchpad Dev team? Welcome! Come "me" with the rest of us! [16:01] me [16:01] me again [16:01] i [16:01] flacoste, hi [16:02] me [16:02] herb, hi [16:02] me [16:02] ok, everyone here. [16:02] [TOPIC] Agenda [16:02] New Topic: Agenda [16:02] * Actions from last meeting [16:02] * Oops report & Critical Bugs & Broken scripts [16:02] * Operations report (mthaddon/herb/spm) [16:02] * DBA report (stub) [16:02] [TOPIC] * Actions from last meeting [16:02] New Topic: * Actions from last meeting [16:02] * matsubara to chase rockstar about failure on updatebranches script [16:02] * stub to give a try on bug 354593 with mars help if needed [16:02] * stub to fix bug 310818 [16:02] * mars to take a look at OOPS-1307J16 [16:02] * Discuss the solution proposed by gary_poster after the meeting, about ExpatErrors and bug 403606 [16:02] * mars and stub to discuss the Disconnection and OperationalErrors after the meeting [16:02] me [16:02] Launchpad bug 354593 in launchpad-foundations "SSO exceptions views need proper branding" [High,Triaged] https://launchpad.net/bugs/354593 [16:02] Launchpad bug 310818 in launchpad-foundations "Oops report does not always log timed-out query" [High,In progress] https://launchpad.net/bugs/310818 [16:03] https://lp-oops.canonical.com/oops.py/?oopsid=1307J16 [16:03] yay, jml [16:03] Launchpad bug 403606 in launchpad-registry "ExpatError errors should be handled to not generate the OOPSes" [Undecided,New] https://launchpad.net/bugs/403606 [16:03] I suck, I didn't chase rockstar about the updatebranches script failures [16:03] matsubara, I thought we agreed that mwhudson would be better to chase on it. [16:03] matsubara, got a URL for the failure? [16:03] otoh, the script is not failing anymore... [16:03] matsubara, I know mwhudson was looking at it on his Tuesday. [16:03] jml would be good to ask as well. === cprov is now known as cprov-lunch [16:04] rockstar, all right. I'll talk to jml and mwhudson later on today [16:04] [action] * matsubara to chase mwhudson/jml about failure on updatebranches script [16:04] ACTION received: * matsubara to chase mwhudson/jml about failure on updatebranches script [16:04] matsubara, jml is here right now. :) [16:04] jml, I'll get you an url for the scripts after the meeting. I need to trawl my emails to find it [16:04] matsubara, ok. thanks. [16:05] stub, how's 354593 fix coming along? [16:05] why is this High again? [16:06] I wonder if mars had time to look over OOPS-1307J16 [16:06] https://lp-oops.canonical.com/oops.py/?oopsid=1307J16 [16:06] flacoste, do you know ^? [16:06] hmm, i put it as such [16:06] any reason it should be? [16:06] flacoste, according to the bug history you made it high :-) [16:06] debranding of the SSO is a U1/ISD affair anyway [16:06] matsubara: Slow. I need to discuss with people how to actually do it - maybe next week on the sprint if I get time. [16:07] * sinzui agrees with flacoste [16:07] stub: i think we should try to get stu and James to do it :-) [16:07] especially, stu, it would be a test good case for transfer knowledge [16:07] Anything that means I don't have to work out how ZPT macros works is fine by me. [16:07] +1 [16:08] Ursinha, what's up with "Discuss the solution proposed by gary_poster after the meeting, about ExpatErrors and bug 403606"? [16:08] Launchpad bug 403606 in launchpad-registry "ExpatError errors should be handled to not generate the OOPSes" [Undecided,New] https://launchpad.net/bugs/403606 [16:08] matsubara, the ExpatErrors were being discussed by mars and gary [16:08] matsubara: that;s now registry. it actualy is a legitimate oops [16:08] [action] stub to delegate bug 354593 to ISD [16:08] Launchpad bug 354593 in launchpad-foundations "SSO exceptions views need proper branding" [High,Triaged] https://launchpad.net/bugs/354593 [16:08] ACTION received: stub to delegate bug 354593 to ISD [16:08] it indicates a problem with mailman integration [16:09] I will ask barry to look into bug 403606 [16:09] Launchpad bug 403606 in launchpad-registry "ExpatError errors should be handled to not generate the OOPSes" [Undecided,New] https://launchpad.net/bugs/403606 [16:09] stub, You recently fixed a DisconnectionError bug. was it related to the errors you discussed with mars? that action item is now done? [16:09] thanks sinzui and gary_poster [16:10] :-) [16:10] matsubara: I landed code to log OOPS reports on DisconnectionError before retrying the request. Is that what you mean? [16:10] stub, I mean: "* mars and stub to discuss the Disconnection and OperationalErrors after the meeting" [16:11] stub, is that what caused the TransactionRollbackError oopses? [16:11] We discussed. I don't recall much about the conversation though :) [16:11] :-) [16:11] Ursinha: That fix was, yes. I've got another branch that turns the volume down so we don't log the TransactionCommitError's [16:12] [action] sinzui to ask barry to fix bug 403606 [16:12] Launchpad bug 403606 in launchpad-registry "ExpatError errors should be handled to not generate the OOPSes" [Undecided,New] https://launchpad.net/bugs/403606 [16:12] ACTION received: sinzui to ask barry to fix bug 403606 [16:12] stub, good, I filed bug 409907 for that [16:12] Launchpad bug 409907 in launchpad-foundations "TransactionRollbackErrors may prevent us to detect real issues" [Undecided,New] https://launchpad.net/bugs/409907 [16:13] Ursinha, is there a bug for OOPS-1307J16? [16:13] https://lp-oops.canonical.com/oops.py/?oopsid=1307J16 [16:13] matsubara, not that I opened one, because we needed to know what was going on over there [16:13] to open the bug [16:13] so mars was going to investigate that [16:15] I don't recall having those anymore [16:15] [action] ursinha to chase mars about OOPS-1307J16 and file a bug about it [16:15] https://lp-oops.canonical.com/oops.py/?oopsid=1307J16 [16:15] ACTION received: ursinha to chase mars about OOPS-1307J16 and file a bug about it [16:15] https://lp-oops.canonical.com/oops.py/?oopsid=1307J16 [16:15] I think that's all for last meeting's action items [16:15] thanks everyone [16:15] [TOPIC] * Oops report & Critical Bugs & Broken scripts [16:15] New Topic: * Oops report & Critical Bugs & Broken scripts [16:16] there are two issues to discuss [16:16] one was about bug 409907, that I already mentioned to stub and it's being handled [16:16] Launchpad bug 409907 in launchpad-foundations "TransactionRollbackErrors may prevent us to detect real issues" [Undecided,New] https://launchpad.net/bugs/409907 [16:16] the other is about the select replication_lag() timeouts we're having [16:17] mthaddon also reported problems that we don't know if are related to that [16:18] I don't know if there's much to be discussed at this point, because it seems we need to fix oops reports first to be able to see the real problem here [16:18] is that correct stub: [16:18] ? [16:18] should we request a CP for the branch that fixes the oops log? [16:19] given that we're skipping a release, that's probably a good idea [16:19] flacoste, I've spoken with jtv yesterday about those,and he also said that was unlikely to be his changes fault (possible but unlikely) [16:20] I landed code today that should tell us more about if the timeout is actually occuring due to blocking on the database, or elsewhere. [16:20] s/his/translations/ [16:20] stub, should we request a CP? [16:20] yeah, i really think a CP is a good idea [16:20] (please please) [16:20] [action] stub to request CP for his branch that fixes oops logging [16:20] ACTION received: stub to request CP for his branch that fixes oops logging [16:20] cool [16:21] we have two critical bugs, already fix committed [16:21] so, good [16:21] cool [16:21] about the failing scripts [16:21] we had some scripts failing this week [16:21] nightly, productreleasefinder and garbo-hourly [16:22] and rosetta-poimport too [16:22] nightly was already addressed by jtv [16:22] matsubara, productreleasefinder isn't expected to fail anymore? sinzui? [16:22] as a rosetta script was taking too much time and jtv will remove it from nightly and add a cronjob for it [16:22] Ursinha: no, but the errors is see are not failures...the script was not run [16:23] stub, do you know why garbo-hourly is failing? [16:23] Its failing? [16:23] matsubara: many scripts are not running because of one log process [16:23] henninge, rosetta-poimport failed on the 5th. can you investigate and reply to the list? [16:24] s/log/long/ [16:24] matsubara, it's not being run, it seems [16:24] matsubara: sure, I will. [16:24] stub, I got a few emails: "Scripts failed to run: loganberry:garbo-hourly" [16:24] Ursinha: matsubara there is some traffic about this. spm reported the long running prcess a weeks ago. I has asked why the prf had not run [16:24] and no replies to the list, so I'm asking here [16:25] thanks henninge [16:25] matsubara, actually stub repklied [16:25] *replied [16:25] Oh - there were some blocked runs because the rosetta export-to-branch script was running in a 5 hour long transaction [16:25] So the script blocks because it doesn't want to make anything worse. [16:25] [action] henninge to investigate rosetta-poimport script failure on the Aug 5th and report back to the list [16:25] ACTION received: henninge to investigate rosetta-poimport script failure on the Aug 5th and report back to the list === salgado is now known as salgado-lunch [16:27] so I guess it's ok [16:28] that's all for this section [16:28] from me [16:28] thanks everyone [16:28] ! [16:28] you can move on matsubara [16:28] all right. thanks everyone [16:28] [TOPIC] * Operations report (mthaddon/herb/spm) [16:28] New Topic: * Operations report (mthaddon/herb/spm) [16:28] 2009-07-31 - Rolled out r8323 to bzrsyncd [16:28] 2009-08-05 - Cherry picks for code imports, lpnet* and the script server. [16:29] Our monitoring system has been timing out in connecting to the app servers more often this week. Admittedly its timeout is set lower than the OOPS timeout. But we've also been noticing higher load on the app servers as well. This was discussed by Ursinha during the oops/critical bugs/broken scripts section. [16:29] There's currently 1 cherry pick and 1 database query awaiting (dis)approval. [16:29] The LOSAs currently have 14 bugs marked high and triaged. Only 1 of which is assigned to someone and targeted for a release. We would be grateful if we saw some movement on these. [16:29] We're currently running with a single slave in preparation for the sprint next week. [16:30] also wanted to check that there should be a cherry pick request for the cowboyed storm change to lpnet9 and lpnet10 (per the production status wiki page) [16:30] cowboyed storm change? [16:30] flacoste: https://pastebin.canonical.com/20503/ under eggs/storm-0.14salgado_storm_launchpad_288_308-py2.4-linux-i686.egg [16:30] mthaddon, herb: i'll look at the LPS to approve/decline [16:31] right [16:31] mthaddon: the cherry pick would simply be to update that dependency [16:31] herb, do you keep that list of 14 bugs somewhere? in a wiki page or have a tag to group them? [16:31] matsubara: bugs.launchpad.net/~canonical-losas [16:32] flacoste: well in any case, the CP that was requested (and performed) yesterday overwrote it, so it needs to be formalised so other CPs don't overwrite it again [16:32] sinzui: can salgado makes an appropriate CP request? [16:32] Yes [16:33] it's simply a new upload to download-cache with a versions.cfg change [16:34] sinzui, flacoste, intellectronica, rockstar: Could you take a look at herb's bug list (bugs.launchpad.net/~canonical-losas) and see what your teams can do about the high ones in the short term? [16:35] ok [16:35] clearly we're not looking for all of them to be fixed by the next meeting (though that would be great ;) [16:35] just mostly would like to know they're staying on the right radars and are being worked on as appropriate. [16:36] cool [16:36] anything else for herb? [16:36] herb: so, basically, these are mostly bugs which will make life easier for you when fixed? [16:36] bug 348722 should become invalid when we update all pmt teams to become true private teams [16:36] Launchpad bug 348722 in launchpad-code "Set default branch visibility to "forbidden" if any team set to 'Private'" [High,Triaged] https://launchpad.net/bugs/348722 [16:37] intellectronica: some of them are geniune operational issues, some of them are quality of life issues for the LOSAs [16:37] There should be no private-membership teams at the start of week 1 [16:37] cool, sure, we'll take a look and see if there's any low hanging fruit [16:38] barry will be working with the losas on August 11 to fix bug 325962 [16:38] Launchpad bug 325962 in launchpad-registry "lp-mailman startup is blocking on a pid file in the wrong directory" [High,Triaged] https://launchpad.net/bugs/325962 [16:38] sinzui: that was the one that was assgned and targetted at a release. [16:39] herb, many times [16:39] assigned even [16:39] heh [16:39] all right. I think that's it [16:39] herb it failed my rules that bug is not high if it is not worked on by all parties in 3 months [16:39] thanks [16:39] thanks herb and everyone [16:39] [TOPIC] * DBA report (stub) [16:39] New Topic: * DBA report (stub) [16:40] We set off some alerts when the poimport script and PostgreSQL decided that lots of disk space should be used. We see some smaller spikes, which is just PG using disk to store intermediary results, but this time it was large enough to set of the alarms. [16:40] We have seen this once before, and in neither case have we been able to repeat it. My best hypothesis is the planner statistics triggering a really bad query plan, so I'll bump the planner statistic sample size on the production dbs in case this stops future occurances. [16:41] henninge, maybe the last rosetta-poimport failure was related to that ^ [16:43] matsubara: I believe we already know what it was about and it may be related to that. [16:43] matsubara: I'll talk to the guys. [16:43] henninge, cool. thanks [16:43] stub, anything else? [16:43] Not that I can think of [16:43] all right. thank you stub [16:43] I guess that's all for today [16:44] Thank you all for attending this week's Launchpad Production Meeting. See https://dev.launchpad.net/MeetingAgenda for the logs. [16:44] #endmeeting [16:44] Meeting finished at 10:44. [16:44] thanks everyone [16:44] right on time [16:44] :-) [16:44] thanks guys === matsubara is now known as matsubara-lunch === salgado-lunch is now known as salgado === cprov-lunch is now known as cprov === matsubara-lunch is now known as matsubara === maxb_ is now known as maxb === salgado is now known as salgado-afk