[12:23] #t/nick mrevell-lunch [12:23] damn === mrevell is now known as mrevell-lunch === salgado-afk is now known as salgado === mrevell-lunch is now known as mrevell === bac_afk is now known as bac === salgado is now known as salgado-brb === salgado-brb is now known as salgado [16:00] #startmeeting [16:00] Meeting started at 10:00. The chair is matsubara. [16:00] Commands Available: [TOPIC], [IDEA], [ACTION], [AGREED], [LINK], [VOTE] [16:00] Welcome to this week's Launchpad Production Meeting. For the next 45 minutes or so, we'll be coordinating the resolution of specific Launchpad bugs and issues. [16:00] [TOPIC] Roll Call [16:00] New Topic: Roll Call [16:00] me [16:00] me [16:00] so, who is here today? [16:00] me [16:01] flacoste, Ursinha, Rinchen,? [16:01] me [16:01] me [16:01] cprov, ping [16:01] pong [16:01] me [16:01] me [16:02] we're missing, code [16:02] rockstar: ? [16:02] me [16:02] i'm Foundations and the DBA report [16:02] Ursinha: who's the qa contact for rosetta? [16:02] danilo[home]__: ? [16:02] me [16:02] danilo[home]__, ping [16:03] ah hinding there [16:03] ok, so we're missing code [16:03] matsubara: right, danilo[home]__ is, surprisingly, my account at home :) [16:03] let's move on then [16:04] [TOPIC] Agenda [16:04] New Topic: Agenda [16:04] * Next meeting [16:04] * Actions from last meeting [16:04] * Oops report & Critical Bugs [16:04] * Operations report (mthaddon/herb/spm) [16:04] * DBA report (DBA contact) [16:04] * Sysadmin requests (Rinchen) [16:04] [TOPIC] Next meeting [16:04] New Topic: Next meeting [16:04] next meeting, same time next week? ok for everyone? [16:04] yep [16:04] me [16:04] matsubara, here [16:04] stub should be able to make it [16:05] Rinchen, rockstar: noted. thanks [16:05] he had a previous engagement tonight [16:05] flacoste, nice [16:05] great [16:05] I forgot that we were doing this earlier. [16:05] [TOPIC] * Actions from last meeting [16:05] New Topic: * Actions from last meeting [16:05] * stub to patch our fti regexp to avoid OOPSes (bug 174368) and discuss a proper fix with jtv [16:05] Launchpad bug 174368 in launchpad-foundations "Search query triggering error in tsearch" [Undecided,Confirmed] https://launchpad.net/bugs/174368 [16:05] matsubara: that's not started [16:06] a hanging bug [16:06] honestly, i don't think it's high priority [16:06] hmm shall I keep bring it up during this meeting? [16:06] it mostly trigger an OOPS when somebody tries spamming one of our search fields [16:06] well deserved imho [16:06] :-) [16:07] :) [16:07] :-) [16:07] matsubara, well, guess not so [16:07] but you might have a different opinion? [16:07] I'd like to have it targeted to a milestone at least [16:07] yes [16:07] so we won't keep pushing oops bugs to the end of the queue [16:07] well, if it's acceptable it should not be an OOPS should it? [16:08] it's not acceptable [16:08] there is some rare, but legitimate query that are affected by it [16:08] it's important that we fix OOPS bugs, even if they affect a few users [16:08] targeting to a milestone doesn't mean much if there is no real dedication to finish it in a certain timeframe [16:08] yepp [16:09] danilos, indeed [16:09] some bugs, like this one, for example, are not known how much they might take to solve (it might be a bunch of different things) [16:09] i'll keep it on our radar [16:09] well, that's the point of targeting it, isn't it? you set a deadline to fix it [16:09] and try to get to it at the end of the cycle [16:09] all right. thanks flacoste [16:09] I take it off from the actions from last meeting [16:10] [TOPIC] * Oops report & Critical Bugs [16:10] New Topic: * Oops report & Critical Bugs [16:10] Today's oops report is about bugs 271561, 273363 [16:10] Launchpad bug 271561 in launchpad-bazaar "OOPS calling __repr__ in xmlrpc method" [Undecided,New] https://launchpad.net/bugs/271561 [16:10] Launchpad bug 273363 in launchpad-foundations "'LaunchpadDatabasePolicy' object has no attribute 'read_only' in xmlrpc server" [Undecided,New] https://launchpad.net/bugs/273363 [16:10] rockstar, any news about #271561? [16:11] that's been happening at least once a day and I didn't see any progress in the bug report. [16:11] matsubara, it's being worked on, that's all I know about it. [16:11] flacoste, do you think bug 273363 might be related to bug 271902? [16:11] Launchpad bug 271902 in launchpad-foundations "db_policy equals None causing OOPS" [High,In progress] https://launchpad.net/bugs/271902 [16:11] Edwin has a fix for tno attribute 'read_only' I beleive [16:11] matsubara: stub has a fix in review for that one [16:11] and yes, they are dupped [16:12] sinzui? [16:12] sinzui and flacoste you might want to coordinate who will fix it then? :-) [16:12] well, it's assigned to stuart [16:12] but I guess that's on stub's turf [16:12] 273363 is assigned to Edwin [16:12] lol [16:12] the other one is assigned to EdwinGrubb [16:12] well, stuart has a branch fixing both in review [16:13] It is caused by Edwins fix to the cookie issue with feeds [16:13] i'll speak to Edwin [16:13] rockstar: can you assign it to the devel fixing that issue and change the status to in progress? [16:13] matsubara: can you dup it? [16:13] flacoste: sure [16:13] matsubara, sure. [16:13] thanks guys. [16:13] ok [16:13] Ursinha: stage is yours [16:14] one critical, bug 273489 [16:14] Launchpad bug 273489 in rosetta "Remaining Intrepid template approvals" [Critical,In progress] https://launchpad.net/bugs/273489 [16:14] danilos, i've sent one email to jtv yesterday, to get more details on the problem [16:14] right [16:14] can you help me with that after the meeting? [16:14] this has basically been 'fix committed' [16:14] we are now importing all the Intrepid templates [16:14] right [16:14] I also have one soyuz critical as well. yesterday cprov identified an oops that affected 60 or so PPA's [16:15] and this morning there were around 13K files left to import (yesterday afternoon around 18K) [16:15] so, this should be completely fixed in two days at most [16:15] cprov: can we expect a IR for that one? I presume no bug was filed and things were already fixed, right? [16:15] danilos, great, thanks [16:15] matsubara: and it was fixed at that time as well, which a production update query. [16:15] it was only edge that was affected [16:15] matsubara, which bug is it? [16:15] so, no code change needed? [16:15] matsubara: no [16:16] Ursinha: ping me after the meeting for more details, but right after the meeting, I am likely to get out soon [16:16] Ursinha: no bug reported for that one [16:16] matsubara: as I said, we just had to rush the data migration. [16:16] cprov: ok. so, Rinchen asked about doing an IR for that. [16:17] matsubara, are we going to file one? [16:17] Did file a critical bug for that? [16:17] so, if we didn't file a bug.... [16:17] no, I can file one, but it's kinda pointless, isn't it? since the problem is fixed [16:18] matsubara, only in the sense that we fixed the problem, not what caused the problem [16:18] and there'll be no code change [16:18] it's not a bug, it's an edge rollout issue [16:18] so let's start please with a write up to the dev list about what happened and how to prevent it and not do an IR [16:18] is that acceptable to everyone? [16:18] cprov already did that last night [16:18] Rinchen: yes, email already sent. [16:19] ok, thanks. I've been searching for it but haven't found it yet. :-( [16:19] I'll keep looking. [16:19] Thanks! [16:19] and thanks for resolving it quickly last nigiht [16:19] thanks [16:20] moving on [16:20] thanks guys [16:20] [TOPIC] * Operations report (mthaddon/herb/spm) [16:20] New Topic: * Operations report (mthaddon/herb/spm) [16:20] * 2008-09-19 - Updated 2.1.9 to r7035. This update included planned downtime. The service was down for approximately an hour. [16:20] * During the week we've had a few app servers die and leave core files. flacoste was investigating. [16:20] * 2008-09-23 - Cherry pick r7058 and r7064 to the scripts server and bzrsyncd server respectively. [16:20] * 2008-09-24 - Cherry pick r7066 and r7072 to lpnet*, update edge* to r7072. [16:20] herb: so i investigated the core files [16:20] flacoste: any update on the dying app servers? [16:20] cool [16:21] unfortunately, the stack track is pretty useless [16:21] boo [16:21] seems like accessing corrupted memory [16:21] things that barry and mwhudson said we could consider is running the appserver [16:21] using python2.4-dbg [16:21] which has some more debugging stuff in it [16:21] but it requires that the packages we are using also have a -dbg build [16:21] and are three times slower [16:22] another interesting thing [16:22] that's less than ideal. [16:22] is the stack trace posted by jtv [16:22] that occured in one of his script [16:22] it seems to point to a zope or storm problem [16:22] in the case of zope, the landscape team has a fixed for it [16:23] we should get it by moving to zope 3.4 (which we are going to attempt next week) [16:23] but it wasn't clear from the discussion i overheard if it looked like the same problem they experienced [16:23] another hypothesis made it a problem with the Storm C extensions [16:24] it's a good working hypothesis that the script and app server death is related to the same symptom [16:24] the fact that we have a better stack trace in the script case is probably due to the fact that it runs single-threaded [16:24] ok. short of running with all -dbg packages is there anything we can do to help isolate the problem? [16:24] so next step is to follow-up on the jtv, gustavo, barry discussion [16:25] and see what conclusions was there [16:25] and if there is something we can try from there [16:25] one last thing [16:25] i gave mthaddon the command to extract a stacktrace from a core file [16:25] that's really the only thing we need or can do with it [16:25] https://launchpad.canonical.com/OSA/HowTo/BacktraceFromCoredump [16:25] so you could just save the stack trace instead of the whole core files [16:25] mthaddon: awesome! [16:25] flacoste: ok. cool. [16:26] EOT unless there are questions [16:26] that's it from the LOSAs unless there are any questions. [16:26] cool. thanks for the thoroughly explanation flacoste [16:26] and thanks herb, mthaddon! [16:26] [TOPIC] * DBA report (DBA contact) [16:26] New Topic: * DBA report (DBA contact) [16:26] Nothing unusual I'm aware of on the production database. [16:26] Replication testing scheduled this week using demo.launchpad.net as [16:26] per discussions with Francis. Turn off the monitoring if it was [16:27] switched back on. [16:27] Assuming demo.launchpad.net testing doesn't push us back to the [16:27] drawing board, I want to have a replication version of the staging [16:27] rollout scripts ready for next cycle, which should involve some [16:27] testing this cycle to ensure they actually work. [16:27] Schedule is to have the production Launchpad database replicated as [16:27] part of the 2.1.11 release. Staging running replicated for the whole [16:27] cycle should give enough experience for signoff from everyone. There [16:27] should be no unusual downtime requirements. I will need to be around [16:27] for the rollout though. [16:27] The new DB baseline doesn't affect production or staging. [16:27] mthaddon, herb: stuart will send you over notes on the changes this means for the staging roll-out process [16:27] i can take questions [16:28] flacoste, great, thx [16:28] flacoste: ok. when should we expect notes? [16:28] not before the end of next week i think [16:28] it will take that much time to test on demo [16:28] and then upgrade the staging scripts [16:28] flacoste: at a high level how will this change the rollout process? [16:28] flacoste: can Ursinha and I help with the testing somehow? [16:28] the restoration of the DB [16:29] matsubara: i don't think so, at this stage it's not about QA, but more about seeing performance [16:29] matsubara: i'll talk to stuart and forward the offer, i might be mistaken [16:29] herb: we'll be having a replicated DB (so a master and a slave) on staging [16:30] flacoste: okie. tell him to ping/email us if he needs something [16:30] so this affects how the DB sync is done [16:30] flacoste: ok [16:30] flacoste: thanks [16:30] all right. thanks flacoste [16:30] [TOPIC] * Sysadmin requests (Rinchen) [16:30] New Topic: * Sysadmin requests (Rinchen) [16:30] Hi! [16:30] Is anyone blocked on an RT or have any that are becoming urgent? [16:31] I have one RT #31795, which I'd like to suggest priority ~80 [16:31] Does anyone thing this section of the meeting is worthwhile? [16:31] Rinchen, it might if me had an RT... [16:31] :-) [16:31] s/me/we [16:31] Rinchen: I guess we are free to raise RT tickets with you and our team leads whenever we want anyway [16:31] I think it is worthwhile [16:31] matsubara, ok, will look into that [16:31] maybe we should make it possible for people to add an RT-related item to the agenda before the meeting, if they have one, but skip the section if no-one does [16:31] Rinchen: thank you [16:31] if nothing else it acts as a memory jog [16:32] intellectronica: +1 [16:32] intellectronica: I like that [16:32] danilos, intellectronica - yeah, that was my point. If others find it helpful though, I'm happy to continue it. [16:32] it's not like it's a hard thing to do :-D [16:32] I'll let matsubara and Ursinha make the call. [16:32] Any other tickets? [16:33] ok, if you do have something, please ping me! [16:33] thanks matsubara [16:33] ok, let's experiment with intellectronica suggestion for awhile then. I'll update MeetingAgenda page to reflect that [16:33] thank you Rinchen [16:34] anything else before I close? [16:34] all right [16:34] Thank you all for attending this week's Launchpad Production Meeting. See the channel topic for the location of the logs. [16:35] actually the log part is a lie [16:35] thzx! [16:35] but thanks! [16:35] #endmeeting [16:35] Meeting finished at 10:35. [16:35] thanks all, matsubara especially for running the meeting :) [16:36] np [16:36] thanks, matsubara [16:36] thanks matsubara === matsubara is now known as matsubara-lunch === salgado is now known as salgado-lunch === matsubara-lunch is now known as matsubara === salgado-lunch is now known as salgado === thumper_laptop is now known as thumper === salgado is now known as salgado-afk