=== mrevell is now known as mrevell-lunch === bac_afk is now known as bac === salgado-afk is now known as salgado === mrevell-lunch is now known as mrevell [15:57] #startmeeting [15:57] Meeting started at 09:57. The chair is matsubara. [15:57] Commands Available: [TOPIC], [IDEA], [ACTION], [AGREED], [LINK], [VOTE] [15:58] hmm it seems that my clock is a bit early [15:58] matsubara: isn't it in 1h? [15:58] No, 2 min. [15:58] 1500 UTC [15:59] * rockstar preemptively me-s [15:59] ok, I can wait another 2 min :-) [16:00] * matsubara uses the time to actually fix ntp on this computer [16:00] NOW! [16:00] ok, thanks rockstar [16:00] Welcome to this week's Launchpad Production Meeting. For the next 45 minutes or so, we'll be coordinating the resolution of specific Launchpad bugs and issues. [16:00] :) [16:00] [TOPIC] Roll Call [16:00] New Topic: Roll Call [16:00] so, who's here today? [16:01] Ursinha, Rinchen ? [16:01] matsubara, me and mrevell are here but we're on a call [16:01] me [16:01] we'll be here if you need us [16:01] me [16:01] Rinchen: okie [16:01] so, bugs and code are here [16:01] cprov: ? [16:01] herb: ? [16:01] me [16:02] * mpt waves [16:02] me (kinda) [16:02] soyuz is here [16:02] me [16:02] hi mpt! [16:02] me [16:02] foundations is here [16:02] translations is here [16:02] sinzu1: hi === sinzu1 is now known as sinzui [16:02] hi [16:02] ok, registry is here. [16:02] me [16:03] flacoste: are you proxying for stub? or stub is here? [16:03] stub? [16:03] ? [16:03] he' s here :-) [16:03] ok, so DBA is here :-) [16:03] cool [16:03] we're missing someone from losas then [16:03] they can join us later, let's move on [16:03] [TOPIC] Agenda [16:03] New Topic: Agenda [16:04] * Next meeting [16:04] * Actions from last meeting [16:04] * Oops report & Critical Bugs [16:04] * Operations report (mthaddon/herb/spm) [16:04] * DBA report (DBA contact) [16:04] [TOPIC] * Next meeting [16:04] New Topic: * Next meeting [16:04] so, same time next week? [16:04] me [16:04] yes [16:04] :) [16:04] hi herb [16:05] all right then [16:05] [TOPIC] * Actions from last meeting [16:05] New Topic: * Actions from last meeting [16:05] no actions from last meeting [16:05] [TOPIC] * Oops report & Critical Bugs [16:05] New Topic: * Oops report & Critical Bugs [16:05] Ursinha: stage is yours [16:05] bug 276950 [16:05] Launchpad bug 276950 in soyuz "Timeout accepting more than 6 packages in queue page" [Undecided,New] https://launchpad.net/bugs/276950 [16:05] matsubara filed this one yesterday [16:05] right? [16:06] BjornT is looking at that right now [16:06] cool [16:06] bug 277129 [16:06] Launchpad bug 277129 in launchpad-foundations "OOPS when attempting to render a timeout page" [Undecided,New] https://launchpad.net/bugs/277129 [16:06] i just filed it, can anybody take a look> [16:06] ? [16:06] flacoste, I just talked with salgado about that [16:07] he's not aware of that, even having one OOPS authenticated as him [16:07] and what was his assessment? [16:07] I don't remember seeing that OOPS [16:07] he never saw that [16:08] anyway, stub said that it's a timeout rendering page problem [16:08] as I wrote on the bug [16:08] it doesn't look like that to me [16:08] hmm, not sure of that looking at the traceback [16:08] it's been happening a few times these days and with different users [16:09] we don't have time to look into this week, but we can next week [16:09] flacoste, ok, if you want to give you two cents on what that would be, go ahead [16:10] flacoste, and ok, thanks [16:10] [ACTION] foundations to look up bug 277129 [16:10] ACTION received: foundations to look up bug 277129 [16:10] Launchpad bug 277129 in launchpad-foundations "OOPS when attempting to render a timeout page" [High,Triaged] https://launchpad.net/bugs/277129 [16:10] the last bug is the critical of translations, bug 273489 [16:10] Launchpad bug 273489 in rosetta "Remaining Intrepid template approvals" [Critical,Fix committed] https://launchpad.net/bugs/273489 [16:10] danilos, when the script finished running? [16:10] Ursinha: that was a few days ago, we are basically handling the manual stuff right now [16:11] danilos, ok [16:11] Ursinha: that one might even be 'Fix released' so far, I'll make sure to check with jtv [16:11] danilos, ok, thanks [16:11] matsubara, you can move on [16:11] thanks guys [16:11] thanks Ursinha [16:11] thanks everyone. [16:11] we have another critical bug which is already fix committed [16:12] so, I won't bring it up again [16:12] [TOPIC] * Operations report (mthaddon/herb/spm) [16:12] New Topic: * Operations report (mthaddon/herb/spm) [16:12] * 2008-09-25: Cherry pick r7080 to the FTP master, updating cron. [16:12] * 2008-09-26: Cherry pick r7087 to the scripts server. [16:12] * 2008-09-29: Cherry pick zope r46 to lpnet* [16:12] * We still have app servers dying periodically. mwhudson provided some additional information to gather when core dumps are found. [16:12] * Codebrowse was unresponse and needed to be restarted a couple of times this week. [16:13] flacoste: have you been able to look through the backtraces? anything of value there? [16:13] re: app servers dying. [16:13] nope [16:13] well, nothing that I saw [16:13] ok [16:13] so are we at a dead end? [16:13] i was very impressed by the kind of output you can get though [16:14] when you know what you are doing [16:14] heh [16:14] and i was waiting to have at least another to see a pattern [16:14] ok [16:14] Well that's it from the losas unless there are any questions for us. [16:15] ok. thanks herb [16:15] thanks [16:15] [TOPIC] * DBA report (stub) [16:15] New Topic: * DBA report (stub) [16:15] Nothing thrilling happening with the production systems - business as usual. [16:15] DB patch review call with Mark scheduled for tomorrow, although that might not be relevant to this forum now. [16:15] demo.launchpad.net running with a replicated database backend just fine. The updated db maintenance scripts will be landing when the db freeze is lifted on launchpad/devel, so Friday or Monday. devs can run replicated if they want easily - we may want to give thought to making this the default. [16:16] ping? [16:16] stub: sounds like a good idea to run it replicated locally. is it going to give a big hit in the performance? [16:16] yes [16:16] stub: we hear you [16:16] stub: we discussed rolling out replicated yesterday [16:16] we weren't sure it was a good idea [16:17] one question came though is do we have to take LP down to set-up the replication? [16:17] I don't think we would want to take the hit running tests against a replicated environment. [16:17] The appservers and systems will only need to be down for a few minutes. [16:18] It will take three hours to build the replica, but that can happen live. The appservers of course shouldn't be making use of the replica as it won't have data on it at that point. [16:18] Once the replica is built, you change the appserver configs and bounce them. [16:19] ok, that makes sense [16:19] I should add a master-only flag or something to make that easy, as it will probably be standard rollout procedure. [16:19] so it was deemed safer to not roll-out replication to staging next week [16:20] well, run staging replicated [16:20] and turn it on week-0 [16:20] if we need the performance boost [16:20] we can turn it on in production after sufficient staging QA [16:20] iow, we don't need to wait for the roll-out to turn it on in production [16:20] I have some concern about how a replicated staging will affect a replicated production, given staging runs on the same server slated as being the lpmain slave and eventually the authdb master. [16:21] But we can deal with that at the time - I think it is better to run the risk of losing staging for a while than purchase some very expensive hardware just in case. [16:22] The concern being that the staging rebuild process will be going on for 4 hours of every day (if we continue with daily db builds) - that is 4 hours of a single core being hammered. I don't think we will really know how that impacts things in reality until we try it though. [16:24] Sorry - 4.5 hours every day ;) [16:24] stub: so it means chokecherry will serve as the main readonly slave as well? [16:25] (once it all goes up, of course) [16:25] there are proposals to change staging db update to once a week [16:25] Yes. We only have two servers with suitable hardware - hackberry (current db server) and chokecherry. [16:25] ok, thanks for clarifying stub [16:26] It also impacts our disaster recovery - instead of having a hot spare, we need scale back to a single server if one explodes. [16:27] So we need to watch our load - at some point we will have too much load to run on a single box and we need to think about DR before then. [16:27] DR? [16:27] disaster recovery [16:27] Disaster Recovery [16:28] ok [16:28] i'll add that to the disaster scenarios page, if that's still up to date [16:28] anything else stub? [16:28] nah - I'm just rambling. [16:29] thanks for bringing up your concerns [16:29] I have one last thing to ask you all before closing the meeting [16:30] I've noticed that the number of "New" bugs are climbing, so please, keep an eye on the daily triage backlog. Ursinha and I can help each team, but you need to approach us. [16:30] please QA contacts, pass that on to your TLs [16:31] so I think that's all [16:31] Thank you all for attending this week's Launchpad Production Meeting. See the channel topic for the location of the logs. [16:31] #endmeeting [16:31] Meeting finished at 10:31. [16:31] thanks matsubara [16:31] Thanks matsubara === salgado is now known as salgado-lunch === salgado-lunch is now known as salgado === rockstar` is now known as rockstar === salgado is now known as salgado-afk