[12:23] <mrevell> #t/nick mrevell-lunch
[12:23] <mrevell> damn
[16:00] <matsubara> #startmeeting
[16:00] <MootBot> Meeting started at 10:00. The chair is matsubara.
[16:00] <MootBot> Commands Available: [TOPIC], [IDEA], [ACTION], [AGREED], [LINK], [VOTE]
[16:00] <matsubara> Welcome to this week's Launchpad Production Meeting. For the next 45 minutes or so, we'll be coordinating the resolution of specific Launchpad bugs and issues.
[16:00] <matsubara> [TOPIC] Roll Call
[16:00] <MootBot> New Topic:  Roll Call
[16:00] <sinzui> me
[16:00] <bigjools> me
[16:00] <matsubara> so, who is here today?
[16:00] <matsubara> me
[16:01] <matsubara> flacoste, Ursinha, Rinchen,?
[16:01] <flacoste> me
[16:01] <Ursinha> me
[16:01] <bigjools> cprov, ping
[16:01] <cprov> pong
[16:01] <cprov> me
[16:01] <herb> me
[16:02] <matsubara> we're missing, code
[16:02] <matsubara> rockstar: ?
[16:02] <intellectronica> me
[16:02] <flacoste> i'm Foundations and the DBA report
[16:02] <matsubara> Ursinha: who's the qa contact for rosetta?
[16:02] <matsubara> danilo[home]__: ?
[16:02] <danilos> me
[16:02] <Ursinha> danilo[home]__, ping
[16:03] <matsubara> ah hinding there
[16:03] <matsubara> ok, so we're missing code
[16:03] <danilos> matsubara: right, danilo[home]__ is, surprisingly, my account at home :)
[16:03] <matsubara> let's move on then
[16:04] <matsubara> [TOPIC] Agenda
[16:04] <MootBot> New Topic:  Agenda
[16:04] <matsubara>  * Next meeting
[16:04] <matsubara>  * Actions from last meeting
[16:04] <matsubara>  * Oops report & Critical Bugs
[16:04] <matsubara>  * Operations report (mthaddon/herb/spm)
[16:04] <matsubara>  * DBA report (DBA contact)
[16:04] <matsubara>  * Sysadmin requests (Rinchen)
[16:04] <matsubara> [TOPIC] Next meeting
[16:04] <MootBot> New Topic:  Next meeting
[16:04] <matsubara> next meeting, same time next week? ok for everyone?
[16:04] <flacoste> yep
[16:04] <Rinchen> me
[16:04] <rockstar> matsubara, here
[16:04] <flacoste> stub should be able to make it
[16:05] <matsubara> Rinchen, rockstar: noted. thanks
[16:05] <flacoste> he had a previous engagement tonight
[16:05] <Ursinha> flacoste, nice
[16:05] <matsubara> great
[16:05] <rockstar> I forgot that we were doing this earlier.
[16:05] <matsubara> [TOPIC] * Actions from last meeting
[16:05] <MootBot> New Topic:  * Actions from last meeting
[16:05] <matsubara>  * stub to patch our fti regexp to avoid OOPSes (bug 174368) and discuss a proper fix with jtv
[16:05] <flacoste> matsubara: that's not started
[16:06] <Ursinha> a hanging bug
[16:06] <flacoste> honestly, i don't think it's high priority
[16:06] <matsubara> hmm shall I keep bring it up during this meeting?
[16:06] <flacoste> it mostly trigger an OOPS when somebody tries spamming one of our search fields
[16:06] <flacoste> well deserved imho
[16:06] <flacoste> :-)
[16:07] <Ursinha> :)
[16:07] <matsubara> :-)
[16:07] <Ursinha> matsubara, well, guess not so
[16:07] <flacoste> but you might have a different opinion?
[16:07] <matsubara> I'd like to have it targeted to a milestone at least
[16:07] <Ursinha> yes
[16:07] <matsubara> so we won't keep pushing oops bugs to the end of the queue
[16:07] <bigjools> well, if it's acceptable it should not be an OOPS should it?
[16:08] <flacoste> it's not acceptable
[16:08] <flacoste> there is some rare, but legitimate query that are affected by it
[16:08] <matsubara> it's important that we fix OOPS bugs, even if they affect a few users
[16:08] <danilos> targeting to a milestone doesn't mean much if there is no real dedication to finish it in a certain timeframe
[16:08] <flacoste> yepp
[16:09] <Ursinha> danilos, indeed
[16:09] <danilos> some bugs, like this one, for example, are not known how much they might take to solve (it might be a bunch of different things)
[16:09] <flacoste> i'll keep it on our radar
[16:09] <matsubara> well, that's the point of targeting it, isn't it? you set a deadline to fix it
[16:09] <flacoste> and try to get to it at the end of the cycle
[16:09] <matsubara> all right. thanks flacoste
[16:09] <matsubara> I take it off from the actions from last meeting
[16:10] <matsubara> [TOPIC] * Oops report & Critical Bugs
[16:10] <MootBot> New Topic:  * Oops report & Critical Bugs
[16:10] <matsubara> Today's oops report is about bugs 271561, 273363
[16:10] <matsubara> rockstar, any news about #271561?
[16:11] <matsubara> that's been happening at least once a day and I didn't see any progress in the bug report.
[16:11] <rockstar> matsubara, it's being worked on, that's all I know about it.
[16:11] <matsubara> flacoste, do you think bug 273363 might be related to bug 271902?
[16:11] <sinzui> Edwin has a fix for tno attribute 'read_only' I beleive
[16:11] <flacoste> matsubara: stub has a fix in review for that one
[16:11] <flacoste> and yes, they are dupped
[16:12] <flacoste> sinzui?
[16:12] <matsubara> sinzui and flacoste you might want to coordinate who will fix it then? :-)
[16:12] <flacoste> well, it's assigned to stuart
[16:12] <matsubara> but I guess that's on stub's turf
[16:12] <sinzui> 273363 is assigned to Edwin
[16:12] <flacoste> lol
[16:12] <matsubara> the other one is assigned to EdwinGrubb
[16:12] <flacoste> well, stuart has a branch fixing both in review
[16:13] <sinzui> It is caused by Edwins fix to the cookie issue with feeds
[16:13] <flacoste> i'll speak to Edwin
[16:13] <matsubara> rockstar: can you assign it to the devel fixing that issue and change the status to in progress?
[16:13] <flacoste> matsubara: can you dup it?
[16:13] <matsubara> flacoste: sure
[16:13] <rockstar> matsubara, sure.
[16:13] <matsubara> thanks guys.
[16:13] <Ursinha> ok
[16:13] <matsubara> Ursinha: stage is yours
[16:14] <Ursinha> one critical, bug 273489
[16:14] <Ursinha> danilos, i've sent one email to jtv yesterday, to get more details on the problem
[16:14] <danilos> right
[16:14] <Ursinha> can you help me with that after the meeting?
[16:14] <danilos> this has basically been 'fix committed'
[16:14] <danilos> we are now importing all the Intrepid templates
[16:14] <Ursinha> right
[16:14] <matsubara> I also have one soyuz critical as well. yesterday cprov identified an oops that affected 60 or so PPA's
[16:15] <danilos> and this morning there were around 13K files left to import (yesterday afternoon around 18K)
[16:15] <danilos> so, this should be completely fixed in two days at most
[16:15] <matsubara> cprov: can we expect a IR for that one? I presume no bug was filed and things were already fixed, right?
[16:15] <Ursinha> danilos, great, thanks
[16:15] <cprov> matsubara: and it was fixed at that time as well, which a production update query.
[16:15] <bigjools> it was only edge that was affected
[16:15] <Ursinha> matsubara, which bug is it?
[16:15] <matsubara> so, no code change needed?
[16:15] <cprov> matsubara: no
[16:16] <danilos> Ursinha: ping me after the meeting for more details, but right after the meeting, I am likely to get out soon
[16:16] <matsubara> Ursinha: no bug reported for that one
[16:16] <cprov> matsubara: as I said, we just had to rush the data migration.
[16:16] <matsubara> cprov: ok. so, Rinchen asked about doing an IR for that.
[16:17] <Ursinha> matsubara, are we going to file one?
[16:17] <Rinchen> Did file a critical bug for that?
[16:17] <Rinchen> so, if we didn't file a bug....
[16:17] <matsubara> no, I can file one, but it's kinda pointless, isn't it? since the problem is fixed
[16:18] <Rinchen> matsubara, only in the sense that we fixed the problem, not what caused the problem
[16:18] <matsubara> and there'll be no code change
[16:18] <bigjools> it's not a bug, it's an edge rollout issue
[16:18] <Rinchen> so let's start please with a write up to the dev list about what happened and how to prevent it and not do an IR
[16:18] <Rinchen> is that acceptable to everyone?
[16:18] <bigjools> cprov already did that last night
[16:18] <cprov> Rinchen: yes, email already sent.
[16:19] <Rinchen> ok, thanks. I've been searching for it but haven't found it yet. :-(
[16:19] <Rinchen> I'll keep looking.
[16:19] <Rinchen> Thanks!
[16:19] <Rinchen> and thanks for resolving it quickly last nigiht
[16:19] <matsubara> thanks
[16:20] <matsubara> moving on
[16:20] <Ursinha> thanks guys
[16:20] <matsubara> [TOPIC] * Operations report (mthaddon/herb/spm)
[16:20] <MootBot> New Topic:  * Operations report (mthaddon/herb/spm)
[16:20] <herb> * 2008-09-19 - Updated 2.1.9 to r7035. This update included planned downtime. The service was down for approximately an hour.
[16:20] <herb> * During the week we've had a few app servers die and leave core files. flacoste was investigating.
[16:20] <herb> * 2008-09-23 - Cherry pick r7058 and r7064 to the scripts server and bzrsyncd server respectively.
[16:20] <herb> * 2008-09-24 - Cherry pick r7066 and r7072 to lpnet*, update edge* to r7072.
[16:20] <flacoste> herb: so i investigated the core files
[16:20] <herb> flacoste: any update on the dying app servers?
[16:20] <herb> cool
[16:21] <flacoste> unfortunately, the stack track is pretty useless
[16:21] <herb> boo
[16:21] <flacoste> seems like accessing corrupted memory
[16:21] <flacoste> things that barry and mwhudson said we could consider is running the appserver
[16:21] <flacoste> using python2.4-dbg
[16:21] <flacoste> which has some more debugging stuff in it
[16:21] <flacoste> but it requires that the packages we are using also have a -dbg build
[16:21] <flacoste> and are three times slower
[16:22] <flacoste> another interesting thing
[16:22] <herb> that's less than ideal.
[16:22] <flacoste> is the stack trace posted by jtv
[16:22] <flacoste> that occured in one of his script
[16:22] <flacoste> it seems to point to a zope or storm problem
[16:22] <flacoste> in the case of zope, the landscape team has a fixed for it
[16:23] <flacoste> we should get it by moving to zope 3.4 (which we are going to attempt next week)
[16:23] <flacoste> but it wasn't clear from the discussion i overheard if it looked like the same problem they experienced
[16:23] <flacoste> another hypothesis made it a problem with the Storm C extensions
[16:24] <flacoste> it's a good working hypothesis that the script and app server death is related to the same symptom
[16:24] <flacoste> the fact that we have a better stack trace in the script case is probably due to the fact that it runs single-threaded
[16:24] <herb> ok. short of running with all -dbg packages is there anything we can do to help isolate the problem?
[16:24] <flacoste> so next step is to follow-up on the jtv, gustavo, barry discussion
[16:25] <flacoste> and see what conclusions was there
[16:25] <flacoste> and if there is something we can try from there
[16:25] <flacoste> one last thing
[16:25] <flacoste> i gave mthaddon the command to extract a stacktrace from a core file
[16:25] <flacoste> that's really the only thing we need or can do with it
[16:25] <mthaddon> https://launchpad.canonical.com/OSA/HowTo/BacktraceFromCoredump
[16:25] <flacoste> so you could just save the stack trace instead of the whole core files
[16:25] <flacoste> mthaddon: awesome!
[16:25] <herb> flacoste: ok. cool.
[16:26] <flacoste> EOT unless there are questions
[16:26] <herb> that's it from the LOSAs unless there are any questions.
[16:26] <matsubara> cool. thanks for the thoroughly explanation flacoste
[16:26] <matsubara> and thanks herb, mthaddon!
[16:26] <matsubara> [TOPIC] * DBA report (DBA contact)
[16:26] <MootBot> New Topic:  * DBA report (DBA contact)
[16:26] <flacoste> Nothing unusual I'm aware of on the production database.
[16:26] <flacoste> Replication testing scheduled this week using demo.launchpad.net as
[16:26] <flacoste> per discussions with Francis. Turn off the monitoring if it was
[16:27] <flacoste> switched back on.
[16:27] <flacoste> Assuming demo.launchpad.net testing doesn't push us back to the
[16:27] <flacoste> drawing board, I want to have a replication version of the  staging
[16:27] <flacoste> rollout scripts ready for next cycle, which should involve some
[16:27] <flacoste> testing this cycle to ensure they actually work.
[16:27] <flacoste> Schedule is to have the production Launchpad database replicated as
[16:27] <flacoste> part of the 2.1.11 release. Staging running replicated for the whole
[16:27] <flacoste> cycle should give enough experience for signoff from everyone. There
[16:27] <flacoste> should be no unusual downtime requirements. I will need to be around
[16:27] <flacoste> for the rollout though.
[16:27] <flacoste> The new DB baseline doesn't affect production or staging.
[16:27] <flacoste> mthaddon, herb: stuart will send you over notes on the changes this means for the staging roll-out process
[16:27] <flacoste> i can take questions
[16:28] <mthaddon> flacoste, great, thx
[16:28] <herb> flacoste: ok.  when should we expect notes?
[16:28] <flacoste> not before the end of next week i think
[16:28] <flacoste> it will take that much time to test on demo
[16:28] <flacoste> and then upgrade the staging scripts
[16:28] <herb> flacoste: at a high level how will this change the rollout process?
[16:28] <matsubara> flacoste: can Ursinha and I help with the testing somehow?
[16:28] <flacoste> the restoration of the DB
[16:29] <flacoste> matsubara: i don't think so, at this stage it's not about QA, but more about seeing performance
[16:29] <flacoste> matsubara: i'll talk to stuart and forward the offer, i might be mistaken
[16:29] <flacoste> herb: we'll be having a replicated DB (so a master and a slave) on staging
[16:30] <matsubara> flacoste: okie. tell him to ping/email us if he needs something
[16:30] <flacoste> so this affects how the DB sync is done
[16:30] <herb> flacoste: ok
[16:30] <herb> flacoste: thanks
[16:30] <matsubara> all right. thanks flacoste
[16:30] <matsubara> [TOPIC] * Sysadmin requests (Rinchen)
[16:30] <MootBot> New Topic:  * Sysadmin requests (Rinchen)
[16:30] <Rinchen> Hi!
[16:30] <Rinchen> Is anyone blocked on an RT or have any that are becoming urgent?
[16:31] <matsubara> I have one RT #31795, which I'd like to suggest priority ~80
[16:31] <Rinchen> Does anyone thing this section of the meeting is worthwhile?
[16:31] <rockstar> Rinchen, it might if me had an RT...
[16:31] <Rinchen> :-)
[16:31] <rockstar> s/me/we
[16:31] <danilos> Rinchen: I guess we are free to raise RT tickets with you and our team leads whenever we want anyway
[16:31] <bigjools> I think it is worthwhile
[16:31] <Rinchen> matsubara, ok, will look into that
[16:31] <intellectronica> maybe we should make it possible for people to add an RT-related item to the agenda before the meeting, if they have one, but skip the section if no-one does
[16:31] <matsubara> Rinchen: thank you
[16:31] <bigjools> if nothing else it acts as a memory jog
[16:32] <danilos> intellectronica: +1
[16:32] <matsubara> intellectronica: I like that
[16:32] <Rinchen> danilos, intellectronica - yeah, that was my point.  If others find it helpful though, I'm happy to continue it.
[16:32] <Rinchen> it's not like it's a hard thing to do :-D
[16:32] <Rinchen> I'll let matsubara and Ursinha make the call.
[16:32] <Rinchen> Any other tickets?
[16:33] <Rinchen> ok, if you do have something, please ping me!
[16:33] <Rinchen> thanks matsubara
[16:33] <matsubara> ok, let's experiment with intellectronica suggestion for awhile then. I'll update MeetingAgenda page to reflect that
[16:33] <matsubara> thank you Rinchen
[16:34] <matsubara> anything else before I close?
[16:34] <matsubara> all right
[16:34] <matsubara> Thank you all for attending this week's Launchpad Production Meeting. See the channel topic for the location of the logs.
[16:35] <matsubara> actually the log part is a lie
[16:35] <flacoste> thzx!
[16:35] <matsubara> but thanks!
[16:35] <matsubara> #endmeeting
[16:35] <MootBot> Meeting finished at 10:35.
[16:35] <danilos> thanks all, matsubara especially for running the meeting :)
[16:36] <matsubara> np
[16:36] <intellectronica> thanks, matsubara
[16:36] <Ursinha> thanks matsubara