[00:10] OOPS-1362EA10197 [00:10] https://lp-oops.canonical.com/oops.py/?oopsid=1362EA10197 === danilos-afk is now known as danilos [16:00] #startmeeting [16:00] Meeting started at 10:00. The chair is matsubara. [16:00] Commands Available: [TOPIC], [IDEA], [ACTION], [AGREED], [LINK], [VOTE] [16:00] Welcome to this week's Launchpad Production Meeting. For the next 45 minutes or so, we'll be coordinating the resolution of specific Launchpad bugs and issues. [16:00] [TOPIC] Roll Call [16:00] New Topic: Roll Call [16:00] me [16:00] Not on the Launchpad Dev team? Welcome! Come "me" with the rest of us! [16:00] ni! [16:00] me [16:00] me [16:00] me [16:00] me, sitting in for sinzui [16:01] Ursinha, Chex: hi [16:01] me [16:01] * Chex is here [16:01] danilos, are you joining us? [16:01] matsubara, flaky internet [16:01] allenap, hi [16:01] me [16:01] matsubara: yes [16:01] me [16:02] danilos, shall I make you the default translations person for the LP prod meeting? [16:02] me? [16:02] you! [16:02] hi mbarnett [16:02] welcome :-) [16:02] :) [16:02] ok, everyone is here [16:02] [TOPIC] Agenda [16:02] New Topic: Agenda [16:02] * Actions from last meeting [16:02] * Oops report & Critical Bugs & Broken scripts [16:02] * Operations report (mthaddon/Chex/spm/mbarnett) [16:02] * DBA report (stub) [16:02] * Proposed items [16:02] [TOPIC] * Actions from last meeting [16:02] New Topic: * Actions from last meeting [16:03] * barry to continue debug on bug 403606 after finishing 3.0 UI stuff [16:03] * matsubara to trawl logs related to high load on edge yesterday and ping Chex about it [16:03] Launchpad bug 403606 in launchpad-registry "ExpatError errors should be handled to not generate the OOPSes" [High,Triaged] https://launchpad.net/bugs/403606 [16:03] matsubara: yes [16:03] I still suck, haven't done that one. [16:03] [action] * matsubara to trawl logs related to high load on edge yesterday and ping Chex about it [16:03] ACTION received: * matsubara to trawl logs related to high load on edge yesterday and ping Chex about it [16:04] EdwinGrubbs, I'll keep the action item for barry in the list since 3.0 crazyness is not over yet :-) [16:04] [action] * barry to continue debug on bug 403606 after finishing 3.0 UI stuff [16:04] ACTION received: * barry to continue debug on bug 403606 after finishing 3.0 UI stuff [16:04] Launchpad bug 403606 in launchpad-registry "ExpatError errors should be handled to not generate the OOPSes" [High,Triaged] https://launchpad.net/bugs/403606 [16:04] [action] matsubara to update translations qa contact on meeting agenda page [16:04] ACTION received: matsubara to update translations qa contact on meeting agenda page [16:05] [TOPIC] * Oops report & Critical Bugs & Broken scripts [16:05] New Topic: * Oops report & Critical Bugs & Broken scripts [16:05] Ursinha, go ahead [16:06] hmm Ursinha was having connectivity problems before the meeting [16:07] ok, so, let me take over for what I can [16:08] bac is doing a great job with managing release of 3.0 and any critical bugs [16:08] what we've seen is already listed on https://dev.launchpad.net/CurrentRolloutBlockers [16:08] thanks. getting lots of good help. [16:08] we have noticed increased timeouts on production DBs, due to postgres restart [16:09] for more details, Ursinha will have to fill us in [16:09] we had a bunch of disconnection errors as well [16:09] matsubara: around the time of rollout or? (because that'd be expected if so) [16:09] danilos, not sure, Ursinha just sms'ed me and asked me to bring it up [16:10] but I take the increase on those are due to the rollout, yes [16:11] so, Ursinha and I will keep looking for the new oopses that happened since the rollout [16:11] matsubara: I suggest we get more detailed input from Ursula when she shows up back online, but go on with the meeting for now; for what we know, all critical issues are being taken care of, and we are hoping for a re-rollout tonight [16:11] and let each affected team about them and add them to the CRB page [16:11] matsubara: thanks matsubara, that'd be great [16:11] we had some script failures as well related to the rollout [16:11] and another one unrelated which bigjools already replied to [16:12] (which is the packagediffs thing) [16:12] we have 6 critical bugs [16:12] fixed in the rollout [16:13] 2 fix committed, 3 in progress and one triaged [16:13] danilos, the one triaged is in rosetta. what's up? are you expecting to land code for that one for the second rollout? [16:13] bac, do you have a time for the second rollout? [16:14] matsubara: it's in progress [16:14] thanks bigjools [16:14] danilos, I mean bug 435891 [16:14] Launchpad bug 435891 in rosetta "recent update broke the urls used in launchpad integration" [Critical,Triaged] https://launchpad.net/bugs/435891 [16:14] matsubara: it's in progress [16:14] matsubara: we anticipate a second roll out. perhaps later today, though no decision has been made yet. [16:14] danilos, it's not what LP says :-) [16:15] matsubara: it's what I say :) [16:15] bug 435891 is in progress :) [16:15] danilos, I trust you to beat LP and make it behave :-) [16:15] thanks danilos [16:15] the other in-progress one is almost landed (in pqm) [16:15] matsubara: Disconnection errors [16:15] about them [16:16] do you know if they happened before or after the roll-out? [16:16] if they happened before/during the roll-out they are expected [16:16] we had a bug, now fixed, that logged DisconnectionError as regular OOPS [16:16] are you in the production meeting? [16:16] where it should be logged as a soft oops [16:16] bac: yes, this is the production meeting [16:16] sorry, wrong window [16:16] so DisconnectionError logged after the roll-out are a problem [16:16] flacoste, after the meeting I'll generate a new summary to find that out. [16:17] ones before or during it (when running with unfixed code) is not a problem [16:17] [action] matsubara to generate new oops summary including oopses only for after the rollout [16:17] ACTION received: matsubara to generate new oops summary including oopses only for after the rollout [16:17] matsubara, bac: we can determine the second roll-out time after we have that report [16:18] matsubara: i was just made aware of bug 435628 that the bugs team is working on for a re-roll [16:18] I'll defer the answer to bac [16:18] Launchpad bug 435628 in malone "Attempting to file a bug on an Ubuntu source packages OOPSes when bug filing is disabled" [High,Triaged] https://launchpad.net/bugs/435628 [16:18] barry was going to look at annotating those oopses so we can tell if they where DisconnectionErrors returned to users, or just part of the normal reconnection workflow. [16:19] matsubara: i agree with flacoste that we can make a decision after getting your report [16:19] all right [16:19] erm... gary... not barry [16:19] :-) [16:19] heh [16:19] ok, I think that's all for this section [16:19] let's move on [16:20] thanks everyone! [16:20] [TOPIC] * Operations report (mthaddon/Chex/spm/mbarnett) [16:20] New Topic: * Operations report (mthaddon/Chex/spm/mbarnett) [16:20] HI everyone, brief LOSA report this week. [16:20] - LP 3.0 rollout: The rollout dominated Launchpad work this week. Briefly, things went fairly well, we [16:20] observed a number of items in the rollout process that are being addressed. We have emailed out the [16:20] full report today. [16:21] And that it for us, unless anyone has questions? [16:22] three cheers for losas. :-) [16:22] thanks Chex [16:22] gary_poster: I think you mean three *beers* for each losa... [16:22] lol, yeah, probably more appreciated [16:22] :) [16:22] * stub raises a glass and says 'cheers' [16:23] * Chex drinks to that [16:23] so, drunken DB report comes next? :) [16:23] let's go to the bar, err, move on [16:23] [TOPIC] * DBA report (stub) [16:23] New Topic: * DBA report (stub) === stub is now known as drunken_master [16:24] Database update seems to have gone smoothly apart from Bug #435674 being discovered. This will have caused a short hang of the SSO servers, but it was probably quick and hopefully nobody noticed. [16:24] :) [16:24] After the update, I needed to do some data migration. That increased replication lag, putting extra load on the master db. All done now and load is back to normal. This will have contributed to timeouts, so you may wants to attribute timeouts before 1200 UTC to this. [16:24] The appservers where observed in the wild basing their decision to use the slave database on how lagged that particular slave is, not how lagged the cluster is as a whole. This means that building new slave databases will no longer stop the appservers going into master-only mode. [16:24] :) [16:24] Launchpad bug 435674 in launchpad-foundations "fti.py wants to lock all replication sets" [High,Triaged] https://launchpad.net/bugs/435674 [16:24] oot. [16:24] lol === drunken_master is now known as stub [16:24] heh [16:25] thanks for the update drunken_master, always appreciated [16:25] thanks stub [16:26] [TOPIC] * Proposed items [16:26] New Topic: * Proposed items [16:26] no proposed items [16:26] anything else before I close? [16:26] hi Ursinha [16:26] just in time :-) [16:26] I just want to say sorry [16:27] and congrats to all lp team because LP 3.0 rocks [16:27] :) [16:27] indeed! [16:27] Thank you all for attending this week's Launchpad Production Meeting. See https://dev.launchpad.net/MeetingAgenda for the logs. [16:27] #endmeeting [16:27] Meeting finished at 10:27. [16:27] thank you! === cprov is now known as cprov-lunch === cprov-lunch is now known as cprov === matsubara is now known as matsubara-lunch === salgado is now known as salgado-lunch === ursula_ is now known as Ursinha === danilos is now known as danilo-afk === matsubara-lunch is now known as matsubara === EdwinGrubbs is now known as Edwin-lunch === ursula_ is now known as Ursinha === salgado-lunch is now known as salgado === Edwin-lunch is now known as EdwinGrubbs === cprov is now known as cprov-afk === salgado is now known as salgado-afk === matsubara is now known as matsubara-afk === flacoste is now known as flacoste_afk === Ursinha is now known as Ursinha-afk