/srv/irclogs.ubuntu.com/2009/04/02/#launchpad-meeting.txt

matsubara#startmeeting16:00
MootBotMeeting started at 10:00. The chair is matsubara.16:00
MootBotCommands Available: [TOPIC], [IDEA], [ACTION], [AGREED], [LINK], [VOTE]16:00
matsubaraWelcome to this week's Launchpad Production Meeting. For the next 45 minutes or so, we'll be coordinating the resolution of specific Launchpad bugs and issues.16:00
matsubara[TOPIC] Roll Call16:00
MootBotNew Topic:  Roll Call16:00
rockstarme16:00
herbme16:01
cprovme16:01
sinzuime16:01
matsubaraUrsinha:16:01
Ursinhame16:01
stubme (on the right server this time)16:01
danilosme (if no call)16:02
flacosteme16:02
matsubaraintellectronica: hi16:02
intellectronicame16:03
matsubaraall right, everyone here16:03
matsubara[TOPIC] Agenda16:03
MootBotNew Topic:  Agenda16:03
matsubara * Actions from last meeting16:03
matsubara * Oops report & Critical Bugs16:03
matsubara * Operations report (mthaddon/herb/spm)16:03
matsubara * DBA report (stub)16:03
matsubara[TOPIC] * Actions from last meeting16:03
MootBotNew Topic:  * Actions from last meeting16:03
matsubara  * intellectronica to make efforts to take a look at bug 32990816:03
matsubara  * sinzui to talk to kiko about pending cp requests16:03
ubottuLaunchpad bug 329908 in malone "DownloadFailed OOPS when reporting a bug with apport (dup-of: 349646)" [Undecided,New] https://launchpad.net/bugs/32990816:03
ubottuLaunchpad bug 349646 in malone "apport uploads not being found in +filebug" [Undecided,Fix released] https://launchpad.net/bugs/34964616:03
intellectronicamatsubara: that's fixed16:04
matsubarawell, sinzui's one is not needed anymore since that's been released16:04
matsubarathanks intellectronica16:04
sinzuimatsubara: I removed the requests because it was close to the rollout and the items were not critical16:04
matsubarasinzui: sure. thanks for checking16:04
matsubaramoving on16:04
matsubara[TOPIC] * Oops report & Critical Bugs16:04
MootBotNew Topic:  * Oops report & Critical Bugs16:04
* sinzui has a question about what is critical for unmaintaines app16:05
matsubaraUrsinha: ?16:05
Ursinhame16:05
Ursinha4 bugs to talk about16:05
Ursinhamatsubara wants to talk about bug 35353016:06
Ursinha• bigjools, bug 347194, fixed as RC but still appears on lpnet16:06
Ursinha• sinzui: bug 35386316:06
Ursinha• bigjools, bug 353568, timeout at +source/package page16:06
ubottuLaunchpad bug 353530 in malone "OOPS filing a bug using the email interface " [Undecided,New] https://launchpad.net/bugs/35353016:06
matsubarasinzui: good question. You mean blueprint stuff?16:06
ubottuLaunchpad bug 347194 in soyuz "IntegrityError: duplicate key value violates unique constraint "binarypackagerelease_binarypackagename_key"" [High,Fix committed] https://launchpad.net/bugs/34719416:06
ubottuLaunchpad bug 353863 in launchpad-registry "TypeError when finishing creating user account in lpnet" [Undecided,New] https://launchpad.net/bugs/35386316:06
ubottuLaunchpad bug 353568 in soyuz "ubuntu/source/package/+index timing out" [High,Triaged] https://launchpad.net/bugs/35356816:06
Ursinhashould we raise bug 353568 to critical?16:06
matsubarasinzui: I think we need to raise that question in the list16:06
matsubaracprov: what's up wit hteh ones bigjools fixed?16:07
flacosteme again16:07
matsubarahi francis16:07
flacosteanother X lock-up16:07
flacostewhat did i miss?16:07
matsubarawe're doing the oops section16:07
Ursinhaflacoste, the bugs we'll discuss16:08
sinzuiUrsinha: That looks like a critical bug to me16:08
cprovmatsubara: I don't know, AFAICT it's not fixed.16:08
matsubaraso far nothing for foundations16:08
sinzuiUrsinha: I will give it to salgado who is already looking into login/account issues16:08
Ursinhasinzui, I couldn't reproduce that, don't know if matsubara tried that16:08
matsubarathose oopses are likely to be candidates for RC and next re-roll16:08
Ursinhafor sure16:08
matsubaraUrsinha: I did not16:08
Ursinhathanks sinzui16:09
flacostewhat login/account issues are we having?16:09
sinzuiUrsinha: salgado saw many oopses he could not reproduce, but I think he can at least explain why16:09
cprovmatsubara: I will look at it this afternoon, maybe I can do something quick to stop the timeout in production16:09
Ursinhaflacoste, bug 35386316:09
ubottuLaunchpad bug 353863 in launchpad-registry "TypeError when finishing creating user account in lpnet" [Undecided,New] https://launchpad.net/bugs/35386316:09
salgadoI'll need help with this one16:09
matsubarare: bug 353530, intellectronica could you take a look? it's about the OOPS in filing bug using the email interface but I'm not sure that scpecific oops is under Bugs responsability16:09
ubottuLaunchpad bug 353530 in malone "OOPS filing a bug using the email interface " [Undecided,New] https://launchpad.net/bugs/35353016:09
matsubaracprov: cool. thanks16:10
intellectronicamatsubara: according to steve's comment that's another case of missing permissions16:10
intellectronicabut i'm not clear whether it was dealt with. i'll check16:10
matsubaraI'm going to add those to the CurrentRolloutBlockers page and use that page to coordinate things that will go in for the re-roll16:10
Ursinhamatsubara, afaik that was just fixed by adding the user to the conf file in the server16:10
matsubaraintellectronica: seems to be dealt with, but my question is more in the sense on how we can avoid that in the future16:10
Ursinhaas per spm explanations16:11
Ursinhato me16:11
matsubaraso, apparently it was a unusual rollout requirement but nobody added it there16:11
matsubaraUrsinha: don't say server, we have at least 10 "servers" out there :-)16:12
Ursinhamatsubara, sorry :) s/server/server in which the conf was missing/16:12
matsubaraanyway, glancing at it, could be that the slaves were missing the right config?16:12
intellectronicaso it seems16:13
rockstarmatsubara, might that be a question for the db report section?16:13
flacosteUrsinha, matsubara: we should add test for missing permission16:13
flacostematsubara: did you file a bug about the one you wanted me to discuss with stub?16:13
matsubaraflacoste: nope, but I have the pastebin here. I'll file a bug about it right after the meeting16:14
matsubara[action] matsubara to file a bug about the missing select permissions that delayed the rollout16:14
MootBotACTION received:  matsubara to file a bug about the missing select permissions that delayed the rollout16:14
flacostethanks16:14
matsubara[action] cprov to look up soyuz bugs 347194, 35356816:15
ubottuLaunchpad bug 347194 in soyuz "IntegrityError: duplicate key value violates unique constraint "binarypackagerelease_binarypackagename_key"" [High,Fix committed] https://launchpad.net/bugs/34719416:15
MootBotACTION received:  cprov to look up soyuz bugs 347194, 35356816:15
ubottuLaunchpad bug 353568 in soyuz "ubuntu/source/package/+index timing out" [High,Triaged] https://launchpad.net/bugs/35356816:15
cprovmatsubara: the first one is fixed16:15
matsubaraerr, sorry about that, I'll edit that entry16:15
matsubara[action] matsubara to edit #347194 out of the last action :-)16:16
MootBotACTION received:  matsubara to edit #347194 out of the last action :-)16:16
cprovmatsubara: some errors happened yesterday because I had to reprocess a bunch binary uploads that failed after the rollout (due the absence of the launchpad_auth DB user)16:16
Ursinhacprov, now it makes sense16:17
matsubaraah, so that also affected other things other than the email interface.16:17
Ursinhathanks :)16:17
cprovUrsinha: yes, it was a nightmare, because the buildfarm was full and binaries could not be processed due to the lack of DB access16:18
matsubara[action] matsubara to include francis suggestion to bug 353530 and ursinha to summarize what spm told her16:18
ubottuLaunchpad bug 353530 in malone "OOPS filing a bug using the email interface " [Undecided,New] https://launchpad.net/bugs/35353016:18
MootBotACTION received:  matsubara to include francis suggestion to bug 353530 and ursinha to summarize what spm told her16:18
Ursinhaindeed16:18
matsubarasalgado: how can we help you with that one?16:19
salgadomatsubara, I'll let you know once I know. :)16:19
matsubara[action] salgado to debug and fix bug 35386316:20
ubottuLaunchpad bug 353863 in launchpad-registry "TypeError when finishing creating user account in lpnet" [Undecided,New] https://launchpad.net/bugs/35386316:20
MootBotACTION received:  salgado to debug and fix bug 35386316:20
matsubaraI think I addressed everything16:20
danilosUrsinha: has there been any outcome of the timeout discussion?16:21
matsubaraso, as usual after the release we are going to monitor the oops reports constantly and coordinate with the teams about any new oopses16:21
Ursinhadanilos, I'm going to talk about it with stub in his section16:21
danilosUrsinha: ok, thanks16:21
danilossorry for not following the script, I forgot my lines :)16:21
Ursinhadanilos, :)16:21
matsubara[action] sinzui to email the list how we should address critical bugs on unmaintained apps (e.g. blueprint)16:22
MootBotACTION received:  sinzui to email the list how we should address critical bugs on unmaintained apps (e.g. blueprint)16:22
matsubarasinzui: ^ is that correct?16:22
sinzuimatsubara: yes16:22
matsubaraok, I think that's all for this section. All the critical ones are being handled16:22
matsubarathanks everyone16:23
matsubara[TOPIC] * Operations report (mthaddon/herb/spm)16:23
MootBotNew Topic:  * Operations report (mthaddon/herb/spm)16:23
herb2009-03-30 - Experienced some DB problems that affected the service. Launchpad was unavailable for approximately 9 minutes. stub sent out an email summarizing the issues.16:23
herb2009-03-30 - Cherry picked r8054 and part of r7999.16:23
herb2009-04-01 - Rollout of 2.2.3. Total downtime was approximately 100 minutes. I think there were a few hiccups on some DB permissions, but I haven't had an opportunity to catch up with mthaddon and spm on the details.16:23
herbBug 156453 and bug 118625 continue to be a source of discomfort. I think rockstar has an update on these though.16:23
ubottuLaunchpad bug 156453 in loggerhead "production loggerhead branch leaks memory" [Critical,In progress] https://launchpad.net/bugs/15645316:23
herbBug 80895 and bug 119420 are a pain point for the LOSAs. I think something may have been scheduled for this cycle on this front. If so that's a total win from our point of view.16:23
ubottuLaunchpad bug 118625 in launchpad-bazaar "codebrowse sometimes hangs" [High,Triaged] https://launchpad.net/bugs/11862516:23
herbWhen do we think we'll be doing a re-roll?16:23
ubottuLaunchpad bug 80895 in malone "Give people five minutes to edit/delete their comment" [Undecided,Confirmed] https://launchpad.net/bugs/8089516:23
ubottuLaunchpad bug 119420 in launchpad-answers "Cannot edit a comment" [Medium,Triaged] https://launchpad.net/bugs/11942016:23
rockstarherb, I can has update!16:24
rockstar:)16:24
herbwoo!16:24
rockstarSo we have a memory middleware currently that's allowing us to track down memory issues.16:24
rockstarherb, also, mwhudson and jam have been writing a C-based memory profiler as well, so we can track refs even better in bzrlib itself.16:24
herbexcellent16:25
matsubaraherb: I'll let you know about the re-roll once we know. :-)16:25
herbmatsubara: appreciated.16:25
rockstarherb, unfortunately, I can't really tell if the "sometimes hangs" bug is related to the "leaks memory" bug.16:26
matsubaraherb: re: the DB permission, I'm going to file a bug about it and flacoste and stub will discuss it :-)16:26
herbrockstar: I suspect so, but fixing the memory issue would be a huge win.16:26
stubits not a bug, it was an operational issue16:26
Ursinhaindeed16:26
rockstarherb, yes.  If they are unrelated, it's probably a bug in one of our dependencies.16:27
stuberm... if you are talking about the same one i'm thinking off.16:27
matsubarastub: I'm talking about the permission for the SSO user16:27
stubok. different ;)16:27
matsubara:-)16:27
matsubaraok, anything else for herb?16:28
matsubarathanks herb.16:28
herbthanks matsubara16:28
matsubaraand thank mthaddon and spm for the handling the rollout so well too!16:29
matsubaramoving on.16:29
herbmatsubara: will do16:29
matsubara[TOPIC] * DBA report (stub)16:29
MootBotNew Topic:  * DBA report (stub)16:29
stubTodays Database update ran in about 100 mins with all replicas enabled. Earlier calculations indicated the downtime would be a bit under three hours. The discrepancy is staging isn't as powerful and normal staging operations are underway during the restore.16:29
stubThis was good from a downtime perspective, but does mean we can no longer get reliable rollout timings from staging. When rollout times are a concern, we might have to test the database upgrade process on a production server and calculate the time from there.16:29
stubI want to switch our master database to the new 16 core box from the current 8 core box in the next two weeks. This will require a few minutes downtime - I think a scheduled 10 minute outage will suffice. We might want to double up if there is other downtime required in the near future.16:29
stubA few days ago, generating a table bloat report managed to mess up PostgreSQL, causing all queries to the master to generate nothing but errors. A forced restart was required, causing a few minutes of downtime total The cause has been tracked down and is being worked on upstream, and we can avoid it now we know what it is (don't feed temporary tables to pgstattuple).16:29
stubI've opened a couple of bugs about batch jobs that are taking too long. I generally don't care how long things take as long as their impact is light, but staging updates and post rollout processes are approaching 24 hours...16:29
stubA number of problems where caused by missing PostgreSQL authorization to the new launchpad_auth user on production. This authorization was added to staging, but missed getting into the production rollout tasks. spm sorted it a few hours after the rollout as I understand it. This is a purely operational issue outside the scope of our test suite (staging is the test bed for database connection authorizations). Ignore OOPSes and bugs like 353516:30
stubAll from me.16:30
stubBug 35353016:30
ubottuLaunchpad bug 353530 in malone "OOPS filing a bug using the email interface " [Undecided,New] https://launchpad.net/bugs/35353016:30
Ursinhastub, I have one oops, I don't know if it was just a hiccup16:31
Ursinhastub, https://lp-oops.canonical.com/oops.py/?oopsid=OOPS-1188D121416:31
ubottuhttps://devpad.canonical.com/~jamesh/oops.cgi/1188D121416:31
matsubara[action] matsubara to talk to mrevell to announce a maintenance in the DB for about 10 min outage in the next 2 weeks. ask mrevell to talk to stub about it16:31
MootBotACTION received:  matsubara to talk to mrevell to announce a maintenance in the DB for about 10 min outage in the next 2 weeks. ask mrevell to talk to stub about it16:31
stubUrsinha: Thats a bug needing fixing.16:31
Ursinhastub, I'll file a bug about it now16:32
Ursinhaabout the timeouts we mentioned during the week16:32
Ursinhait seems they indeed dropped16:33
Ursinhathe major responsible now is the source package index page16:33
Ursinhadanilos, ^16:33
stubOk. So we need to be even less aggressive doing mass data migration.16:33
Ursinhaif the timeouts continue the next days, we'll have to chase another cause.16:34
danilosstub, Ursinha: we'll have something similar coming up, how can we make sure the impact is not felt on our production machines?16:34
stubdanilos: Either set the acceptable lag setting lower, or a cooldown time after each batch.16:35
herbstub: or both?16:36
danilosstub: ok, I guess we'll have to experiment with these16:36
stubor both16:36
matsubaraok. I guess that's all for stub?16:36
matsubarathanks stub16:36
Ursinhathanks stub16:37
matsubaraI have a minor annoucement that I forgot to add to the agenda16:37
matsubaraNext week is our second performance week16:37
matsubaraso, please add the bugs you're going to work on in https://dev.launchpad.net/PerformanceWeeks/April200916:37
matsubaraand I think that's all16:38
matsubaraanything else before I close?16:38
matsubara316:38
matsubara216:38
matsubara116:38
matsubaraThank you all for attending this week's Launchpad Production Meeting. See the channel topic for the location of the logs.16:38
Ursinhastub, bug 35389716:38
ubottuLaunchpad bug 353897 in launchpad-foundations "DisallowedStore OOPS in lpnet/+login" [Undecided,New] https://launchpad.net/bugs/35389716:38
matsubara#endmeeting16:39
MootBotMeeting finished at 10:39.16:39
flacostestub: do you know why that bug is happening?16:39
flacosteUrsinha: i guess we should fix this before the re-roll?16:39
stubflacoste: a login.launchpad.net page is trying to access the MAIN_STORE, MASTER_FLAVOR which is disallowed (because it needs to keep running when lp is down for maintenance)16:40
Ursinhaflacoste, 5 occurrences we have registered16:40
flacosteon loging!16:40
flacosteok, this needs to be fixed16:40
Ursinhaon loging16:40
flacostecreate_unique_token_for_table16:41
stubHe is spelling login with a Canadian accent16:41
Ursinhalol16:41
flacostelol16:41
flacosteFrench Canadian accent!16:41
stubflacoste: Or more precisely, a login.launchpad.net page is attempting to create a LoginToken (which it can't) instead of an AuthToken (which it can)16:43
flacosteok16:43
flacostestub: i'll try to give it a shot this afternoon16:43
stubflacoste: It is a twisty maze16:45
flacostestub: but i might punt it to you if i cannot complete it :-)16:46
stubflacoste: Salgado loves the authentication system.16:46
flacostei think he has his share of problems already16:46
=== salgado is now known as salgado-lunch
=== salgado-lunch is now known as salgado
=== matsubara is now known as matsubara-lunch
=== matsubara-lunch is now known as matsubara
=== salgado is now known as salgado-afk

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!