/srv/irclogs.ubuntu.com/2009/10/15/#launchpad-meeting.txt

=== bigjools is now known as bigjools-lunch
=== mrevell-lunch is now known as mrevell
=== bigjools-lunch is now known as bigjools
=== danilo_ is now known as danilos
matsubara#startmeeting16:00
MootBotMeeting started at 10:00. The chair is matsubara.16:00
MootBotCommands Available: [TOPIC], [IDEA], [ACTION], [AGREED], [LINK], [VOTE]16:00
matsubaraWelcome to this week's Launchpad Production Meeting. For the next 45 minutes or so, we'll be coordinating the resolution of specific Launchpad bugs and issues.16:00
matsubara[TOPIC] Roll Call16:00
MootBotNew Topic:  Roll Call16:00
rockstarni!16:00
danilosme16:00
adeuringme16:01
adeuring(allenap is sick)16:01
danilos(or "coo", if anyone knows about kin dza dza ;)16:01
matsubaragary_poster, Chex, bigjools: hi16:01
mbarnetthello16:01
matsubarasinzui, hi16:01
gary_posterme16:01
gary_posterand hi16:01
matsubara:-)16:01
sinzuime16:01
mbarnettme16:01
matsubaraapologies from Stuart and Ursula16:01
bigjoolsme16:01
Chexhello16:02
matsubara[TOPIC] Agenda16:02
MootBotNew Topic:  Agenda16:02
matsubara * Actions from last meeting16:02
matsubara * Oops report & Critical Bugs & Broken scripts16:02
matsubara * Operations report (mthaddon/Chex/spm/mbarnett)16:02
matsubara * DBA report (stub)16:02
matsubara * Proposed items16:02
matsubara[TOPIC] * Actions from last meeting16:02
MootBotNew Topic:  * Actions from last meeting16:02
matsubara* matsubara to trawl logs related to high load on edge on 2009-09-09 ~1830UTC and ping Chex about it16:02
matsubara* matsubara to email the devel list about the new ErrorReportingUtility method16:02
matsubara    * done16:02
matsubara* matsubara to file a bug to have the HWSubmissionMissingFields oopses as informational only (note to self: see bug 438671 for more details)16:02
matsubara    * filed https://bugs.edge.launchpad.net/malone/+bug/44666016:02
ubottuLaunchpad bug 438671 in checkbox "HWSubmissionMissingFields OOPS on +hwdb/+submit" [Undecided,Confirmed] https://launchpad.net/bugs/43867116:02
matsubara* matsubara to look in lp-production-configs for the new oops prefixes.16:02
ubottuLaunchpad bug 446660 in malone "HWSubmissionMissingFields exceptions should be updated to be informational only" [High,Triaged]16:02
matsubara* all QA contacts to inform their teams about the new QA column and what they should do about it.16:02
matsubara* Chex to email the list about the new QA column in https://wiki.canonical.com/InformationInfrastructure/OSA/LPIncidentLog16:02
matsubaraI still haven't checked the high load logs. Chex or mthaddon, did you notice high loads after the 2009-09-09?16:04
Chexmatsubara: we still have been seeing some high loads, yes16:04
matsubaraI did look the new prefixes on lp-productions-configs. I need to update oops-tools to recognize those16:05
matsubaraI also noticed that some oops prefixes will conflict with existing ones, so I need to sort that out with...16:06
matsubaralosas I guess16:06
matsubara[action] matsubara to file a bug on oops-tools to recognize new oops prefixes and sort out conflicting prefixes with losas16:07
MootBotACTION received:  matsubara to file a bug on oops-tools to recognize new oops prefixes and sort out conflicting prefixes with losas16:07
matsubaraChex, re: the high load, could you take on the task of analysing the logs? my idea was to correlate information from the app servers logs with the apache logs and see if that could shed some light.16:09
matsubaramthaddon emailed the list about the new QA column, so everyone, read it and spread the word to your teams, please.16:09
ChexChex: yes sure, I can look at that.16:10
danilosmatsubara: it has just been discussed in the TL call as well, flacoste will champion the process16:10
Chexmatsubara: ^^  I mean..16:10
danilosmatsubara: (about QA Info column on LP incident log)16:10
matsubaraChex, cool, thanks a lot. ping me if you need any info on that16:10
matsubaradanilos, cool. thanks!16:11
matsubara[action] Chex to check app server logs and apache logs to see if it can shed any light in the high load issue.16:11
MootBotACTION received:  Chex to check app server logs and apache logs to see if it can shed any light in the high load issue.16:11
matsubara[TOPIC] * Oops report & Critical Bugs & Broken scripts16:12
MootBotNew Topic:  * Oops report & Critical Bugs & Broken scripts16:12
matsubarawe're seeing a bunch on DisconnectionErrors which are not informational only16:12
matsubarawhich means, the Retry mechanism is not enough for those cases.16:13
gary_postermatsubara: are these the ones on the xmlrpc server?16:13
gary_posteror something else?16:13
matsubaragary_poster, yes, most of them on xmlrpc server16:13
matsubarabut there are a few, like OOPS-1383I246, in login.launchpad.net16:13
ubottuhttps://lp-oops.canonical.com/oops.py/?oopsid=1383I24616:13
gary_postermatsubara: right.  I investigated and could not duplicate the ones in the xmlrpc server.  Kicking the xmlrpc server made them go away.  There's a bug number which I can get in a moment.  After discussing with flacoste, I think the best we can hope for is to figure out a way to add more diagnostic information should the problem happen again16:14
matsubaragary_poster, ok, I take this is a foundations taks then. let me know the bug number please (or I'll file a new one for the more diagnostic info needed issue, if that's not what the bug you mentioned is about)16:17
gary_posterbug 450593 .   Stuart has a follow up: check with losas if there were any unusual activity ATM16:17
ubottuLaunchpad bug 450593 in launchpad-foundations "Lots of DisconnectionErrors on xmlrpc server - staging" [Undecided,New] https://launchpad.net/bugs/45059316:17
gary_posterI think a comment saying that we should address by adding diagnostic information in case there is a repeat would be sufficient.  I'll do that.16:17
matsubarathanks gary_poster16:18
matsubaraapart from that we have a bunch of oopses that will need fixing given the new zero oops policy.16:19
matsubaraUrsula will keep an eye on those for now and let the teams lead which ones are happening more frequently16:19
matsubarawe had some script failures last week16:21
matsubarathe main one seems to be the branch-puller which was already discussed in the list16:21
matsubaracheckwatches failed on the 13th, but since no other email came out, I assume it was a blip. adeuring, can you confirm?16:22
adeuringmatsubara: erm, I have no idea...16:22
matsubaraand the product-release finder and update-cache failed to run on the 14th16:22
adeuringmatsubara: I'll ask Graham16:23
matsubarasinzui, do you know what's up with the product release finder script?16:23
matsubarawho's owns the update-cache script?16:23
matsubaras/'s//16:23
matsubarathanks adeuring16:23
gary_posterI don't know; looking16:24
matsubara[action] adeuring to check with gmb about checkwatches failure16:24
MootBotACTION received:  adeuring to check with gmb about checkwatches failure16:24
sinzuimatsubara: No, but I think the issue is not that it failed, bu that a long process prevented it from running16:24
matsubarasinzui, right, that'd explain. could you check that's the root cause and reply to the list?16:24
sinzuimatsubara: okay16:25
matsubaramaybe the update-cache failure happened for the same reason16:25
gary_postermatsubara: I don't see an update-cache script in the LP tree.  (I do see variants like update-download-cache)16:25
matsubarajust a reminder to everyone, if a script fails and your team owns that script, please reply to the failure email saying that someone is taking a look at it.16:27
matsubaragary_poster, all I see is: "The script 'update-cache' didn't run on 'loganberry' between 2009-10-14 04:00:08 and 2009-10-14 22:00:08 (last seen 2009-10-13 11:36:51.345188)" not sure which script that one is monitoring.16:27
matsubarafor the critical bugs section, we have 4 bugs, 3 fix committed and 1 in progress16:29
matsubaradanilos, the one in progress is assigned to henning but he's on vacation16:30
matsubarais it really critical?16:30
danilosmatsubara: I'd have to check, sorry for not being on top of this16:31
gary_poster(I also looked for update-cache in lp-production-configs.  not there either.)16:32
matsubaragary_poster, I think it's cronscripts/update-pkgcache.py. IIRC, the losas script monitoring tool uses the script name defined in LaunchpadCronScript16:34
matsubara[action] danilos to check bug 438039, assess if it's really critical. if it's is, land a fix, if it's not, update the importance16:35
MootBotACTION received:  danilos to check bug 438039, assess if it's really critical. if it's is, land a fix, if it's not, update the importance16:35
ubottuLaunchpad bug 438039 in rosetta "bzr branch import script oopses sometimes" [Critical,In progress] https://launchpad.net/bugs/43803916:35
gary_postermatsubara: oh ok, thanks.  that script is either the one salgado was talking about that he owns, or something for soyuz, seems to me.16:36
bigjoolsit's traditionally maintained by soyuz16:36
gary_posterbigjools: ok, thanks16:36
bigjoolsbut in the new world order it could be registry16:36
matsubarabigjools, can you confirm that update-cache failure described in the "Subject: Scripts failed to run: loganberry:productreleasefinder, loganberry:update-cache" refers to the update-pckg.py and reply back to the email sent to the list?16:36
matsubaraok, you just did :-)16:37
bigjoolsit doesn't look like update-packagecache16:37
bigjoolserrr ah it is16:38
matsubarabigjools, it's the only script that has update-cache string in cronscripts/16:38
bigjoolssorry got confused by seeing productreleasefinder16:38
matsubara[action] bigjools to investigate update-cache failure and reply back to the list16:38
MootBotACTION received:  bigjools to investigate update-cache failure and reply back to the list16:38
matsubarabigjools, you might want to coordinate with sinzui since he'll check the product release failure one and suspects it might have failed because of a long running process16:39
bigjoolsmatsubara: is there an oops?16:39
sinzuionly an email that it did not run16:39
matsubarabigjools, nope16:39
bigjoolsand it was a one-off?16:39
sinzuibigjools: it did not start, and that is 99% of the time the fault of a long running process16:40
bigjoolsok16:40
* sinzui really does not think about the issue until it happens two days in a row16:40
bigjoolsand me16:40
matsubarasinzui, perhaps the script monitoring should have such a feature16:40
matsubarabut anyway, sorry for taking so long on this section16:41
matsubarathanks eveyrone16:41
matsubara[TOPIC] * Operations report (mthaddon/Chex/spm/mbarnett)16:41
MootBotNew Topic:  * Operations report (mthaddon/Chex/spm/mbarnett)16:41
Chexhello everyone16:41
gary_posterhi16:42
Chexsorry, notes failure:16:43
Chex- LP Ship-it progress:16:43
Chex; LP shipit is live on the new servers16:43
Chex; Nigel Pugh is now in charge of approving CPs to those servers16:43
Chex; We are still working on the new front-ends for LP Login and LP itself16:43
Chex- Buildd-manager DB restart issue/bugs: Bugs 451351 & 451349 have been16:44
ubottuLaunchpad bug 451351 in soyuz "buildd-manager doesn't give us a good way of determining it's in a failed state" [High,Triaged] https://launchpad.net/bugs/45135116:44
Chexfiled to address this issue, any movement to fix this problem?16:44
Chex- QA column in Incident Log: Tom sent a email to LP list on Oct 12, has16:45
Chexanyone reviewed the email and have comments/concerns about it?16:45
matsubaraChex, are oops reports from those new servers going to be rsync'ed to devpad? such oopses are supposed to be included in LP oops summaries?16:45
ChexLP Incidents of note: ; Applied: CP 9660 to lpnet, CP 9679 to lpnet16:45
Chex    ; Small LP outage (8 mins) : App servers (and16:45
Chex         librarians) didn't reconnect & had to be restarted after LP DBs16:45
Chexwere restarted: Bug filed: 45109316:45
Chexand thats our report for this week. sorry for the troubles there16:46
bigjoolsChex: I am looking into 451351 but don't expect anything soon, it's a hard problem16:46
Chexmatsubara: I am not sure on the status of oops summaries on the new servers, I will check on that16:47
matsubaraChex, cool, thanks.16:47
Chexbigjools: ok, thanks, just looking for status of progress.16:47
matsubaraChex, danilos mentioned that QA column things was discussed today in the TL meeting and flacoste will champion the process.16:48
Chexmatsubara: ok, that is great to hear.16:48
matsubaraChex, thanks for the report16:48
matsubaralet me move on as we are overdue16:48
matsubara[TOPIC] * DBA report (stub)16:48
MootBotNew Topic:  * DBA report (stub)16:48
matsubaraThe new replica to become the master for the authentication service has been taken offline, as the hardware was showing signs of strain keeping up with Launchpad's write load. The hardware is being beefed up to cope. The alternative is to just put the authdb replication set on this server and have the authentication service appservers connect to the main launchpad databases for the data they need to pull from the lpmain repl16:48
matsubaraication set.16:48
matsubaraNothing else to report.16:48
matsubarathat came from Stuart. any questions about dba's report?16:49
matsubaraok, I'll take that as a no :-)16:50
matsubara[TOPIC] * Proposed items16:50
MootBotNew Topic:  * Proposed items16:50
matsubarano new proposed items16:50
matsubaraThank you all for attending this week's Launchpad Production Meeting. See https://dev.launchpad.net/MeetingAgenda for the logs.16:50
matsubarasorry for overrunning16:50
matsubara#endmeeting16:51
MootBotMeeting finished at 10:51.16:51
gary_posterthanks matsubara16:51
=== salgado is now known as salgado-lunch
=== matsubara is now known as matsubara-lunch
=== salgado-lunch is now known as salgado
=== matsubara-lunch is now known as matsubara
=== salgado_ is now known as salgado
=== salgado is now known as salgado-afk
=== matsubara is now known as matsubara-afk

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!