/srv/irclogs.ubuntu.com/2010/01/28/#launchpad-meeting.txt

=== salgado is now known as salgado-brb
=== salgado-brb is now known as salgado
=== mrevell is now known as mrevell-luncheon
=== mrevell-luncheon is now known as mrevell
=== salgado is now known as salgado-afk
=== salgado-afk is now known as salgado-lunch
=== matsubara-lunch is now known as matsubara
matsubara#startmeeting16:00
MootBotMeeting started at 10:00. The chair is matsubara.16:00
MootBotCommands Available: [TOPIC], [IDEA], [ACTION], [AGREED], [LINK], [VOTE]16:00
matsubaraWelcome to this week's Launchpad Production Meeting. For the next 45 minutes or so, we'll be coordinating the resolution of specific Launchpad bugs and issues.16:00
matsubara[TOPIC] Roll Call16:00
MootBotNew Topic:  Roll Call16:00
matsubaraNot on the Launchpad Dev team? Welcome! Come "me" with the rest of us!16:00
sinzuime16:00
al-maisanme16:00
danilo_me16:00
mrjazzcatme16:01
mbarnettme16:01
matsubarasorry mrjazzcat, I always forget to ping you about the meeting. I'll add you to the "Who should be here?" section if you don't mind16:01
mrjazzcatyes, please16:01
matsubaraon the MeetingAgenda page, I mean16:01
mrjazzcatno worries16:01
matsubara[action] add brian to the list of attendees in the MeetingAgenda page16:02
MootBotACTION received:  add brian to the list of attendees in the MeetingAgenda page16:02
matsubaraUrsula won't be around today16:02
matsubaraand I'll be standing in for Gary16:02
matsubararockstar, hi, around?16:02
matsubaraallenap, hi16:03
matsubarawell, let's move on and then Gavin and Paul can join in later16:03
matsubara[TOPIC] Agenda16:03
MootBotNew Topic:  Agenda16:03
matsubara * Actions from last meeting16:03
matsubara * Oops report & Critical Bugs & Broken scripts16:03
matsubara * Operations report (mthaddon/Chex/spm/mbarnett)16:03
matsubara * DBA report (stub)16:03
matsubara * Proposed items16:03
matsubara[TOPIC] * Actions from last meeting16:03
MootBotNew Topic:  * Actions from last meeting16:03
matsubara * allenap to dig the master bug of OOPS-1474EA77116:04
ubottuhttps://lp-oops.canonical.com/oops.py/?oopsid=1474EA77116:04
matsubara * salgado to take a look in the TypeError oopses (OOPS-1479S1000)16:04
matsubara   * already did that, this is bug 403281, it happened because mthaddon was testing the new read-only switch on staging.16:04
matsubara * rockstar to take a look in OOPS-1480CMP116:04
ubottuhttps://lp-oops.canonical.com/oops.py/?oopsid=1479S100016:04
ubottuLaunchpad bug 403281 in launchpad-foundations "public xmlrpc requests broken during read only period" [Undecided,Triaged] https://launchpad.net/bugs/40328116:04
ubottuhttps://lp-oops.canonical.com/oops.py/?oopsid=1480CMP116:04
matsubaraok, so I'll re-add both items for allenap and rockstar16:04
matsubara[action] * allenap to dig the master bug of OOPS-1474EA77116:05
ubottuhttps://lp-oops.canonical.com/oops.py/?oopsid=1474EA77116:05
MootBotACTION received:  * allenap to dig the master bug of OOPS-1474EA77116:05
ubottuhttps://lp-oops.canonical.com/oops.py/?oopsid=1474EA77116:05
matsubara[action] * rockstar to take a look in OOPS-1480CMP116:05
ubottuhttps://lp-oops.canonical.com/oops.py/?oopsid=1480CMP116:05
MootBotACTION received:  * rockstar to take a look in OOPS-1480CMP116:05
ubottuhttps://lp-oops.canonical.com/oops.py/?oopsid=1480CMP116:05
matsubara[TOPIC] * Oops report & Critical Bugs & Broken scripts16:05
MootBotNew Topic:  * Oops report & Critical Bugs & Broken scripts16:05
matsubarawe have some oops reports but most of them foundations issues16:05
matsubarahttps://lp-oops.canonical.com/oops.py/?oopsid=OOPS-1488EA88416:06
matsubaraLooks like an anonymous user is trying to do some operation which (s)he's not allowed. Should we really log an oops for this?16:06
matsubaramaybe related to https://bugs.edge.launchpad.net/launchpad-foundations/+bug/27102916:06
matsubaraMore non-informational disconnectionerrors https://lp-oops.canonical.com/oops.py/?oopsid=OOPS-1489J14716:06
ubottuhttps://lp-oops.canonical.com/oops.py/?oopsid=1488EA88416:06
matsubaraInternalError after ther rollout https://lp-oops.canonical.com/oops.py/?oopsid=OOPS-1489C109416:06
ubottuhttps://lp-oops.canonical.com/oops.py/?oopsid=1489J14716:06
ubottuUbuntu bug 271029 in launchpad-foundations "ForbiddenAttribute exception raised changing property of object" [Medium,Triaged]16:06
ubottuhttps://lp-oops.canonical.com/oops.py/?oopsid=1489C109416:06
matsubaracode team, BranchMergeProposalExists https://lp-oops.canonical.com/oops.py/?oopsid=OOPS-1488EA17416:06
ubottuhttps://lp-oops.canonical.com/oops.py/?oopsid=1488EA17416:06
matsubaraso, that's it and there's no one from Code to take a look at the BranchMergeProposalExists one16:06
matsubara[action] matsubara to email Tim about https://lp-oops.canonical.com/oops.py/?oopsid=OOPS-1488EA17416:06
ubottuhttps://lp-oops.canonical.com/oops.py/?oopsid=1488EA17416:06
MootBotACTION received:  matsubara to email Tim about https://lp-oops.canonical.com/oops.py/?oopsid=OOPS-1488EA17416:06
ubottuhttps://lp-oops.canonical.com/oops.py/?oopsid=1488EA17416:06
matsubara[action] matsubara to talk to leonard about https://lp-oops.canonical.com/oops.py/?oopsid=OOPS-1488EA88416:07
ubottuhttps://lp-oops.canonical.com/oops.py/?oopsid=1488EA88416:07
MootBotACTION received:  matsubara to talk to leonard about https://lp-oops.canonical.com/oops.py/?oopsid=OOPS-1488EA88416:07
ubottuhttps://lp-oops.canonical.com/oops.py/?oopsid=1488EA88416:07
matsubara[action] matsubara to talk to salgado about More non-informational disconnectionerrors https://lp-oops.canonical.com/oops.py/?oopsid=OOPS-1489J14716:07
ubottuhttps://lp-oops.canonical.com/oops.py/?oopsid=1489J14716:07
MootBotACTION received:  matsubara to talk to salgado about More non-informational disconnectionerrors https://lp-oops.canonical.com/oops.py/?oopsid=OOPS-1489J14716:07
ubottuhttps://lp-oops.canonical.com/oops.py/?oopsid=1489J14716:07
matsubara[action] matsubara to talk to stub or gary about InternalError after ther rollout https://lp-oops.canonical.com/oops.py/?oopsid=OOPS-1489C109416:07
ubottuhttps://lp-oops.canonical.com/oops.py/?oopsid=1489C109416:07
MootBotACTION received:  matsubara to talk to stub or gary about InternalError after ther rollout https://lp-oops.canonical.com/oops.py/?oopsid=OOPS-1489C109416:07
ubottuhttps://lp-oops.canonical.com/oops.py/?oopsid=1489C109416:07
matsubaralovely, looks like I'm running the meeting all by myself heheh16:07
rockstarme16:07
allenapme16:08
al-maisan:)16:08
matsubaraon the broken scripts side16:08
matsubarasinzui, Scripts failed to run: loganberry:send-person-notifications seems to be broken16:08
matsubarasinzui, could you take a look and reply to the list?16:09
sinzuimatsubara: all scripts appear to be broken16:09
matsubaraall?16:09
sinzuiThey are not running and I am tempted to say something new was added that is taking forever and a day16:09
matsubaraI only see notifications for send-person-notifications and garbo-hourly16:09
matsubarasinzui, can you confirm and reply to the list that's the case, at least for the send-person-notifications one?16:10
matsubaraI'll ask losas and/or stub about garbo-hourly not running as well16:10
allenapmatsubara: Re. OOPS-1474EA771, it's bug 508302, and deryck is working on it today.16:10
ubottuLaunchpad bug 508302 in malone "NotImplementedError OOPS when reporting a bug" [High,In progress] https://launchpad.net/bugs/50830216:10
ubottuhttps://lp-oops.canonical.com/oops.py/?oopsid=1474EA77116:10
matsubarathanks allenap, I'll adjust the bug link on that oops report16:10
matsubara[action] matsubara to fix bug link on OOPS-1474EA771 to point to bug 50830216:11
ubottuhttps://lp-oops.canonical.com/oops.py/?oopsid=1474EA77116:11
MootBotACTION received:  matsubara to fix bug link on OOPS-1474EA771 to point to bug 50830216:11
ubottuhttps://lp-oops.canonical.com/oops.py/?oopsid=1474EA77116:11
matsubara[action] sinzui to investigate failure on send-person-notifications and reply to the list with his findings16:11
MootBotACTION received:  sinzui to investigate failure on send-person-notifications and reply to the list with his findings16:11
matsubarabtw, updatebranches script also failed recently but that's been fixed by spm. the new rollout changed the script name and losas updated the notification thing to recognize the new name16:13
matsubaraon the critical bugs side16:13
rockstarmatsubara, updatebranches no longer runs.16:13
mthaddonmatsubara: er, not quite16:13
matsubarawe have 3 critical bugs16:13
rockstarIt's been replaced by scan_branches16:13
mthaddonmatsubara: we've had to revert it a bunch of times16:13
matsubaramthaddon, hmm no? spm's email seems to indicate that16:14
mthaddonmatsubara: spm went to bed a while ago - a new problem was discovered since then16:14
mthaddonmatsubara: abentley and Chex have been working on it16:14
matsubaraoh, I was looking at this latest email to the list replying to one of the script failures notification16:14
matsubarawell, if they're already working on it, it's ok. :-)16:15
mthaddonmatsubara: not really...16:15
matsubaramthaddon, no? what else is expected?16:15
mthaddonmatsubara: as I understand it, we've reverted to the old script because we still don't know what was wrong16:15
mthaddonmatsubara: and the fact that we've reverted between the old and new scripts twice now on production is a problem in itself16:15
mthaddonmatsubara: and also the fact that the first we heard about the problem was from a user report16:16
matsubaramthaddon, I meant it's ok in the sense that people are already working on a solution and there's nothing much to be done during this meeting to have people act on it16:16
mthaddoni.e. we don't have a good measure of when this problem is even happening16:16
mthaddonmatsubara: maybe not, but I'd like a bit of discussion about this class of problem and what can be done to prevent it in the future16:17
danilo_mthaddon: what is the exact problem that we need to be able to track? (sorry, I am not fully up to date on what broke)16:17
mthaddondanilo_: aiui email notifications of branch updates failed to be sent out16:17
gary_postermthaddon: "reverted ...twice...on production": I think we all agree this sucks.  However, AIUI, this was successfully QAd.  Either the QA was bad, or staging is not close enough to prod in some way.  I don't think we know yet.16:18
matsubaramthaddon, I'm unaware of the details as well. My expectation is that a IncidentLog will be filed and action to prevent it will be included in the incident log16:18
danilo_mthaddon: ah, right, that could have a bigger impact (it might be harming us in translations as well)16:18
mthaddonmatsubara: this doesn't really qualify as an incident log item since there's no measurable service that's been interrupted (we don't have any kind of nagios monitoring of this) - I guess I'm asking how we plan to approach it from here16:19
mthaddonand how we got into this situation16:20
danilo_mthaddon, gary_poster, matsubara: we are obviously missing a dedicated "communications person" for this specific item (someone to keep the entire situation in check); we've discussed that approach before, it'd be nice to find someone who can offload the communication side from abentley and others working on it16:21
gary_posterdanilo_: to the degree there's a failure there (communications), it'd probably be mine as RM16:21
gary_postermaybe we can have somebody else too16:21
gary_posterbut that's RM stuff16:21
gary_posterbut AIUI that's not the prob16:22
danilo_gary_poster, not necessarily, we discussed this in a TL call a few weeks (months?) back where we need someone to communicate with everyone16:22
gary_postermaybe so16:22
gary_posterbut probs I see:16:22
danilo_gary_poster, it's mostly about having someone take responsibility for making sure problems are visible and we know what's going on16:23
gary_poster- we didn't catch this on staging.  Why?16:23
gary_postereither QA was bad or staging is too diff16:23
gary_posterwe need to know why16:23
gary_posterand fix it16:23
mthaddonyep, I agree with that16:23
gary_posterthen also, unless I misunderstand, mthaddon is saying that we don't have an automated nagios-like process verifying basic success on production for this thing16:24
danilo_gary_poster, neither of those is easy to fix (one depends on people always DTRT, another on machines always DTRT), so we need to be able to easily find out when it's broken rather than wait for users to report it16:24
gary_posterdanilo_: but doesn't that depend on one of the three things I said?  (people DTRT, machines DTRT, nagios-like-thing DTRT)16:24
danilo_gary_poster, it does, I was typing before you typed the last one :)16:25
mthaddongary_poster: it's possible we can't do that for *everything*, but if we decide this is a sufficiently important thing that we care about it if it fails, it sounds like we need to monitor it somehow, yeah (possibly we are already with OOPSes, but why didn't we catch it til a user told us about it?)16:25
gary_poster:-) ok16:25
danilo_gary_poster, the 4th is lack of coordination and communication :)16:25
gary_postermthaddon: right. For me, this gets to my "too many different kinds of moving parts" in our architecture. If we have fewer moving parts then we can institute more uniform nagios-like-checks.16:26
gary_postermaybe the jobs system can help with this16:26
danilo_anyway, gary_poster, I think we should just raise the importance of ensuring sufficient monitoring of this part of code-hosting by thumper, and we can be done with the topic16:26
gary_postermaybe we can architect the jobs system to give us a nagios-like hook16:27
danilo_gary_poster, we don't have to solve the problem here :)16:27
gary_posterbecause doing it with cron scripts is a one-per job16:27
matsubaradanilo_, can you raise the topic in the next TL meeting?16:27
gary_posterdanilo_: ack.  I kind of disagree with your summary though, and your action item, so that's why I'm continuing to blather :-)16:28
danilo_matsubara, we are having a week long TL meeting next week, so it'd be best to action it for someone from code team to pass it on to thumper, imho :)16:28
gary_poster(IOW, this is not a problem for thumper, it is a problem for Björn, team leads, etc.)16:28
danilo_gary_poster, well, sure, I agree, but one step at a time16:29
gary_postermatsubara: two action items: :-)16:29
matsubara[action] rockstar to raise the importance of ensuring sufficient monitoring of this part (i.e. branch updates emails failing to be delivered) of code-hosting by thumper16:29
MootBotACTION received:  rockstar to raise the importance of ensuring sufficient monitoring of this part (i.e. branch updates emails failing to be delivered) of code-hosting by thumper16:29
danilo_gary_poster, there's immediate problem and then there's the elegant solution; I'm always for fixing the immediate problem first and having the elegant solution come out of that16:29
gary_posteryeah, that's number one16:29
gary_posternumber two is gary to bring up archtecture concerns to team lead mtg :-)16:30
danilo_gary_poster, as for the other one, I think it ties in well with what we discussed today and what we'll want to discuss anyway16:30
matsubara[action] TLs + Bjorn to talk about "too many different kinds of moving parts" in our architecture. If we have fewer moving parts then we can institute more uniform nagios-like-checks.16:30
MootBotACTION received:  TLs + Bjorn to talk about "too many different kinds of moving parts" in our architecture. If we have fewer moving parts then we can institute more uniform nagios-like-checks.16:30
matsubaradoes that summarize it well?16:30
gary_posteryeah thank you.  though it's probably my action, since I'm the one with the bee in my bonnet :-)  but that's fine16:31
danilo_gary_poster, matsubara: I don't like action items like that because they put no responsibility on anyone in particular, thus meaning that if they get done, they get done unrelated to the action item; thus, you don't really need it16:31
gary_posterso give it to me :-)16:31
matsubaradanilo_, I'll add it to gary's queue when I add the summary to the MeetingAgenda page16:31
danilo_gary_poster, heh, that's ok, I am certain we would have discussed this regardless of us having any particular action item16:31
danilo_matsubara, sure, thanks16:32
gary_poster:-)16:32
matsubarait serves as a reminder as well16:32
matsubaraanyway, thanks for the comments16:32
rockstarIt fairness, the "not getting branch update emails" thing was because a rather large part of the code hosting system was made into a job.16:32
gary_posterTo whom are you being fair? :-)16:33
gary_posterNever mind, I'll be quiet :-)16:33
danilo_:)16:33
matsubarawe have 3 critical bugs, one in progress, one fix committed16:33
rockstarI'm not sure how "sufficient monitoring" would have fixed this.16:33
matsubarathe other one is triaged, bug 51156716:33
ubottuLaunchpad bug 511567 in launchpad-foundations "Can't remove authorised app" [Critical,Triaged] https://launchpad.net/bugs/51156716:33
rockstargary_poster, to the code team in general.16:33
matsubarahmm16:33
danilo_rockstar, sufficient monitoring of scripts that do this16:33
matsubarathat's a dupe16:33
matsubaraand I filed that bug a few days ago16:33
rockstardanilo_, howso?16:34
matsubaraor maybe I filed the dupe16:34
gary_posterrockstar: ah, gotcha.  Tim can beat us into shape at the TL sprint so we understand.16:34
rockstargary_poster, yeah, I'll talk to him.16:34
gary_postercool16:34
danilo_rockstar, monitoring should have caught the problem (i.e. "hey, this script is failing"); I won't pretend to understand the entire problem, so we might be entirely off base, but we should be able to check our service level16:34
rockstardanilo_, there wasn't a script failing.16:34
rockstarIt ran fine, it was just a new script that had apparently left out some old functionality.16:35
danilo_rockstar, right, never mind the "implementation details", the problem is: "why we didn't catch it before someone told us it's failing"; there's not necessarily a technical solution16:36
danilo_matsubara, am I still on the channel?16:38
matsubarayes16:38
danilo_oh, ok, it's just everybody being quite :)16:38
danilo_matsubara, I think we should go on16:38
matsubarasorry, I was looking for a bug report to dupe against 51156716:38
matsubaraanyway16:38
matsubarathanks16:38
matsubara[TOPIC] * Operations report (mthaddon/Chex/spm/mbarnett)16:39
MootBotNew Topic:  * Operations report (mthaddon/Chex/spm/mbarnett)16:39
matsubarahello?16:41
matsubaraChex, mbarnett ?16:41
mbarnettsorry16:41
Chexsorry16:42
Chexhere is the report16:42
Chex- LP rollout 10.01 Wednesday was successful:16:42
Chex    : See https://wiki.canonical.com/InformationInfrastructure/OSA/LPRollout20100127 for more details.16:42
Chex    : The read-only switch left idle connections to the master DB, it is currently being investigated16:42
Chex- New LP Appserver is online, some issues with internal access, but now everything is OK.16:42
Chex- New branch-scanner having issues, just reverted back to old again.  Based on meeting dicsussion here,16:42
Chex        continuing to address.16:42
Chexand thats all for us.  Any questions/comments?16:42
matsubaraChex, what's this new LP appserver online? I guess I'll have to tell oops-tools about oops reports from it?16:43
matsubara[action] matsubara to update oops-tools to know about the new lp appserver16:43
MootBotACTION received:  matsubara to update oops-tools to know about the new lp appserver16:43
noodles775Chex: do you know if the new servers have access to the private librarian?16:43
mbarnettmatsubara: soybean was recently put online as a replacement for gangotri +16:43
mbarnettnoodles775: that was resolved earlier today16:44
matsubarambarnett, oh, so it's using the same config files?16:44
noodles775A user was seeing about 1 in 4 requests to download a... ah, great, thanks!16:44
mbarnettmatsubara: it took over lpnet1, lpnet2, and edge1 from gangotri, stole lpnet9 from gandwana, and added a sparkly new lpnet15 standard lpnet appserver16:44
matsubarambarnett, ok, it's the new lpnet15 instance I care about. I'll check the configs and update oops-tools accordingly16:45
matsubarathanks16:45
matsubaramoving on16:45
mbarnettmatsubara: thank you.16:45
matsubara[TOPIC] * DBA report (stub)16:45
MootBotNew Topic:  * DBA report (stub)16:45
matsubarastub sent the report to the list16:45
matsubaraallenap, he mentioned something about checkwatches being very cpu intensive. it's probably of interest of the Bugs team16:46
allenapmars: deryck has just forwarded the message to me.16:46
allenapmatsubara: ^16:46
matsubarathanks allenap16:46
matsubara[TOPIC] * Proposed items16:46
MootBotNew Topic:  * Proposed items16:47
matsubarano proposed items16:47
matsubarawhich brings this meeting to a close16:47
matsubaraThank you all for attending this week's Launchpad Production Meeting. See https://dev.launchpad.net/MeetingAgenda for the logs.16:47
matsubaraand sorry for the delay16:47
matsubara#endmeeting16:47
MootBotMeeting finished at 10:47.16:47
=== EdwinGrubbs is now known as Edwin-lunch
=== matsubara is now known as matsubara-afk
=== salgado is now known as salgado-afk

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!