/srv/irclogs.ubuntu.com/2010/09/14/#launchpad-dev.txt

mwhudson	mbarnett: ok, that's not what i expected	00:00
mwhudson	mbarnett: which machines did you try it on?	00:00
lifeless	sinzui: browser/team.py has an intersting thing	00:00
lifeless	it looks up the action from the form	00:00
mbarnett	mwhudson: galapagos, pear, russkaya	00:01
lifeless	but it doesn't seem to check that the action is for that row	00:01
mwhudson	!?	00:02
mwhudson	mbarnett: sorry if this is tedious, but can you pastebin the ~importd/.bazaar/bazaar.conf files from pear and russkaya?	00:03
spm	mwhudson: I'll sort that mbarnett needs to go and en-cake-enate	00:05
sinzui	Lifeless yeah. I am staring at the template. I think the message is adapted to a widget and it builds ids from the message id...we automatically discard duplicate message ids	00:05
maxb	jelmer: Did the bzr-svn on launchpad somehow just not ever try "discovering revprop revisions" before the last rollout?	00:05
jelmer	maxb: it's always done that	00:05
maxb	huh	00:05
mbarnett	yes yes, cake levels dangerously low!	00:05
jelmer	maxb: But we didn't do KDE imports until recently	00:05
maxb	Yes, but the KDE imports which ran before the rollout didn't "discover revprop revisions"	00:06
maxb	oh, wait, I'm getting my dates wrong.	00:06
lifeless	sinzui: anyhow all tests are passing, except for	00:07
lifeless	>>> find_tag_by_id(admin_browser.contents, 'batchnav_first')['class']	00:07
sinzui	Yes that will be an issue for a few more moments. I have a patch	00:07
lifeless	sinzui: and I don't think the bug with POST there is any better or worse due to batching; if its buggy its always been buggy.	00:08
sinzui	I agree	00:08
mwhudson	spm: cool, and good morning	00:11
sinzui	lifeless: https://bugs.edge.launchpad.net/launchpad-foundations/+bug/637654 has a proposed patch. It fixes the most common case of upper and lower. I image a page with two sets of BNs will continue to be broken	00:11
spm	mwhudson: hmm. you may have hit the nail. pear doesn't have that file	00:12
lifeless	sinzui: is this likely to cause other tests to fail ?	00:12
spm	mwhudson: https://pastebin.canonical.com/37119/ <== russkaya	00:12
lifeless	(I mean, will there be other fixups to do with that patch)	00:12
lifeless	sinzui: could you commit that patch somewhere and push it? I'll pull it in	00:13
sinzui	I bet not since there are no tests reporting that we have rubbish ids. I reported it as a separate bug because I think this issue is a separate concern from your branch now	00:13
* sinzui does		00:13
lifeless	sinzui: it is a seperate concern, but either I leave the test out or I include your branch	00:13
mwhudson	spm: gar	00:14
spm	mwhudson: I assume C&P from one t'other?	00:14
mwhudson	spm: does /home/importd/.bazaar/sign-vcs-import exist on pear?	00:14
spm	nope	00:14
mwhudson	did pear get reinstalled at some point?	00:15
mwhudson	spm: can you look at /home/importd/.bazaar/sign-vcs-import on russkaya?	00:15
spm	mwhudson: I'd assume it being a new machine; some things may have been missed :-(	00:15
mwhudson	it probably references something ridiculous like ~importd/hoover/keys/key.gpg	00:15
spm	holy dooly	00:16
mwhudson	https://wiki.canonical.com/InformationInfrastructure/OSA/LPHowTo/SetUpCodeImportSlave -> "# Make sure ~importd/.bazaar/ and ~importd/botslave look like they do on a working slave. "	00:16
mwhudson	:(	00:16
spm	mwhudson: exec /usr/bin/gpg --no-default-keyring --keyring /home/importd/botslave/gpg/vcs-imports@canonical.com.pub --secret-keyring /home/importd/botslave/gpg/vcs-imports@canonical.com.secret --default-key A60FA0E1 "$@"	00:16
mwhudson	spm: i was sure this was working at some point on pear :(	00:17
mwhudson	maybe not	00:17
mwhudson	it only affects cscvs imports i guess....	00:17
spm	I'm surprised it's working on russkaya....	00:17
sinzui	lifeless: lp:~sinzui/launchpad/batch-ids-0 There are no tests for these links. Nor can I find any tests for the existing of the upper or lower navs. All the tests for the BN are getting the Next link using TestBrowser	00:22
lifeless	oulling	00:22
sinzui	I checked for uses the the default BN. Subclasses like root search and bug do their own layouts	00:23
lifeless	sinzui: so 'batchnav_first' is not what to look for	00:24
sinzui	lifeless try 'upper-batch-nav-batchnav-first' but keep in mind the nav will not rendered if there is only one batch. We need 6 messages in the testrunner env or we add ?batch=1 to the url we are tesing to set the size to 1 message	00:27
lifeless	and for the bottom lower-batch-nav-batchnav-first ?	00:30
sinzui	yep	00:30
lifeless	?batch=1 doesn't do it	00:31
lifeless	http://paste.ubuntu.com/493359/	00:32
lifeless	hmm	00:33
lifeless	1 is too small	00:33
lifeless	moving it lower down	00:34
lifeless	sinzui: doesn't appear to be rendering the nav links to me	00:37
lifeless	Continue to hold the message, deferring\n your decision until later.</li>\n </ul>\n </div>\n\n <table class="listing">\n	00:39
sinzui	lifeless sorry, my screen was locked for a moment	00:39
lifeless	sinzui: ^ thats with batch=1 and 2 messages	00:39
wgrant	So, we have an issue with the OpenID identifier migration last week, causing incorrect accounts to be linked together... can someone poke around on staging to work out WTF is going on?	00:39
lifeless	2\n \n <span>\n messages have</span>\n been posted to	00:39
lifeless	wgrant: sure, once I'm finished here.	00:39
wgrant	lifeless: Thanks.	00:40
lifeless	sinzui: only one message is shown	00:40
lifeless	so the batch param worked	00:40
lifeless	its the naviation bit that isn't	00:40
sinzui	I am a bad advisor. you are on the first batch. We should be checking for upper--batch-nav-batchnav-next	00:41
lifeless	no, you're fine	00:42
lifeless	its no there	00:42
sinzui	:(	00:42
lifeless	after the advice	00:42
lifeless	Discard</strong> - Throw the message aw	00:42
lifeless	etc	00:42
lifeless	your decision until later.</li>\n </ul>\n </div>\n\n <table class="listing">\n <thead><tr>\n <th>Message detail	00:42
lifeless	is whats in the browser.contents	00:42
lifeless	the navigation bit is just awol	00:43
lifeless	it should be after that </div>	00:43
sinzui	We suppress rendering of the lower if there is no additional batches, so maybe that template fragment is wrong	00:44
lifeless	well there is an additional batch - batch=1 and two messages to moderate	00:45
lifeless	the count on the page shows '2' so we know there are two there	00:45
lifeless	and there is only one "approve" in the output, so we know only one got shown	00:45
sinzui	lifeless, the upper template must rendered since there is clearly a batch. the view guards the rendering with this: ``if self.context.currentBatch():``	00:47
* sinzui is looking at canonical/launchpad/webapp/batching.py		00:47
lifeless	what is the context object going to be - held_messages ?	00:48
spm	mwhudson: pear now has those dirs/files setup per russkaya	00:49
mwhudson	spm: great, thanks	00:49
sinzui	lifeless: yes held_messages. we are adapting a BN	00:50
lifeless	so I did this	00:50
lifeless	+++ lib/canonical/launchpad/webapp/batching.py 2010-09-13 23:50:20 +0000	00:50
lifeless	@@ -40,7 +40,7 @@	00:50
lifeless	def render(self):	00:50
lifeless	if self.context.currentBatch():	00:50
lifeless	return LaunchpadView.render(self)	00:50
lifeless	- return u""	00:50
lifeless	+ return u"not rendered"	00:50
lifeless	not rendered was not included in the output	00:50
sinzui	?	00:51
* sinzui checks zcml		00:51
lifeless	I wanted to see if that code path was shortcircuiting or something	00:51
* sinzui checks other batches with the hacked template		00:54
lifeless	and this:	00:58
lifeless	+++ lib/canonical/launchpad/webapp/batching.py 2010-09-13 23:52:07 +0000	00:58
lifeless	@@ -38,6 +38,7 @@	00:58
lifeless	css_class = "upper-batch-nav"	00:58
lifeless		00:58
lifeless	def render(self):	00:58
lifeless	+ return u" fooo "	00:59
lifeless	if self.context.currentBatch():	00:59
lifeless	return LaunchpadView.render(self)	00:59
lifeless	return u""	00:59
lifeless	also doesn't show up in the output	00:59
sinzui	right. I am not seeing my template change when testing https://blueprints.launchpad.dev/firefox?batch=2	00:59
sinzui	Or I could run the instance that I made the change in instead	01:00
lifeless	and to cap it off,when I make that raise an Exception I don't get an error	01:00
sinzui	oh.. I wonder. I see >>> admin_browser.reload() which has a history if being buggy	01:01
lifeless	have a look at lib/lp/blueprints/templates/person-specworkload.pt	01:01
lifeless	sinzui: I'm sure its not that, I made the view crash and the page rendered	01:02
sinzui	I see my hack in specs now that I am running the right branch	01:02
lifeless	I'm thoroughly confused	01:06
lifeless	is there a sample data team w/list ?	01:07
sinzui	lifeless me too, this always just works. Can you humour me by adding this link before we do the call the find_tag_by_id	01:08
sinzui	>>> admin_browser.open(	01:08
sinzui	... 'http://launchpad.dev/~guadamen/+mailinglist-moderate')	01:08
lifeless	of course	01:08
sinzui	lifeless there are no mls in data. I have a make harness note about making them after a request in made in the UI	01:09
lifeless	>>> admin_browser.open(	01:09
lifeless	... 'http://launchpad.dev/~guadamen/+mailinglist-moderate?batch=1')	01:09
lifeless	>>> find_tag_by_id(admin_browser.contents, 'upper-batch-nav-batchnav-first')['class']	01:09
lifeless	first	01:09
lifeless	>>> admin_browser.contents	01:09
lifeless	thats what the story does	01:09
sinzui	:(	01:09
lifeless	whats weirded	01:10
lifeless	I added a string literal and I can't see it	01:10
lifeless	its almost like those divs are eaten	01:11
lifeless	when I put a literal above it works	01:11
lifeless	in the list of action descriptions	01:11
lifeless	but when I add another div at the place we have the navigation ones it disappears	01:12
sinzui	lifeless as a desperate act to to verify this we could add size=1 to the BN instantiation in the view to be certain that the URL is not being ignore	01:12
sinzui	d	01:12
lifeless	I'm certain its not	01:12
lifeless	because only one "approve" action is in the contents	01:12
sinzui	Ah, yes, that is what I did to be certain something showed up in my env	01:12
lifeless	I suspect the metal:form stuf	01:13
lifeless	I'm positive its simply not evaluation things without metal:fill-slot in that container	01:14
lifeless	I think if we add a div around it it wil work, moving the widgets slot up	01:14
sinzui	we are adapting the message in the same manner that we want to adapt the BN	01:15
lifeless	yes, but the metal interpreter isn't evaluating things without slots	01:15
sinzui	We can certainly move the navs out of the form to be sire it works	01:15
lifeless	bet you that that is is	01:16
lifeless	is it	01:16
lifeless	yes	01:16
lifeless	it was	01:16
lifeless	I have this now	01:16
sinzui	\o/	01:16
lifeless	<a class="next" rel="next"\n href="http://launchpad.dev/%7Eguadamen/+mailinglist-moderate?start=1&batch=1"\n id="upper-batch-nav-batchnav-next">	01:16
lifeless	- <table class="listing" metal:fill-slot="widgets">	01:16
lifeless	+ <div metal:fill-slot="widgets">	01:16
lifeless	+ <tal:navigation	01:16
lifeless	+ replace="structure view/held_messages/@@+navigation-links-upper" />	01:16
lifeless	+	01:16
lifeless	+ <table class="listing">	01:16
lifeless	thats the key	01:16
sinzui	1.5h of confusion and 1 minute to fix with insight.	01:17
lifeless	we must be programming	01:17
lifeless	Thank you for this; I'll push up and propose for merge	01:18
sinzui	Thanks	01:19
lifeless	and I'll write a mail to the list with a) a howto and b) asking for where it should go	01:19
lifeless	does anyone remember the wiki page for the bug sprinty thing at the end of the year?	01:20
=== lifeless changed the topic of #launchpad-dev to: Launchpad Development Channel \| Performance Tuesday \| Week 1 of 10.10 \| PQM is open for business \| firefighting: - \| https://dev.launchpad.net/ \| Get the code: https://dev.launchpad.net/Getting
lifeless	sinzui: https://code.edge.launchpad.net/~lifeless/launchpad/registry/+merge/35354	01:24
wgrant	lifeless: So, I worked out what was up with the broken accounts.	01:40
wgrant	Sadly more will likely break soon.	01:40
lifeless	wgrant: ok cool	01:40
lifeless	I learnt how to batch stuff	01:40
lifeless	and to hate metal:form	01:40
wgrant	Heh.	01:40
wgrant	Who owns our OpenID consumer these days?	01:41
lifeless	consumer? foundations	01:42
lifeless	lnchtime	01:43
lifeless	thumper: does transaction time == scan time ?	01:53
thumper	lifeless: luckily, no	01:53
thumper	lifeless: 5.5 minutes to get the ancestry from bzrlib :(	01:54
lifeless	wheee	01:57
lifeless	I think your idea of decoupling the tip change may not be enough	01:58
lifeless	I'd start with autocommit	01:58
lifeless	IMBW	01:58
lifeless	but it seems like low effort for big return	01:58
=== Ursinha-brb is now known as Ursinha-afk
lifeless	thumper: can I borrow your eyeballs	02:15
thumper	lifeless: no, they're mine	02:22
thumper	lifeless: what do you need?	02:22
* mwhudson is reminded of the end of hotshots		02:22
lifeless	thumper: a review	02:23
lifeless	its small, it will fix mailing list moderation (or make it fixable by further tuning)	02:23
MTecknology	Just on the wild off chance... Is there anyone that knows very basic accounting principles in here? I know there are very smart people in here and hoping one might be able to help me out..	02:43
lifeless	MTecknology: I do, enough to say 'run run away'	02:47
lifeless	spm: hey, don't suppose in the losa wiki you have a sql fragment to report on locks in the db ?	02:48
lifeless	spm: I know I wrote one up years ago ...	02:48
spm	yup sure do	02:48
lifeless	thumper needs its.	02:48
spm	it's a tad obscure to find tho. lp howto, troubleshooting from ememory	02:48
MTecknology	lifeless: How about enough to help me figure out this problem that's driving me absolutely bonkers? I have the book - but the book doesn't cover the material.	02:48
thumper	lifeless: did you mute?	02:48
spm	thumper: https://wiki.canonical.com/InformationInfrastructure/OSA/LPHowTo/BlockedProcessesDBLocks as a general	02:49
spm	https://wiki.canonical.com/InformationInfrastructure/OSA/LPHowTo/PostgresOldQueries is also vaguely relevant	02:49
spm	MTecknology: google isn't helping find it? if they're basic accuonting principles, there should be heap of online references that explain them??	02:50
MTecknology	spm: My issue is understanding the basics of what I'm even reading online	02:50
spm	MTecknology: being quite serious here (I've got a few ni teh series): Perhaps "Accounting for Dummies"? serious suggestion, the dummies series are excellent for explaining the basic concepts. ??	02:52
MTecknology	spm: might be worth buying from ya - any chance you could try to help me in a query with this one?	02:54
MTecknology	or else there's a barnes & nobel here if you weren't offering to sell	02:54
spm	MTecknology: accounting? hell no. I never studied it at school or uni. wouldn't have a clue. I just have a few Dummies books that I've found excellent for explaining early concepts in the topics in question. :-)	02:55
MTecknology	oh	02:55
MTecknology	BTW - This is what I'm fighting. Pearson Brothers recently reported an EBITDA of $13.5 million and net income of $2.6 million. It had $2.0 million of interest expense, and its corporate tax rate was 35%. What was its charge for depreciation and amortization?	02:55
spm	http://www.amazon.com/Accounting-Dummies-John-Tracy/dp/0764550144 fwiw	02:56
MTecknology	cheap, and probably much more useful than this $150 unbound see through sheets of paper book I have	02:57
spm	probably :-)	02:59
cr3	lifeless: hi there, sorry I couldn't answer you earlier. still around?	03:10
lifeless	yes	03:13
cr3	lifeless: so, regarding test runs, do you also feel that's the best way to describe a group of test results run at a point in time in a given context?	03:15
cr3	lifeless: typically, I prefer to name things with one word, like submission instead of test run, but I think the latter might be clearer	03:15
lifeless	I commentted on thta in #testrepository	03:17
lifeless	sorry otp now	03:17
cr3	lifeless: heh, you seem to have been on the phone all day :)	03:19
cr3	on an unrelated topic, I have a question about defining interfaces: if a class implements IBugTarget which inherits from IHasBugs, using bugs as an example, then that class typically defines a createBug method.	03:22
cr3	however, why not have the class have a bugs attribute which returns a IBugSet which, in turn, implements a create method	03:23
cr3	in other words, the difference is like product.createBug compared to product.bugs.create, does this make sense to anyone?	03:24
lifeless	thumper: https://dev.launchpad.net/LEP/FeatureFlags#preview	03:25
lifeless	thumper: if features.getFeature('code.incrementaldif') == 'on':	03:28
lifeless	in templates, you do view/features/code.incrementaldiff	03:28
lifeless	or something like that	03:28
lifeless	spm: how many cpus on the master db?	03:35
spm	lifeless: 16	03:36
lifeless	cr3: hi	03:57
lifeless	uhm	03:57
lifeless	in reverse order, I don't know, IBugSet really isn't the specific code I'd use if sketching it that way, and don't forget that all calls to SQL are ~ 1000 times slower than python.	03:58
lifeless	I odn't have a brilliant name for the result of running many tests other than 'test run'	03:59
wgrant	lifeless: Any idea how to debug the +filebug issue?	04:08
lifeless	hmm	04:10
lifeless	spm: we really do need a hand	04:10
lifeless	spm: when you can, its approx the top timeout on prod	04:10
spm	lifeless: sure, was on a call, earlier hence the terse reply. sup?	04:10
lifeless	spm: +filebug gives an apache/haproxy error, reliably, on staging and prod	04:11
lifeless	wgrant has been looking at it	04:11
lifeless	we need to know a bit more about whats actually going on.	04:11
wgrant	Bug 636801	04:11
spm	um. since when? I happily filed a bug earlier?	04:11
lifeless	or for someone to make the request to a naked appserver	04:11
wgrant	I guess we need someone to watch staging Apache and see why it errors.	04:11
lifeless	or something	04:11
wgrant	spm: Only when filing with lots of apport attachments.	04:12
lifeless	spm: with apport on a package with 20+ subscribers?	04:12
spm	no, just a soyuz one. qed. ;-)	04:12
* wgrant kicks mup.		04:12
lifeless	mup has mastered the fine art of silence.	04:12
spm	mup appears to have left the channel	04:12
wgrant	spm: WRT that Soyuz one, it seems to be a general problem. I've received complaints that lots of builds are dispatching repeatedly.	04:12
lifeless	spm: _mup_	04:13
cr3	lifeless: the interface question was mostly related to something containing other objects. put another way, I could have projects['bzr'].create_test_run() or projects['bzr'].test_runs.create()	04:13
spm	ahh. it hides under a new name.	04:13
lifeless	cr3: is this python or LP API's ?	04:14
lifeless	cr3: if its LP API's you probably want to design to the wire protocol, given how round-trip-happy it is.	04:14
cr3	lifeless: ok, so every dot is a roundtrip potentially	04:15
wgrant	Not just potentially :(	04:15
lifeless	if by potentially you mean 'almost guaranteed'	04:15
lifeless	and by dot you mean 'python method invocation'	04:15
lifeless	(which includes __getattr__ aka '.')	04:15
cr3	I thought that perhaps launchpadlib could potentially cache information on the client side, sometimes avoiding a roundtrip	04:15
lifeless	cr3: optimise for cold cache :)	04:16
lifeless	(it can, under very limited circumstances)	04:16
lifeless	which I suspect we'll be limiting to about 2.5 hours in the near future	04:17
cr3	ok, that answers my question and provides good guidelines for the future. thanks!	04:17
cr3	lesson learned, now time for bed. cheerio folks!	04:20
wgrant	Bah, no staging.	04:23
spm	lifeless: so, been doing some log snarfing and head scratching. not finding any errors in apache - but if a POST, and timing out; tbh I wouldn't expect to. :-( If this can be reliably repeated, I'd suggest 'a' way forward, would be to sniff the traffic at the client end when doing such a thing. even tho the connection'd be ssl'd, I'd betcha we'd get useful info out of the flow.	04:29
lifeless	stub: https://code.edge.launchpad.net/~stub/launchpad/cronscripts/+merge/35279 reviewedish	04:31
spm	wgrant: ref soyuz; yeah, I'm sure I'd seen comments around this bug before; but didn't have enough "knowledge" to find 'em. So figured a new with some detailed timeing info may help Julian. Being a private build I had to be a tad circumspect in what I put in unf. :-/	04:31
lifeless	spm: we don't see the response on the client,thats the point.	04:31
lifeless	spm: client -> server, pause, 'could not connect to launchpad'	04:31
spm	lifeless: the tcp conenction stays open forever until it gets client killed?	04:32
lifeless	spm: so we don't get an oops, don't get zip	04:32
lifeless	spm: no we get the haproxy/apache lalala page	04:32
spm	after what time period, repeatedly?	04:32
lifeless	wgrant: please tell spm how to make it happen, then he'll see	04:32
spm	same time period? longer? shortly? varies by moon phase and tides?	04:32
lifeless	10seconds sometimes apparently, though I think that was during the overload	04:32
lifeless	30 normallyish, I thinks.	04:33
spm	different browsers to make a diff?	04:33
lifeless	spm: don't think we've tested, because the browser is working fine.	04:33
spm	just wondering if it's an internal browser timeout that's then kicking the server error	04:34
lifeless	I don't even know how to parse that	04:34
spm	ie. are packets actually flowing and then dying.	04:34
spm	or no packets flowing at all	04:34
lifeless	spm: its http - request/response model	04:35
lifeless	spm: and apport does preuploading of the bugs, so its not a big post.	04:35
spm	for sure; I'm looking at the tcp layer to get clues for wtf is happening at the http layer.	04:35
lifeless	wgrant: whats a package that this has happened to ?	04:35
lifeless	spm: I don't think its an http problem myself, I think its appserver lalalalala land time genuinely, but we don't see the oops clearly	04:36
spm	fwiw, it should be pretty simple in staging: intranettertubers -> apache -> appserver. no squid, no haproxy.	04:36
lifeless	ok	04:36
wgrant	spm, lifeless: I was trying to prepare a case, but staging is borked.	04:37
lifeless	so we're seeing the apache 'server fail' message	04:37
spm	actually there's a point. wonder if the oops are being generated; we're just not seeing 'em. looks...	04:37
lifeless	OOPS-1717E1745, OOPS-1717G1716, OOPS-1717H1810, OOPS-1717K1882, OOPS-1717L1760	04:37
lifeless	OOPS-1717E1218, OOPS-1717E1837, OOPS-1717K1949, OOPS-1717M1234, OOPS-1717N1211	04:37
lifeless	OOPS-1717D703, OOPS-1717G778, OOPS-1717K884, OOPS-1717K885	04:37
lifeless	they are listed as soft timeouts on +filebug	04:37
lifeless	we also have some 'OffsiteFormPostError'	04:38
lifeless	OOPS-1717M14	04:38
spm	process-apport-blobs.log is remarkably unhelpful	04:38
wgrant	process-apport-blobs is fine.	04:38
lifeless	that happens async	04:38
lifeless	its all in the appserver at this point	04:39
* spm is doing the Sherlock Holmes method of debug - eliminate the working, to discover the not ;-)		04:39
lifeless	:)	04:40
wgrant	Can we expect staging to return at some point?	04:40
wgrant	It's been down a lot lately...	04:40
lifeless	wgrant: theres about 6 queries per attachment	04:40
wgrant	lifeless: Really?	04:40
spm	launchpad-trace.log has zip with 'filebug' in it. orsum	04:41
lifeless	INSERT INTO BugAttachment (message, bug, libraryfile, type, title) VALUES (%s, %s, %s, %s, %s) RETURNING BugAttachment.id	04:41
wgrant	Message, BugAttachment, BugNotification, FUCKLOADS * BugNotificationRecipient	04:41
lifeless	SELECT Person.account, Person.creation_comment, Person.creation_rationale, Person.datecreated, Person.defaultmembershipperiod, Person.defaultrenewalperiod, Person.displayname	04:41
lifeless	SELECT BugTask.assignee, BugTask.bug, BugTask.bugwatch, BugTask.date_assigned, BugTask.date_closed, BugTask.date_confirmed, BugTask.date_fix_committed, BugTask.date_fix_released	04:41
spm	wgrant: for some reason, staging is being updated 'continuously' regardless of need. haven't had a chance to chase. yet.	04:41
lifeless	SELECT StructuralSubscription.blueprint_notification_level, StructuralSubscription.bug_notification_level, StructuralSubscription.date_created, StructuralSubscription.date_last_updated	04:41
lifeless	SELECT Person.account, Person.creation_comment, Person.creation_rationale, Person.datecreated, Person.defaultmembershipperiod, Person.defaultrenewalperiod, Person.displayname,	04:41
lifeless	SELECT Person.account, Person.creation_comment, Person.creation_rationale, Person.datecreated, Person.defaultmembershipperiod, Person.defaultrenewalperiod, Person.displayname	04:41
wgrant	lifeless: 'ugh' comes to mind.	04:41
lifeless	SELECT Person.account, Person.creation_comment, Person.creation_rationale, Person.datecreated, Person.defaultmembershipperiod, Person.defaultrenewalperiod, Person.displayname,	04:41
lifeless	SELECT Person.account, Person.creation_comment, Person.creation_rationale, Person.datecreated, Person.defaultmembershipperiod, Person.defaultrenewalperiod, Person.displayname,	04:42
lifeless	SELECT BugTask.assignee, BugTask.bug, BugTask.bugwatch, BugTask.date_assigned, BugTask.date_closed, BugTask.date_confirmed, BugTask.date_fix_committed, BugTask.date_fix_released,	04:42
lifeless	SELECT StructuralSubscription.blueprint_notification_level, StructuralSubscription.bug_notification_level, StructuralSubscription.date_created, StructuralSubscription.date_last_updated,	04:42
lifeless	SELECT Person.account, Person.creation_comment, Person.creation_rationale, Person.datecreated, Person.defaultmembershipperiod, Person.defaultrenewalperiod, Person.displayname,	04:42
lifeless	SELECT Person.account, Person.creation_comment, Person.creation_rationale, Person.datecreated, Person.defaultmembershipperiod, Person.defaultrenewalperiod, Person.displayname,	04:42
lifeless	SELECT LibraryFileContent.datecreated, LibraryFileContent.filesize, LibraryFileContent.id, LibraryFileContent.md5, LibraryFileContent.sha1 FROM LibraryFileContent WHERE LibraryFileContent.id = %s LIMIT 1	04:42
lifeless	INSERT INTO BugActivity (oldvalue, datechanged, whatchanged, message, newvalue, bug, person) VALUES (%s, CURRENT_TIMESTAMP AT TIME ZONE 'UTC', %s, %s, %s, %s, %s) RETURNING	04:42
lifeless	INSERT INTO Message (datecreated, owner, subject, rfc822msgid) VALUES (CURRENT_TIMESTAMP AT TIME ZONE 'UTC', %s, %s, %s) RETURNING Message.id	04:42
lifeless	I'm going to stop there	04:43
lifeless	'lots'	04:43
lifeless	wgrant: mailed you, its an open package, normal person	04:44
wgrant	lifeless: Is this from a hidden OOPS?	04:44
wgrant	Hm, that's only 500 queries.	04:45
wgrant	Is this a soft timeout?	04:45
lifeless	yes	04:45
wgrant	Ah.	04:45
lifeless	but we've no reason to assume that this is unrelated ;)	04:45
wgrant	I'm more concerned with the bad error than the fact that there's an error.	04:46
wgrant	We know why it's timing out.	04:46
wgrant	We don't know why it's timing out like this.	04:46
lifeless	oh it concerns me too	04:47
lifeless	wgrant: are you doing some perf stuff today? It is tuesday...	04:47
wgrant	Are we going to be able to work this out on staging soon, or should we do it on prod now?	04:49
lifeless	prod it up	04:49
wgrant	OK.	04:51
wgrant	https://bugs.edge.launchpad.net/ubuntu/+source/linux/+filebug/5ca89d78-bfa3-11df-905e-0025b3df357a breaks pretty repeatedly.	04:51
lifeless	wgrant: whats your ip for spm to look in apache logs	04:51
wgrant	Isn't the token in the URL sufficient?	04:52
spm	"trust me, I'm a sysadmin"	04:52
wgrant	Otherwise 122.108.38.217	04:52
* lifeless looks in shock at spm		04:52
jtv	wgrant: heya	04:52
wgrant	jtv: Morning.	04:52
jtv	wgrant: may or may not be related but last night at least, we had some edge breakage where one of the edge instances reported the wrong revision.	04:53
jtv	So it'd say it was at r11532 but actually seemed to be stuck at r11522 like the rest.	04:53
wgrant	jtv: Unrelated -- this has been going on since ~10.09.	04:53
jtv	Oh ok	04:53
jtv	nm that then :)	04:54
wgrant	Heh.	04:54
spm	bleh. aapche say '502'	04:56
wgrant	Nothing useful in the error log?	04:56
wgrant	Or does that mean the appserver said 502?	04:56
lifeless	spm: now the question is, did an oops get generated	04:57
spm	[14/Sep/2010:04:50:42 +0100]	04:57
* wgrant stabs BST in the face.		04:57
spm	indeed	04:58
spm	[Tue Sep 14 04:50:55 2010] [error] [client 122.108.38.217] (70014)End of file found: proxy: error reading status line from remote server localhost, referer: https://bugs.edge.launchpad.net/ubuntu/+source/linux/+filebug/5ca89d78-bfa3-11df-905e-0025b3df357a	04:59
spm	[Tue Sep 14 04:50:55 2010] [error] [client 122.108.38.217] proxy: Error reading from remote server returned by /ubuntu/+source/linux/+filebug/5ca89d78-bfa3-11df-905e-0025b3df357a, referer: https://bugs.edge.launchpad.net/ubuntu/+source/linux/+filebug/5ca89d78-bfa3-11df-905e-0025b3df357a	04:59
spm	^^ errorlog	04:59
wgrant	Aha.	04:59
wgrant	So, the appserver died.	04:59
wgrant	Or otherwise closed the connection.	04:59
spm	hmm. haproxy/squid are in there somewhere. there may be in ter est ing comp li ca tions	05:00
wgrant	I thought they were on the other side.	05:00
wgrant	But I could well be wrong.	05:00
wgrant	So I guess you need to go through all the layers :(	05:03
lifeless	apache -> ha -> appserver	05:03
wgrant	With Squid in front of Apache?	05:04
spm	apache -> (squid)? -> ha -> app; POsts don't go via squiddly	05:04
wgrant	Ahh	05:04
wgrant	Handy.	05:04
lifeless	nor do authenticated requests IIRC	05:04
spm	correct	05:05
lifeless	thumper: https://bugs.edge.launchpad.net/launchpad-code/+bug/637758 please put the code walkthrough we did in there, for gary's info when he sees the other bug I'm filing :)	05:05
thumper	ok	05:06
lifeless	spm: how many appservers for lpnet ?	05:08
spm	15	05:09
poolie	how's it going, wallyworld?	05:09
wgrant	I was a bit surprised to see O oopses over the weekend.	05:09
lifeless	so 60 threads	05:09
wgrant	I didn't realise there were quite that many.	05:09
lifeless	thumper: https://bugs.edge.launchpad.net/launchpad-foundations/+bug/637761	05:11
poolie	wgrant: because the counter was broke, or because we actually had 0?	05:13
poolie	remarkably good if os	05:13
poolie	*so	05:13
mwhudson	does that 15 include the login and shipit servers?	05:13
wgrant	poolie: O != 0	05:13
poolie	haha	05:14
poolie	O meaning some particular category?	05:14
lifeless	poolie: server ID in the oops code	05:14
lifeless	A, B, C, ...	05:14
spm	mwhudson: no, theose are extras	05:14
mwhudson	wow	05:14
mwhudson	lots of hardware	05:14
lifeless	some machines have multiple instances	05:14
lifeless	but yes.	05:14
poolie	aren't some higher letters used for something other than a machine id?	05:15
poolie	or maybe that's a different field	05:15
lifeless	its an arbitrary string	05:15
lifeless	e.g. XML	05:15
lifeless	date before, serial after	05:15
lifeless	thumper: rt 41361 if you want to high-pri it	05:15
thumper	lifeless: ack, dealing with a user on #launchpad right now	05:16
wgrant	The appservers are single letters. Others are longer strings (eg. CW, FTPMASTER, PPA)	05:16
spm	woo. progress. Sep 13 23:26:16 localhost haproxy[15039]: 127.0.0.1:39282 [13/Sep/2010:23:26:00.844] lpnet-app lpnet-app/potassium_lpnet_5 0/0/0/-1/15230 502 1184 - - SH-- 67/38/38/2/0 0/0 "POST /ubuntu/+source/linux/+filebug/8dc224d8-bf85-11df-806b-0025b3df357a HTTP/1.1"	05:16
lifeless	I wonder how hard it would be to port storm & zope to stackless	05:16
lifeless	or jython	05:17
wgrant	spm: But what does it mean?	05:17
lifeless	wgrant: it means SH--	05:17
wgrant	Heh.	05:17
lifeless	wgrant: one thing it tells us	05:17
spm	lpnet 5 on potassium did the "work"	05:17
lifeless	potassium should have the oops	05:17
wgrant	If there was an OOPS.	05:18
wgrant	I was hoping it would tell us on what terms the response ended.	05:18
thumper	lifeless: I'm thinking that we are seeing other xmlrpc problems from the bzr client	05:20
thumper	lifeless: as it does lp name resolution lookups	05:20
lifeless	wgrant: divorced	05:21
spm	lifeless: wgrant: it also tells us, this timed out after 15 seconds; some other logs around there have 300secs, so ... funky.	05:21
lifeless	thumper: sorry, can you expand on that please.	05:21
spm	or succeeded after 270 seconds; so I'd suggest this is unlikely (but not ruled out) to be a timeout issue directly.	05:22
wgrant	spm: Does potassium have an opinion?	05:22
lifeless	the 270 seconds will be a file attachment	05:22
lifeless	wgrant: it likes water	05:22
wgrant	spm: Also, it didn't time out after 15 seconds.	05:22
wgrant	I don't think.	05:23
wgrant	Because I get that error in less than 14 seconds.	05:23
spm	15230 <== ms, ~ 15 secs	05:23
lifeless	:23:26:00 -> 23:26:16	05:23
wgrant	(now, at least -- not sure about that request)	05:23
spm	:-)	05:23
wgrant	So it's not a pure timeout.	05:23
spm	I'd be inclined to rule out an apache/haproxy/squid timeout, not exlcude, but look elsewhere.	05:24
lifeless	spm: is that url in the zop elogs on potassium	05:24
spm	huh. not the same url, but related: https://lp-oops.canonical.com/oops.py/?oopsid=OOPS-1718F124	05:29
lifeless	spm: its odd for the request to dispatch and not be in the access log	05:31
lifeless	spm: wouldn't you say?>	05:31
spm	which log? the lp one?	05:31
lifeless	lpnet_5	05:31
spm	oh yes.	05:31
lifeless	isn't there an access log for it?	05:32
spm	I've founjd it in the trace log	05:32
lifeless	\o/	05:32
spm	but not in launchpad-access5.log-20100914	05:32
lifeless	ok that OOPS is also a soft timeout	05:32
lifeless	I wonder if there is something special going on	05:32
wgrant	There is clearly something special going on.	05:32
lifeless	wgrant: I have an experiment you could do	05:32
wgrant	Sure.	05:32
spm	lifeless: https://pastebin.canonical.com/37129/	05:33
lifeless	configure launchpad.dev to have a 1 second soft timeout and 1.2 second hard timeout	05:33
lifeless	point apport at it and have fun	05:33
lifeless	wgrant: theory: hard timeouts are breaking in this case	05:33
lifeless	and - boom - I think I know why	05:33
thumper	lifeless: could have been something else, don't worry about it	05:33
wgrant	Let's see...	05:33
lifeless	requesttimeline stuff we saw yesterday.	05:33
wgrant	Yeah, I wondered if that was realted.	05:34
* wgrant finds the timeouts.		05:34
lifeless	spm: is there an OverlappingActionError in the lpnet5 appserver log ?	05:34
lifeless	spm: (not the trace log)	05:34
wgrant	lifeless: I only see soft_request_timeout	05:35
wgrant	No hard_request_timeout.	05:35
lifeless	db_statement_timeout	05:35
wgrant	Oh.	05:35
wgrant	The comment on that is misleadding.	05:35
lifeless	really?	05:35
* thumper looks at the daily timeout candidates		05:35
wgrant	# SQL statement timeout in milliseconds. If a statement	05:35
wgrant	# takes longer than this to execute, then it will be aborted.	05:35
wgrant	# A value of 0 turns off the timeout. If this value is not set,	05:35
wgrant	# PostgreSQL's default setting is used.	05:35
thumper	what is BranchSet:CollectionResource:#branches ?	05:35
lifeless	thumper: an API call	05:35
wgrant	thumper: API branch collection.	05:35
thumper	yeah but which?	05:35
lifeless	I've got a bug open on the clarity for that	05:35
lifeless	thumper: you have to look at the oops	05:36
wgrant	/branches	05:36
lifeless	which is why i have a bug open	05:36
lifeless	wgrant: any collection.	05:36
wgrant	lifeless: It says BranchSet	05:36
lifeless	ah true, damn your eyes.	05:36
wgrant	But yes, normally it's stupid.	05:36
spm	lifeless: not that I can find. afaict, that "request" doesn't exist in the lpnet5 access log. even looking at the full 10 min period around	05:40
lifeless	wgrant: http://pastebin.com/naNudsp3	05:40
lifeless	spm: check nohup	05:40
spm	hrm. point	05:40
wgrant	Proxy Error	05:40
wgrant	BOOM	05:40
lifeless	for for OverlappingActionError	05:40
wgrant	OverlappingActionError: (<lp.services.timeline.timedaction.TimedAction object at 0xe91ebac>, <lp.services.timeline.timedaction.TimedAction object at 0xe71ebec>)	05:40
lifeless	wgrant: you reproduced ?	05:40
wgrant	You win.	05:40
lifeless	wgrant: \o/	05:41
wgrant	Question is... which are they...	05:41
spm	1st problem. nohup doesn't log times.	05:41
lifeless	apply my patch	05:41
wgrant	Ah!	05:41
lifeless	spm: never mind, my WAG was spot on	05:41
spm	raise OverlappingActionError(self.actions[-1], result)	05:41
spm	OverlappingActionError: (<lp.services.timeline.timedaction.TimedAction object at 0x2b034a6f7990>, <lp.services.timeline.timedaction.TimedAction object at 0x1338fd90>)	05:42
spm	raise OverlappingActionError(self.actions[-1], result)	05:42
spm	OverlappingActionError: (<lp.services.timeline.timedaction.TimedAction object at 0x13ff2f90>, <lp.services.timeline.timedaction.TimedAction object at 0x147b1f50>)	05:42
spm	lifeless: cool, fwiw tho ^^	05:42
lifeless	spm: thanks	05:42
=== jerinas is now known as j24
wgrant	Uh.	05:42
wgrant	OverlappingActionError: (<TimedAction SQL-launchpad-main-master[UPDATE Bug SET heat_]>, <TimedAction SQL-session[ UPDATE ]>)	05:42
wgrant	How?	05:42
lifeless	grah	05:42
wgrant	Oh.	05:42
wgrant	Maybe when it times out it doesn't close the action?	05:42
lifeless	see the comment in errorlog about this	05:42
wgrant	Which?	05:43
wgrant	Ah.	05:43
wgrant	I see.	05:43
lifeless	sorry	05:43
lifeless	in logTuple	05:43
lifeless	storm tracers are not a stack.	05:43
lifeless	our having a timeout tracer and a log tracer doesn't work as well as it should in theory.	05:45
lifeless	I think I'm going to create a stack-lock tracer that delegates to two other tracers and combine them.	05:45
lifeless	long term.	05:45
lifeless	for now, lets get fugly, lets get fugly.	05:45
=== jerinas_ is now known as j24
* spm decides now would be a good time to run away for lunch		05:46
lifeless	spm: I have a cowboy	05:47
lifeless	spm: when you return	05:47
lifeless	wgrant: what was the bug for this?	05:47
spm	lifeless: cool; I assum by then you'll also have an incident report to go with? ;-)	05:47
wgrant	Bug 636801	05:47
lifeless	spm: I can make one	05:51
lifeless	wgrant: please confirm that http://pastebin.com/iPnkpPpF fixes it	05:52
wgrant	lifeless: Great success.	05:55
lifeless	spm: we have the thing to cowboy	06:02
wgrant	stub: Hi.	06:22
stub	yo	06:22
wgrant	stub: The multiple OpenID identifiers stuff has had some interesting consequences.	06:22
wgrant	stub: In particular the bit where it respects email address linkage more than identifier linkage.	06:23
wgrant	Which results in people being logged in as the wrong person, and the real person OOPSing because they no longer have an identifier.	06:23
wgrant	I guess the users can fix it by merging the accounts... but I'm not sure that respecting the email address in the first place is a good idea.	06:24
lifeless	I don't get not respect.	06:24
wgrant	Hm?	06:24
lifeless	playing with words	06:24
wgrant	Hah.	06:25
lifeless	badly	06:25
stub	I don't understand what the problem case is. If you are logging into the OP using an email address, you want to login as the Launchpad Person attached to that email address.	06:26
stub	I suspect the cases that are broken where broken already, caused when LP accounts which were merged (the main bug this change was supposed to tackle)	06:28
lifeless	spm: when you return : the thing to cowboy is https://code.launchpad.net/~lifeless/launchpad/cp/+merge/35364	06:28
lifeless	spm: its going through the motions now to get into prod-devel, and I'll request a normal reroll tomorrow or so with it in it, but we should fix it now.	06:29
lifeless	wgrant: so, care to work on filebug ?	06:29
lifeless	wgrant: -huge- room for improvement.	06:29
lifeless	spm: and for edge, we need https://code.launchpad.net/~lifeless/launchpad/oops/+merge/35363	06:32
lifeless	again, its in the pipe to be done the normal way	06:32
wgrant	stub: In the cases I know of, the user had changed their LP email address to blah+launchpad@some.domain	06:32
wgrant	stub: A package or translations import then recreated blah@some.domain	06:33
wgrant	So the next time they log in, they land in a different account.	06:33
stub	I see.	06:33
wgrant	What is the purpose of the email address match?	06:34
stub	Because people can change their email details in the OpenID Provider.	06:34
stub	Edit your emails, create a new account with the old email, be unable to log into Launchpad.	06:35
wgrant	Huh?	06:36
wgrant	If the LP person was tied to an identifier, then email addresses don't matter.	06:36
wgrant	It could be a little confusing in some cases, until the OpenID associations are listed clearly... but it wouldn't do strange things like this.	06:36
lifeless	-> shops. If issues, ring me	06:38
lifeless	oh, this is needed primarily on appservers, so just them for now.	06:38
stub	Create foo@example.com in the OP. Log into Launchpad. Edit the account to be bar@example.com. Create a new account for foo@example.com. Now if you log in as foo@example.com, you can't log into Launchpad as your email address in Launchpad is associated with a different identifier.	06:38
stub	Although the way we really triggered this was account merging.	06:39
wgrant	lifeless: ECHAN?	06:39
wgrant	stub: Ah, I see.	06:39
wgrant	Hmm.	06:39
stub	People had multiple accounts in the OP, and a Launchpad person with multiple email addresses. They had to log into the OP using the email address that happens to be linked to the correct Person.	06:39
wgrant	I wonder what should be done here.	06:41
wgrant	I cannot see a good solution.	06:42
stub	Because we can now link multiple identifiers to a Person, and because person merge does the right thing now, we might be able to drop some of the repair work login does now if the solution is causing worse problems.	06:42
wgrant	Does the use case you provided above have any legitimate reason for occuring?	06:43
stub	If it does, it is pretty obscure.	06:43
wgrant	Yes...	06:43
wgrant	So I wonder if the repair is useful.	06:43
wgrant	Or if it should just tell you that you are doing bad things.	06:43
stub	We are in a half way stage to becoming a real OpenID consumer. I think the problems go away if we stop trusting the Canonical SSO and instead implement the work flow for attaching OpenID identifiers to Launchpad accounts. But there is a fair bit of work that needs doing first (shipit and our test infrastructure makes this more complicated)	06:46
wgrant	Right, this is what I was thinking.	06:46
wgrant	Except for the test infrastructure.	06:46
wgrant	What's the issue with that?	06:46
wgrant	Does it do something stupid like using basic auth?	06:46
stub	The OpenID our tests use is the old Launchpad OP code. It uses the same underlying database tables, so it is all tied up in knots.	06:47
wgrant	Ah, right.	06:47
wgrant	stub: So, what should be done? Advise the users to merge?	06:57
stub	At the moment, yes.	06:58
wgrant	Thanks.	06:59
stub	That might be the preferred solution too, as we ensure the SSO database and Launchpad database remain in sync. I'm not sure.	07:00
* mwhudson eods		07:00
wgrant	The separation needs to eventually be far more obvious.	07:00
lifeless	wgrant: not echan, info for spm	07:15
spm	hm?	07:15
=== almaisan-away is now known as al-maisan
lifeless	spm: the cowboy	07:15
spm	ah	07:15
lifeless	spm: see all the backlog	07:15
lifeless	spm: we're missing out on many OOPS at the moment, its rather important	07:16
spm	yarp; just getting it all together atm	07:16
spm	hrm. complex patch that one	07:16
lifeless	hah	07:17
stub	lifeless: by 'fail closed' do you mean if we can't load or parse the config file we should default to enabled?	07:50
stub	I can argue that either way. losas might have an opinion.	07:50
lifeless	I mean we should not run unless permitted too	07:51
spm	losas have lots of opinions, some of them are even relevant	07:51
lifeless	if theres something wrong, running is likely to add to the problem	07:51
lifeless	(as a default, for most cronscripts)	07:51
stub	ok. My reasoning for the current behaviour is a stuffup in the config mechanism (Apache dead, syntax error) shouldn't bring the Launchpad systems down. And your reasoning is just as valid :-) It does get noisy if things fail, but that is about it.	07:54
lifeless	its just cronscripts now isn't it ?	07:55
stub	I could make it a config option and make it somebody elses problem ;)	07:55
stub	Yes - this is just cronscripts.	07:55
lifeless	so if all the cronscripts are down, when apache is down, I don't think its raeally a problem :)	07:56
lifeless	I mean, at that point, apache is down :)	07:56
stub	I think a typo or mistake is more likely - this config file is being edited by humans.	07:56
lifeless	so there are two places that can occur	07:57
lifeless	the lazr config providing the url	07:57
lifeless	and the referenced ini file	07:58
lifeless	for the former, it should change nearly never	07:58
lifeless	for the latter, we could provide a small lint-and-update tool	07:59
lifeless	spm: what do you think	08:00
spm	I don't think. sysadmin.	08:00
lifeless	and if someone paid you to ? :P	08:00
spm	with chocolate? I'd pretend to think REAL hard.	08:01
lifeless	hhha	08:01
spm	not sure I fully follow the issue? Q&D summary? something about not running cronscripts if parts of LP are borked?	08:02
lifeless	ok	08:02
lifeless	so there is a new ini file coming in	08:02
lifeless	it will disable all cronscripts in one hit, no need to touch cron	08:02
lifeless	if there is an error obtaining it (over http) and parsing it, what should happen: should the scripts run, or not run	08:03
spm	where "one hit" is ~ 26 easily accessible servers and 4 difficult ones?	08:04
lifeless	yes	08:04
spm	cool, just ensuring we appreciate what "one hit" means :-)	08:04
lifeless	as long as they can access it over http inside the network.	08:04
spm	Oh! I see. Right! thats.... funky.	08:04
lifeless	hyou odn't like?	08:05
spm	No I do like	08:05
spm	the scripts should do whatever the last invocation was. which sucks, because now you're maintaining state as well.	08:05
spm	ie. network hiccups will occour. we don't want to clobber LP by such a transient	08:06
lifeless	spm: so tcp syn will retry three times anyway	08:07
spm	as a thought: you'd have a 2 checks. the official "check http"; with a secondary, check local/state. We can script update the state if necessary - eg apache update	08:07
spm	even so. soyuz used to barf badly all the time on funky network woes.	08:07
spm	I'm think more resiliant than what just tcp et al give.	08:08
spm	does that make sense? crackful?	08:08
lifeless	thinking	08:08
spm	lifeless: that should be done on PROD btw	08:09
lifeless	so, for daily and hourly crons	08:09
lifeless	wgrant: please break it	08:09
lifeless	wgrant: prod	08:09
wgrant	lifeless: OK.	08:09
spm	not edge, haven't done edge yet	08:09
lifeless	spm: for daily and hourly cronscripts, not running is fairly significant	08:10
lifeless	spm: OTOH daily and hourly things are background tasks mainly, and oops reporting etc is separate and not driven by this	08:10
wgrant	lifeless: Success.	08:10
wgrant	OOPS-1718H415	08:10
lifeless	spm: \o/	08:10
spm	right, which is why I'd want them to fail as safely as possible - via a local state "what did I do last time?" <== but state is also likely to shared (maybe??) so likely updated more frequently.	08:11
spm	\o/	08:11
lifeless	so, personal opinion, if things are fucked royally I'd rather have the cronscripts not running to facilitate recovery.	08:11
spm	I'd probably suggest shying away from per-job state; go for a global	08:11
lifeless	as long as when they don't run they log it, if we find tha network transients are an issue, we can iterate.	08:12
wgrant	As long as it keeps attempting to retrieve it frequently...	08:12
spm	and if things are that bad; humans are involved. and we can set the state file; or 'script roll out' an updated state file	08:12
stub	I understand the 'what I did last time' argument, but the extra complexity makes diagnosis complex. I'd say keep it as simple as we can.	08:12
lifeless	stub: I agree.	08:12
spm	if that case, fail quiet; don't run.	08:12
stub	But if that means maintaining state, we can do that (the cronscript can remember its last invokation on the file system, in /var/run or somewhere.	08:12
lifeless	spm: I'd fail closed, don't run, and log the failure.	08:13
lifeless	spm: why do you say fail quiet ?	08:13
stub	spm: At the moment, if the config file cannot be found (404) we emit a DEBUG and enable. Any other errors, including syntax errors, we emit a ERROR traceback and enable.	08:14
spm	probably via some mechanisim that makes it easy to nagios alert	08:14
spm	lifeless: don't cronspam bombard == quiet	08:14
lifeless	spm: thats in tension with 'diagnosable'	08:14
lifeless	spm: logging to a file would be ok?	08:14
lifeless	spm: also remember that this is on 404s and syntax errors	08:15
lifeless	spm: so things are messed up if its happening at all	08:15
spm	lifeless: you're talking > 260 cron tasks. if we get a global fail, that's a LOT of spam to wade thru	08:15
spm	logging to disk is ok	08:16
lifeless	so, if we said:	08:16
lifeless	- on failure to get ini and parse, don't run.	08:16
lifeless	- log that to disk, not stderr	08:16
lifeless	- nagios should be monitoring those log files	08:16
lifeless	would that make sense to you?	08:16
spm	yup	08:16
lifeless	stub: to you ?	08:17
spm	* log that to disk, not stderr: like oops' perhaps in a vague handwavy way. known dir; date time stamped; we can nagios alert on files between 0-60 mins old type of thing	08:17
spm	setup a red button "archive and zot the cron logs" so the "all fixed" is not as painful	08:18
stub	Seems a little fragile relying on nagios like this. It is looking for an error rather than checking something is reacting correctly.	08:18
spm	stub: how so?	08:18
stub	We are relying on the cronscript to log things correctly and ...	08:19
stub	oh.. hang on.	08:19
spm	being ware of assumptions: I'd assume the apache/configs setup is monitored	08:19
stub	We already alert if scripts stop running.	08:19
spm	only via scriptactivity, but yes.	08:19
spm	it's arguable if that should be nagios'd. atm I'd be vehmently against it.	08:20
stub	So we could just disable silently (DEBUG or INFO - whatever) and rely on the existing checks to beep if things remain screwed up for too long.	08:20
lifeless	works for me	08:20
spm	that works	08:20
lifeless	as long as someone coming along to look can look at a file on disk to see whats up.	08:20
spm	I'm only aware of one script that really should have a nagios check against it - branch-merge-proposals	08:20
lifeless	wgrant:	08:21
lifeless	SQL time: 10701 ms	08:21
lifeless	Non-sql time: 4505 ms	08:21
lifeless	Total time: 15206 ms	08:21
lifeless	Statement Count: 536	08:21
spm	it's timely, and is also requiring human intervention on fail	08:21
stub	I think the 'too much spam to wade though' indicates too many scripts are emitting their logs via email. Perhaps they should log to file instead of stdout/email and losas look on disk when the alerts ping.	08:22
lifeless	stub: mthaddon wants that	08:23
lifeless	stub: for all basically	08:23
* StevenK kicks webservice.get()		08:23
spm	I think pretty much every LP scripts logs to STDERR by default, which is known as Doing It Wrong	08:23
StevenK	First call works, second call returns AttributeError: 'thread._local' object has no attribute 'interaction' :-(	08:23
stub	spm: There is --logfile available right now to log elsewhere.	08:23
lifeless	StevenK: login()	08:23
spm	stub: I'm sure we've used that elsewhere and it doesn't quite work as described...	08:24
StevenK	lifeless: But why does the first call to webservice.get() work fine?	08:24
lifeless	you're already logged in	08:24
StevenK	And how does that log me out?	08:24
lifeless	end of the request	08:24
lifeless	feel free to clean this up	08:24
stub	spm: That would be a bug (not surprising as nobody ever used --logfile after it got implemented 5 years ago)	08:24
spm	:-)	08:25
stub	--log-file=LOG_FILE Send log to the given file, rather than stderr.	08:25
spm	Ahh. That's not helpful as is. what you want is all "normal output" to go to stdout, or the above option. any real errors that REQUIRE manual intervention get sent to STDERR.	08:26
spm	atm we get craploads of "INFO" messages or "CRITCIAL ERRORS" that are nothing of the sort, sent to STDERR	08:26
lifeless	spm: icing on edge is borked	08:26
poolie	stub: yeah we were just talking about this	08:26
poolie	lifeless: me too	08:26
lifeless	spm: we haven't got that part of the deployment right yet	08:27
stub	spm: ok. That is a change, but we could do it globally for all scripts at once. The desired behaviour will need to be documented (bug report?)	08:27
spm	lifeless: bleh. edge is autodeploying atm	08:27
lifeless	yes	08:27
lifeless	spm: need a new RT ?	08:27
poolie	i'm also getting a 'The following errors were encountered:	08:27
poolie	Server error, please contact an administrator.	08:27
poolie	OK	08:27
poolie	'	08:27
spm	stub: i think this is a bug I logged about 2 years ago... :-)	08:27
poolie	in an ajax thing	08:27
spm	poolie: I can't do anything until you contact me per the above error. sorry.	08:27
stub	spm: Yes - I was expecting production scripts to all be run with -q	08:27
lifeless	poolie: thats possibly/probably the thing we're deploying to fix	08:27
poolie	:)	08:28
poolie	like a finely-oiled machine	08:28
* spm watches poolie hop on his bike to drive down here and slap me upside the head....		08:28
poolie	i would but it's a bit cold and wet	08:28
poolie	and probably doubly so down there	08:28
spm	"horrible" <== and not just saying to keep you away	08:29
lifeless	spm: so now I have 11542 with no icing	08:30
lifeless	spm: can you check the apache ?	08:30
spm	yarp	08:30
spm	hrm. supposedly we are 11542 everywhere	08:31
lifeless	yes	08:31
lifeless	but the icing ain't	08:31
spm	isn't this the build farkup we saw the other week?	08:31
lifeless	https://bugs.edge.launchpad.net/+icing/rev11542/combo.css	08:31
lifeless	spm: thats meant to be static, from apache	08:31
spm	le sigh	08:31
StevenK	lifeless: With webservice.get(), login(), webservice.get() I get newInteraction called while another interaction is active. for the second .get	08:31
lifeless	StevenK: odd	08:32
lifeless	StevenK: perhaps the get() isn't what was throwing.	08:32
lifeless	StevenK: perhaps you're actually tring to access something in between the ( and )	08:32
StevenK	lifeless: Of course I am	08:32
lifeless	don't	08:32
lifeless	calculate the url outside of the function call	08:32
lifeless	because ...	08:33
lifeless	accessing objects requires a participation	08:33
spm	oh awesome. that file doesn't exist on edge.	08:36
lifeless	say what ?	08:36
stub	lifeless: I'll land the branch I have with the abspath and maybe the timeout if it is simple, as it is still an improvement, and open bugs and kanban tickets on the next set of changes.	08:36
lifeless	stub: thanks	08:36
spm	lifeless: exactly that. the folder is there, that particular file (and possible some small number of others) aren't.	08:36
lifeless	spm: they are built by 'make compile'	08:36
lifeless	or possibly make build	08:37
spm	apparenetly not in this case....	08:37
spm	oh ffs. the make build blewup again.	08:39
lifeless	spm: does the deploy script abort when that happens ?	08:39
wgrant	Not my fault, this time, though!	08:39
spm	lifeless: https://pastebin.canonical.com/37132/	08:39
spm	the script can't - the make is continuing, so the deploy scripts doesn't know it's aborted	08:40
spm	(AIUI, IMBW)	08:40
lifeless	spm: filing an RT - it has to.	08:40
spm	No. I lie. It does see the error.	08:40
lifeless	make: *** [compile] Error 1	08:40
lifeless	Error 2 running ssh launchpad@banana make -C /srv/edge.launchpad.net/edge/launchpad build LPCONFIG=edge1	08:40
lifeless	Running ssh launchpad@banana "rm -rf /srv/edge.launchpad.net/edge/launchpad && ln -s /srv/edge.launchpad.net/edge/launchpad-rev-11542 /srv/edge.launchpad.net/edge/launchpad"	08:40
lifeless	its not halting !	08:40
spm	yeah...	08:40
spm	yeah x 3	08:41
spm	ok. later problem. reverting.	08:41
lifeless	spm: why was it 11542 that rolled out ?	08:43
spm	no idea atm	08:43
lifeless	ok, thats tip of stable	08:43
lifeless	fair enough (but wtf with the error)	08:43
spm	oki, apaches rolled back; doing the app servers	08:43
adeuring	good morning	08:43
spm	truely. it's supposed to abort on errors. we use this logic all over the place. And it works on other systems :-(	08:44
lifeless	spm: can you do 11538 which I think was previous with the patch applied; and we may need to stop other edge rollouts till we fix.	08:44
spm	heya adeuring	08:44
spm	lifeless: that I have/am	08:44
spm	lifeless: launchpad@banana:/srv/edge.launchpad.net/edge$ rm launchpad ; ln -s /srv/edge.launchpad.net/edge/launchpad-rev-11538 launchpad	08:44
lifeless	spm: I bet its python2.5	08:46
spm	lifeless: shrug, I just blame wgrant for everything. faster, easier, if less accurate	08:46
lifeless	spm: can you do this on a machine thats still 2.5 ?	08:46
wgrant	But it was me (and python2.5) last time this happened.	08:46
wgrant	So it's quite accurate.	08:46
spm	haha. lets not let FACTS get in the way here!!!!	08:47
lifeless	spm: find . -name 'potemplate.py'	08:47
spm	let me finish getting the apps restarted :-)	08:47
lifeless	spm: then, for each reported file, cd to that dir and run 'python -c 'import potemplate'	08:47
spm	edge3 done	08:47
spm	2010-09-14 07:48:01 WARNING SIGTERM failed to kill launchpad (7487). Trying SIGKILL <== yay. it's back! wooo!	08:48
spm	edge4 coming back	08:49
spm	edge1 coming back	08:50
stub	Is network syslog loathed by IS?	08:51
spm	edge2 on the way back	08:51
spm	stub: i don't mind it; no idea about others tho	08:51
spm	edge1 & 4 being difficult and not working	08:52
spm	edge2 is fine	08:53
spm	edge1 being really painful and needing to be manually killed.	08:54
spm	edge1 stabbing successful; it lives	08:54
spm	retrying edge4...	08:54
spm	edge5 coming back	08:56
spm	edge4 lives	08:56
spm	edge5 lives; should be done. verifying.	08:56
spm	lifeless: have you logged a bug on this essplosion?	08:58
lifeless	spm: urls like this: https://bugs.edge.launchpad.net/+icing/rev11542/combo.css - how are they served.	08:58
lifeless	spm: I RT'd it	08:58
lifeless	spm: for the explosion part	08:59
spm	that's the continue, but not the root cause?	08:59
lifeless	spm: waiting on your python2.5 test for confirmation	08:59
spm	ah k	08:59
spm	lifeless: so to recap a bit back - CP'd to prod; not to edge.	09:01
spm	cowboyed not CP'd.	09:02
lifeless	spm: right, can we do edge 11538 cowboy, not 11542 cowboy ?	09:02
spm	I'll throw that to Tom I suspect	09:02
lifeless	ok	09:02
lifeless	rev 11542 looks like the bust one	09:03
lifeless	which is jtv's patch	09:04
jtv	?	09:04
lifeless	jtv: you appear to have used python 2.6 in lib/lp/translations/browser/potemplate.py line 971	09:05
* jtv looks		09:05
lifeless	jtv: look at spm's pastebin	09:05
jtv	Wonder what's wrong with it…	09:06
jtv	ah	09:06
jtv	lifeless: want me to write up a quick patch?	09:08
mrevell	Hello	09:10
jtv	hi mrevell	09:10
jtv	lifeless, spm: I'd fix it thusly: http://paste.ubuntu.com/493508/	09:10
lifeless	spm: ping	09:12
spm	lifeless: yo	09:13
lifeless	spm: need you to try applying jtv's patch to a 11542 dir (one of the failed ones)	09:13
lifeless	and see if 'make build' will then work.	09:13
lifeless	jtv: could you please do a few things for me, its getting on here.	09:13
lifeless	- file a bug that this is broken,	09:13
jtv	lifeless: speak	09:13
* jtv files bug		09:13
lifeless	- put your branch up for review etc etc - r=me to apply it, if spm confirms it works.	09:13
lifeless	- arraange for someone to let the LOSAs know when it lands in stable, so that edge updates can be reenabled.	09:14
lifeless	- (e.g. yourself, or your delegate)	09:14
jtv	On the way.	09:14
lifeless	I will mail the list about the process issue	09:14
spm	trying atm...	09:15
spm	try2 wit hright config...	09:15
jtv	bug 637868	09:16
lifeless	jtv: did you ec2 your test ?	09:17
spm	lifeless: I'd suggest that cowboy get's rolled in with jtv's fix and just a regular edge rollout rolled.	09:17
jtv	lifeless: not yet, not yet	09:17
lifeless	jtv: sorry, let me be more clear	09:17
lifeless	jtv: the patch that broke; did you land it:	09:17
lifeless	- by running all tests locally + pqm	09:18
jtv	ec2 land.	09:18
lifeless	- by ec2 land	09:18
lifeless	- ...	09:18
spm	jtv: lifeless: https://pastebin.canonical.com/37134/ looks good	09:19
jtv	spm: still looking good?	09:25
jtv	The codebase, I mean, not you. You'll always look good.	09:26
lifeless	spm: that looks good; thanks. jtv your patch fixes it.	09:27
jtv	BTW it's odd that this passed PQM, what with the pagetests exercising it.	09:27
lifeless	jtv: pqm doesn't run tests.	09:27
jtv	Sorry, buildbot.	09:27
lifeless	jtv: ec2 runs them, but its probaby running lucid	09:27
wgrant	buildbot is Lucid.	09:27
lifeless	buildbot has two separate jobs.	09:27
jtv	Well, we have lucid and hardy buildbot slaves.	09:27
lifeless	jtv: see my mail	09:28
* jtv will see mail		09:28
lifeless	I don't think bb requires both to be ok.	09:28
lifeless	but thats what we probably need	09:28
jtv	BTW should I MP the fix for stable or for devel?	09:29
lifeless	devel	09:29
jtv	OK. I branched off stable though just to be sure.	09:30
lifeless	now that edge rollouts are blocked, theres no panic (no reason to delay) but no panic.	09:30
jtv	lifeless: the MP is at https://code.launchpad.net/~jtv/launchpad/bug-637868/+merge/35373	09:31
wgrant	lifeless: ec2 has been running Lucid for a long time.	09:31
wgrant	I think this is about the third time things have broken.	09:32
lifeless	wgrant: definitely second.	09:32
lifeless	wgrant: bug 637854	09:33
wgrant	_mup_, I am disappoint.	09:34
wgrant	lifeless: At least it tries to prejoin.	09:34
lifeless	wgrant: ugh!	09:34
jtv	lifeless: so now I can land on devel as normal and just wait for the fix to percolate?	09:35
lifeless	yes	09:35
jtv	(I would appreciate a click on the button from you btw, to prove I didn't invent your approval :)	09:36
lifeless	I did	09:36
jtv	Oh! The MP just timed out for me is all	09:36
jtv	Thanks.	09:36
lifeless	you need to let mthaddon know when its good to go on stable	09:36
lifeless	jtv: interesting, what OOPS id ?	09:36
jtv	I already ran the applicable pagetests through ec2… guess a full EC2 run makes no sense here.	09:36
jtv	lifeless: I don't know; focused on fixing my bug, so just reloaded	09:37
bigjools	http://www.workswithu.com/2010/09/07/measuring-the-value-of-canonicals-launchpad/	09:52
bigjools	there's a certain person posting comments on that one	09:53
wgrant	Let me guess...	09:53
wgrant	remarkable!	09:53
spiv	wgrant: you're psychic, clearly.	09:55
wgrant	He does have some good points, as usual.	09:59
lifeless	and they are clearly unbiased	10:04
lifeless	which is refreshing	10:04
bigjools	shame it's the same sound of that grinding axe	10:06
jtv	Speaking of grinding axes…	10:10
jtv	The builds-list.pt template is supposed to work for any BuildFarmJob but it tries to access build/dependencies. :(	10:11
jml	"No longer needed: Python 2.5"	10:12
wgrant	jtv: I'm glad you're completing the generalisation for us :P	10:12
jtv	wgrant: remember that axe I mentioned just now? Kindly insert it into one of your feet.	10:12
wgrant	Ow.	10:13
jtv	Good.	10:14
jtv	And thank you.	10:15
* wgrant limps away viciously.		10:15
bigjools	jtv: then return None	10:17
bigjools	you don't have any dependencies	10:17
wgrant	It's not on the interface.	10:17
wgrant	Assuming it is illegal.	10:17
jtv	No, that's the nasty bit.	10:17
jtv	It's in IPackageBuild.	10:17
jtv	So _implementing_ it in BuildFarmJob or BuildFarmJobDerived or my own specific buildfarmjob class isn't enough.	10:17
jtv	It needs to move into IBuildFarmJob, which is uglier than a Windows desktop.	10:18
bigjools	at least Windows has sound that works	10:18
jtv	To name some arbitrary example of extreme ugliness.	10:18
jtv	Yes, Windows often has working sound, so you can _hear_ the "I don't know this codec" error instead of just seeing it.	10:19
jtv	But we digress.	10:19
jtv	Here we are chattering about operating systems when wgrant's foot is oozing virtual blood.	10:20
jtv	Help the man, for God's sake!	10:20
wgrant	bigjools: Your sound isn't working?	10:21
wgrant	I think mine might be slightly crackly on Maverick.	10:21
wgrant	But it works mostly.	10:21
bigjools	maverick re-installed pulseaudio	10:21
jpds	jtv: He's in AU, dude.	10:21
bigjools	pulseaudio is a crock of shit	10:22
jtv	jpds: pronounced "Ow!!!"	10:22
wgrant	bigjools: WFM!	10:22
wgrant	Although I could be distracted by my poor, poor foot.	10:22
bigjools	it's insisting on using my laptop instead of my headset's mic	10:23
jtv	wgrant: nice save	10:23
bigjools	I've no idea how to make it do what I want instead of what it wants	10:23
jtv	Meanwhile, I just managed to insinuate a TranslationTemplatesBuild into a builder history!	10:23
wgrant	bigjools: Kubuntu doesn't have a nice control panel for it?	10:23
wgrant	jtv: But does it crash?	10:24
jtv	wgrant: no. Not that anyone'd notice: I don't see anything in the build code that would put a BuildFarmJob into the right state to show up there.	10:24
wgrant	Hm?	10:24
bigjools	wgrant: I dunno, what is it in Gnome?	10:24
jtv	bigjools: a crock of shit. Trick question.	10:25
wgrant	bigjools: The sound preferences thing (in the volume indicator's menu) has radio buttons for the input device.	10:25
* bigjools hears 2 drums and a cymbal falling off a cliff		10:25
allenap	Hi jml, thanks for looking into my Zope befuddlement. I think mwh's reply has now hit the nail on the head, so I'll wikify that and reply to the list.	10:25
jtv	AFAICS the Builder still selects and dispatches a BuildQueue object, not a BuildFarmJob. How's it ever going to update BuildFarmJob.{status,date_started,date_finished}?	10:26
bigjools	finally it works	10:27
jml	allenap, cool.	10:27
wgrant	jtv: It's complicated.™	10:28
wgrant	But it works.	10:28
jtv	wgrant: well since there is absolutely currently nothing coupling my BuildFarmJobs to my BuildQueues, I don't see how it can.	10:28
bigjools	so, when using pulse with skype, how do I make it ring the PC speakers and not in the headphones, which I might not be wearing? :/	10:30
wgrant	jtv: I believe it's handled by the IBuildFarmJobBehavior.	10:30
wgrant	jtv: IIRC you override updateBuild_WAITING.	10:30
wgrant	The default calls handleStatus on the build.	10:31
jtv	I think we already override that.	10:31
wgrant	I mean you do currently.	10:32
wgrant	But you should probably stop.	10:32
jtv	Ah	10:32
jtv	That could hurt.	10:32
wgrant	It will.	10:33
jtv	Hand me that axe, will you?	10:33
jtv	Thank—eewwww, there's blood all over the blade	10:33
jtv	Anyway, for now <puts axe aside> I guess it's enough to get display working and then next we can focus on tying the BuildQueue and the TranslationTemplatesBuild together.	10:34
bigjools	wgrant: the discussion in https://bugs.launchpad.net/bugs/635103 is a little over my head at the moment, do you know why it's not working for him yet fine in Ubuntu?	10:46
wgrant	bigjools: He wants to not have to download and upload the whole thing.	10:47
wgrant	To do that we'd need an ia32-libs-specific hack, to support the conglomeration of horrid hacks that is ia32-libs.	10:47
wgrant	I was going to tell him to go away. But you're probably a better person to do it :P	10:48
bigjools	wgrant: possibly, but I don't understand what that package is doing	10:49
wgrant	bigjools: Do you really want to know?	10:49
bigjools	yep	10:49
wgrant	Well.	10:49
wgrant	There is a reason that the source packages is 700MB.	10:49
wgrant	It contains approximately an awful lot of packed source packages.	10:50
wgrant	It builds on amd64, but builds them for i386.	10:50
wgrant	Er, wait, no, it includes the binaries too.	10:51
wgrant	So it doesn't build them.	10:51
bigjools	Oo	10:51
wgrant	It extracts the i386 binaries, and produces a big amd64 ia32-libs binary containing all of them.	10:51
lifeless	fooooooogly	10:51
wgrant	So you have this huge source package contain dozens or hundreds of sources and binaries from the archive.	10:51
wgrant	Er, yes.	10:51
bigjools	dot com	10:51
wgrant	It will be rendered obsolete by multiarch.	10:52
wgrant	But multiarch hasn't happened yet.	10:52
lifeless	multiarch was 'coming' when we -started-	10:52
bigjools	dem's de magic words	10:52
wgrant	lifeless: it's seriously in development now, though.	10:52
wgrant	Some of the work has been done in the last year.	10:52
wgrant	Hell, even NMSP is happening.	10:52
lifeless	wgrant: I know.	10:52
wgrant	And derivative distros.	10:52
wgrant	This is incredible.	10:53
bigjools	I can only lever so much of the planet with the team size I have	10:53
jml	bigjools, well, to be true to the metaphor, you just need a better place to stand	11:02
bigjools	jml: I'll jump higher :)	11:03
bigjools	jml: when are you arriving here BTW?	11:06
jml	bigjools, Sunday. Let me check my ticket.	11:06
bigjools	you bought train ticket in advance? gosh :)	11:07
bigjools	insert preposition as required. Sigh.	11:07
jml	I think you mean "article", and yes I did.	11:08
jml	it's hard to get out of the habit of booking travel in advance	11:08
wgrant	Is this for the buildd-manager attack session?	11:09
jml	indeed it is.	11:09
wgrant	Excellent.	11:09
jml	bigjools, anyway, I'll be taking an afternoon train, probably the 1442	11:11
bigjools	jml: ok well when you get ensconced in the pub, gimme a shout and I'll pop over for a pint	11:13
jml	bigjools, will do.	11:14
wgrant	bigjools: I've received lots of complaints in the last few days that builds keep getting redispatched.	11:14
wgrant	Even on non-virt builders.	11:14
bigjools	jml: there should be taxis at the station but if there are not let me know and I'll come and pick you up	11:14
jml	bigjools, thanks.	11:15
bigjools	wgrant: the UI is a lie	11:15
bigjools	the early commit, is not :/	11:15
wgrant	Hm?	11:15
lifeless	jml: can you do me a favour?	11:15
wgrant	Even if it is committing before it confirms successful dispatch, why is the dispatch not successful?	11:16
jml	lifeless, quite possibly.	11:16
bigjools	what's happening is that we mark the build as running before it's completely dispatched. If there's a comms error then it looks like it gets re-dispatched after the next builder picks it up	11:16
lifeless	jml: my pqm-landed (nonec2) branch has a test failure	11:16
lifeless	https://lpbuildbot.canonical.com/builders/lucid_lp/builds/139/steps/shell_7/logs/summary	11:16
lifeless	jml: its -extremely- shallow.	11:16
lifeless	jml: (add the missing tuple)	11:16
wgrant	bigjools: But there shouldn't comms errors :(	11:16
wgrant	+be	11:17
lifeless	jml: however the fix needs to be done to production-devel too.	11:17
bigjools	wgrant: we don't live in a perfect world	11:17
lifeless	jml: before the oops fix can be uncowboyed.	11:17
bigjools	routers drop out, DC engineers kick cables	11:17
bigjools	etc	11:17
lifeless	jml: yes/no ?	11:17
wgrant	bigjools: Over 20 minutes?	11:17
jml	lifeless, you'd like me to land the fix for you?	11:17
* gmb hates at typos in sampledata		11:17
gmb	'testible' indeed.	11:18
lifeless	jml: yes, on devel and production-devel	11:18
jml	gmb, it's a pun!	11:18
bigjools	wgrant: yes, that's the interval I see because of the bad scaling	11:18
lifeless	jml: its 22:20 here, more or less	11:18
gmb	jml, I noticed that about half a second after pressing enter :)	11:18
wgrant	bigjools: Ahh, true, forgot that bit.	11:18
jml	lifeless, ok, will do.	11:18
lifeless	jml: thank you!	11:18
jtv	Are we going into testfix?	11:37
jtv	The lucid_lp buildbot just failed.	11:37
jtv	Is failing, rather.	11:37
jtv	lib/canonical/launchpad/webapp/ftests/test_adapter.txt	11:37
jtv	Line 305, in test_adapter.txt Failed example:	11:38
jtv	get_request_statements()	11:38
jtv	Differences (ndiff with -expected +actual):	11:38
jtv	- [] + [(0, 0, 'SQL-launchpad-main-master', 'SELECT 2')]	11:38
wgrant	Is that what lifeless was talking about above?	11:40
jml	wgrant, yes.	11:44
jml	I wonder why emacs is segfaulting for me.	12:01
thumper	jml: because it hates you :)	12:02
bigjools	it's telling you to use a real editor	12:04
deryck	Morning, all	12:04
bigjools	howdy deryck	12:04
jml	bigjools, yeah, you're right. I've had to revert to "emacs -nw"	12:04
jml	deryck, hello	12:05
bigjools	:)	12:05
thumper	jml: ?? whazzat?	12:05
jml	thumper, try it!	12:05
thumper	not right now	12:05
jml	thumper, ok, I give up, it's emacs in a terminal	12:05
thumper	I'm trying to right a talk	12:05
thumper	ah.. no windows, it's so obvious	12:05
bigjools	that's either a typo or a clever play on words	12:05
thumper	bigjools: which one?	12:06
jml	thumper, "righting" a talk.	12:06
bigjools	"right a talk"	12:06
* thumper hangs head		12:06
bigjools	lol	12:06
thumper	it is a typo	12:06
thumper	I'd like to be more cleverer	12:06
=== al-maisan is now known as almaisan-away
=== matsubara-afk is now known as matsubara
jml	I'm off for lunch & errands. Back later.	13:25
=== almaisan-away is now known as al-maisan
cr3	leonardr: when you have a moment, I would have a question for you about routes when exposing a restful interface with lazr	14:31
leonardr	cr3, sure	14:34
leonardr	routes as in the url traversal code?	14:34
cr3	leonardr: yes, how can a collection be contextual? for example, lets say LP had /me/bugs and /project/foo/bugs, where both person and project would implement IHasBugs, how should the "bugs" part of the url be defined?	14:35
leonardr	cr3: that's called a scoped collection, and lazr.restful traverses from 'leonardr' to bugs or from 'mozilla' to bugs by attribute access on the person or project object	14:36
leonardr	so leonardr.bugs is /~leonardr/bugs	14:36
cr3	leonardr: ah, so it must be defined as an attribute, I thought it might be ProjectNavigation in the browser layer or perhaps even using the Bag	14:37
leonardr	cr3: no, once you have identified a specific object all further traversal happens through attribute access	14:37
cr3	leonardr: would it make sense to have the IHasBugs define a "bugs" attribute?	14:38
leonardr	cr3: afaik, yes	14:39
cr3	leonardr: but if IHasBugs has a searchBugs method already which should essentially behave like the bugs attribute, given no parameters, then wouldn't searchBugs and bugs look a lot the same?	14:40
cr3	leonardr: my concern is that every class implementing IHasBugs would essentially have to do something like: @property; def bugs(self): return self.searchParams();	14:42
leonardr	cr3: well, you don't _have_ to put 'bugs' in IHasBugs if different implementations get the bugs differently	14:43
leonardr	but, i have two things to say on top of that	14:44
leonardr	oh, neverm ind, you're saying that all the IHasBugs feature 'bugs'	14:44
leonardr	be that as it may, /bugs is better for the end-user than ?ws.op=searchBugs	14:45
leonardr	however, there's nothing to be done about that for now	14:45
leonardr	my next project will include things like	14:46
cr3	leonardr: I was mostly using IHasBugs as an example for a collection which might be implemented by more than one context	14:46
leonardr	cr3: sure, i know you're not really talking about IHasBugs. but i'm trying to deal with the situation as you posed it	14:47
leonardr	my next project will include features like the ability to designate a method as being "the method you call to generate a scoped collection"	14:47
leonardr	so you could tag searchBugs as the generator for /bugs	14:47
cr3	leonardr: I was grepping through lazr for the concept of an alias, like "bugs" is aliased to searchBugs or somesuch	14:48
henninge	sinzui: ping	15:01
sinzui	hi henninge	15:02
henninge	Hi!	15:02
henninge	I am a bit at loss at how to downgrade a package.	15:02
henninge	psycopg2 in this case	15:02
sinzui	henninge, do you have the deb?	15:02
henninge	sinzui: I have done that once or twice before but I could use a pointer, please ;-) ?	15:03
henninge	No, I was just searching for that.	15:03
* sinzui checks lp history		15:03
sinzui	henninge, download the deb from here: https://edge.launchpad.net/ubuntu/lucid/i386/python-psycopg2/2.0.13-2ubuntu2	15:04
sinzui	henninge, sudo dpkg -i --force-downgrade python-psycopg2_2.0.13-2ubuntu2_i386.deb	15:05
henninge	sinzui: thank you very much!	15:06
* henninge actually forgot to look on LP for the package ...		15:06
sinzui	henninge, i decided not to pin the version. I hold out some small hope that lp or psycho will resolve there differences. I downgrade after every update breaks lp	15:06
henninge	sinzui: yes, otherwise one might forget about the pinning and hit strange errors later ...	15:07
gmb	rockstar: So, deryck solved our JS wizard problem :)	15:52
rockstar	gmb, that's because deryck is awesome.	15:52
rockstar	gmb, what was the issue?	15:53
gmb	rockstar: Two things: 1) YUI auto-generates the CSS class names based on WIDGET.name - so the hidden class was yui3-lazr-wizard-hidden, which wasn't defined anywhere.	15:53
gmb	Also, widgets that aren't created as hidden can never be hidden.	15:53
gmb	(At least, that's how it behaves; I suspect theres a bug there)	15:54
rockstar	gmb, wait, how was it never created?	15:54
rockstar	gmb, and it's Widget.NAME, isn't it?	15:54
gmb	rockstar, Yeah, sorry, bad caps.	15:54
gmb	rockstar: Let me just check the patch deryck gave me so that I know I'm not BSing you.	15:54
gmb	rockstar: http://pastebin.ubuntu.com/493666/	15:55
rockstar	gmb, okay, so I had it defined as Wizard.NAME = "wizard"; so I don't know where the lazr comes from either.	15:55
gmb	rockstar: The wizard was created but by default visible was True.	15:55
gmb	At least, that's how it looks based on deryck's patch.	15:55
rockstar	gmb, okay.	15:55
rockstar	gmb, I'm not sure I understand the bottom patch though, to wizard.js.	15:56
gmb	rockstar: Yeah. lp:~gmb/lazr-js/wizard-widget/ contains deryck's fix and some further CSS fixes.	15:56
rockstar	gmb, I guess he's just demonstrating that there's missing CSS somewhere?	15:56
gmb	I don't know if you need to do more with it.	15:56
rockstar	gmb, well, it needs to get finished now. If it's firing events, it can probably start moving through steps now.	15:56
gmb	Well, I don't know. That seems to be related to the way the widget behaves... deryck, can you clarify exactly why your fix fixes?	15:56
deryck	rockstar, yeah, that's all. The CSS in use currently assumes the widget name is "lazr-formoverlay" and it wasn't set to hide by default.	15:57
rockstar	deryck, okay, so we can't reuse that name, so we need to just define yui3-lazr-wizard-hidden in the CSS then?	15:57
deryck	rockstar, yui3-NAME-hidden, where NAME is what you define in the widget. This is how all those CSS classes get built.	15:59
rockstar	deryck, yeah, so it should have been yui3-wizard-hidden	15:59
rockstar	deryck, and I thought I had defined that.	15:59
deryck	rockstar, right. There is a yui3-NAME for every class this widget descends from. But only the current NAME gets the hidden class added.	15:59
rockstar	deryck, yeah, okay.	16:00
deryck	rockstar, you did, but the CSS was not using it. And you couldn't tell because you weren't hidding by default with visible: false. So changes to the name had no affect.	16:00
rockstar	deryck, ah, that makes sense.	16:00
rockstar	So if it's not hidden by default, it can't be hidden again.	16:00
rockstar	That is, quite possibly, one of the silliest things I've ever heard.	16:01
gmb	rockstar: Yeah, I needed to mop the tea off my monitor when deryck told me.	16:01
rockstar	gmb, I'm digging in my junk drawer for a baby to punch as we speak.	16:01
gmb	...	16:02
deryck	well, no, not quite accurate	16:02
deryck	rockstar, wizard.render() shows the widget without the hidden class. Nothing in your code calls wizard.hide().	16:03
rockstar	deryck, the defaultCancel should have been doing it.	16:03
deryck	rockstar, I don't think so. That's only called after UI changes, AIUI.	16:03
deryck	at least, my reading of the code.	16:03
rockstar	deryck, no, it's called when the CANCEL event is fired. I know it was being called, because that's where the Y.log("aoeu") was.	16:04
rockstar	I think you and I both confirmed that the -hidden CSS class was getting set as well.	16:04
deryck	right, but that's not called on load.	16:04
rockstar	deryck, yeah, so I either have to hide by default and call .show() in the example, or call .hide() and then .show() on load.	16:05
deryck	CANCEL event is only fired by ESC or clicking away from the widget, no? I never saw "aoeu" until I did some action, not on load.	16:05
deryck	right	16:05
rockstar	deryck, I saw it when I clicked the cancel button.	16:05
rockstar	Yeah, okay. So what you're saying is what I'm understanding.	16:06
rockstar	Stupidness prevails.	16:06
rockstar	Thanks for sorting it out.	16:06
deryck	np :-)	16:06
rockstar	We need a page on the wiki of all the YUI gotchas.	16:06
deryck	yup	16:06
rockstar	We'll probably just forget the page exists and create it 4 more times, but whatever.	16:06
salgado	jcsackett, I was going to have a second look at your unknown-blueprints-service-597738 branch but noticed there's some discussion still going on, and I'm wondering whether you're going to do any more changes to it or if it's ready for a second look	16:07
jcsackett	salgado: i'm still working on it.	16:19
jcsackett	i actually needed to add an attr to the view, so i need to write some tests as well.	16:19
jcsackett	salgado: i'll ping you and sinzui when i've pushed changes for round 2.	16:21
gmb	rockstar: So - for my own clarity - are you now going to do further work on the wizard to make it do wizardly things properly?	16:22
rockstar	gmb, I _can_, but it sounds like it's blocking you, and I'd like to avoid that as much as possible.	16:23
gmb	rockstar, Right. That sounds fair. In that case, I'll get cracking on getting it doing what what we need and ping you if there are any further issues.	16:24
gmb	rockstar: Is there a specific bug or LEP that your work so far is tied to? I don't want to make something that doesn't do what you need it to do.	16:24
rockstar	gmb, does the overall design make sense?	16:24
rockstar	gmb, I had a kanban card with the work on it.	16:25
rockstar	(because we like to track our work in many different places)	16:25
gmb	rockstar: Yes. In fact, it's pretty much exactly what I had in mind for my hack, although that was less elegant :)	16:25
gmb	rockstar: Ah, cool. I shall go and find it.	16:25
salgado	jcsackett, ok	16:26
deryck	rockstar, hey, I think all the python was added to lazr-js for the testsing story. To hook up the yui-unittest stuff with zope test runner.	16:42
rockstar	deryck, I'm not so sure. Our testing story uses Java. It can be fired up from the shell.	16:45
deryck	ah, ok. Maybe not then.	16:45
deryck	I thought that was why. Why all the Zope packages then?	16:45
deryck	and storm and lazr.restful. god good y'all. ;)	16:46
rockstar	deryck, so, the testing story does use lazr.testing, but it doesn't need to.	16:52
rockstar	deryck, also, it could be used for the lazr-js testing, but not have to be distributed to our projects as well.	16:52
deryck	right	16:52
deryck	rockstar, just thinking more....	16:55
deryck	rockstar, some of what the egg provides us is the js lint stuff.... perhaps that could be broken out into it's own package.... separate the testing, python utils, and js file building stories a bit? Smaller simpler packages?	16:56
rockstar	deryck, yes.	16:56
rockstar	deryck, the problem is that we've tied ourselves to a buoy with no anchor, so we experience pain anytime we want to change anything.	16:56
deryck	right	16:57
deryck	yeah, maybe it's not easy to do a neat and clean separation.	16:57
rockstar	deryck, the launchpad build system is too closely tied to the lazr-js build system, which makes it exponentially more complicated.	16:57
jml	clearly you should attach a sail to your buoy	16:57
jml	or something.	16:57
rockstar	jml, engineering fail. :)	16:57
deryck	rockstar, it also makes adoption of lazr-js by any other web project outside Canonical difficult or impossible.	16:58
rockstar	deryck, exactly.	16:58
jml	rockstar, you're the one who's in pain and stuck to a buoy!	16:58
rockstar	jml, no, my boat is stuck to a buoy.	16:58
rockstar	jml, also, while the launchpad team does blow a lot of hot air, I think we want engines, not sails. :)	16:59
jml	rockstar, ok, as long as it's an electric motor	16:59
rockstar	deryck, I tried to use lazr-js this weekend. After about an hour, we gave up and used jQuery. :)	16:59
jml	with batteries charged by a wind farm.	16:59
rockstar	jml, yeah, because we also want to be environmentally responsible.	16:59
rockstar	Unfortunately, Google owns the wind farms, so we have to display Google Ads on our boat the whole time.	17:00
=== benji is now known as benji-lunch
=== matsubara is now known as matsubara-lunch
=== Ursinha is now known as Ursinha-lunch
jml	brb.	17:36
=== Ursinha-lunch is now known as Ursinha
mrevell	night all	18:06
=== matsubara-lunch is now known as matsubara
=== benji-lunch is now known as benji
=== leonardr is now known as leonardr-away
deryck	Some say money is the root of all evil, but it's really notifying subscribers in a web app request.	18:54
rockstar	deryck, :)	19:01
rockstar	deryck, sending mail in general...	19:01
=== leonardr-away is now known as leonardr
rockstar	james_w, why does Recipe.__str__ need to call .parse() ?	19:47
james_w	rockstar: it doesn't	19:48
rockstar	james_w, so I could write a patch that removes it, and you'd be happy with it?	19:49
james_w	rockstar: but it's a good way of ensuring that we are not generating malformed recipes in __str__	19:49
rockstar	james_w, shouldn't tests be enough?	19:49
james_w	clearly you have found a case where that breaks down, but I'm not convinced that it warrants getting rid of that	19:49
james_w	I would remove it if I could be convinced that there are many more cases where it isn't going to be a good thing	19:50
james_w	rockstar: yes, they /should/ be enough	19:50
rockstar	james_w, I can't foresee any other issues, but if I could foresee bugs, I would fix them before they became bugs.	19:52
james_w	yes	19:52
rockstar	james_w, I think that, for reads, we should trust that it creates them properly.	19:52
rockstar	james_w, if we have bugs, then we can deal with them.	19:52
james_w	why trust when you can verify?	19:52
rockstar	As it is, if __str__ ever creates a bad manifest, it'll explode on a user in Launchpad.	19:52
rockstar	james_w, I guess the question is "Is Launchpad bzr-builder's most common use case."	19:53
james_w	yes	19:54
rockstar	james_w, I'm all for verifying if it didn't raise an exception the way it does. I'd like it to warn and move on, but that would be difficult to look for in Launchpad.	19:54
james_w	bad manifest> explode how?	19:54
james_w	warnings> considered that, but who ever pays attention to warnings?	19:54
rockstar	james_w, yes, and warnings would be difficult to get out of a webapp.	19:55
james_w	yeah	19:55
james_w	well we don't have to use warn(), but still...	19:55
rockstar	james_w, you're verifying that functionality you wrote is working properly. That's noble and all, but if there were a bug, RecipeParser.parse() would raise a rather exception that isn't really the user's fault.	19:56
james_w	yes	19:56
rockstar	james_w, I think the best course of action would be to remove the parse in __str__. We could have different method be more strict, but having __str__ be that strict seems odd.	19:58
james_w	having it be sure to generate a valid recipe is odd?	19:59
rockstar	james_w, in __str__ I think it is.	20:00
rockstar	james_w, how 'bout this: Recipe.get_manifest() will always return a valid manifest or raise an exception, while __str__ just returns the manifest, valid or not.	20:02
james_w	why would you ever want to put an invalid manifest in the edit box?	20:02
rockstar	In Launchpad, we could then call Recipe.get_manifest(), and if it raises an exception, get the raw string.	20:02
rockstar	james_w, I think we have a valid reason to put an invalid manifest in the edit box.	20:03
james_w	if there is a bug that causes a round-trip to fail, then you will force the user to correct the error, then hit save, which will corrupt it again when displaying it	20:03
rockstar	We can catch the exception and add an error box that says "This recipe is totally broken. Please fix it."	20:03
james_w	but you just said yourself that this would only happen for bugs where it isn't the users fault	20:03
james_w	so don't we want an OOPS if they are bugs?	20:04
rockstar	james_w, in this case, it's kind of the user's fault, but we let them save the bad data, and now they can't fix it.	20:04
james_w	yes	20:04
rockstar	james_w, but we need a better migration path than oopsing on crap data.	20:05
james_w	but I'm now starting to think we should call it a bug and go back and rethink the fix	20:05
rockstar	james_w, I've been calling it a bug the whole time.	20:05
rockstar	Because to Launchpad, it is a bug.	20:05
rockstar	And I'm trying to solve it for both Launchpad AND bzr-builder.	20:05
rockstar	james_w, in this case, the current bug is caused by a bzr-builder bug getting fixed.	20:06
rockstar	The user used . as a directory, and that never worked in building. Now it fails earlier.	20:06
james_w	yes, but perhaps we should be saying that making the parser more strict without a format version bump is a bug, and should be fixed	20:07
rockstar	So the user entered bad data, but we accepted it.	20:07
rockstar	james_w, possibly. I suggested that to abentley, and he had a point that this was always bad data. It's just that it fails earlier now.	20:07
rockstar	james_w, and it never really affected users of bzr-builder itself like it did with Launchpad.	20:07
james_w	yes, it was always bad data	20:08
rockstar	james_w, it's just that now, the user has no way of getting to that data and fixing it themselves.	20:08
james_w	but there is a distinction between parse+build in the code, and perhaps trying to conflate them like that isn't the best idea	20:08
rockstar	So we need to teach Launchpad how to cope with crap data and encourage the user to fix the crap data.	20:09
abentley	james_w, I like validation. I think we should do more of it. For example, bogus revision specs.	20:09
rockstar	I'm not saying we shouldn't validate, but we should provide a path for coping with invalid data that doesn't require futzing with the database.	20:10
james_w	abentley: then please file bugs	20:10
abentley	I just think we should distinguish between "well-formed" and "valid", and allow parsing of recipes that are merely "well-formed".	20:10
abentley	james, there's already a bug about that.	20:10
james_w	rockstar: if the code had always been like this then the bad data could never get in the db. If we have a rule that more strict parsing would always result in a format bump then there are two ways we could get this problem in future:	20:11
james_w	1. a bug that means that the parser doesn't detect the problem in the first place	20:11
james_w	2. a bug in __str__ which is why the check is there	20:11
james_w	or I guess 3. that we want to make it stricter without a bump for some reason	20:11
rockstar	james_w, this change I'm proposing is STRICTLY for Launchpad sanity cases.	20:12
james_w	in either the first two cases then there is a bug that we want to know about	20:12
rockstar	james_w, have you seen https://bugs.edge.launchpad.net/launchpad-code/+bug/620868	20:12
rockstar	(this is the bug I'm addressing)	20:12
james_w	yes	20:13
james_w	abentley: I don't see a bug	20:14
rockstar	james_w, do you understand why that bug exists?	20:14
james_w	yes	20:14
rockstar	james_w, okay, there was a bug, #1 in your case. It was fixed.	20:15
rockstar	It caused existing data (that never really worked anyway) to cause oopses, but not provide a way for the user to fix it.	20:15
james_w	It's not #1 in my case	20:15
abentley	james_w, https://bugs.edge.launchpad.net/launchpad-code/+bug/592821	20:15
james_w	they were well-formed recipes before, just ones that were never going to work	20:16
james_w	abentley: ah, so not on bzr-builder	20:16
abentley	james_w, right. It doesn't have to be done there.	20:17
james_w	why not do it there?	20:17
rockstar	abentley, I'm not sure I see how your issue and mine go together her.	20:17
rockstar	s/her/here	20:18
abentley	james_w, because I don't know whether such a check is useful to bzr-builder. Because if it is useful, I don't know why it's not there.	20:19
james_w	because I never thought of checking?	20:19
abentley	james_w, because the set of valid revision specs can vary, and maybe you don't want to get into that.	20:19
james_w	exactly	20:19
james_w	and maybe Launchpad shouldn't either?	20:20
rockstar	james_w, in the case of your #1, yes, the parser wasn't detecting the problem, and now it is (with a newer bzr-builder)	20:20
abentley	james_w, we can guarantee that it won't vary between our appservers and our builders if we choose to.	20:20
rockstar	Because it wasn't detecting the problem, the invalid recipe made it into the database.	20:21
rockstar	Now, with a new bzr-builder, it finds the bad data and oopses.	20:21
abentley	rockstar, they go together because they are both issues of validation, where an incorrect value was put into a recipe field.	20:21
rockstar	Launchpad needs a way for users to deal with bad data that made it into the data (however that happens) and allow them to change it.	20:22
rockstar	abentley, okay.	20:22
rockstar	james_w, so I'm proposing a change that would help Launchpad by providing a better interface the bzr-builder's Recipe class.	20:23
rockstar	Launchpad needs to be more robust. We can validate 'til the cows come home, but if we don't give users a way to deal with invalid data, then we've only made things worse.	20:24
james_w	rockstar: I understand that, but I am looking to explore the issue in a little more depth. There's little point in asking the user to correct the problem if the problem was caused by us.	20:24
rockstar	james_w, the problem was caused by us, only in the fact that we let them put bad data into the database, but that would never actually succeed.	20:25
rockstar	james_w, format bump or whatever needs to happen, I'm happy with. My big concern, however, is that the user's don't suddenly find out their recipe is broken by finding an oops where their recipe used to be.	20:26
* rockstar should probably go eat something, so he stops this egregious use of apostrophes.		20:26
james_w	rockstar: I agree with you that they should be able to fix bad data that they put it	20:27
rockstar	james_w, the patch I propose would do that.	20:27
james_w	rockstar: I'm arguing that doing this across the board leads to us possibly asking users to "fix" perfectly valid recipes due to bugs in bzr-builder	20:27
rockstar	james_w, maybe. I'm less concerned with that at this point.	20:28
james_w	so I'm looking for ways for us to separate the two things such that we can ask them to fix bad data, while apologising for bugs and fix them	20:28
rockstar	james_w, I will always apologize to the user. The fact that we're just now telling them they're wrong is a no-no on our part.	20:29
james_w	all of the examples given so far are recipes where we can perfectly understand the intent, they just aren't going to work	20:32
james_w	so as Aaron said, splitting well-formed and valid might make sense	20:32
rockstar	james_w, which is what I'm proposing.	20:33
james_w	rockstar: at one level, yes, but we can push it deeper than the patch you are suggesting	20:33
rockstar	james_w, here's my patch: http://pastebin.ubuntu.com/493798/	20:33
james_w	yes, I perfectly understand the change you are proposing	20:34
rockstar	james_w, I don't see necessity for going any deeper than that. If I can catch the exception and somehow say "Hm, this is what you had, but for some reason bzr-builder doesn't think it's valid anymore." then I'm happy.	20:35
james_w	sure you are	20:36
rockstar	james_w, I'm not sure how much more apart "valid" and "well-formed" you want.	20:37
james_w	a way of saying to the user "your recipe is well-formed, but these things are likely to be problems:"	20:37
james_w	at any time	20:38
james_w	then we can make the parser more "strict", without causing issues like this, provide better assistance to the user, and still have validation that what we create is at least well-formed	20:40
rockstar	james_w, yeah, I have no opinions on the overall architecture of bzr-builder. I would just like something that works today. If there's a bigger picture, great.	20:48
james_w	this has an impact on LP too though	20:48
rockstar	james_w, I think it does. I think it'd be great for user's experience.	20:48
james_w	it's about user-experience, so I won't let you wash your hands of it as lying outside of LP ;-)	20:48
rockstar	james_w, however, right now, the user's experience is "WTF? Why can't I get to my recipe?" That's my big concern now.	20:49
james_w	sure, but it's only two recipes?	20:49
rockstar	james_w, yes, but that's two more oopses that we don't need.	20:50
james_w	the change you propose is a "narrow" interface to likely problems (it can only report one), and it has a poor API to use it everywhere	20:50
lifeless	morning	20:50
james_w	sure, it's just not a stop-the-line issue IMO	20:50
lifeless	so the OOPS comes from where?	20:50
james_w	so if we can we should come up with an API that nicely gives us the better experience and implement that	20:51
james_w	lifeless: https://bugs.edge.launchpad.net/launchpad-code/+bug/620868	20:51
lifeless	so maybe I'm confused	20:52
lifeless	we used a plain text field to store the recipe didn't we?	20:52
abentley	lifeless, no.	21:00
abentley	lifeless, we store the recipe in object form.	21:00
lifeless	sourcepackagerecipedata ?	21:03
abentley	lifeless, yes, and the SourcePackageRecipeDataInstructions that refer to it.	21:04
lifeless	ok, I see	21:04
lifeless	thanks	21:04
lifeless	I was wondering if it would make sense, when the recipe is invalid to still permit it to be edited	21:04
lifeless	until it becomes valid.	21:04
abentley	lifeless, that is what we want to do.	21:05
lifeless	then we can handle a wider range of unexpected things like this	21:05
lifeless	abentley: awesome	21:05
abentley	lifeless, The problem is that we can't stringify the invalid recipe.	21:05
lifeless	what do we we use stringification for ?	21:05
abentley	lifeless, because bzr-builder checks for validity when it stringifies a recipe.	21:05
abentley	lifeless, we use stringification for displaying the field to the user so that they can edit it.	21:06
lifeless	does this mean that we can't show the user the invalid recipe	21:06
abentley	lifeless, yes.	21:06
lifeless	I see, certainly not going to help things along ;)	21:06
lifeless	and it helps me understand the chat you were having - thanks.	21:06
rockstar	abentley, http://pastebin.ubuntu.com/493798/	21:48
=== al-maisan is now known as almaisan-away
=== ajmitch_ is now known as ajmitch
wallyworld_	morning	22:19
abentley	thumper, http://pastebin.ubuntu.com/493848/	22:20
lifeless	Ursinha: hi	22:23
lifeless	https://bugs.edge.launchpad.net/launchpad-registry/+bug/615237	22:23
lifeless	oh, I see whats up	22:25
lifeless	Ursinha: the ec2land stuff	22:26
lifeless	that gets bugs from an MP, does it get it from the MP, or the branch ?	22:26
mwhudson	lifeless: the mp gets bugs from the branch, unless i'm missing context horribly	22:27
lifeless	mwhudson: so I use the same branch for domain-fixues	22:27
lifeless	mwhudson: mps' show bugs already fixed in earlier MP's	22:27
lifeless	mwhudson: but I need to know which -precisely- the ec2 land code uses, or where to find that code, to make it stop including fix-committed and fix-released bugs in the bugs list in the pqm mail.	22:28
mwhudson	ah right	22:29
mwhudson	i expect the ec2 land code isn't that impentrable...	22:29
lifeless	indeed	22:30
lifeless	its blatting a growing number of bugs every time i land	22:31
Ursinha	lifeless, the branch	22:34
lifeless	Ursinha: hi	22:34
lifeless	https://bugs.edge.launchpad.net/launchpad-foundations/+bug/638468	22:34
* Ursinha looks		22:34
Ursinha	lifeless, don't know if that works	22:35
lifeless	Ursinha: 'that' ?	22:36
Ursinha	lifeless, sorry, let me elaborate	22:36
Ursinha	lifeless, problem is many times people mention bugs that weren't properly fixed, but had code landed, so bugs that are fix committed or fix released	22:36
Ursinha	so for that to work we'd need to ensure that all bugs that have fix to land are !fix committed/released	22:37
Ursinha	otherwise we'll start missing things	22:37
Ursinha	my thoughts would be: create another branch	22:37
Ursinha	than that won't happen	22:38
lifeless	Ursinha: hang on a sec.	22:38
lifeless	Ursinha: ec2 land will error if there are no valid bugs right ?	22:38
Ursinha	lifeless, if there are no bugs and it's not no-qa, yes	22:39
lifeless	right	22:39
lifeless	so, if someone has references only fix-committed and fix-released bugs	22:40
lifeless	they will get an error	22:40
lifeless	and in that errror you could say (bug X Y and Z are also linked on the branch but are fix committed/fix released)	22:40
Ursinha	lifeless, what will you do if you're trying to land a branch which is already linked to a bug which is fix committed, but that's what you want to do? are you going to unlink the bug?	22:40
Ursinha	I don't like this idea	22:40
lifeless	Ursinha: if the code I'm landing is needed to fix that bug, its not really fix committed is it ?	22:41
Ursinha	lifeless, it can be, but qa-bad	22:41
Ursinha	fix committed == there's a fix for that bug (working or not)	22:41
Ursinha	in progress == fix is in progress (it might have incremental fixes but the whole fix isn't committed yet)	22:42
Ursinha	s/committed/landed	22:42
lifeless	qa-bad should imply in-progress or triaged	22:42
lifeless	fix committed isn't 'commit in the tree' its FIX in the tree.	22:43
Ursinha	right	22:43
lifeless	I think that when something is bust we should make the bug be in-progress again	22:43
lifeless	just like --incr	22:44
Ursinha	lifeless, manually?	22:44
lifeless	we could automate it	22:44
lifeless	but sure, manually.	22:44
lifeless	qa-bad + fixreleased makes no sense.	22:45
lifeless	qa-bad + fixcommitted also makes no sense.	22:45
Ursinha	qa-bad is added by the devel; we could a) change it manually in the same time we're adding the qa-bad tag or b) make the bot to check if there are qa-bad and change it to in progress again	22:45
lifeless	Designing other workflows around nonsensical states is not going to work well.	22:45
lifeless	Ursinha: I like both a) and b)	22:45
Ursinha	lifeless, ok. what to say about branches that have several bugs linked, and some of them are already released	22:45
Ursinha	?	22:45
lifeless	just list the other bugs.	22:45
Ursinha	why are people reusing branches?	22:46
lifeless	Ursinha: convenience; clarity; organisation.	22:46
Ursinha	lifeless, well, I think we're trying to make the scripts to workaround a situation that could be avoided by just not reusing branches	22:48
Ursinha	and the problem is that the script already tries to workaround some behaviors to create consistency	22:48
Ursinha	I think the scripts will get more and more confuse because of that, but if you think this is really worthy, I can do that	22:48
Ursinha	adding a mechanism to tagger to set qa-bad bugs to in progress isn't hard	22:49
lifeless	It makes my dev environment a lot easier to manage.	22:49
lifeless	I have 'librarian' for the librarian, 'registry' for registry, 'oops' for oops etc	22:49
lifeless	sometimes I have bug-X branches when I have multiple things in flight : but the whole kanban + RFWTAD workflow is about removing the need for parallel-tasking.	22:50
Ursinha	lifeless, what are you going to do about the fix committed bugs linked to your branches, when you land a new fix?	22:50
lifeless	Ursinha: I don't understand the question.	22:50
lifeless	if its fix committed there is no more work to do on the bug.	22:51
lifeless	landing a new fix suggests its not fix committed.	22:51
Ursinha	lifeless, that's not what I see following bugmail	22:51
lifeless	thats a contradiction	22:51
lifeless	Ursinha: can you point me at some examples?	22:51
Ursinha	lifeless, I'd have to do some gardening in my bugmail	22:52
Ursinha	lifeless, we can try that. the idea is to make ec2 land ignore fix committed bugs?	22:52
Ursinha	or error on them?	22:52
* wallyworld_ off to doctor appointment		22:52
lifeless	Ursinha: ignore fix(committed\|released) bugs	22:54
Ursinha	lifeless, one case that came to mind now is, bug fix released but not really fixed. not tagged qa-bad or -needstesting, but has new fix	22:54
Ursinha	what to do in this case?	22:54
Ursinha	I saw that happen a few times after releases	22:54
lifeless	so, something has been made better	22:54
lifeless	but not good enough	22:54
lifeless	with the QA workflow using bugs to permit commits to trunk to be deployed.	22:55
lifeless	we need a new bug for the QA workflow.	22:55
lifeless	don't we?	22:55
Ursinha	not sure what you mean	22:55
lifeless	so in this scenario:	22:55
Ursinha	I guess ec2 land should check all bugs, see only the !fix committed/released and if no bugs left, error	22:55
lifeless	- bug X	22:55
lifeless	- branch lands that 'fixes X'	22:56
lifeless	we QA it - bugX [qa-ok]	22:56
lifeless	we deploy	22:56
lifeless	- bugx [FixReleased]	22:56
lifeless	then we realise its still timingout (for instance)	22:56
lifeless	thats your scenario, right ?	22:56
Ursinha	right	22:57
lifeless	now, there are two places we might find this	22:57
lifeless	firstly, we might notice in QA	22:57
lifeless	and secondly we might notice after deploy	22:57
lifeless	if we notice in QA, because its 'better' we don't need to stop the deploy.	22:57
lifeless	so any solution to this needs to cater for noticing in QA.	22:57
lifeless	if we notice in QA, we could se the bug to qa-incremental (is that right?)	22:58
Ursinha	hm	22:58
lifeless	so In-progress status, and 'branch is ok to land'	22:58
Ursinha	if we notice in qa, so it's qa-untestable	22:58
Ursinha	or qa-bad if devel thinks the fix is going to bork prod. if rolled out	22:59
lifeless	https://dev.launchpad.net/QAProcessContinuousRollouts#We can QA the branch, and it is an incremental step towards the fix of one or more bugs	22:59
Ursinha	lifeless, but only if you landed the first fix as --incr	22:59
Ursinha	if you know previously that the fix might not be the last one, than land it as incremental and all of that will be done automatically	23:00
lifeless	Ursinha: So I guess I'm saying 'what you describe is us realising that the fix was incr, even if we didn't say it was'	23:00
Ursinha	bug in progress, qa-untestable	23:00
Ursinha	yes, I see that	23:00
lifeless	so the right thing to do, whether we notice in QA, is to set the bug status in the same way.	23:00
Ursinha	I'm saying there's room for tweaking things :)	23:00
lifeless	yeah	23:00
lifeless	and so if we set th ebug status the same way	23:01
Ursinha	qa-untestable, in that case	23:01
Ursinha	otherwise we'll be blocked	23:01
lifeless	we'll set it to in-progress, qa-untestable(or-qa-ok I guess for the testable-incremental-case)	23:01
Ursinha	doesn't matter for the script, both are "go-for-it"	23:02
Ursinha	I like qa-ok best	23:02
lifeless	right	23:02
lifeless	and ec2 land would correctly include this bug in the later landing	23:02
lifeless	because its in-progress	23:02
Ursinha	right	23:02
lifeless	that seems to work to me	23:02
Ursinha	lifeless, right. I'll update the wiki page and let you know	23:05
lifeless	Ursinha: thanks!	23:05
Ursinha	I don't like the way the theme in dev.lp.net separates the sections of the page	23:16
Ursinha	it's kinda hard to read	23:16
lifeless	yeah	23:17
lifeless	its rather awkward	23:17
lifeless	mbarnett: nevermind,m just BB flakiness	23:18
mars	lifeless, reading backscroll, looking sadly upon the BB waterfall - did you already pass the BB restart work along?	23:23
lifeless	mars: restart work ?	23:27
mars	lifeless, re: your "BB flakiness" comment	23:27
lifeless	mars: well I haven't debugged deeply, for clarity	23:28
mbarnett	lifeless: hehe.. kk	23:28
lifeless	mars: but the most recent build was for an older rev	23:28
lifeless	mars: so it hadn't tried tip of prod-devel	23:28
lifeless	(and it was nearly 12 hours old that the fix landed in prod-devel)	23:29
mars	lifeless, looking at the waterfall BB is completely hosed right now :(	23:30
mars	so I need to figure out what to tackle first	23:30
lifeless	mars: the machine gun ?	23:30
lifeless	mars: parhaps bring out the 'restart the world' card?	23:31
=== matsubara is now known as matsubara-afk
mars	lifeless, we used that card earlier today - I am worried	23:31
lifeless	oh	23:31
lifeless	that is a concern	23:31
mars	and lp and db_lp have been down for a week	23:32
lifeless	we has several CPs pending	23:32
mars	:(	23:32
lifeless	cannot rename, ubuntu bug uploads, and OOPS generation	23:32
mars	ok, so we need to get lucid_lp and prod up first	23:34
mars	mbarnett, I forced the lucid_lp build. If the build does not start in, say, 10 minutes, you will have to restart the build slave.	23:35
mbarnett	mars: kk	23:35
mars	ok, prod_lp is offline for some reason (is it an EC2 slave?)	23:36
lifeless	the log shows 'substantiating'	23:37
lifeless	so I'd say yes	23:37
mars	checking master.cfg	23:37
mars	yes, EC2Latent	23:38
mars	lifeless, you said prod_lp pulled a stale tip revision for it's test run?	23:38
lifeless	mars: no, I said the last run in 'recent builds' was ages ago and for what is now obsolete	23:39
lifeless	mars: and that it hasn't has a more recent run which would get a better tip	23:39
lifeless	I hypothesised that it hadn't detected it	23:39
mars	ok, I'll force a build then	23:39
lifeless	alternatively, if the slave doesn't come up, I bet bb doesn't report that as a failed run.	23:39
lifeless	mars: I forced a build	23:40
mars	it is still offline	23:40
lifeless	see 23:18:22 in the waterfall	23:40
mars	ugh	23:41
mars	have to restart the build master then	23:41
mars	mbarnett, could you please restart the build master? That should get the prod_lp builder running again	23:43
mars	lifeless, EC2 build slaves need a master restart in order to bring them back up (lp, db_lp, prod_lp).	23:44
mars	I haven't seen this problem before, but "restart the world" sounds right	23:44
mars	use the unstoppable super weapons first	23:44
mars	always the unstoppable super weapons first	23:44
mbarnett	build master has been restarted	23:45
lifeless	where's the earth shattering kaboom	23:45
lifeless	there's meant to be an earth shattering kaboom	23:45
lifeless	</marvin>	23:45
mars	heh	23:45
mbarnett	does not appear to be starting back up happily	23:46
* mars mashes F5 a few more times in desperation		23:47
mars	j/k, that actually speeds server death in resource-exhausted environs :)	23:48
mars	mbarnett, do you have a log I could look at please?	23:48
mbarnett	mars: sure, give me a couple	23:50
mars	k	23:51

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!