/srv/irclogs.ubuntu.com/2010/11/12/#launchpad-dev.txt

mwhudson	err!	00:20
mwhudson	why is branchChanged hitting AssertionErrors?	00:20
spiv	And no visible OOPS ID in the traceback sent to my 'bzr push' either...	00:21
mwhudson	yeah	00:21
spiv	On the other hand, LP did seem to successfully notice that my branch changed.	00:22
mwhudson	thumper: hello :-)	00:23
mwhudson	well	00:24
mwhudson	the assertionerror is because the transaction is doomed	00:24
mwhudson	https://lp-oops.canonical.com/oops.py/?oopsid=OOPS-1777XMLP119	00:25
mwhudson	ah no, being doomed	00:26
mwhudson	its in a timeout block	00:27
mwhudson	wth, there's a gap of 15s between recorded queries	00:28
* wgrant stabs qastaging.		00:29
StevenK	wgrant: What did qastaging ever do to you?	00:34
mwhudson	spiv: https://bugs.launchpad.net/launchpad-code/+bug/674305 <- feel free to hit the affects me too thing :-)	00:35
_mup_	Bug #674305: bzr push occasionally reports AssertionError on terminal <Launchpad Bazaar Integration:New> <https://launchpad.net/bugs/674305>	00:35
wgrant	StevenK: Timed out lots.	00:35
wgrant	Although it may just be that those pages are broken now.	00:35
wgrant	(Archive:+index, +packages, +delete-packages, that sort of thing)	00:35
wgrant	Hmm.	00:39
wgrant	It'd be nice if daily builds didn't all hit and DoS the build farm at the same time.	00:39
spiv	mwhudson: done, thanks!	00:41
lifeless	wgrant: https://bugs.launchpad.net/soyuz/+bug/672371	00:45
_mup_	Bug #672371: Archive:+packages timeouts <ppa> <qa-needstesting> <regression> <timeout> <Soyuz:Fix Committed by jelmer> <https://launchpad.net/bugs/672371>	00:45
thumper	mwhudson: hey	00:46
thumper	mwhudson: whazzup?	00:46
mwhudson	thumper: that bug	00:47
thumper	mwhudson: I think	00:48
mwhudson	thumper: https://bugs.launchpad.net/launchpad-code/+bug/674305	00:48
_mup_	Bug #674305: bzr push occasionally reports AssertionError on terminal <Launchpad Bazaar Integration:New> <https://launchpad.net/bugs/674305>	00:48
thumper	mwhudson: I think that may be the xmlrpc fuckage	00:48
thumper	mwhudson: not sure why there are massive gaps	00:48
mwhudson	thumper: the xmlrpc fuckage?	00:48
mwhudson	the same as for getJobForMachine?	00:48
thumper	mwhudson: all the timeouts on the xmlrpc server	00:48
thumper	mwhudson: exactly	00:48
mwhudson	hm, ok	00:48
thumper	I've not been able to find out why we have 8s gaps	00:49
thumper	with no obvious reason	00:49
mwhudson	:/	00:49
thumper	I spent almost a week chasing it	00:49
thumper	and I've nothing to show for it	00:50
wgrant	lifeless: Yeah, but isn't that in theory fixed?	00:50
lifeless	wgrant: see my last comment	00:50
wgrant	Oh.	00:50
lifeless	iz single slow query	00:50
lifeless	well	00:50
lifeless	there are other slow queries	00:50
lifeless	but thats the smoking gun	00:50
wgrant	does that also take forever on a real DB?	00:51
thumper	lifeless: ah... no	00:51
thumper	it isn't a slow query	00:51
thumper	it is the 15s gap between query execution and the next one that bothers me	00:52
thumper	mwhudson: I'd love some help chasing that down as I've exhausted my understanding on that problem	00:53
lifeless	wgrant: don't know	00:53
lifeless	thumper: huh, what are you talking about?	00:53
lifeless	thumper: I'm talking about https://lp-oops.canonical.com/oops.py/?oopsid=OOPS-1776QS51	00:53
thumper	https://lp-oops.canonical.com/oops.py/?oopsid=OOPS-1777XMLP119	00:53
lifeless	query 33	00:53
thumper	lifeless: ^^	00:54
lifeless	thumper: I'll have a look	00:54
lifeless	thumper: that looks like thread starvation to me	00:55
thumper	lifeless: but it is only a guess	00:55
thumper	lifeless: and why is it starved	00:55
thumper	we don't know	00:55
thumper	we are just guessing	00:55
lifeless	thumper: the losas have the xml server split out as the highest ticket	00:56
lifeless	thumper: when thats done we'll have more resources for xmlrpc	00:56
lifeless	thumper: and after that the single threaded experiment will kick in	00:56
lifeless	thumper: if you want to work on this today, I suggest implementing the per thread stats	00:57
thumper	lifeless: no, I'm in the middle of something else	00:57
lifeless	https://bugs.edge.launchpad.net/launchpad-foundations/+bug/243554 for reference	01:01
_mup_	Bug #243554: oops report should record information about the running environment <oops-infrastructure> <Launchpad Foundations:Triaged> <OOPS Tools:Triaged> <https://launchpad.net/bugs/243554>	01:01
lifeless	wgrant: I have two problems answering for 'on a real db'	01:02
lifeless	wgrant: firstly, we don't have the substituted ids to reproduce	01:02
lifeless	wgrant: secondly don't have access and we're short staffed losa-wise.	01:02
lifeless	wgrant: where are you up to exam wise?	01:03
wgrant	lifeless: On the first day of a 12 day break.	01:05
wgrant	So not doing much.	01:05
lifeless	wgrant: Are you interested in tackling this perf issue? I have a trip on sunday for the cassandra training	01:06
wgrant	lifeless: We should have a stub soon, shouldn't we?	01:06
lifeless	and shoppinh/prep to do today	01:06
lifeless	wgrant: in a few hours yes	01:06
lifeless	wgrant: I'm strictly on leave, but I'm pretty bad at unwinding for < several-week periods.	01:07
wgrant	Heh.	01:07
lifeless	right now though, I have to do a shop-run. bbs.	01:07
wgrant	So, 11888 made it bad, and the fix in iforgetwhat didn't help?	01:07
lifeless	it helped	01:21
lifeless	but not enough	01:21
lifeless	we have two options	01:21
lifeless	fix the query - its taking 200ms per SPPH at the moment.	01:22
lifeless	rollback both 11888 and 11903(?)	01:22
lifeless	note that rolling back leaves the page at 10 seconds and the ajax status updating timing out.	01:22
wallyworld	thumper: hello, mcfly	02:15
thumper	wallyworld: whazzup?	02:15
wallyworld	i can't get branch lp:~wallyworld/launchpad/invalid-branch-link-message to merge properly	02:16
wallyworld	it's not in the codebase either locally or on loggerhead and any merge attempts via pqm or lp-land claim there is nothing to do	02:16
thumper	wallyworld: this is the revision that was backed out wasn't it?	02:17
wallyworld	yes	02:17
wallyworld	but i fixed it	02:17
wallyworld	ie backed out the bad yui stuff	02:17
thumper	right	02:17
wallyworld	it's gone past ec2 again no probs	02:17
thumper	did you reverse the reversed merge?	02:18
wallyworld	no. not sure what to do	02:18
thumper	right	02:18
thumper	what you need to do is to merge devel into you branch	02:18
thumper	then do a reverse merge of the revision that backed out your change	02:18
thumper	the guts of the problem is that most of your branch has been merged	02:19
thumper	and the files were then reverted	02:19
thumper	so you need to revert the revert	02:19
wallyworld	ok. noob alert. how do i do a reverse merge?	02:19
thumper	do you know the devel revision that reverted your merge?	02:19
thumper	wallyworld: it is a cherry pick merge	02:19
wallyworld	i did just after it happened :-)	02:20
spiv	wallyworld: merge -r NEW..OLD (rather than merge -r OLD..NEW)	02:20
thumper	wallyworld: I	02:20
wallyworld	i can see if i can find it	02:20
thumper	wallyworld: I'll leave you in spiv's capable hands	02:20
spiv	wallyworld: "bzr help revert" has an example:	02:20
thumper	wallyworld: to test the merge locally	02:20
spiv	“For example, "merge . --revision -2..-3" will remove the	02:21
spiv	changes introduced by -2, without affecting the changes introduced by -1.”	02:21
thumper	wallyworld: get an up to date devel, and go bzr merge --preview ../my-branch	02:21
thumper	wallyworld: that way you can see what pqm will be attempting to merge into devel	02:21
thumper	wallyworld: in the way of changes	02:21
wallyworld	ok. i'll have a wee looksy. thanks. i'll grab a quick bite first. suddenly i'm hungry	02:22
* thumper finally has the recipe index builds looking nice		02:28
thumper	now for the tests...	02:28
thumper	F**K ME - 150 / 1593 CodeImportSchedulerApplication:CodeImportSchedulerAPI	02:34
thumper	hard / soft timeouts	02:34
thumper	36 / 131 CodehostingApplication:CodehostingAPI	02:34
thumper	mwhudson: ^^^ that'll be contributing to the push issues	02:34
mwhudson	thumper: yep	02:35
mwhudson	also :(	02:35
* thumper has push failures like mwhudson had		02:43
lifeless	wgrant: so ;)	02:58
wgrant	lifeless: Hi. Just reinstalled and trying to get Launchpad running.	03:01
lifeless	meep!	03:01
wallyworld	poolie: ping	03:01
wgrant	Desktp + Soyuz on amd64 with lp-buildd in a VM does not fit well in 4GiB. :/	03:01
poolie	hi there wallyworld	03:01
poolie	hi wgrant, lifeless	03:01
wgrant	Afternoon poolie.	03:02
lifeless	hi poolie	03:02
wallyworld	hey, with the bzr 2.2.2 upgrade, we talked about doing it today from tip to avoid 2 lots of downtime. but i don't really think we should package trunk prior to official release. what downtime is involved? when i did the 2.2.1 upgrade, was there any downtime there?	03:03
poolie	so two things:	03:03
poolie	firstly, i wasn't really saying "you should package tip", just "it's safe to jump to tip if you want to"	03:04
poolie	we shouldn't normally need to	03:04
wgrant	There is a few seconds of downtime for codehosting upgrades.	03:04
poolie	and if there's a bug there for which you need an urgent deployment, it could be better to just do a release immediately	03:05
poolie	secondly i don't think it's really relevant to downtime	03:05
spiv	wgrant: although if you are a user 90% of the way through an hour long push the cost to you will be more than a few seconds...	03:05
poolie	i probably said "to avoid lag between us landing a fix and you running it"	03:05
poolie	hm iwbni it didn't interrupt running connections	03:05
wgrant	Hmm, true.	03:06
spiv	poolie: hmm, and in this case hypothetically it wouldn't need to; we don't need to restart the ssh server, just provide a new bzr so that new connections will get a fixed lp-serve...	03:06
wallyworld	so, me thinks it's better to wait for bzr 2.2.2 to be released next week deal schedule a small outage	03:07
wallyworld	if needed at all	03:07
wgrant	We have a downtime window next week for the DB upgrade anyway.	03:07
lifeless	right	03:07
lifeless	otherwise we have to schedule downtime	03:07
lifeless	unless its zomg time	03:07
lifeless	we will once the relevant RT is done have no-downtime deploys to codehosting.	03:08
lifeless	but its (I think) third in the queue.	03:08
lifeless	and we're getting one item done every 2-3 weeks.	03:08
wallyworld	there's that cpu spin/wait issue that 2.2.2 fixes and a few people get hit by hit but not so many that we shouldn't wait till next week...	03:08
spiv	Tangentially, I see https://lpstats.canonical.com/graphs/CodehostingPerformance/ looks a bit alarming ?	03:08
lifeless	it does	03:09
lifeless	fortunately its friday and noone will care about it till Monday	03:09
lifeless	<ha ha ha>	03:09
wallyworld	lifeless: you shouldn't care about it either. so much for you taking the day off. my wife would kill me if i worked too much on my "day off"	03:10
poolie	hm, is is that a repeating pattern over the last 24h?	03:11
poolie	spm, are you back at work?	03:11
spm	poolie: I am, but seriously considering tking the rest off - having a horrible hayfever attack atm - has triggered a very nasty asthma response. :-/	03:12
lifeless	spm: :(	03:12
lifeless	spm: taken claratyne?	03:13
spm	indeed	03:13
lifeless	spm: saline solutionas suggested can help a lot - gets the pollen out	03:13
spm	aye	03:13
spiv	mmm, neti pots.	03:13
lifeless	spiv: I ordered one wed	03:13
poolie	nasonex is great (prescription only)	03:14
wgrant	Hmm. It'd be nice if we had tracebacks for each SQL statement.	03:14
lifeless	poolie: yeah, mine runs out in a few days	03:15
lifeless	I've been given a (different) thing - I haven't read up to see if its equivalent yet.	03:15
wgrant	rofl	03:16
lifeless	'allonase' or something like that	03:16
wgrant	'I also suggest renaming "incomplete" to "need info", as it's much more	03:16
wgrant	descriptive. "Incomplete" makes it sound like the bug is in progress of	03:16
wgrant	being fixed, but not yet done.'	03:16
lifeless	wgrant: https://bugs.launchpad.net/launchpad-foundations/+bug/606959	03:16
_mup_	Bug #606959: oops should record the short traceback that caused each query? <Launchpad Foundations:Triaged> <https://launchpad.net/bugs/606959>	03:16
spiv	lifeless: heh	03:16
spiv	lifeless: what's nice about that idea is that although capturing tracebacks is a touch expensive, that shouldn't matter if you only do a reasonable number of queries ;)	03:17
lifeless	spiv: http://ecoyogastore.co.nz/eco-yoga-gear/neti-pot	03:17
lifeless	spiv: yeah	03:17
poolie	i saw, linked from the discussion of Go, google have a final bug status of "unfortunate"	03:17
poolie	that's nice	03:17
lifeless	lol	03:17
poolie	"suckstobeyou" :)	03:18
wgrant	I thought they added that specially for the naming bug.	03:18
spiv	lifeless: what web stores need for neti pots are photos more like http://www.flickr.com/photos/debrisdesign/502255811/	03:18
wgrant	But I may be wrong.	03:18
poolie	oh, maybe	03:18
poolie	it could be freeform for all i know	03:18
lifeless	spiv: yeah, I hope it has a manual	03:19
poolie	but it's a bit more precise for some things than 'wontfix'	03:19
spiv	lifeless: the internet can provide a guide or twenty, I'm sure.	03:19
lifeless	what we need is a closure-space	03:19
lifeless	N dimensions and a slider.	03:19
lifeless	like the colour-space pickers	03:20
wgrant	poolie: That's what Opinion is for!	03:20
wgrant	cough	03:20
lifeless	wgrant: thats an opinion!	03:20
wgrant	lifeless: :(	03:24
lifeless	seriously	03:25
lifeless	its still an experiment as far as I've heard	03:25
wgrant	Ah.	03:25
wgrant	OK, with Unity defeated, it is now time to look at that query.	03:29
lifeless	heh	03:29
lifeless	wallyworld: if you want to discuss https://bugs.launchpad.net/bugs/674329 further I'm happy to do so - I didn't mean to prevent discussion about whatever symptoms you ran into.	03:30
_mup_	Bug #674329: DecoratedResultSet eagerly fetches all results <performance> <Launchpad Foundations:Won't Fix> <https://launchpad.net/bugs/674329>	03:30
wallyworld	lifeless: hmmm. seems at first glance the whole concept of iterable results sets which load records in batches is not supported?	03:32
wallyworld	what is the query returns 10000000 records. and the user only wants to see 100 at a time?	03:33
lifeless	wallyworld: thats what batch navigator is for	03:33
lifeless	wallyworld: we do a count(*) [we should estimate instead, but thats orthogonal) and then use a slice (OFFSET X LIMIT Y in SQL) to only retrieve 100 at a time.	03:34
wallyworld	i realise that's what it is supposed to be for, but isn't the pirpose defauted if __iter__ loads the whole lot anyway	03:34
lifeless	wallyworld: __iter__ is /not/ for 'do partial work'	03:34
lifeless	wallyworld: (neither in general, nor in this specific case)	03:34
lifeless	wallyworld: in this specific case its because the database server will do all the work requested, always.	03:35
lifeless	so we have to ask for the right amount of work up front rather than do some, do some more, and then say that we're done.	03:35
lifeless	wallyworld: if you consider the implications of ORDER BY/GROUP BY on the work required in the db, this should make a lot of sense	03:36
wallyworld	sorry for my dumbness, but isn;t the whole concept of yield to avoid eagerly realising the entire list?	03:36
lifeless	uhm	03:36
lifeless	so, iterators, generators and lazy evaluation	03:37
wallyworld	why does the server do all the work? other databases don't enforce this?	03:37
lifeless	wallyworld: good question. Pg definitely does; others I won't speculate on.	03:37
wallyworld	sure, the database has to do some work to satisfy order by etc, but the step of extracting the data from the db into the result set needn't be done unless required	03:38
lifeless	nevertheless	03:38
lifeless	python-pgsql has a single large buffer with the results, no further network access occurs as we iterate the rows.	03:38
lifeless	Or so I am assured by Smart People.	03:39
lifeless	[specifically jamesh who dug into this in the past too]	03:39
wallyworld	ok then.	03:39
jamesh	by python-pgsql, you mean psycopg2?	03:39
lifeless	jamesh: blah - yes	03:39
wallyworld	lifeless: so to recap, if the result set has 10000000 rows, it's ok to do a list(rs) which effectively constructs an in memory data structure with all that data even if we only want to process 100 at a time?	03:41
jamesh	wallyworld: if you stop reading the result set early, the only effort you're going to save is the conversion of the result buffer to Python objects on the client side.	03:41
wallyworld	or am i missing something?	03:41
wgrant	wallyworld: You'll slice first.	03:41
wallyworld	yes, and for a large result set, that's significant and a potential performance issue	03:41
wgrant	wallyworld: The slice affects the issued query.	03:41
jamesh	if you know you will only need a subset of the rows, tell the database so that it can send you less info.	03:42
wallyworld	jamesh: i'm talking about say batch navigator which allows the user to scroll through the results 100 at a time.	03:43
wallyworld	we may want the whole lot eventually, but not all at once	03:43
wgrant	That slices, so the DB only sends those 100 rows.	03:43
wgrant	And only those 100 are turned into objects.	03:43
wallyworld	wgrant: not if a list(rs) is done??	03:43
wallyworld	which is what happens in DecoratedResultSet	03:43
lifeless	wallyworld: no, to recap, slice the resultset.	03:43
wgrant	wallyworld: __iter__ will only be called on the sliced version, right?	03:43
jamesh	wallyworld: how do you know you'll want them all eventually?	03:43
wgrant	slicing returns a new resultset.	03:44
wgrant	And __iter__ is called on that.	03:44
jamesh	for example, how often do people go to the second page of results from a bug search?	03:44
wallyworld	jamesh: i said we may want them all eventually, say if the user scrolls to the end	03:44
lifeless	wallyworld: general principle: specify all the work you want within a transaction - call it 2 seconds of processing time.	03:44
wallyworld	:-)	03:44
lifeless	wallyworld: and ask for, and process that. No more (would be wasted). No less (would result in additional queries - lowers efficiency)	03:45
lifeless	wallyworld: the batch navigator does this slicing for you	03:45
lifeless	wallyworld: how about we get concrete. 'I'm trying to do X, and Y is happening'	03:45
wallyworld	ok. i think my problem is i misunderstood how the batch navigator works.	03:46
wallyworld	thanks for setting me straight :-)	03:46
lifeless	the batch navigator uses count() on the base result set to estimate the number of pages	03:46
* wallyworld crawls back to his hole		03:46
lifeless	and a slice to get the data for the current page	03:47
wallyworld	makes sense	03:47
lifeless	the count() is a performance issue with huge datasets	03:47
lifeless	we need to switch to estimators	03:47
wallyworld	yeah.	03:47
lifeless	but thats orthogonal	03:48
wallyworld	also, in my case, i had a query with a group by so had to override count()	03:48
lifeless	erm	03:48
wallyworld	the default storm rs barfs	03:48
lifeless	:(	03:48
lifeless	I thought that was fixed in 0.18	03:48
wallyworld	you can't say select (*) from xxxx with a group by in it	03:49
wallyworld	no	03:49
wallyworld	i fixed it quite simply	03:49
wallyworld	but i also found a bug in Count()	03:49
wallyworld	it messes up count(distinct xxx)	03:49
poolie	lifeless, do you go to the losa meetings?	03:49
wallyworld	it leaves out () around the columns	03:49
poolie	i don't know the speciic name for it, but i mean the one where francis asks them to do things	03:50
wallyworld	s/select()/select count()	03:50
lifeless	poolie: no, tz fail. I get minutes, and have a separate meeting with ISF	03:50
poolie	k	03:50
lifeless	poolie: I do when I'm in a workable tz	03:50
poolie	i'll mail him then	03:51
poolie	thanks	03:51
poolie	jam, did you file an RT for starting lp-serve?	03:53
poolie	bug 660264	03:53
_mup_	Bug #660264: bzr+ssh on launchpad should fork, not exec <qa-ok> <Launchpad Bazaar Integration:Fix Committed by jameinel> <https://launchpad.net/bugs/660264>	03:54
jam	I've had an rt for a while now, 41340 IIRC, but I'm not positive	03:54
poolie	thanks, i'll check that	03:54
jam	sorry, 42156	03:54
* wallyworld goes to make a coffee and get his fire proof suit		03:55
jam	poolie: https://rt.admin.canonical.com/Ticket/Display.html?id=41791	03:56
poolie	that's not exactly the same as getting it running though	03:56
poolie	is there a ticket or bug for that?	03:57
poolie	iirc you need them to change some configuration scripts that you don't yourself have access to?	03:57
lifeless	poolie: the lp-serve thing is moving; jam needed to land more code	03:58
poolie	to do what?	03:58
jam	poolie: there is one, but I keep shooting blind as to the rt number	03:59
jam	Let me find the email	04:00
poolie	thanks	04:00
wgrant	Could someone run http://paste.ubuntu.com/530449/ on staging?	04:00
poolie	lifeless, while jam's, looking, what do you understand the state of this to be?	04:01
poolie	i'd just like to make the bug accurate and work out where if anywhere it's getting stuck	04:01
lifeless	poolie: its in a back and forth discussion with the losas as they figure all the bits out	04:02
lifeless	poolie: its low priority (relatively that is) so I wouldn't expect it to happen rapidly	04:02
jam	poolie: 42199	04:02
lifeless	poolie: mwhudson was landing the init script for jam, and with that it should be able to be enabled on staging	04:03
lifeless	and then qad	04:03
lifeless	epic fail	04:04
lifeless	3142 OOPS-1776B79 BugTask:+index	04:04
poolie	so from that rt it looks like the next action is still 'get the service running on qastaging'?	04:04
lifeless	=== Top 10 Time Out Counts by Page ID ===	04:04
lifeless	Hard / Soft Page ID	04:04
lifeless	238 / 35 Person:+commentedbugs	04:04
lifeless	150 / 1593 CodeImportSchedulerApplication:CodeImportSchedulerAPI	04:04
lifeless	50 / 188 BugTask:+index	04:04
lifeless	36 / 131 CodehostingApplication:CodehostingAPI	04:04
lifeless	16 / 9 Person:+bugs	04:04
lifeless	14 / 352 Distribution:+bugs	04:04
jam	poolie: right, this whole week there haven't been enough l-osas, and there have been some critical things going on	04:04
lifeless	9 / 70 Archive:EntryResource:getBuildSummariesForSourceIds	04:04
lifeless	9 / 8 Archive:+copy-packages	04:04
lifeless	8 / 396 Distribution:+bugtarget-portlet-bugfilters-stats	04:04
jam	today there was only Ch-ex	04:04
lifeless	7 / 0 BugTask:+addcomment	04:04
lifeless	poolie: yes	04:04
poolie	k, i don't want to preempt the critical things, of course, i just want it to not stay stuck after that	04:05
lifeless	poolie: so in my queue its:	04:05
lifeless	- after RFWTAD stuff - thats important to finish getting single revs deployed and finish eliminating operation risk	04:05
lifeless	- after token librarian - thats old inventory which fixes timeouts for many private attachments (e.g. security builds)	04:06
lifeless	in terms of LOSA time	04:06
poolie	ok	04:07
lifeless	short interrupts to move it along are of course reasonable	04:07
poolie	so it's off john's plate until they get to it?	04:07
lifeless	poolie: John can best answer that	04:07
jam	lifeless, poolie: I'm at least pending them telling me what I need to do next	04:12
jam	the last round I didn't know I needed until they asked for it	04:12
poolie	mm there seem to be a few problems like that	04:12
thumper	RFC: http://people.canonical.com/~tim/recipe-latest-builds.png	04:21
thumper	it is using factory generated fake data, so I have multiple binary builds for the same arch	04:22
thumper	but the basics are there	04:22
thumper	this is up for review now	04:23
thumper	poolie: hi	04:28
thumper	poolie: we have another urgent need for committing to stacked branches	04:28
poolie	hi thumper	04:28
poolie	i think francis mentioned this...	04:29
thumper	poolie: bzr-builder commits to the branch	04:29
poolie	it was for.. right	04:29
poolie	and why does it want a stacked branch not a checkout?	04:29
thumper	poolie: and getting a branch for some big projects was using much more memory than the virtual builders had	04:29
thumper	poolie: because it never pushes	04:29
wgrant	thumper: Not a fan of the triplicated spr name and version, but apart from that it looks great.	04:30
thumper	poolie: apparently an alternative solution is to change the merge code	04:30
thumper	poolie: abentley wrote it all up	04:30
poolie	onto the bug about commit?	04:31
thumper	on the incident report	04:31
thumper	for the buildd failures	04:31
poolie	that was an email or a wiki page?	04:31
thumper	wiki page I believe	04:31
thumper	I could forward you the email if you like	04:32
thumper	aaron wrote solutions up for me	04:32
poolie	i can probably find it	04:32
thumper	rockstar: ping?	04:32
* thumper EODs		04:33
poolie	thumper, is that https://wiki.canonical.com/IncidentReports/2010-10-28-LP-build-manager-not-dispatching ?	04:34
thumper	poolie: ah, I see it isn't all on the incident report	04:35
lifeless	thumper: if its not pushing	04:40
lifeless	thumper: why commit at all?	04:40
lifeless	stub: what do you think of the idea of capturing query params in oops	04:42
lifeless	stub: it seems to me it will help reproducing issues lot	04:42
stub	lifeless: We will be logging private information, including information lp devs technically shouldn't have access to.	04:43
stub	Some of that already leaks via the URL of course (so LP devs can learn about private teams they shouldn't know about)	04:45
stub	But that hasn't been a problem so far, as private stuff has been company internal rather than private to a subset of the company.	04:45
lifeless	stub: well, in theory :)	04:46
lifeless	stub: so, we also manually create many queries today	04:46
lifeless	so at least - today - we already leak that	04:46
stub	Content of some of the private bugs could be an issue, as that would violate vendorsec	04:50
lifeless	yeah	04:51
lifeless	all disclosure stuff is serious	04:51
lifeless	stub: when would we use content from a private bug in a query ?	04:52
lifeless	stub: INSERT I guess	04:52
lifeless	stub: + 'bugs like this'	04:52
lifeless	stub: uhm, fo rthe INSERT case we could choose not to substitute	04:52
lifeless	s/substitute/include/	04:52
lifeless	stub: we're trying to figure out why https://lp-oops.canonical.com/oops.py/?oopsid=OOPS-1777QS12 has multi second queries	04:53
lifeless	stub: doing them by hand with plausible ids is extremely fast - 130ms for the main lookup in the page	04:53
lifeless	stub: could it be the something odd like the isolation level (what level does appserver run as), or is it just the specific ids that will be at issue?	04:55
lifeless	stub: ping	05:41
lifeless	hmm, nvm for a sec	05:41
lifeless	wallyworld: qastaging-slave vs main	05:42
lifeless	perhaps	05:42
lifeless	bah	05:42
lifeless	wgrant: ^	05:42
wgrant	lifeless: Could be, I suppose.	05:42
wallyworld	lifeless: ECONTEXT	05:42
lifeless	wallyworld: I was talking to wgrant ; tab fail.	06:04
wallyworld	lifeless: np. i figured that when i saw the rest of the conversation come through :-)	06:04
lifeless	stub: ping	06:14
=== almaisan-away is now known as al-maisan
stub	lifeless: pong	06:14
lifeless	hi	06:14
lifeless	I need your help	06:14
lifeless	we've got a very odd thing happening	06:14
lifeless	have a look at these two oopses	06:15
lifeless	https://lp-oops.canonical.com/oops.py/?oopsid=OOPS-1777QS12	06:15
lifeless	https://lp-oops.canonical.com/oops.py/?oopsid=OOPS-1777QS19	06:15
lifeless	this is the +packages page which is a current blocker for deploying	06:15
stub	isolation level doesn't cause slowdowns	06:15
lifeless	this page	06:15
lifeless	https://qastaging.launchpad.net/~yavdr/+archive/stable-vdr/+packages?start=0&batch=204	06:15
lifeless	in 1777QS12 query 34 takes 6.3 seconds	06:16
lifeless	in 19 it takes 202ms	06:16
lifeless	and 39 takes 20 seconds	06:16
lifeless	we've shutdown cronscripts on asuka	06:16
lifeless	so the load should be tolerable (about 2 I believe - spm can confirm ?)	06:17
lifeless	running query 34 by hand, it takes about 200ms consistently, every time	06:18
stub	lifeless: Are the oopses from the first batch? The way we currently do batching means that it you have a large set of results, the later batches will always timeout.	06:18
lifeless	stub: same batch in both oopses	06:18
lifeless	stub: same exact url	06:18
lifeless	ahh	06:22
lifeless	I think I've managed to get a slow query	06:22
lifeless	\o/ finally	06:22
wgrant	!!	06:23
stub	OOPS-1777QS19 q39 is slow and comes with all the parameters (obviously we are not sanitizing the aborted query...)	06:25
lifeless	stub: yeah, its also genuinely slow locally	06:25
lifeless	by which I mean ro user on qastaging	06:25
lifeless	stub: thanks	06:26
stub	And that is slow because it is returning 1.35 million rows	06:28
lifeless	\o/	06:28
lifeless	wgrant: ^	06:29
wgrant	Hmm. Is that the newer version query?	06:29
lifeless	I think I just reused the existing grouped version of it	06:29
lifeless	sounds like it was inefficient already	06:29
lifeless	:)	06:29
lifeless	or buggy	06:30
wgrant	1.35 million rows sounds buggy.	06:30
lifeless	wgrant: this give you what you need to make a test, isolate n fix?	06:30
stub	lifeless: its missing a join condition	06:31
wgrant	lifeless: Maybe.	06:31
wgrant	Hah, so it is.	06:31
stub	lifeless: Its missing a 'AND sourcepackagename.id = sourcepackagerelease.sourcepackagename	06:31
wgrant	SPN	06:31
wgrant	Yeah.	06:31
lifeless	stub: in the inner or outer?	06:32
stub	The outer	06:32
lifeless	2.7 seconds	06:33
lifeless	tolerable with just one	06:33
stub	So every matched row is being expanded to 38k rows.	06:33
wgrant	Ah.	06:33
wgrant	I think in fact that it shouldn't be joining against SPN at all.	06:33
lifeless	wgrant: still badly needs tuning	06:33
lifeless	oh, I did chage that, I removed spn.... but I bet storm is putting it back in.	06:34
lifeless	bastardo.	06:34
lifeless	how do you disable autotables?	06:34
lifeless	jamesh: ^	06:34
wgrant	lifeless: It's still explicitly there.	06:34
wgrant	clauseTables=[	06:34
wgrant	'SourcePackageName', 'SourcePackagePublishingHistory'])	06:34
wgrant	s/Name/Release/, I suspect.	06:34
stub	So we might be able to avoid the subselect using DISTINCT ON	06:34
jamesh	lifeless: what's the context?	06:34
lifeless	jamesh: nvm :)	06:35
lifeless	jamesh: I was thinking storm was seeing a table ref from an inner query and autotables adding it to the outer FROM	06:35
lifeless	jamesh: but I was wrong	06:35
wgrant	lifeless: So, how does it go if you remove the SPN join from the query?	06:36
jamesh	ah.	06:36
lifeless	wgrant: fine	06:36
lifeless	wgrant: what file is tha tin	06:36
lifeless	wgrant: 2.6 seconds	06:36
wgrant	lifeless: lib/lp/registry/model/distroseries.py	06:37
wgrant	2.6 seconds sounds sort of excessive.	06:37
lifeless	http://pastebin.com/bJ2TxmFc	06:38
stub	Hmm... distinct on makes it worse.	06:41
lifeless	ok	06:43
lifeless	thats up in PQM	06:43
lifeless	immediate fix	06:43
wgrant	And that hopefully makes it non-critical.	06:44
lifeless	yeah	06:49
lifeless	assuming theres nothing hiding behind it	06:49
lifeless	let me get the change cowboyed to see	06:49
wgrant	This explains why even trivial archives were timing out.	06:51
lifeless	wgrant: indeed	07:17
lifeless	ok its landed	07:18
adeuring	good morning	08:58
bigjools	morning	08:59
mrevell	Hey up, by the way	09:10
henninge	lifeless: It looks like your fix for bug 672371 did not help. +packages still times out on qastaing.	09:51
_mup_	Bug #672371: Archive:+packages timeouts <ppa> <qa-needstesting> <regression> <timeout> <Soyuz:Fix Committed by jelmer> <https://launchpad.net/bugs/672371>	09:51
henninge	What's next? Revert r11888?	09:52
henninge	jml: Hi! Any chance you could QA bug 673015?	09:55
_mup_	Bug #673015: Code of Conduct requirement for PPA upload rights is unnecessary <ppa> <qa-needstesting> <Soyuz:Fix Committed by jml> <https://launchpad.net/bugs/673015>	09:55
henninge	allenap: Hi! Any luck figuring out bug 667340?	09:56
_mup_	Bug #667340: Trac status of "Verified" confuses bug watcher <qa-bad> <trac-support> <trivial> <Launchpad Bugs:Fix Committed by allenap> <https://launchpad.net/bugs/667340>	09:56
henninge	stub: Can you please QA bug 673874 before starting on your weekend?	09:59
_mup_	Bug #673874: Improve bug comment caching <qa-needstesting> <Launchpad Bugs:Fix Committed by stub> <https://launchpad.net/bugs/673874>	09:59
allenap	henninge: No, not yet. It hasn't caused any regressions, so it's actually safe to go.	09:59
allenap	henninge: I'll mark it as qa-ok but continue to investigate.	09:59
henninge	allenap: thanks a lot!	09:59
henninge	gmb: can you please QA bug 672507 ?	10:04
_mup_	Bug #672507: Add bug_notification_level to the structural +subscribe view <qa-needstesting> <story-better-bug-notification> <Launchpad Bugs:Fix Committed by gmb> <https://launchpad.net/bugs/672507>	10:04
gmb	henninge: Sure.	10:04
gmb	henninge: Done	10:06
henninge	gmb: thanks a lot!	10:06
bigjools	henninge: jml needs my help to QA that	10:10
henninge	bigjools: thanks for offering it ;)	10:10
lifeless	henninge: see the comment in the bug	10:11
bigjools	he can't QA without it since it needs dogfood :)	10:11
lifeless	henninge: we're waiting on https://lpbuildbot.canonical.com/waterfall	10:11
henninge	lifeless: ah yes, thank you.	10:14
jml	hello.	10:19
jml	yes QA, I know I know	10:19
jml	bigjools: where do I need to point .dput.cf at?	10:26
bigjools	jml: http://pastebin.ubuntu.com/530615/	10:26
* bigjools processes your upload		10:28
bigjools	jml: rejected	10:29
jml	bigjools: why so?	10:30
bigjools	jml: can I help you make a dummy package that I know works	10:30
bigjools	"Unable to find python-testtools_0.9.6.orig.tar.gz"	10:30
bigjools	and it was a mixed upload it seems	10:30
jml	meaning?	10:30
bigjools	binaries and source	10:30
bigjools	jml: I normally use the "hello" package	10:31
bigjools	apt-get source hello	10:31
bigjools	cd hello-2.5	10:32
bigjools	dch -i	10:32
bigjools	<add a revision>	10:32
jml	yeah, that's what I did with testtools	10:32
jml	(so far so good)	10:32
* bigjools sighs at stuck keys		10:33
jml	heh	10:33
bigjools	ok, then you need to "debuild -S"	10:33
jml	ahh	10:34
jml	it's the -S that I didn't do	10:34
jml	uploaded	10:34
bigjools	accepting it this time	10:35
jml	yay	10:35
bigjools	you cleared the CoC from ~jml?	10:36
jml	bigjools: I did, but I'd like to double check with getUtility(IPersonSet).getByName('jml').is_ubuntu_coc_signer	10:36
* bigjools checks		10:36
bigjools	False	10:37
bigjools	qa-ok!	10:37
jml	sweet.	10:37
jml	bigjools: thanks!	10:37
bigjools	my pleasure	10:37
* bigjools goes to celebrate with caffeine		10:38
jml	henninge: what's the word on the crazy non-vc managed file that refers to class paths?	10:41
henninge	jml: It cannot be updated outside of a roll-out - at least not without Tom around ...	10:43
henninge	jml: So I am preparing a branch that adds the required import to c.l.i again with an XXX to remove it again after the roll-out.	10:44
jml	henninge: that seems unsatisfactory	10:44
henninge	and a special roll-out requirement to update that file	10:44
jml	henninge: can't we just add the requirement and leave c.l.i as-is?	10:45
henninge	jml: only if we go without a further deployment today	10:45
jml	henninge: so it needs a rollout-with-downtime?	10:45
henninge	so spm says, yes.	10:45
jml	henninge: did he say what it's needed for?	10:46
henninge	jml: hang on, I'll forward the mail	10:46
jml	henninge: thanks :)	10:46
jml	henninge: ok. I find this whole thing colossally annoying, but it looks like you guys are making the best of a bad situation.	10:55
henninge	jml: we are trying hard ... ;) thanks	10:55
henninge	and yes, it is annoying	10:55
=== al-maisan is now known as almaisan-away
lifeless	henninge: it passed buildbot	10:59
lifeless	henninge: when 914 hits qastaging	10:59
lifeless	then	10:59
lifeless	https://qastaging.launchpad.net/~yavdr/+archive/stable-vdr/+packages?start=0&batch=204	10:59
lifeless	should start working	10:59
lifeless	that should be anytime now	11:00
henninge	lifeless: thanks! But it will be another 4 hours or so ...	11:00
lifeless	henninge: why?	11:01
henninge	https://lpbuildbot.canonical.com/builders/lucid_lp/builds/355	11:01
henninge	It just entered buildbot not passed it yet	11:01
lifeless	oh crumbs 913 I see	11:01
lifeless	ah well	11:01
lifeless	gl	11:01
lifeless	!	11:01
lifeless	and gnight all	11:01
henninge	lifeless: good night and thanks again.	11:02
=== matsubara_ is now known as matsubara
bigjools	hello - I am having trouble getting the webservice to work on dogfood. When I try and log in, there's a rejection because it can't traverse to '1.0'. HALP?	11:23
jelmer	bigjools: Have you tried using 'devel' rather than 1.0 ?	11:25
bigjools	yes, that's what I am using - which makes the error more odderer	11:25
bigjools	launchpad = Launchpad.login_with('testing', 'https://api.dogfood.launchpad.net/devel/')	11:25
jelmer	Shouldn't there be another /api/ in there?	11:25
bigjools	yes	11:26
bigjools	still fails!	11:26
jelmer	then I'm out of ideas :-)	11:26
bigjools	hmm using https://api.dogfood.launchpad.net/api worked	11:27
bigjools	ah you need to write version='devel' in the login_with params	11:32
bigjools	jml: got a sec?	11:59
jml	bigjools: sure	11:59
bigjools	jml: I'm probably doing something very very stupid but I have code blindness. See http://pastebin.ubuntu.com/530655/	12:00
bigjools	there's a code snippet and a pdb session	12:00
bigjools	the inner function callback can't see all of the outer method's variables....	12:00
LPCIBot	Project devel build (220): FAILURE in 2 hr 4 min: https://hudson.wedontsleep.org/job/devel/220/	12:01
LPCIBot	* Launchpad Patch Queue Manager: [r=lifeless][ui=none][no-qa] Remove StartsWith matcher from	12:01
LPCIBot	lp.testing.matchers in favour of one from testtools & fix some	12:01
LPCIBot	assertions that always passed.	12:01
LPCIBot	* Launchpad Patch Queue Manager: [r=lifeless][ui=none][no-qa] Really drop Sourcepackagename from getNewerSourceReleases - fixing massive timeouts on +packages.	12:01
jml	./me looks	12:02
deryck	Morning, all.	12:02
bigjools	morning deryck	12:02
jml	bigjools: you are masking them in scope, I think.	12:03
jml	bigjools: let me knock up a simpler example...	12:03
=== henninge changed the topic of #launchpad-dev to: Launchpad Development Channel \| Week 3 of 10.11 \| PQM is open \| firefighting: Lots of timeouts on qastaging!! \| https://dev.launchpad.net/ \| Get the code: https://dev.launchpad.net/Getting
jml	bigjools: http://paste.ubuntu.com/530657/	12:04
henninge	OK, qastaging is timing out left and right ... :(	12:04
henninge	Ubuntu pages seem to work fine but any project page times out.	12:05
jml	bigjools: in "if file_sha1 == 'buildlog':", you are overriding out_file, out_file_name and out_file_fd	12:05
jml	bigjools: probably the thing to do is pass them in.	12:05
jml	e.g.	12:05
bigjools	jml: it's not got that far yet	12:05
jml	d.addCallback(got_file, out_file_name, out_file)	12:05
jml	bigjools: it doesn't matter.	12:05
bigjools	ok	12:05
jml	bigjools: run the python I pasted	12:05
bigjools	that's special	12:06
jml	bigjools: simply having an assignment in the scope masks the outer scope, whether or not the assignment has been evaluated.	12:06
jml	bigjools: I'm not sure it would be sensible to do anything else.	12:07
bigjools	jml: ok thanks , I'll pass 'em in	12:07
jml	bigjools: np.	12:07
henninge	Argh!	12:14
henninge	I think I never realized how widespread the problems are that r11888 caused.	12:14
henninge	Maybe it's just that.	12:15
wgrant	henninge: It should be limited to pages on IArchive.	12:17
jml	henninge: can you please subscribe me to whatever bug you file for the XXX in c/l/interfaces/__init__?	12:17
henninge	jml: oh bug, right ... ;)	12:18
wgrant	henninge: Anything outside Archive:+(index\|packages\|copy-packages\|delete-packages) is probably not 11888.	12:18
henninge	wgrant: thanks	12:18
henninge	although I wish it was ... (because there is a fix coming)	12:19
jml	bigjools: should I put that API gotcha on a wiki page somewhere?	12:26
bigjools	jml: not yet - I can't get it working still	12:26
jml	bigjools: ok.	12:26
bigjools	jml: there's an error from wadllib about "Can't look up definition in another url"	12:26
=== mrevell is now known as mrevell-lunch
jml	I've not seen that one before	12:27
bigjools	and I suspect I need leonardr	12:27
bigjools	yeah, it's doing something weird so that the /api is stripped somewhere	12:27
wgrant	The URL shouldn't have /api in it.	12:27
bigjools	but later depends on it being there	12:27
wgrant	/api is used to traverse from the webapp to the API -- you don't use it on api.launchpad.net.	12:28
bigjools	.........	12:28
bigjools	and so it works	12:28
bigjools	thanks wgrant	12:28
wgrant	Heh.	12:28
henninge	jml: bug 674476 I failed to mention it in the XXX, though... :/	12:30
_mup_	Bug #674476: Files outside the LP tree reference LP code <Launchpad Foundations:New> <https://launchpad.net/bugs/674476>	12:30
jml	henninge: thanks. that's ok.	12:30
henninge	and you are subscribed	12:31
henninge	jml: and thanks for reminding me about the bug	12:32
jml	henninge: np.	12:33
=== didrocks1 is now known as didrocks
mrevell-lunch	#	12:57
=== mrevell-lunch is now known as mrevell
=== almaisan-away is now known as al-maisan
=== beuno_ is now known as beuno
jml	Started in 15 minutes 27 seconds!	13:14
bigjools	jml: ha - remember how we added Deferred to lp_sitecustomise.py?	13:24
jml	yeah?	13:25
bigjools	jml: looks like I need DeferredList too :)	13:25
jml	bigjools: I thought DeferredList subclassed Deferred	13:25
bigjools	ForbiddenAttribute: ('addCallback', <DeferredList ....	13:25
jml	meh	13:26
bigjools	I guess zope doesn't care so much about that	13:26
jml	why might bugtask.date_closed be none, even though its status is one of Fix Released, Wontfix or Inprogress?	13:39
shadeslayer	jam: was my qtwebkit build fix0red?	14:03
=== matsubara is now known as matsubara-lunch
=== Ursinha is now known as Ursinha-lunch
gary_poster	henninge: where are we with the qastaging slowdown? I see that only qastaging is affected; staging and production are fine. The timeout exception I see is within database code, but since that's where we check for timeouts, that's not necessarily indicative.	14:24
gary_poster	Has anyone looked at qastaging logs? Has anyone looked at performance graphs? Has anyone tried to correlate performance graphs with revisions deployed on qastaging?	14:24
gary_poster	and, are we coordinating here or on -ops?	14:25
gary_poster	Hm, no tuolomne graphs of qastaging AFAICT :-/	14:26
gary_poster	Maybe I need to know the machine name(s)	14:27
gary_poster	qastaging is the same machines as staging, and staging is not timing out (as badly?) so machine load doesn't seem likely...	14:34
gary_poster	trying logs	14:34
henninge	gary_poster: staging is a lot of revisions behind qastaging atm	14:36
gary_poster	henninge: I figured it was something like that, yeah	14:36
gary_poster	so what has been done?	14:36
gary_poster	I saw stub's reply, but that didn't tell us much	14:37
gary_poster	I was about to grope around in logs	14:37
henninge	gary_poster: logs is good	14:38
henninge	I was hoping that the authors of the revision could check if any of their code could be causing this.	14:39
gary_poster	henninge: you identified the revision?	14:40
henninge	no, it's just any of the later ones.	14:41
gary_poster	ah :-)	14:41
henninge	bug I could narrow down the range because it only started today.	14:41
gary_poster	henninge: you up for that while I do log groping? I can share log groping fun here. So far the only thing that does not look like chatter in the qastaging librarian log is "Exception KeyError: ((<class 'canonical.launchpad.database.librarian.LibraryFileAlias'>, (1890638,)),) in <function remove at 0x8f3ced8> ignored"	14:44
gary_poster	seems to be mostly happy thoug	14:44
gary_poster	h	14:45
henninge	gary_poster: I am looking at the revs atm, yes.	14:45
gary_poster	cool thanks	14:46
gary_poster	There are boatloads of "No handlers could be found for logger "librarian"" things in logs, which do make me a nit nervous	14:51
gary_poster	bit	14:51
henninge	gary_poster: I noticed earlier that getting images from the librarian had a long delay.	14:52
jml	mrevell: flacoste: http://paste.ubuntu.com/530734/	14:53
gary_poster	henninge: yeah. Maybe. It doesn't smell like the cause to me. This is interesting though: qastaging app log is swamped with these: http://pastebin.ubuntu.com/530735/	14:53
henninge	gary_poster: What is a "DoomedTransaction"?	14:54
gary_poster	a transaction that must not be restarted	14:54
gary_poster	henninge: may be an unrelated problem. This started after the the restart 2010-10-21T15:22:39, so it's been happening a looong time	14:56
henninge	ah, ok	14:57
abentley	jkakar: around? ResultSet.set is generating bad SQL.	15:07
=== Ursinha-lunch is now known as Ursinha
abentley	jkakar: http://pastebin.ubuntu.com/530742/	15:13
=== matsubara-lunch is now known as matsubara
jkakar	Can you paste the code that generates this, please?	15:17
jkakar	abentley: ^^ Also, am on a call, will be laggy.	15:18
abentley	jkakar: http://pastebin.ubuntu.com/530747/	15:22
jkakar	abentley: What's the __storm_table__ for SourcePackageRecipeBuild.	15:23
abentley	jkakar: __storm_table__ = 'SourcePackageRecipeBuild'	15:24
jkakar	abentley: Is that right?	15:25
abentley	jkakar: Yes.	15:25
jkakar	abentley: Can you do a JOIN in an UPDATE statement? It looks like you're building a bad query, not that Storm is generating a bad one.	15:25
abentley	jkakar: I'm not an expert on SQL syntax. It's possible that I'm asking Storm to do the impossible, but if I am, I expect Storm to tell me.	15:27
jkakar	abentley: Storm won't tell you if you're trying to do the impossible.	15:29
jkakar	abentley: It's reasonable to expect it, but Storm is just a "thin" cough layer with an expression compiler that generates SQL exactly as you specify	15:29
jkakar	abentley: The database is telling you that you're trying to do the impossible. Which, given that different backends have different definitions of "impossible", is probably right anyway.	15:30
abentley	jkakar: This doesn't seem like it should be impossible. Can't one do a subselect or something?	15:30
jkakar	abentley: Sure. The best thing to do is first, figure out what query you want to run. The second step is to figure out how to make Storm generate it.	15:31
jkakar	abentley: If you can write out the query you want I can help you figure out the second part.	15:31
abentley	jkakar: If that is actually how it is done, why not write the query directly?	15:32
jkakar	abentley: A few reasons:	15:32
jkakar	- Storm will expand a class name into a series of column names in a query, such as in a SELECT.	15:32
jkakar	- When you use Storm you get a result set that gives you powerful capabilities, like union, max, count, etc.	15:33
jkakar	- When your class changes, because you added or removed a column, you don't have to change your queries unless they involve one of the modified attributes.	15:33
jkakar	- Most of the time you already know what query you want, so it isn't hard to get from what you want to a store.find() call with Storm expressions.	15:34
jkakar	abentley: This sounds like a case where the problem is not knowing what query you want to run. With Storm you're always expected to know what query you want to run.	15:34
abentley	jkakar: What makes you say that?	15:34
jkakar	abentley: It was designed explicitly not to hide SQL from you, but in fact, to make it possible to generate the exact query you want.	15:34
jkakar	abentley: Because the query you're generating doesn't work (according to the database)?	15:34
jkakar	abentley: Sorry, that probably sounded offensive, but I mean no offense.	15:35
abentley	jkakar: I didn't set out to generate a query. I set out to use an existing function that returns a collection that provides functionality that should do what I want.	15:35
jkakar	abentley: Okay.	15:36
abentley	The query I actually want is, "Find all SourcePackageRecipeBuilds where recipe = X and set recipe to NULL", which I can work out in SQL if you like.	15:37
jkakar	abentley: That's the next step, yes, working it out in SQL so you know what you need Storm to generate for you.	15:37
abentley	jkakar: UPDATE SourcePackageRecipeBuild SET recipe = NULL WHERE recipe = 5	15:40
abentley	jkakar: 5 actually being a variable.	15:40
jml	css question	15:41
jml	if I want to have a heading that has an image followed by some text aligned to the middle of that img, how do I do that?	15:41
jkakar	abentley: store.find(SourcePackageRecipeBuild, SourcePackageRecipeBuild.recipe == $value).update(recipe=None)	15:41
abentley	jkakar: I don't want to have two definitions of how you get the builds associated with a recipe, so how do I update getBuilds to return a ResultSet that works?	15:43
jkakar	abentley: Let me read some PostgreSQL documentation for a sec...	15:44
jkakar	abentley: Hmm, it looks like you could include multiple tables in an UPDATE... at least on PostgreSQL.	15:47
jkakar	abentley: I'm not sure exactly what you need... but I think you would benefit by writing a specialized query for the case you have.	15:52
jkakar	abentley: For two reasons, (1) it's a simpler query than the one from getBuilds and will probably run faster and (2) you'll run one less query than you do now (by specifying pending=True and pending=False).	15:53
bigjools	henninge: how big do the tarballs get that are produced by the TTBJs on the builders?	15:53
henninge	good question	15:54
henninge	bigjools: well, it's all text files so they should compress nicely.	15:54
abentley	jkakar: I disagree that it's a benefit. I'd rather have clearer code than simpler queries, and I think two queries is acceptable, and if I cared, I could update getBuilds so that I could get all builds at once.	15:54
bigjools	henninge: it's just that the code that jtv wrote reads them into memory ...	15:55
bigjools	it obviously works but I'd rather not have a time bomb	15:55
henninge	bigjools: They should not become very big, most projects don't have many templates	15:55
henninge	and if they have many, they are each small	15:56
jkakar	abentley: Okay. Updating getBuilds to optionally include the pending clauses then would do what you want... ie: use it in a way that doesn't include the pending clauses.	15:56
bigjools	henninge: typically what sort of size?	15:56
henninge	I'd have to research that. danilos, do you have a figure off the top of your head?	15:56
bigjools	henninge: the change I am making will mean we could potentially be reading as many of these as there are builders	15:56
bigjools	in parallel	15:56
danilos	henninge, 17	15:57
henninge	thanks danilos	15:57
bigjools	danilos is ever helpful :)	15:57
danilos	henninge, uhm, let me read the backscroll then	15:57
danilos	bigjools, if the tarball only includes translations, they should be small (never more than say 50M for the biggest case, but probably around 1M for most)	15:58
abentley	jkakar: So it would only support set if pending was not supplied?	15:58
jkakar	abentley: Yep.	15:58
abentley	jkakar: gross.	15:58
jkakar	abentley: Unless we change the way UPDATE statements are generated.	15:58
jkakar	abentley: So you were probably right in the beginning, there probably is a bug in Storm.	15:59
bigjools	danilos, henninge: aieeeee, I just looked at addOrUpdateEntriesFromTarball	15:59
bigjools	tarball_io = StringIO(content)	15:59
bigjools	if I have the file on disk is there a different method that will work?	16:00
danilos	bigjools, well, by that time, they are already in memory :) where is "content" initialized?	16:00
henninge	bigjools: actually, that's my code ;)	16:00
bigjools	danilos: either in the upload processor or from the builder	16:00
danilos	bigjools, we can as easily parse the file directly on-disk using the tarfile module, if I am not mistaken	16:01
bigjools	sorry but arbitrarily sized files going into stringio scares me	16:01
bigjools	I'm going to file a bug about this, it'll need fixes in a few places	16:01
danilos	bigjools, uhm, what I am trying to say is that StringIO is a shallow wrapper, entire file is already in the memory	16:01
bigjools	danilos: yes, it should not be :)	16:01
henninge	I get it	16:02
henninge	just some figures	16:02
danilos	bigjools, agreed, perhaps we need to save it to a tmp file before we process it	16:02
henninge	all of gimps templates are 736k	16:02
henninge	all of gtk+ templates (2) are 264k	16:02
bigjools	danilos: well I can make a tmp file available in the buildd-manager and the upload processor before it calls that method	16:02
bigjools	it currently has to read the file into memory before passing it	16:03
bigjools	if the template generation goes a bit wonky then it can easily take out the buildd-manager	16:03
bigjools	which Is Bad (TM)	16:03
danilos	bigjools, then it'd be a very simple fix on "our side"	16:04
bigjools	excellent, I'll file the bug and put some pointers to soyuz/buildmaster code in it	16:04
bigjools	cheers	16:04
danilos	bigjools, don't do it before you make the tmp file available :P	16:04
danilos	bigjools, also, note that we are using the same thing for actual Ubuntu package builds, so we'd want to fix that as well	16:05
bigjools	danilos: yes, that's what I was referring to above about the upload processore	16:05
danilos	bigjools, ok consigliere ;)	16:06
bigjools	heh	16:06
bigjools	was about to make a joke about an offer you can't refuse	16:06
danilos	heh	16:06
abentley	jkakar: Here's a version that seems to work: http://pastebin.ubuntu.com/530766/	16:07
jkakar	abentley: Yeah, not surprisingly. I wonder how that query performs compared to the other one, though?	16:07
abentley	jkakar: For the cases where both work, I bet they both perform the same. That's got to be trivial to optimize.	16:08
jkakar	abentley: Probably, yes. Though, in practice, I've occasionally seen dramatically different performance when a query uses a subselect vs. when it doesn't.	16:10
jkakar	It's hard to understand when that will be the case or why, though.	16:10
henninge	gary_poster: staging is timing out, too, now. It has been updated from 9955 to 9965	16:16
gary_poster	henninge: well, that seem to point a pretty stong finger at code then, which simplifies things in some ways. how is the revert going on qastaging?	16:17
gary_poster	*strong	16:17
henninge	gary_poster: it's taking it's time	16:17
gary_poster	:-)	16:17
henninge	its	16:17
gary_poster	ok	16:17
* gary_poster carefully replaces the second "it's" but leaves the first intact ;-)		16:18
henninge	thanks for being careful ;)	16:19
abentley	jkakar: filed as https://bugs.edge.launchpad.net/storm/+bug/674582	16:21
_mup_	Bug #674582: Storm may generate SQL errors on ResultSets.set for otherwise-working ResultSets. <Storm:New> <https://launchpad.net/bugs/674582>	16:21
jkakar	abentley: Thanks!	16:22
=== al-maisan is now known as almaisan-away
sinzui	henninge, gary_poster: I agree that staging is now as useless as qastaging, but I do not see what has changed to make SPR/SPPH queries slower.	16:27
gary_poster	sinzui, I am no longer actively investigating, because henninge's summary that revisions 11888 -> 11899 -> 11914 are a likely cause sounded like a good hypothesis. We are waiting to see if reverting these clears up qastaging.	16:29
sinzui	11914 is not on staging, so I discount that	16:33
gary_poster	yes, but it's part of the logical set	16:33
LPCIBot	Yippie, build fixed!	16:44
LPCIBot	Project devel build (221): FIXED in 4 hr 3 min: https://hudson.wedontsleep.org/job/devel/221/	16:44
sinzui	gary_poster, henninge, I do not understand the "set" point. I do not see that revision on staging 11914. I suspect that 11914 fixes the issue. I think the origin of the issue is 11899	16:53
henninge	sinzui: 11914 does not fix it, it had already been on qastaging and did not help	16:54
henninge	sinzui: we are currently reverting 914 and 899	16:55
henninge	gary_poster, sinzui: do you know if the revision display at the bottom of the LP page is dynamic or static?	16:59
henninge	i.e. Does it need a "make build" to be updated or is the information straight from the branch?	16:59
jml	people.c.c has an old launchpadlib :(	17:00
sinzui	henninge, it requires make build	17:03
henninge	so Chex just told me	17:03
bigjools	jml: I added a test to directly use downloadPage against a real slave in a test and it gets a "405 Method not allowed". Do you know if Twisted has the equivalent of urllib2.debug = True ?	17:06
sinzui	EdwinGrubbs, I am looking at distroseries.getCurrentSourceReleases() I think the subquery for max(spph.id) is doing a full table scan of SPNs because there is no constraint to return only the SPNs passed to the method	17:07
jml	bigjools: I don't know what urllib2.debug=True is.	17:08
jml	bigjools: and I don't know of any debugging foo off the top of my head	17:08
bigjools	jml: it dumps the http comms to stdout - I'm trying to work out what methods it's using that's not allowed	17:08
sinzui	EdwinGrubbs, I suspect that moving 'SourcePackageRelease.sourcepackagename IN %s" into the subquery will make the query faster	17:08
EdwinGrubbs	sinzui: that shouldn't be necessary since it looks like "spr.sourcepackagename = SourcePackageRelease.sourcepackagename" makes it search for all the spph/spr records for a single sourcepackagename.	17:11
jml	bigjools: nothing obviously like that in t.web	17:12
bigjools	jml: yeah, I looked too	17:12
jml	bigjools: wireshark maybe?	17:12
bigjools	tcpdump ... :)	17:12
jml	bigjools: ooh, did you know about from launchpadlib.uris import DOGFOOD_SERVICE_ROOT?	17:17
bigjools	yes	17:17
bigjools	I think I put it there and shamefully forgot	17:17
sinzui	EdwinGrubbs, that assumes that the query planner built that set first	17:17
EdwinGrubbs	sinzui: I've never seen an instance where the query planner thought that it would be faster to run a correlated subquery first and then limit the results of the outer query.	17:20
jml	why is it that bazaar.launchpad.net is so hard for dns servers to resolve?	17:20
EdwinGrubbs	s/limit/filter/	17:20
sinzui	EdwinGrubbs, since we are looking at a PG 8.4 change + the removal of the SPN table from the query. I think I should get sometimes based on where that constraint is placed	17:22
jml	mrevell: still around	17:23
jml	?	17:23
mrevell	Hi jml, sure am	17:24
jml	mrevell: I don't know where best to put this link on the beautifully presented https://dev.launchpad.net/BugJam – http://mumak.net/lp-bugjam-2010/	17:24
jml	mrevell: it's a count of the number of bugs fixed during the bug jam so far	17:24
EdwinGrubbs	sinzui: how many sourcepackagenames are passed in as an argument to getCurrentSourceReleases()	17:24
mrevell	jml, I love it :)	17:24
mrevell	jml, I'll put a link under "Tracking progress"	17:25
jml	mrevell: thanks.	17:25
sinzui	EdwinGrubbs, 1, but get get 38536 where we would expect 1 from natty, maybe 3 for maverick	17:26
sinzui	Edwin 1, my move of the constraint does not fix the issue, 2 I feel pretty good that getting what looks like getting a match for every SPN in natty implies an open join	17:29
EdwinGrubbs	sinzui: can I see the query plan?	17:30
sinzui	I will get it for you	17:31
bigjools	jml: well, that flushed out a nice bug in the tests we wrote a few weeks ago :)	17:33
jml	bigjools: which was?	17:34
bigjools	jml: it was constructing a url of the form /rpc/rpc	17:34
jml	bigjools: heh	17:34
jkakar	jml, abentley: Woah: http://paste.ubuntu.com/530794/	17:34
jml	jkakar: yeah, it's filed as a critical bug.	17:35
jml	jkakar: https://bugs.launchpad.net/launchpad-code/+bug/674305	17:36
_mup_	Bug #674305: bzr push occasionally reports AssertionError on terminal <codehosting-ssh> <xmlrpc> <Launchpad Bazaar Integration:Triaged> <https://launchpad.net/bugs/674305>	17:36
sinzui	EdwinGrubbs, this is the plan to get the current release of bzr, 1 SPN provided and only 1 expected: http://pastebin.ubuntu.com/530797/	17:37
jkakar	jml: Cool.	17:39
jkakar	jml: Dunno if it helps debugging, but this was with a bound branch, it wasn't a push (explicitly).	17:40
jml	jkakar: I'm not at all involved in fixing it	17:40
jml	<- part of the problem	17:40
jkakar	Heh	17:40
sinzui	EdwinGrubbs, sourcepackagename is still listed in the FROM. It was removed several revisions ago	17:40
sinzui	removing it from the query fixes everything	17:41
* sinzui looks at code again		17:41
EdwinGrubbs	sinzui: yeah, I was wondering where that table came from.	17:41
EdwinGrubbs	sinzui: so, is the code not broken? Was it just an old oops?	17:42
sinzui	EdwinGrubbs, It was removed a few days ago, lifeless removed it from clauseTables in r11914, but I suspect something else is putting the table in the from clause	17:43
sinzui	Edwin to be clear, the SPN joins were removed a few days ago, Lifeless then landed another branch to remove it fix clauseTables. But this oops shows that the SPN table is still in the from clause	17:44
sinzui	EdwinGrubbs, ^	17:46
sinzui	EdwinGrubbs, sorry. I am looking at too may oopses. That oops was for an older revision	17:49
* sinzui tries query from r11915		17:50
sinzui	EdwinGrubbs, This is the correct plan for qastaging: http://pastebin.ubuntu.com/530801/	17:51
mrevell	sinzui, Thanks for your post wrt strategies for the bug jam.	17:58
sinzui	mrevell, your welcome	17:59
mrevell	Have a wonderful weekend people. See you Monday.	17:59
EdwinGrubbs	sinzui: ok, the problem is that there are 1138 spr records for a single sourcepackagename.	18:01
sinzui	edwin I agree. I am looking for a constraint or a revised subquery that removes the loop or 1138	18:03
lifeless	moin	18:04
lifeless	sinzui: rev 11914	18:05
lifeless	EdwinGrubbs: ^	18:05
sinzui	lifeless: yes, but 11915 still times out	18:06
sinzui	lifeless we want to reduce the loop of SPRs in the query	18:06
bigjools	good bye, have a nice weekend	18:07
EdwinGrubbs	sinzui: since, there is only one valid spph record for all the spr records, you would get good performance by just eliminating the subquery and moving the conditions into the outer query. You will just have to eliminate the duplicates. DISTINCT won't let you choose the spph record with the max id, so you would have to do that in python, if it is important to get that spph record and not a random one.	18:07
* sinzui nods		18:08
lifeless	sinzui: works for me	18:08
lifeless	https://qastaging.launchpad.net/~yavdr/+archive/stable-vdr/+packages?start=0&batch=204	18:09
lifeless	At least 782 queries/external actions issued in 17.77 seconds	18:09
lifeless	little slower than ideal	18:09
lifeless	trying again to remove cold cache effects	18:09
* sinzui just went from 9695.618 to 33.446 ms using a subquery table of just current ids		18:09
lifeless	At least 782 queries/external actions issued in 12.63 seconds	18:10
lifeless	sinzui: ^ https://qastaging.launchpad.net/~yavdr/+archive/stable-vdr/+packages?start=0&batch=204	18:10
lifeless	sinzui: I welcome further improvements here	18:10
lifeless	EdwinGrubbs: bringing too much back and filtering in python will almost always be slower	18:10
sinzui	lifeless yes, we want to see a source package page load a single spr.	18:10
lifeless	storm is (relatively) slow at deserialisation, due to the cache coherency logic	18:11
sinzui	https://qastaging.launchpad.net/ubuntu/natty/+source/bzr	18:11
lifeless	At least 49 queries/external actions issued in 1.91 seconds	18:11
lifeless	view-source:https://qastaging.launchpad.net/~yavdr/+archive/stable-vdr/+packages?start=0&batch=1	18:11
lifeless	interestingly that page is not flat yet, the binaries must be the cause because there is a test that its flat with sources...and the binary test seemed surprisingly low to me	18:12
EdwinGrubbs	lifeless: it won't bring too much back since there is only one spph record for 1300 spr records that meets the condition. So, the filtering in python might only have to deal with eliminating a handful of records.	18:13
sinzui	EdwinGrubbs, I essentially did the reverse, of your suggestion. I converted the subquery to get the max id to be a table of only viable candidates: http://pastebin.ubuntu.com/530811/	18:13
* sinzui now tries to do it the EdwinGrubbs approved way		18:14
lifeless	ah right, deep history leading to a slow query	18:15
lifeless	EdwinGrubbs: when we query for 200 rows	18:15
lifeless	EdwinGrubbs: what would happen then	18:15
lifeless	EdwinGrubbs: e.g. for http://pastebin.com/7jC2vD7G	18:16
EdwinGrubbs	lifeless: yes, I would like to do it in the database, but I don't know if getting just the max(spph.id) for each sourcepackagename is important or not. To do that in the database would require using a temp table in order to get rid of the subquery.	18:20
sinzui	EdwinGrubbs, lifeless. I think this is the solution we want to achieve in the code http://pastebin.ubuntu.com/530817/	18:20
EdwinGrubbs	sinzui: that only works for a single sourcepackagename.	18:21
EdwinGrubbs	sinzui: oh wait	18:22
sinzui	Edwin why? I see the table controls the SPNs	18:22
sinzui	me tries a list	18:22
sinzui	Edwin it does work with multiple SPNs	18:23
EdwinGrubbs	sinzui: ok, that makes sense. I was thinking that you would run into problems with group by, but you are just grouping by the spr columns, so it all works out.	18:24
sinzui	well, it certainly did not work until I add that	18:24
sinzui	EdwinGrubbs, I do not need the outer "SourcePackageRelease.sourcepackagename IN ()" do I?	18:25
EdwinGrubbs	sinzui: no	18:25
sinzui	This is wicked fast	18:26
sinzui	I am going to start a branch and watch the tests pass	18:26
sinzui	gary_poster, henninge: I have a very fast query that fixes distroseries.getCurrentSourceReleases()	18:27
=== almaisan-away is now known as al-maisan
lifeless	(37 rows)	18:42
lifeless	Time: 186.584 ms	18:42
lifeless	sinzui: thats for the big page	18:42
lifeless	using your branch	18:42
lifeless	sinzui: love your work	18:43
sinzui	wow. I feel good	18:43
lifeless	this is going to knock +packages right back to zilch on the timeouts chart I think	18:43
sinzui	This will have to be stormified. I know how two write this in storm, but not sqlobject	18:43
lifeless	hmm?	18:44
lifeless	I mean thats great	18:44
lifeless	but I can help you do it in situ if you want	18:44
henninge	sinzui, lifeless: Is what you are doing related to the qastaging timeouts?	18:44
sinzui	yes	18:44
lifeless	henninge: do you mean on +packages?	18:45
sinzui	this looks like it will also fix many other timeouts in production too	18:45
henninge	no, the general timeouts we get on all kinds of pages.	18:45
lifeless	henninge: no	18:45
henninge	:(	18:45
lifeless	henninge: we get timeouts because of a few reasons	18:45
lifeless	a) cold cache effects in the db - its much smaller in memory that production	18:45
lifeless	b) we have inefficient code and staging hardware shows this up	18:46
lifeless	this is a case in point - sinzui is shaving many seconds off of a routine page	18:46
lifeless	c) contention/thrashing in the appserver due to all the scripts running on the appserver staging host asuka	18:46
lifeless	there is an rt open to address (c)	18:46
lifeless	(a) - retry a few times, if it eventually works prod will probably chew it up happily	18:47
lifeless	(b) - we need to fix our code. Which will help with (a) too	18:47
henninge	but it seems to be related to certain revisions of the code	18:48
henninge	it started on quastaging and when staging got updated with the same revisions it showed the same timeouts whereas before it (staging) was working fine.	18:48
lifeless	henninge: what pages specifically	18:48
henninge	all project homepages	18:49
henninge	launchpad.net/anyproject	18:49
henninge	all source packages	18:49
lifeless	from 11888 to 11914 we had a very broken query for getCurrentSourceReleases	18:49
henninge	launchpad.net/ubuntu/maverick/+sourece/anypackage	18:49
lifeless	all the pages you're listing are covered by it; it should be tolerable now - the same as before 11888	18:49
henninge	11914 did not fix it, though	18:49
lifeless	what EdwinGrubbs and sinzui are doing is about to make it much better	18:50
lifeless	henninge: this is one reason those pages are all slow on lpnet too	18:50
sinzui	lifeless, henninge method is used in soyuz, translations, registry, and bugs pages. Anything that wants to know the current release of a package is going to be between 50 and 100 times faster	18:51
lifeless	sinzui: yah	18:52
lifeless	sinzui: note that production db is much faster	18:52
lifeless	sinzui: so not all pages will zoom as much	18:52
lifeless	https://launchpad.net/ubuntu/natty/+source/bzr	18:52
rockstar	Does this error message in rabbitmq mean anything to anyone? It's preventing me from install launchpad-developer-dependencies. http://pastebin.ubuntu.com/530828/	18:52
lifeless	but there are many pages which do this query that will benefit a great deal	18:53
lifeless	rockstar: is it already running?	18:53
rockstar	lifeless, no, it won't start.	18:53
henninge	lifeless: I don't find those pages particularly slow but maybe I am just so accustomed to LP slowness ...	18:54
henninge	;-)	18:54
rockstar	lifeless, when the package gets installed, it explodes and prevents anything else from being installed.	18:54
lifeless	rockstar: on maverick ?	18:54
rockstar	lifeless, yes.	18:54
lifeless	hmm	18:54
lifeless	I don't know sorry	18:54
henninge	OK, let's wait and see the outcome of that work.	18:54
lifeless	inet_tcp",{{badmatch,{error,duplicate_name	18:54
lifeless	makes me think the socket is in use	18:54
rockstar	lifeless, hm...	18:54
lifeless	which would happen if you had a rabbit instance already running	18:54
lifeless	e.g. if the devscript is buggy on upgrades	18:55
henninge	sinzui: you have my r-c approval for landing that if it gets too late.	18:55
rockstar	lifeless, oh! Yeah, the other change on this laptop is the u1 setup, so I guess that makes sense. I completely spaced that.	18:55
sinzui	henninge, thanks.	18:56
henninge	PQM is scheduled to close in 3 hours	18:56
rockstar	lifeless, everything are happy now.	18:56
henninge	so you will need r-c ;)	18:56
=== henninge changed the topic of #launchpad-dev to: Launchpad Development Channel \| Week 3 of 10.11 \| PQM is closing at 22 UTC \| firefighting: Lots of timeouts on qastaging!! \| https://dev.launchpad.net/ \| Get the code: https://dev.launchpad.net/Getting
rockstar	lifeless, thanks for intervening between my head and the wall.	18:56
lifeless	rockstar: that was it ?	18:58
lifeless	rockstar: if so, please file a bug ... buggy package ;)	18:58
rockstar	lifeless, well, I should also say that I run launchpad in a chroot.	18:59
lifeless	rockstar: thats fodder for the bug report	19:00
flacoste	henninge, sinzui: could these slow pages be related to the changes to add latest releases to the source package pages?	19:03
flacoste	henninge, sinzui: that was added in a recent revision	19:03
sinzui	flacoste, I do not think so. The method was unchanged this year except for lifelesses changes this week	19:03
sinzui	flacoste I think this is PG 8.4	19:03
flacoste	sinzui: hennige says that timeouts increased with a recent revision	19:04
flacoste	as staging is now seeing the same timeouts than qastaging	19:04
flacoste	whereas it wasn't until it was updated	19:04
flacoste	and qastaging wasn't either yesterday	19:04
sinzui	flacoste yes, we had an open join, but there were timeouts none-the-less	19:04
sinzui	flacostewe had two landing to fix the issue, neither was substantial	19:05
lifeless	I fluffed one	19:05
lifeless	removed an unneeded table constraint, left the table in by mistake.	19:05
lifeless	that went boom badly ;)	19:05
flacoste	pages affected are:	19:06
flacoste	all project homepages	19:06
flacoste	all sourcepackages	19:06
flacoste	according to henninge again	19:06
lifeless	flacoste: we did just discuss this	19:06
lifeless	20 minutes ago	19:06
flacoste	right, i read the backlog	19:07
lifeless	kk	19:07
flacoste	but it's not clear that have identified the issue	19:07
lifeless	flacoste: we have an 8 second query	19:07
lifeless	that will come down to 140ms	19:07
lifeless	on qastaging	19:07
flacoste	sure	19:07
lifeless	we know they all use it	19:07
lifeless	until its fixed we have no data about what lies behind it	19:08
flacoste	well there is an alternative, which is to do a binary search to find the revision introducing the slowness	19:08
lifeless	flacoste: 11888	19:09
flacoste	lifeless: i was under the impression that we tried reverting 11888 and its two following fixes from qastaging, but that still resulted in all these pages timing out	19:10
flacoste	but now, i'm not sure, it's possible that only the follow-up fixes were reverted...	19:10
=== shadeslayer is now known as evilshadeslayer
lifeless	flacoste: 11888 is a confounding factor	19:11
flacoste	henninge: could you confirm/inform the above?^^^	19:11
lifeless	flacoste: with 11888 present, any other flaws would have been magnified	19:11
flacoste	right, but without it, we shouldn't see any more timeouts than before	19:11
lifeless	flacoste: even if 11888 isn't the cause of all the issues, we can't be sure without running with 11888 reverted and the others present	19:11
lifeless	flacoste: we're runnign 11887 live	19:12
flacoste	i know	19:12
lifeless	flacoste: so yes, I agree.	19:12
lifeless	we've always seen more timeouts on qastaging	19:12
lifeless	it has a 10 second timeout	19:12
flacoste	but i thought we had made that tests on qastaging (no 11888, others present) and found that it was still timing out all over the place	19:12
flacoste	but maybe, that's not the test that took place	19:12
flacoste	let me check the branch...	19:13
flacoste	lp:~henninge/launchpad/stable-revert	19:13
flacoste	ah, no	19:13
flacoste	only 11899 and 11914 were reverted	19:14
flacoste	so your hypothese holds	19:14
lifeless	ok	19:15
lifeless	https://lp-oops.canonical.com/oops.py/?oopsid=OOPS-1777QS139 is qastaging.launchpad.net/bzr	19:16
EdwinGrubbs	sinzui: can you look at these screenshots of the involvement portlet for the bugsupervisor versus the admins? https://devpad.canonical.com/~egrubbs/configuration/	19:46
sinzui	Edwin It was easier to write an exception?	19:48
sinzui	EdwinGrubbs, We wanted to hide the link once the tracker was configured	19:49
EdwinGrubbs	sinzui: well, the progress bar doesn't make sense with just one link being shown, so an exception seems like the cleaner solution. It would also be odd to have a single link hidden under the "Configuration options" expander.	19:52
sinzui	okay, I agree. Your approach is correct	19:53
lifeless	flacoste: https://lp-oops.canonical.com/oops.py/?oopsid=OOPS-1777QS139	19:55
henninge	flacoste: I did not revert 11888 in that branch because it has been on qastaging for a while without any trouble.	19:56
lifeless	qastaging.launchpad.net/bzr - its the query sinzui is redoing	19:56
henninge	flacoste: I am sorry for the misunderstanding	19:56
flacoste	lifeless: ack	19:56
sinzui	I am fixing the distroseries/source package problem illustrated by qastaging.launchpad.net/ubuntu/natty/+source/bzr	19:57
* henninge has to go away for a bit again		19:57
EdwinGrubbs	sinzui: can you review https://code.edge.launchpad.net/~edwin-grubbs/launchpad/bug-664788-configure-bugtracker-link-permission/+merge/40754	20:07
sinzui	I will	20:07
lifeless	thumper: perhaps we can cowboy in a squelch for the xmlrpc Fault	20:13
=== al-maisan is now known as almaisan-away
=== matsubara is now known as matsubara-afk
lifeless	sinzui: I have a favour to ask	20:30
lifeless	sinzui: add [rollback=11888] to your landing for the new query	20:31
sinzui	yes lifeless	20:31
sinzui	I will	20:31
lifeless	sinzui: it will tell qatagger that https://bugs.launchpad.net/soyuz/+bug/662523 can be unblocked so the deploy report is accurate	20:31
_mup_	Bug #662523: Archive:EntryResource:getBuildSummariesForSourceIds times out <bad-commit-11888> <timeout> <Soyuz:Fix Committed by lifeless> <https://launchpad.net/bugs/662523>	20:31
lifeless	sinzui: thank you!	20:31
wgrant	lifeless: So that SPN fix worked?	21:57
lifeless	wgrant: yes, and sinzui has an even more effective fix to make other uses of the query much more efficient	21:59
lifeless	wgrant: https://qastaging.launchpad.net/~yavdr/+archive/stable-vdr/+packages?start=0&batch=204	21:59
wgrant	lifeless: How fast is sinzui's?	21:59
lifeless	8700->100ms	21:59
lifeless	wgrant: on production this is already tolerably fast, db server size yada yada yada	22:00
wgrant	Nice.	22:00
wgrant	Yep.	22:00
lifeless	wgrant: but I expect a positive improvement all over.	22:00
sinzui	wgrant: my mp has time summaries and SQL explains https://code.launchpad.net/~sinzui/launchpad/ds-getcurrentreleases/+merge/40756	22:01
lifeless	sinzui: I'm really glad you guys dug into this	22:01
sinzui	Making the SP and DS pages faster really has requires half a dozen engineers looking at the same number of objects	22:02
=== henninge changed the topic of #launchpad-dev to: Launchpad Development Channel \| Week 3 of 10.11 \| PQM is in release-critical mode \| firefighting: - \| https://dev.launchpad.net/ \| Get the code: https://dev.launchpad.net/Getting
lifeless	sinzui: we've 22 or so, looking all across the board	22:08
sinzui	I fear milestones will be the last to fix :( I have time to return to that one next week.	22:09
lifeless	+commentedbugs is the current most severe timeout	22:10
lifeless	and stub has a fix \o/	22:10
lifeless	I won't have time to do anything with it till week after next	22:10
wgrant	Let me guess... it's querying badly to try to find comments with index != 0?	22:16
lifeless	read the bug :)	22:17
cody-somerville	http://www.jacobian.org/writing/buildbot/ci-is-hard/ <-- lmao. "Django’s big. The test suite is around 40,000 lines of code in something like 3,000 individual tests. We work constantly to speed up the test suite, but best case it still takes about 5 minutes to run. This means that our CI absolutely needs to be distributed — a single test server won’t cut it."	22:32
lifeless	flacoste: ping	22:51
flacoste	hi lifeless	22:52
lifeless	flacoste: I think we need to treat this bzr thing as an emergency	22:52
lifeless	flacoste: its very frequent	22:52
poolie	lifeless, which?	22:52
lifeless	poolie: the backtrace on push	22:52
flacoste	lifeless: my understanding is that it's only annoying, not a real error	22:52
lifeless	flacoste: our users don't know this	22:52
lifeless	flacoste: perception	22:52
poolie	this is the zope error being shown to the user?	22:53
poolie	is there a bug?	22:53
poolie	bug number	22:53
lifeless	poolie: flacoste: https://bugs.launchpad.net/launchpad-code/+bug/674305	22:53
_mup_	Bug #674305: bzr push occasionally reports AssertionError on terminal <codehosting-ssh> <xmlrpc> <Launchpad Bazaar Integration:Triaged> <https://launchpad.net/bugs/674305>	22:53
wgrant	Also, doesn't it stop a scan from being requested?	22:54
flacoste	lifeless: any idea of how we could fix this apart from escalating this RT?	22:55
flacoste	wgrant: that would be new information	22:55
lifeless	flacoste: here are the options I know about	22:55
lifeless	wgrant: thats m understanding too	22:55
lifeless	flacoste: a) escalate the RT	22:55
wgrant	If it doesn't mean that no scan is requested, then we have bigger problems.	22:56
lifeless	b) wedge in some retry code here - high risk	22:56
lifeless	wgrant: there are multiple routes to trigger scans	22:56
lifeless	wgrant: its possible a redundant route is saving us	22:56
lifeless	wgrant: e.g. the disconnect hook	22:56
wgrant	Possibly.	22:56
lifeless	c) push the mailman improvement and hope its enough	22:57
lifeless	d) disable other services like codeimport that use the same service	22:57
flacoste	c and d looks like the main option at this time	23:02
flacoste	can we get confirmation that scan isn't triggered?	23:02
flacoste	i don't get any errors from here fwiw	23:06
lifeless	it seems to be a couple of users an hour - which, because its not (appearing-to-be) localised to product/bug like other timeouts, particularly confusing and harmful to our users.	23:07
lifeless	theres no obvious rationale they can connect it to	23:08
lifeless	wgrant: we're talking with ops now	23:09
lifeless	Time Out Counts by Page ID	23:33
lifeless	HardSoftPage ID	23:33
lifeless	5707384CodeImportSchedulerApplication:CodeImportSchedulerAPI	23:33
lifeless	21132Person:+commentedbugs	23:33
lifeless	164561CodehostingApplication:CodehostingAPI	23:33
lifeless	44156BugTask:+index	23:33
lifeless	810ProjectGroup:+milestones	23:33
lifeless	6305Distribution:+bugtarget-portlet-bugfilters-stats	23:33
lifeless	5259Distribution:+bugs	23:33
lifeless	514Person:+bugs	23:33
lifeless	57DistroSeries:+queue	23:33
lifeless	54Archive:EntryResource:getBuildSummariesForSourceIds	23:33
lifeless	bah sorrry for the formatting	23:33
lifeless	flacoste: ^ turning off code imports is probably the fastest thing we can do	23:33
lifeless	mbarnett: how hard is it to disable all code imports ?	23:33
lifeless	mbarnett: we should be able to see an immediate drop in that netstat over a couple of minutes if thats were to help	23:34
lifeless	flacoste: 570 7384 CodeImportSchedulerApplication:CodeImportSchedulerAPI	23:36
lifeless	flacoste: 164 561 164I561ICodehostingApplication:CodehostingAPI	23:36
flacoste	lifeless: good suggestion	23:36
lifeless	mbarnett: please turn off the importds	23:37
lifeless	mbarnett: And keep watching that netstat	23:37
lifeless	flacoste: https://devpad.canonical.com/~lpqateam/lpnet-oops.html#time-outs is where I'm looking	23:41
mbarnett	no more imports should be fired off after any currently running complete.	23:43
lifeless	flacoste: look at this:	23:43
lifeless	https://lp-oops.canonical.com/oops.py/?oopsid=OOPS-1777XMLP1011	23:43
lifeless	SQL time: 17 ms	23:43
lifeless	Non-sql time: 15074 ms	23:43
wgrant	Ow.	23:43
lifeless	flacoste: this is why I want a) single threaded appservers and b) in the main cluster	23:44
flacoste	right	23:46
flacoste	the GIL hypothesis	23:46
lifeless	yes	23:46
lifeless	for instance	23:46
lifeless	https://lp-oops.canonical.com/oops.py/?oopsid=OOPS-1777XMLP4675	23:46
lifeless	but i have another hypothesis	23:46
lifeless	if we start the timer too early	23:47
lifeless	a deep queue could look like this as well	23:47
lifeless	see the 4675 in particular its a soft timeout	23:47

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!