/srv/irclogs.ubuntu.com/2010/11/12/#launchpad-dev.txt

mwhudsonerr!00:20
mwhudsonwhy is branchChanged hitting AssertionErrors?00:20
spivAnd no visible OOPS ID in the traceback sent to my 'bzr push' either...00:21
mwhudsonyeah00:21
spivOn the other hand, LP did seem to successfully notice that my branch changed.00:22
mwhudsonthumper: hello :-)00:23
mwhudsonwell00:24
mwhudsonthe assertionerror is because the transaction is doomed00:24
mwhudsonhttps://lp-oops.canonical.com/oops.py/?oopsid=OOPS-1777XMLP11900:25
mwhudsonah no, being doomed00:26
mwhudsonits in a timeout block00:27
mwhudsonwth, there's a gap of 15s between recorded queries00:28
* wgrant stabs qastaging.00:29
StevenKwgrant: What did qastaging ever do to you?00:34
mwhudsonspiv: https://bugs.launchpad.net/launchpad-code/+bug/674305 <- feel free to hit the affects me too thing :-)00:35
_mup_Bug #674305: bzr push occasionally reports AssertionError on terminal <Launchpad Bazaar Integration:New> <https://launchpad.net/bugs/674305>00:35
wgrantStevenK: Timed out lots.00:35
wgrantAlthough it may just be that those pages are broken now.00:35
wgrant(Archive:+index, +packages, +delete-packages, that sort of thing)00:35
wgrantHmm.00:39
wgrantIt'd be nice if daily builds didn't all hit and DoS the build farm at the same time.00:39
spivmwhudson: done, thanks!00:41
lifelesswgrant: https://bugs.launchpad.net/soyuz/+bug/67237100:45
_mup_Bug #672371: Archive:+packages timeouts <ppa> <qa-needstesting> <regression> <timeout> <Soyuz:Fix Committed by jelmer> <https://launchpad.net/bugs/672371>00:45
thumpermwhudson: hey00:46
thumpermwhudson: whazzup?00:46
mwhudsonthumper: that bug00:47
thumpermwhudson: I think00:48
mwhudsonthumper: https://bugs.launchpad.net/launchpad-code/+bug/67430500:48
_mup_Bug #674305: bzr push occasionally reports AssertionError on terminal <Launchpad Bazaar Integration:New> <https://launchpad.net/bugs/674305>00:48
thumpermwhudson: I think that may be the xmlrpc fuckage00:48
thumpermwhudson: not sure why there are massive gaps00:48
mwhudsonthumper: the xmlrpc fuckage?00:48
mwhudsonthe same as for getJobForMachine?00:48
thumpermwhudson: all the timeouts on the xmlrpc server00:48
thumpermwhudson: exactly00:48
mwhudsonhm, ok00:48
thumperI've not been able to find out why we have 8s gaps00:49
thumperwith no obvious reason00:49
mwhudson:/00:49
thumperI spent almost a week chasing it00:49
thumperand I've nothing to show for it00:50
wgrantlifeless: Yeah, but isn't that in theory fixed?00:50
lifelesswgrant: see my last comment00:50
wgrantOh.00:50
lifelessiz single slow query00:50
lifelesswell00:50
lifelessthere are other slow queries00:50
lifelessbut thats the smoking gun00:50
wgrantdoes that also take forever on a real DB?00:51
thumperlifeless: ah... no00:51
thumperit isn't a slow query00:51
thumperit is the 15s gap between query execution and the next one that bothers me00:52
thumpermwhudson: I'd love some help chasing that down as I've exhausted my understanding on that problem00:53
lifelesswgrant: don't know00:53
lifelessthumper: huh, what are you talking about?00:53
lifelessthumper: I'm talking about https://lp-oops.canonical.com/oops.py/?oopsid=OOPS-1776QS5100:53
thumperhttps://lp-oops.canonical.com/oops.py/?oopsid=OOPS-1777XMLP11900:53
lifelessquery 3300:53
thumperlifeless: ^^00:54
lifelessthumper: I'll have a look00:54
lifelessthumper: that looks like thread starvation to me00:55
thumperlifeless: but it is only a guess00:55
thumperlifeless: and why is it starved00:55
thumperwe don't know00:55
thumperwe are just guessing00:55
lifelessthumper: the losas have the xml server split out as the highest ticket00:56
lifelessthumper: when thats done we'll have more resources for xmlrpc00:56
lifelessthumper: and after that the single threaded experiment will kick in00:56
lifelessthumper: if you want to work on this today, I suggest implementing the per thread stats00:57
thumperlifeless: no, I'm in the middle of something else00:57
lifelesshttps://bugs.edge.launchpad.net/launchpad-foundations/+bug/243554 for reference01:01
_mup_Bug #243554: oops report should record information about the running environment <oops-infrastructure> <Launchpad Foundations:Triaged> <OOPS Tools:Triaged> <https://launchpad.net/bugs/243554>01:01
lifelesswgrant: I have two problems answering for 'on a real db'01:02
lifelesswgrant: firstly, we don't have the substituted ids to reproduce01:02
lifelesswgrant: secondly don't have access and we're short staffed losa-wise.01:02
lifelesswgrant: where are you up to exam wise?01:03
wgrantlifeless: On the first day of a 12 day break.01:05
wgrantSo not doing much.01:05
lifelesswgrant: Are you interested in tackling this perf issue? I have a trip on sunday for the cassandra training01:06
wgrantlifeless: We should have a stub soon, shouldn't we?01:06
lifelessand shoppinh/prep to do today01:06
lifelesswgrant: in a few hours yes01:06
lifelesswgrant: I'm strictly on leave, but I'm pretty bad at unwinding for < several-week periods.01:07
wgrantHeh.01:07
lifelessright now though, I have to do a shop-run. bbs.01:07
wgrantSo, 11888 made it bad, and the fix in iforgetwhat didn't help?01:07
lifelessit helped01:21
lifelessbut not enough01:21
lifelesswe have two options01:21
lifelessfix the query - its taking 200ms per SPPH at the moment.01:22
lifelessrollback both 11888 and 11903(?)01:22
lifelessnote that rolling back leaves the page at 10 seconds and the ajax status updating timing out.01:22
wallyworldthumper: hello, mcfly02:15
thumperwallyworld: whazzup?02:15
wallyworldi can't get branch lp:~wallyworld/launchpad/invalid-branch-link-message to merge properly02:16
wallyworldit's not in the codebase either locally or on loggerhead and any merge attempts via pqm or lp-land claim there is nothing to do02:16
thumperwallyworld: this is the revision that was backed out wasn't it?02:17
wallyworldyes02:17
wallyworldbut i fixed it02:17
wallyworldie backed out the bad yui stuff02:17
thumperright02:17
wallyworldit's gone past ec2 again no probs02:17
thumperdid you reverse the reversed merge?02:18
wallyworldno. not sure what to do02:18
thumperright02:18
thumperwhat you need to do is to merge devel into you branch02:18
thumperthen do a reverse merge of the revision that backed out your change02:18
thumperthe guts of the problem is that most of your branch has been merged02:19
thumperand the files were then reverted02:19
thumperso you need to revert the revert02:19
wallyworldok. noob alert. how do i do a reverse merge?02:19
thumperdo you know the devel revision that reverted your merge?02:19
thumperwallyworld: it is a cherry pick merge02:19
wallyworldi did just after it happened :-)02:20
spivwallyworld: merge -r NEW..OLD (rather than merge -r OLD..NEW)02:20
thumperwallyworld: I02:20
wallyworldi can see if i can find it02:20
thumperwallyworld: I'll leave you in spiv's capable hands02:20
spivwallyworld: "bzr help revert" has an example:02:20
thumperwallyworld: to test the merge locally02:20
spiv“For example, "merge . --revision -2..-3" will remove the02:21
spiv  changes introduced by -2, without affecting the changes introduced by -1.”02:21
thumperwallyworld: get an up to date devel, and go bzr merge --preview ../my-branch02:21
thumperwallyworld: that way you can see what pqm will be attempting to merge into devel02:21
thumperwallyworld: in the way of changes02:21
wallyworldok. i'll have a wee looksy. thanks. i'll grab a quick bite first. suddenly i'm hungry02:22
* thumper finally has the recipe index builds looking nice02:28
thumpernow for the tests...02:28
thumperF**K ME - 150 / 1593  CodeImportSchedulerApplication:CodeImportSchedulerAPI02:34
thumperhard / soft timeouts02:34
thumper36 /  131  CodehostingApplication:CodehostingAPI02:34
thumpermwhudson: ^^^ that'll be contributing to the push issues02:34
mwhudsonthumper: yep02:35
mwhudsonalso :(02:35
* thumper has push failures like mwhudson had02:43
lifelesswgrant: so ;)02:58
wgrantlifeless: Hi. Just reinstalled and trying to get Launchpad running.03:01
lifelessmeep!03:01
wallyworldpoolie: ping03:01
wgrantDesktp + Soyuz on amd64 with lp-buildd in a VM does not fit well in 4GiB. :/03:01
pooliehi there wallyworld03:01
pooliehi wgrant, lifeless03:01
wgrantAfternoon poolie.03:02
lifelesshi poolie03:02
wallyworldhey, with the bzr 2.2.2 upgrade, we talked about doing it today from tip to avoid 2 lots of downtime. but i don't really think we should package trunk prior to official release. what downtime is involved? when i did the 2.2.1 upgrade, was there any downtime there?03:03
poolieso two things:03:03
pooliefirstly, i wasn't really saying "you should package tip", just "it's safe to jump to tip if you want to"03:04
pooliewe shouldn't normally need to03:04
wgrantThere is a few seconds of downtime for codehosting upgrades.03:04
poolieand if there's a bug there for which you need an urgent deployment, it could be better to just do a release immediately03:05
pooliesecondly i don't think it's really relevant to downtime03:05
spivwgrant: although if you are a user 90% of the way through an hour long push the cost to you will be more than a few seconds...03:05
pooliei probably said "to avoid lag between us landing a fix and you running it"03:05
pooliehm iwbni it didn't interrupt running connections03:05
wgrantHmm, true.03:06
spivpoolie: hmm, and in this case hypothetically it wouldn't need to; we don't need to restart the ssh server, just provide a new bzr so that new connections will get a fixed lp-serve...03:06
wallyworldso, me thinks it's better to wait for bzr 2.2.2 to be released next week deal schedule a small outage03:07
wallyworldif needed at all03:07
wgrantWe have a downtime window next week for the DB upgrade anyway.03:07
lifelessright03:07
lifelessotherwise we have to schedule downtime03:07
lifelessunless its zomg time03:07
lifelesswe will once the relevant RT is done have no-downtime deploys to codehosting.03:08
lifelessbut its (I think) third in the queue.03:08
lifelessand we're getting one item done every 2-3 weeks.03:08
wallyworldthere's that cpu spin/wait issue that 2.2.2 fixes and a few people get hit by hit but not so many that we shouldn't wait till next week...03:08
spivTangentially, I see https://lpstats.canonical.com/graphs/CodehostingPerformance/ looks a bit alarming ?03:08
lifelessit does03:09
lifelessfortunately its friday and noone will care about it till Monday03:09
lifeless<ha ha ha>03:09
wallyworldlifeless: you shouldn't care about it either. so much for you taking the day off. my wife would kill me if i worked too much on my "day off"03:10
pooliehm, is is that a repeating pattern over the last 24h?03:11
pooliespm, are you back at work?03:11
spmpoolie: I am, but seriously considering tking the rest off - having a horrible hayfever attack atm - has triggered a very nasty asthma response. :-/03:12
lifelessspm: :(03:12
lifelessspm: taken claratyne?03:13
spmindeed03:13
lifelessspm: saline solutionas suggested can help a lot - gets the pollen out03:13
spmaye03:13
spivmmm, neti pots.03:13
lifelessspiv: I ordered one wed03:13
poolienasonex is great (prescription only)03:14
wgrantHmm. It'd be nice if we had tracebacks for each SQL statement.03:14
lifelesspoolie: yeah, mine runs out in a few days03:15
lifelessI've been given a (different) thing - I haven't read up to see if its equivalent yet.03:15
wgrantrofl03:16
lifeless'allonase' or something like that03:16
wgrant'I also suggest renaming "incomplete" to "need info", as it's much more03:16
wgrantdescriptive. "Incomplete" makes it sound like the bug is in progress of03:16
wgrantbeing fixed, but not yet done.'03:16
lifelesswgrant: https://bugs.launchpad.net/launchpad-foundations/+bug/60695903:16
_mup_Bug #606959: oops should record the short traceback that caused each query? <Launchpad Foundations:Triaged> <https://launchpad.net/bugs/606959>03:16
spivlifeless: heh03:16
spivlifeless: what's nice about that idea is that although capturing tracebacks is a touch expensive, that shouldn't matter if you only do a reasonable number of queries ;)03:17
lifelessspiv: http://ecoyogastore.co.nz/eco-yoga-gear/neti-pot03:17
lifelessspiv: yeah03:17
pooliei saw, linked from the discussion of Go, google have a final bug status of "unfortunate"03:17
pooliethat's nice03:17
lifelesslol03:17
poolie"suckstobeyou" :)03:18
wgrantI thought they added that specially for the naming bug.03:18
spivlifeless: what web stores need for neti pots are photos more like http://www.flickr.com/photos/debrisdesign/502255811/03:18
wgrantBut I may be wrong.03:18
poolieoh, maybe03:18
poolieit could be freeform for all i know03:18
lifelessspiv: yeah, I hope it has a manual03:19
pooliebut it's a bit more precise for some things than 'wontfix'03:19
spivlifeless: the internet can provide a guide or twenty, I'm sure.03:19
lifelesswhat we need is a closure-space03:19
lifelessN dimensions and a slider.03:19
lifelesslike the colour-space pickers03:20
wgrantpoolie: That's what Opinion is for!03:20
wgrant*cough*03:20
lifelesswgrant: thats an opinion!03:20
wgrantlifeless: :(03:24
lifelessseriously03:25
lifelessits still an experiment as far as I've heard03:25
wgrantAh.03:25
wgrantOK, with Unity defeated, it is now time to look at that query.03:29
lifelessheh03:29
lifelesswallyworld: if you want to discuss https://bugs.launchpad.net/bugs/674329 further I'm happy to do so - I didn't mean to prevent discussion about whatever symptoms you ran into.03:30
_mup_Bug #674329: DecoratedResultSet eagerly fetches all results <performance> <Launchpad Foundations:Won't Fix> <https://launchpad.net/bugs/674329>03:30
wallyworldlifeless: hmmm. seems at first glance the whole concept of iterable results sets which load records in batches is not supported?03:32
wallyworldwhat is the query returns 10000000 records. and the user only wants to see 100 at a time?03:33
lifelesswallyworld: thats what batch navigator is for03:33
lifelesswallyworld: we do a count(*) [we should estimate instead, but thats orthogonal) and then use a slice (OFFSET X LIMIT Y in SQL) to only retrieve 100 at a time.03:34
wallyworldi realise that's what it is supposed to be for, but isn't the pirpose defauted if __iter__ loads the whole lot anyway03:34
lifelesswallyworld: __iter__ is /not/ for 'do partial work'03:34
lifelesswallyworld: (neither in general, nor in this specific case)03:34
lifelesswallyworld: in this specific case its because the database server will do all the work requested, always.03:35
lifelessso we have to ask for the right amount of work up front rather than do some, do some more, and then say that we're done.03:35
lifelesswallyworld: if you consider the implications of ORDER BY/GROUP BY on the work required in the db, this should make a lot of sense03:36
wallyworldsorry for my dumbness, but isn;t the whole concept of yield to avoid eagerly realising the entire list?03:36
lifelessuhm03:36
lifelessso, iterators, generators and lazy evaluation03:37
wallyworldwhy does the server do all the work? other databases don't enforce this?03:37
lifelesswallyworld: good question. Pg definitely does; others I won't speculate on.03:37
wallyworldsure, the database has to do some work to satisfy order by etc, but the step of extracting the data from the db into the result set needn't be done unless required03:38
lifelessnevertheless03:38
lifelesspython-pgsql has a single large buffer with the results, no further network access occurs as we iterate the rows.03:38
lifelessOr so I am assured by Smart People.03:39
lifeless[specifically jamesh who dug into this in the past too]03:39
wallyworldok then.03:39
jameshby python-pgsql, you mean psycopg2?03:39
lifelessjamesh: blah - yes03:39
wallyworldlifeless: so to recap, if the result set has 10000000 rows, it's ok to do a list(rs) which effectively constructs an in memory data structure with all that data even if we only want to process 100 at a time?03:41
jameshwallyworld: if you stop reading the result set early, the only effort you're going to save is the conversion of the result buffer to Python objects on the client side.03:41
wallyworldor am i missing something?03:41
wgrantwallyworld: You'll slice first.03:41
wallyworldyes, and for a large result set, that's significant and a potential performance issue03:41
wgrantwallyworld: The slice affects the issued query.03:41
jameshif you know you will only need a subset of the rows, tell the database so that it can send you less info.03:42
wallyworldjamesh: i'm talking about say batch navigator which allows the user to scroll through the results 100 at a time.03:43
wallyworldwe may want the whole lot eventually, but not all at once03:43
wgrantThat slices, so the DB only sends those 100 rows.03:43
wgrantAnd only those 100 are turned into objects.03:43
wallyworldwgrant: not if a list(rs) is done??03:43
wallyworldwhich is what happens in DecoratedResultSet03:43
lifelesswallyworld: no, to recap, slice the resultset.03:43
wgrantwallyworld: __iter__ will only be called on the sliced version, right?03:43
jameshwallyworld: how do you know you'll want them all eventually?03:43
wgrantslicing returns a new resultset.03:44
wgrantAnd __iter__ is called on *that*.03:44
jameshfor example, how often do people go to the second page of results from a bug search?03:44
wallyworldjamesh: i said we *may* want them all eventually, say if the user scrolls to the end03:44
lifelesswallyworld: general principle: specify all the work you want within a *transaction* - call it 2 seconds of processing time.03:44
wallyworld:-)03:44
lifelesswallyworld: and ask for, and process that. No more (would be wasted). No less (would result in additional queries - lowers efficiency)03:45
lifelesswallyworld: the batch navigator does this slicing for you03:45
lifelesswallyworld: how about we get concrete. 'I'm trying to do X, and Y is happening'03:45
wallyworldok. i think my problem is i misunderstood how the batch navigator works.03:46
wallyworldthanks for setting me straight :-)03:46
lifelessthe batch navigator uses count() on the base result set to estimate the number of pages03:46
* wallyworld crawls back to his hole03:46
lifelessand a slice to get the data for the current page03:47
wallyworldmakes sense03:47
lifelessthe count() is a performance issue with huge datasets03:47
lifelesswe need to switch to estimators03:47
wallyworldyeah.03:47
lifelessbut thats orthogonal03:48
wallyworldalso, in my case, i had a query with a group by so had to override count()03:48
lifelesserm03:48
wallyworldthe default storm rs barfs03:48
lifeless:(03:48
lifelessI thought that was fixed in 0.1803:48
wallyworldyou can't say select (*) from xxxx with a group by in it03:49
wallyworldno03:49
wallyworldi fixed it quite simply03:49
wallyworldbut i also found a bug in Count()03:49
wallyworldit messes up count(distinct xxx)03:49
poolielifeless, do you go to the losa meetings?03:49
wallyworldit leaves out () around the columns03:49
pooliei don't know the speciic name for it, but i mean the one where francis asks them to do things03:50
wallyworlds/select(*)/select count(*)03:50
lifelesspoolie: no, tz fail. I get minutes, and have a separate meeting with ISF03:50
pooliek03:50
lifelesspoolie: I do when I'm in a workable tz03:50
pooliei'll mail him then03:51
pooliethanks03:51
pooliejam, did you file an RT for starting lp-serve?03:53
pooliebug 66026403:53
_mup_Bug #660264: bzr+ssh on launchpad should fork, not exec <qa-ok> <Launchpad Bazaar Integration:Fix Committed by jameinel> <https://launchpad.net/bugs/660264>03:54
jamI've had an rt for a while now, 41340 IIRC, but I'm not positive03:54
pooliethanks, i'll check that03:54
jamsorry, 4215603:54
* wallyworld goes to make a coffee and get his fire proof suit03:55
jampoolie: https://rt.admin.canonical.com/Ticket/Display.html?id=4179103:56
pooliethat's not exactly the same as getting it running though03:56
poolieis there a ticket or bug for that?03:57
poolieiirc you need them to change some configuration scripts that you don't yourself have access to?03:57
lifelesspoolie: the lp-serve thing is moving; jam needed to land more code03:58
poolieto do what?03:58
jampoolie: there is one, but I keep shooting blind as to the rt number03:59
jamLet me find the email04:00
pooliethanks04:00
wgrantCould someone run http://paste.ubuntu.com/530449/ on staging?04:00
poolielifeless, while jam's, looking, what do you understand the state of this to be?04:01
pooliei'd just like to make the bug accurate and work out where if anywhere it's getting stuck04:01
lifelesspoolie: its in a back and forth discussion with the losas as they figure all the bits out04:02
lifelesspoolie: its low priority (relatively that is) so I wouldn't expect it to happen rapidly04:02
jampoolie: 4219904:02
lifelesspoolie: mwhudson was landing the init script for jam, and with that it should be able to be enabled on staging04:03
lifelessand then qad04:03
lifelessepic fail04:04
lifeless3142  OOPS-1776B79    BugTask:+index04:04
poolieso from that rt it looks like the next action is still 'get the service running on qastaging'?04:04
lifeless=== Top 10 Time Out Counts by Page ID ===04:04
lifeless    Hard / Soft  Page ID04:04
lifeless     238 /   35  Person:+commentedbugs04:04
lifeless     150 / 1593  CodeImportSchedulerApplication:CodeImportSchedulerAPI04:04
lifeless      50 /  188  BugTask:+index04:04
lifeless      36 /  131  CodehostingApplication:CodehostingAPI04:04
lifeless      16 /    9  Person:+bugs04:04
lifeless      14 /  352  Distribution:+bugs04:04
jampoolie: right, this whole week there haven't been enough l-osas, and there have been some critical things going on04:04
lifeless       9 /   70  Archive:EntryResource:getBuildSummariesForSourceIds04:04
lifeless       9 /    8  Archive:+copy-packages04:04
lifeless       8 /  396  Distribution:+bugtarget-portlet-bugfilters-stats04:04
jamtoday there was only Ch-ex04:04
lifeless       7 /    0  BugTask:+addcomment04:04
lifelesspoolie: yes04:04
pooliek, i don't want to preempt the critical things, of course, i just want it to not stay stuck after that04:05
lifelesspoolie: so in my queue its:04:05
lifeless - after RFWTAD stuff - thats important to finish getting single revs deployed and finish eliminating operation risk04:05
lifeless - after token librarian - thats old inventory which fixes timeouts for many private attachments (e.g. security builds)04:06
lifelessin terms of LOSA time04:06
poolieok04:07
lifelessshort interrupts to move it along are of course reasonable04:07
poolieso it's off john's plate until they get to it?04:07
lifelesspoolie: John can best answer that04:07
jamlifeless, poolie: I'm at least pending them telling me what I need to do next04:12
jamthe last round I didn't know I needed until they asked for it04:12
pooliemm there seem to be a few problems like that04:12
thumperRFC: http://people.canonical.com/~tim/recipe-latest-builds.png04:21
thumperit is using factory generated fake data, so I have multiple binary builds for the same arch04:22
thumperbut the basics are there04:22
thumperthis is up for review now04:23
thumperpoolie: hi04:28
thumperpoolie: we have another urgent need for committing to stacked branches04:28
pooliehi thumper04:28
pooliei think francis mentioned this...04:29
thumperpoolie: bzr-builder commits to the branch04:29
poolieit was for.. right04:29
poolieand why does it want a stacked branch not a checkout?04:29
thumperpoolie: and getting a branch for some big projects was using much more memory than the virtual builders had04:29
thumperpoolie: because it never pushes04:29
wgrantthumper: Not a fan of the triplicated spr name and version, but apart from that it looks great.04:30
thumperpoolie: apparently an alternative solution is to change the merge code04:30
thumperpoolie: abentley wrote it all up04:30
poolieonto the bug about commit?04:31
thumperon the incident report04:31
thumperfor the buildd failures04:31
pooliethat was an email or a wiki page?04:31
thumperwiki page I believe04:31
thumperI could forward you the email if you like04:32
thumperaaron wrote solutions up for me04:32
pooliei can probably find it04:32
thumperrockstar: ping?04:32
* thumper EODs04:33
pooliethumper, is that https://wiki.canonical.com/IncidentReports/2010-10-28-LP-build-manager-not-dispatching ?04:34
thumperpoolie: ah, I see it isn't all on the incident report04:35
lifelessthumper: if its not pushing04:40
lifelessthumper: why commit at all?04:40
lifelessstub: what do you think of the idea of capturing query params in oops04:42
lifelessstub: it seems to me it will help reproducing issues  lot04:42
stublifeless: We will be logging private information, including information lp devs technically shouldn't have access to.04:43
stubSome of that already leaks via the URL of course (so LP devs can learn about private teams they shouldn't know about)04:45
stubBut that hasn't been a problem so far, as private stuff has been company internal rather than private to a subset of the company.04:45
lifelessstub: well, in theory :)04:46
lifelessstub: so, we also manually create many queries today04:46
lifelessso at least - today - we already leak that04:46
stubContent of some of the private bugs could be an issue, as that would violate vendorsec04:50
lifelessyeah04:51
lifelessall disclosure stuff is serious04:51
lifelessstub: when would we use content from a private bug in a query ?04:52
lifelessstub: INSERT I guess04:52
lifelessstub: + 'bugs like this'04:52
lifelessstub: uhm, fo rthe INSERT case we could choose not to substitute04:52
lifelesss/substitute/include/04:52
lifelessstub: we're trying to figure out why https://lp-oops.canonical.com/oops.py/?oopsid=OOPS-1777QS12 has multi second queries04:53
lifelessstub: doing them by hand with plausible ids is extremely fast - 130ms for the main lookup in the page04:53
lifelessstub: could it be the something odd like the isolation level (what level does appserver run as), or is it just the specific ids that will be at issue?04:55
lifelessstub: ping05:41
lifelesshmm, nvm for a sec05:41
lifelesswallyworld: qastaging-slave vs main05:42
lifelessperhaps05:42
lifelessbah05:42
lifelesswgrant: ^05:42
wgrantlifeless: Could be, I suppose.05:42
wallyworldlifeless: ECONTEXT05:42
lifelesswallyworld: I was talking to wgrant ; tab fail.06:04
wallyworldlifeless: np. i figured that when i saw the rest of the conversation come through :-)06:04
lifelessstub: ping06:14
=== almaisan-away is now known as al-maisan
stublifeless: pong06:14
lifelesshi06:14
lifelessI need your help06:14
lifelesswe've got a very odd thing happening06:14
lifelesshave a look at these two oopses06:15
lifelesshttps://lp-oops.canonical.com/oops.py/?oopsid=OOPS-1777QS1206:15
lifelesshttps://lp-oops.canonical.com/oops.py/?oopsid=OOPS-1777QS1906:15
lifelessthis is the +packages page which is a current blocker for deploying06:15
stubisolation level doesn't cause slowdowns06:15
lifelessthis page06:15
lifelesshttps://qastaging.launchpad.net/~yavdr/+archive/stable-vdr/+packages?start=0&batch=20406:15
lifelessin 1777QS12 query 34 takes  6.3 seconds06:16
lifelessin 19 it takes 202ms06:16
lifelessand 39 takes 20 seconds06:16
lifelesswe've shutdown cronscripts on asuka06:16
lifelessso the load should be tolerable (about 2 I believe - spm can confirm ?)06:17
lifelessrunning query 34 by hand, it takes about 200ms consistently, every time06:18
stublifeless: Are the oopses from the first batch? The way we currently do batching means that it you have a large set of results, the later batches will always timeout.06:18
lifelessstub: same batch in both oopses06:18
lifelessstub: same exact url06:18
lifelessahh06:22
lifelessI think I've managed to get a slow query06:22
lifeless\o/ finally06:22
wgrant!!06:23
stubOOPS-1777QS19 q39 is slow and comes with all the parameters (obviously we are not sanitizing the aborted query...)06:25
lifelessstub: yeah, its also genuinely slow locally06:25
lifelessby which I mean ro user on qastaging06:25
lifelessstub: thanks06:26
stubAnd that is slow because it is returning 1.35 million rows06:28
lifeless\o/06:28
lifelesswgrant: ^06:29
wgrantHmm. Is that the newer version query?06:29
lifelessI think I just reused the existing grouped version of it06:29
lifelesssounds like it was inefficient already06:29
lifeless:)06:29
lifelessor buggy06:30
wgrant1.35 million rows sounds buggy.06:30
lifelesswgrant: this give you what you need to make a test, isolate n fix?06:30
stublifeless: its missing a join condition06:31
wgrantlifeless: Maybe.06:31
wgrantHah, so it is.06:31
stublifeless: Its missing a 'AND sourcepackagename.id = sourcepackagerelease.sourcepackagename06:31
wgrantSPN06:31
wgrantYeah.06:31
lifelessstub: in the inner or outer?06:32
stubThe outer06:32
lifeless2.7 seconds06:33
lifelesstolerable with just one06:33
stubSo every matched row is being expanded to 38k rows.06:33
wgrantAh.06:33
wgrantI think in fact that it shouldn't be joining against SPN at all.06:33
lifelesswgrant: still badly needs tuning06:33
lifelessoh, I did chage that, I removed spn.... but I bet storm is putting it back in.06:34
lifelessbastardo.06:34
lifelesshow do you disable autotables?06:34
lifelessjamesh: ^06:34
wgrantlifeless: It's still explicitly there.06:34
wgrant            clauseTables=[06:34
wgrant                'SourcePackageName', 'SourcePackagePublishingHistory'])06:34
wgrants/Name/Release/, I suspect.06:34
stubSo we might be able to avoid the subselect using DISTINCT ON06:34
jameshlifeless: what's the context?06:34
lifelessjamesh: nvm :)06:35
lifelessjamesh: I was thinking storm was seeing a table ref from an inner query and autotables adding it to the outer FROM06:35
lifelessjamesh: but I was wrong06:35
wgrantlifeless: So, how does it go if you remove the SPN join from the query?06:36
jameshah.06:36
lifelesswgrant: fine06:36
lifelesswgrant: what file is tha tin06:36
lifelesswgrant: 2.6 seconds06:36
wgrantlifeless: lib/lp/registry/model/distroseries.py06:37
wgrant2.6 seconds sounds sort of excessive.06:37
lifelesshttp://pastebin.com/bJ2TxmFc06:38
stubHmm... distinct on makes it worse.06:41
lifelessok06:43
lifelessthats up in PQM06:43
lifelessimmediate fix06:43
wgrantAnd that hopefully makes it non-critical.06:44
lifelessyeah06:49
lifelessassuming theres nothing hiding behind it06:49
lifelesslet me get the change cowboyed to see06:49
wgrantThis explains why even trivial archives were timing out.06:51
lifelesswgrant: indeed07:17
lifelessok its landed07:18
adeuringgood morning08:58
bigjoolsmorning08:59
mrevellHey up, by the way09:10
henningelifeless: It looks like your fix for bug 672371 did not help. +packages still times out on qastaing.09:51
_mup_Bug #672371: Archive:+packages timeouts <ppa> <qa-needstesting> <regression> <timeout> <Soyuz:Fix Committed by jelmer> <https://launchpad.net/bugs/672371>09:51
henningeWhat's next? Revert r11888?09:52
henningejml: Hi! Any chance you could QA bug 673015?09:55
_mup_Bug #673015: Code of Conduct requirement for PPA upload rights is unnecessary <ppa> <qa-needstesting> <Soyuz:Fix Committed by jml> <https://launchpad.net/bugs/673015>09:55
henningeallenap: Hi! Any luck figuring out bug 667340?09:56
_mup_Bug #667340: Trac status of "Verified" confuses bug watcher <qa-bad> <trac-support> <trivial> <Launchpad Bugs:Fix Committed by allenap> <https://launchpad.net/bugs/667340>09:56
henningestub: Can you please QA bug 673874 before starting on your weekend?09:59
_mup_Bug #673874: Improve bug comment caching <qa-needstesting> <Launchpad Bugs:Fix Committed by stub> <https://launchpad.net/bugs/673874>09:59
allenaphenninge: No, not yet. It hasn't caused any regressions, so it's actually safe to go.09:59
allenaphenninge: I'll mark it as qa-ok but continue to investigate.09:59
henningeallenap: thanks a lot!09:59
henningegmb: can you please QA bug 672507 ?10:04
_mup_Bug #672507: Add bug_notification_level to the structural +subscribe view <qa-needstesting> <story-better-bug-notification> <Launchpad Bugs:Fix Committed by gmb> <https://launchpad.net/bugs/672507>10:04
gmbhenninge: Sure.10:04
gmbhenninge: Done10:06
henningegmb: thanks a lot!10:06
bigjoolshenninge: jml needs my help to QA that10:10
henningebigjools: thanks for offering it ;)10:10
lifelesshenninge: see the comment in the bug10:11
bigjoolshe can't QA without it since it needs dogfood :)10:11
lifelesshenninge: we're waiting on https://lpbuildbot.canonical.com/waterfall10:11
henningelifeless: ah yes, thank you.10:14
jmlhello.10:19
jmlyes QA, I know I know10:19
jmlbigjools: where do I need to point .dput.cf at?10:26
bigjoolsjml: http://pastebin.ubuntu.com/530615/10:26
* bigjools processes your upload10:28
bigjoolsjml: rejected10:29
jmlbigjools: why so?10:30
bigjoolsjml: can I help you make a dummy package that I know works10:30
bigjools"Unable to find python-testtools_0.9.6.orig.tar.gz"10:30
bigjoolsand it was a mixed upload it seems10:30
jmlmeaning?10:30
bigjoolsbinaries and source10:30
bigjoolsjml: I normally use the "hello" package10:31
bigjoolsapt-get source hello10:31
bigjoolscd hello-2.510:32
bigjoolsdch -i10:32
bigjools<add a revision>10:32
jmlyeah, that's what I did with testtools10:32
jml(so far so good)10:32
* bigjools sighs at stuck keys10:33
jmlheh10:33
bigjoolsok, then you need to "debuild -S"10:33
jmlahh10:34
jmlit's the -S that I didn't do10:34
jmluploaded10:34
bigjoolsaccepting it this time10:35
jmlyay10:35
bigjoolsyou cleared the CoC from ~jml?10:36
jmlbigjools: I did, but I'd like to double check with getUtility(IPersonSet).getByName('jml').is_ubuntu_coc_signer10:36
* bigjools checks10:36
bigjoolsFalse10:37
bigjoolsqa-ok!10:37
jmlsweet.10:37
jmlbigjools: thanks!10:37
bigjoolsmy pleasure10:37
* bigjools goes to celebrate with caffeine10:38
jmlhenninge: what's the word on the crazy non-vc managed file that refers to class paths?10:41
henningejml: It cannot be updated outside of a roll-out - at least not without Tom around ...10:43
henningejml: So I am preparing a branch that adds the required import to c.l.i again with an XXX to remove it again after the roll-out.10:44
jmlhenninge: that seems unsatisfactory10:44
henningeand a special roll-out requirement to update that file10:44
jmlhenninge: can't we just add the requirement and leave c.l.i as-is?10:45
henningejml: only if we go without a further deployment today10:45
jmlhenninge: so it needs a rollout-with-downtime?10:45
henningeso spm says, yes.10:45
jmlhenninge: did he say what it's needed for?10:46
henningejml: hang on, I'll forward the mail10:46
jmlhenninge: thanks :)10:46
jmlhenninge: ok. I find this whole thing colossally annoying, but it looks like you guys are making the best of a bad situation.10:55
henningejml: we are trying hard ... ;) thanks10:55
henningeand yes, it is annoying10:55
=== al-maisan is now known as almaisan-away
lifelesshenninge: it passed buildbot10:59
lifelesshenninge: when 914 hits qastaging10:59
lifelessthen10:59
lifelesshttps://qastaging.launchpad.net/~yavdr/+archive/stable-vdr/+packages?start=0&batch=20410:59
lifelessshould start working10:59
lifelessthat should be anytime now11:00
henningelifeless: thanks! But it will be another 4 hours or so ...11:00
lifelesshenninge: why?11:01
henningehttps://lpbuildbot.canonical.com/builders/lucid_lp/builds/35511:01
henningeIt just entered buildbot not passed it yet11:01
lifelessoh crumbs 913 I see11:01
lifelessah well11:01
lifelessgl11:01
lifeless!11:01
lifelessand gnight all11:01
henningelifeless: good night and thanks again.11:02
=== matsubara_ is now known as matsubara
bigjoolshello - I am having trouble getting the webservice to work on dogfood.  When I try and log in, there's a rejection because it can't traverse to '1.0'.  HALP?11:23
jelmerbigjools: Have you tried using 'devel' rather than 1.0 ?11:25
bigjoolsyes, that's what I am using - which makes the error more odderer11:25
bigjoolslaunchpad = Launchpad.login_with('testing', 'https://api.dogfood.launchpad.net/devel/')11:25
jelmerShouldn't there be another /api/ in there?11:25
bigjoolsyes11:26
bigjoolsstill fails!11:26
jelmerthen I'm out of ideas :-)11:26
bigjoolshmm using https://api.dogfood.launchpad.net/api worked11:27
bigjoolsah you need to write version='devel' in the login_with params11:32
bigjoolsjml: got a sec?11:59
jmlbigjools: sure11:59
bigjoolsjml: I'm probably doing something very very stupid but I have code blindness.   See  http://pastebin.ubuntu.com/530655/12:00
bigjoolsthere's a code snippet and a pdb session12:00
bigjoolsthe inner function callback can't see all of the outer method's variables....12:00
LPCIBotProject devel build (220): FAILURE in 2 hr 4 min: https://hudson.wedontsleep.org/job/devel/220/12:01
LPCIBot* Launchpad Patch Queue Manager: [r=lifeless][ui=none][no-qa] Remove StartsWith matcher from12:01
LPCIBotlp.testing.matchers in favour of one from testtools & fix some12:01
LPCIBotassertions that always passed.12:01
LPCIBot* Launchpad Patch Queue Manager: [r=lifeless][ui=none][no-qa] Really drop Sourcepackagename from getNewerSourceReleases - fixing massive timeouts on +packages.12:01
jml./me looks12:02
deryckMorning, all.12:02
bigjoolsmorning deryck12:02
jmlbigjools: you are masking them in scope, I think.12:03
jmlbigjools: let me knock up a simpler example...12:03
=== henninge changed the topic of #launchpad-dev to: Launchpad Development Channel | Week 3 of 10.11 | PQM is open | firefighting: Lots of timeouts on qastaging!! | https:/​/​dev.launchpad.net/​ | Get the code: https:/​/​dev.launchpad.net/​Getting
jmlbigjools: http://paste.ubuntu.com/530657/12:04
henningeOK, qastaging is timing out left and right ... :(12:04
henningeUbuntu pages seem to work fine but any project page times out.12:05
jmlbigjools: in "if file_sha1 == 'buildlog':", you are overriding out_file, out_file_name and out_file_fd12:05
jmlbigjools: probably the thing to do is pass them in.12:05
jmle.g.12:05
bigjoolsjml: it's not got that far yet12:05
jmld.addCallback(got_file, out_file_name, out_file)12:05
jmlbigjools: it doesn't matter.12:05
bigjoolsok12:05
jmlbigjools: run the python I pasted12:05
bigjoolsthat's special12:06
jmlbigjools: simply having an assignment in the scope masks the outer scope, whether or not the assignment has been evaluated.12:06
jmlbigjools: I'm not sure it would be sensible to do anything else.12:07
bigjoolsjml: ok thanks , I'll pass 'em in12:07
jmlbigjools: np.12:07
henningeArgh!12:14
henningeI think I never realized how widespread the problems are that r11888 caused.12:14
henningeMaybe it's just that.12:15
wgranthenninge: It should be limited to pages on IArchive.12:17
jmlhenninge: can you please subscribe me to whatever bug you file for the XXX in c/l/interfaces/__init__?12:17
henningejml: oh bug, right ... ;)12:18
wgranthenninge: Anything outside Archive:+(index|packages|copy-packages|delete-packages) is probably not 11888.12:18
henningewgrant: thanks12:18
henningealthough I wish it was ... (because there is a fix coming)12:19
jmlbigjools: should I put that API gotcha on a wiki page somewhere?12:26
bigjoolsjml: not yet - I can't get it working still12:26
jmlbigjools: ok.12:26
bigjoolsjml: there's an error from wadllib about "Can't look up definition in another url"12:26
=== mrevell is now known as mrevell-lunch
jmlI've not seen that one before12:27
bigjoolsand I suspect I need leonardr12:27
bigjoolsyeah, it's doing something weird so that the /api is stripped somewhere12:27
wgrantThe URL shouldn't have /api in it.12:27
bigjoolsbut later depends on it being there12:27
wgrant /api is used to traverse from the webapp to the API -- you don't use it on api.launchpad.net.12:28
bigjools.........12:28
bigjoolsand so it works12:28
bigjoolsthanks wgrant12:28
wgrantHeh.12:28
henningejml: bug 674476 I failed to mention it in the XXX, though... :/12:30
_mup_Bug #674476: Files outside the LP tree reference LP code <Launchpad Foundations:New> <https://launchpad.net/bugs/674476>12:30
jmlhenninge: thanks. that's ok.12:30
henningeand you are subscribed12:31
henningejml: and thanks for reminding me about the bug12:32
jmlhenninge: np.12:33
=== didrocks1 is now known as didrocks
mrevell-lunch#12:57
=== mrevell-lunch is now known as mrevell
=== almaisan-away is now known as al-maisan
=== beuno_ is now known as beuno
jmlStarted in 15 minutes 27 seconds!13:14
bigjoolsjml: ha - remember how we added Deferred to lp_sitecustomise.py?13:24
jmlyeah?13:25
bigjoolsjml: looks like I need DeferredList too :)13:25
jmlbigjools: I thought DeferredList subclassed Deferred13:25
bigjoolsForbiddenAttribute: ('addCallback', <DeferredList ....13:25
jmlmeh13:26
bigjoolsI guess zope doesn't care so much about that13:26
jmlwhy might bugtask.date_closed be none, even though its status is one of Fix Released, Wontfix or Inprogress?13:39
shadeslayerjam: was my qtwebkit build fix0red?14:03
=== matsubara is now known as matsubara-lunch
=== Ursinha is now known as Ursinha-lunch
gary_posterhenninge: where are we with the qastaging slowdown?  I see that only qastaging is affected; staging and production are fine.  The timeout exception I see is within database code, but since that's where we check for timeouts, that's not necessarily indicative.14:24
gary_posterHas anyone looked at qastaging logs?  Has anyone looked at performance graphs?  Has anyone tried to correlate performance graphs with revisions deployed on qastaging?14:24
gary_posterand, are we coordinating here or on -ops?14:25
gary_posterHm, no tuolomne graphs of qastaging AFAICT :-/14:26
gary_posterMaybe I need to know the machine name(s)14:27
gary_posterqastaging is the same machines as staging, and staging is not timing out (as badly?) so machine load doesn't seem likely...14:34
gary_postertrying logs14:34
henningegary_poster: staging is a lot of revisions behind qastaging atm14:36
gary_posterhenninge: I figured it was something like that, yeah14:36
gary_posterso what has been done?14:36
gary_posterI saw stub's reply, but that didn't tell us much14:37
gary_posterI was about to grope around in logs14:37
henningegary_poster: logs is good14:38
henningeI was hoping that the authors of the revision could check if any of their code could be causing this.14:39
gary_posterhenninge: you identified the revision?14:40
henningeno, it's just any of the later ones.14:41
gary_posterah :-)14:41
henningebug I could narrow down the range because it only started today.14:41
gary_posterhenninge: you up for that while I do log groping?  I can share log groping fun here.  So far the only thing that does not look like chatter in the qastaging librarian log  is "Exception KeyError: ((<class 'canonical.launchpad.database.librarian.LibraryFileAlias'>, (1890638,)),) in <function remove at 0x8f3ced8> ignored"14:44
gary_posterseems to be mostly happy thoug14:44
gary_posterh14:45
henningegary_poster: I am looking at the revs atm, yes.14:45
gary_postercool thanks14:46
gary_posterThere are boatloads of "No handlers could be found for logger "librarian"" things in logs, which do make me a nit nervous14:51
gary_posterbit14:51
henningegary_poster: I noticed earlier that getting images from the librarian had a long delay.14:52
jmlmrevell: flacoste: http://paste.ubuntu.com/530734/14:53
gary_posterhenninge: yeah.  Maybe.  It doesn't smell like the cause to me.  This is interesting though: qastaging app log is *swamped* with these: http://pastebin.ubuntu.com/530735/14:53
henningegary_poster: What is a "DoomedTransaction"?14:54
gary_postera transaction that must not be restarted14:54
gary_posterhenninge: may be an unrelated problem.  This started after the the restart 2010-10-21T15:22:39, so it's been happening a looong time14:56
henningeah, ok14:57
abentleyjkakar: around? ResultSet.set is generating bad SQL.15:07
=== Ursinha-lunch is now known as Ursinha
abentleyjkakar: http://pastebin.ubuntu.com/530742/15:13
=== matsubara-lunch is now known as matsubara
jkakarCan you paste the code that generates this, please?15:17
jkakarabentley: ^^ Also, am on a call, will be laggy.15:18
abentleyjkakar: http://pastebin.ubuntu.com/530747/15:22
jkakarabentley: What's the __storm_table__ for SourcePackageRecipeBuild.15:23
abentleyjkakar: __storm_table__ = 'SourcePackageRecipeBuild'15:24
jkakarabentley: Is that right?15:25
abentleyjkakar: Yes.15:25
jkakarabentley: Can you do a JOIN in an UPDATE statement?  It looks like you're building a bad query, not that Storm is generating a bad one.15:25
abentleyjkakar: I'm not an expert on SQL syntax.  It's possible that I'm asking Storm to do the impossible, but if I am, I expect Storm to tell me.15:27
jkakarabentley: Storm won't tell you if you're trying to do the impossible.15:29
jkakarabentley: It's reasonable to expect it, but Storm is just a "thin" *cough* layer with an expression compiler that generates SQL exactly as you specify15:29
jkakarabentley: The database is telling you that you're trying to do the impossible.  Which, given that different backends have different definitions of "impossible", is probably right anyway.15:30
abentleyjkakar: This doesn't seem like it *should* be impossible.  Can't one do a subselect or something?15:30
jkakarabentley: Sure.  The best thing to do is first, figure out what query you want to run.  The second step is to figure out how to make Storm generate it.15:31
jkakarabentley: If you can write out the query you want I can help you figure out the second part.15:31
abentleyjkakar: If that is actually how it is done, why not write the query directly?15:32
jkakarabentley: A few reasons:15:32
jkakar- Storm will expand a class name into a series of column names in a query, such as in a SELECT.15:32
jkakar- When you use Storm you get a result set that gives you powerful capabilities, like union, max, count, etc.15:33
jkakar- When your class changes, because you added or removed a column, you don't have to change your queries unless they involve one of the modified attributes.15:33
jkakar- Most of the time you already know what query you want, so it isn't hard to get from what you want to a store.find() call with Storm expressions.15:34
jkakarabentley: This sounds like a case where the problem is not knowing what query you want to run.  With Storm you're always expected to know what query you want to run.15:34
abentleyjkakar: What makes you say that?15:34
jkakarabentley: It was designed explicitly not to hide SQL from you, but in fact, to make it possible to generate the exact query you want.15:34
jkakarabentley: Because the query you're generating doesn't work (according to the database)?15:34
jkakarabentley: Sorry, that probably sounded offensive, but I mean no offense.15:35
abentleyjkakar: I didn't set out to generate a query.  I set out to use an existing function that returns a collection that provides functionality that should do what I want.15:35
jkakarabentley: Okay.15:36
abentleyThe query I actually want is, "Find all SourcePackageRecipeBuilds where recipe = X and set recipe to NULL", which I can work out in SQL if you like.15:37
jkakarabentley: That's the next step, yes, working it out in SQL so you know what you need Storm to generate for you.15:37
abentleyjkakar: UPDATE SourcePackageRecipeBuild SET recipe = NULL WHERE recipe = 515:40
abentleyjkakar: 5 actually being a variable.15:40
jmlcss question15:41
jmlif I want to have a heading that has an image followed by some text aligned to the middle of that img, how do I do that?15:41
jkakarabentley: store.find(SourcePackageRecipeBuild, SourcePackageRecipeBuild.recipe == $value).update(recipe=None)15:41
abentleyjkakar: I don't want to have two definitions of how you get the builds associated with a recipe, so how do I update getBuilds to return a ResultSet that works?15:43
jkakarabentley: Let me read some PostgreSQL documentation for a sec...15:44
jkakarabentley: Hmm, it looks like you could include multiple tables in an UPDATE... at least on PostgreSQL.15:47
jkakarabentley: I'm not sure exactly what you need... but I think you would benefit by writing a specialized query for the case you have.15:52
jkakarabentley: For two reasons, (1) it's a simpler query than the one from getBuilds and will probably run faster and (2) you'll run one less query than you do now (by specifying pending=True and pending=False).15:53
bigjoolshenninge: how big do the tarballs get that are produced by the TTBJs on the builders?15:53
henningegood question15:54
henningebigjools: well, it's all text files so they should compress nicely.15:54
abentleyjkakar: I disagree that it's a benefit.  I'd rather have clearer code than simpler queries, and I think two queries is acceptable, and if I cared, I could update getBuilds so that I could get all builds at once.15:54
bigjoolshenninge: it's just that the code that jtv wrote reads them into memory ...15:55
bigjoolsit obviously works but I'd rather not have a time bomb15:55
henningebigjools: They should not become very big, most projects don't have many templates15:55
henningeand if they have many, they are each small15:56
jkakarabentley: Okay.  Updating getBuilds to optionally include the pending clauses then would do what you want... ie: use it in a way that doesn't include the pending clauses.15:56
bigjoolshenninge: typically what sort of size?15:56
henningeI'd have to research that. danilos, do you have a figure off the top of your head?15:56
bigjoolshenninge: the change I am making will mean we could potentially be reading as many of these as there are builders15:56
bigjoolsin parallel15:56
daniloshenninge, 1715:57
henningethanks danilos15:57
bigjoolsdanilos is ever helpful :)15:57
daniloshenninge, uhm, let me read the backscroll then15:57
danilosbigjools, if the tarball only includes translations, they should be small (never more than say 50M for the biggest case, but probably around 1M for most)15:58
abentleyjkakar: So it would only support set if pending was not supplied?15:58
jkakarabentley: Yep.15:58
abentleyjkakar: gross.15:58
jkakarabentley: Unless we change the way UPDATE statements are generated.15:58
jkakarabentley: So you were probably right in the beginning, there probably is a bug in Storm.15:59
bigjoolsdanilos, henninge: aieeeee, I just looked at addOrUpdateEntriesFromTarball15:59
bigjoolstarball_io = StringIO(content)15:59
bigjoolsif I have the file on disk is there a different method that will work?16:00
danilosbigjools, well, by that time, they are already in memory :) where is "content" initialized?16:00
henningebigjools: actually, that's my code ;)16:00
bigjoolsdanilos: either in the upload processor or from the builder16:00
danilosbigjools, we can as easily parse the file directly on-disk using the tarfile module, if I am not mistaken16:01
bigjoolssorry but arbitrarily sized files going into stringio scares me16:01
bigjoolsI'm going to file a bug about this, it'll need fixes in a few places16:01
danilosbigjools, uhm, what I am trying to say is that StringIO is a shallow wrapper, entire file is already in the memory16:01
bigjoolsdanilos: yes, it should not be :)16:01
henningeI get it16:02
henningejust some figures16:02
danilosbigjools, agreed, perhaps we need to save it to a tmp file before we process it16:02
henningeall of gimps templates are 736k16:02
henningeall of gtk+ templates (2) are 264k16:02
bigjoolsdanilos: well I can make a tmp file available in the buildd-manager and the upload processor before it calls that method16:02
bigjoolsit currently has to read the file into memory before passing it16:03
bigjoolsif the template generation goes a bit wonky then it can easily take out the buildd-manager16:03
bigjoolswhich Is Bad (TM)16:03
danilosbigjools, then it'd be a very simple fix on "our side"16:04
bigjoolsexcellent, I'll file the bug and put some pointers to soyuz/buildmaster code in it16:04
bigjoolscheers16:04
danilosbigjools, don't do it before you make the tmp file available :P16:04
danilosbigjools, also, note that we are using the same thing for actual Ubuntu package builds, so we'd want to fix that as well16:05
bigjoolsdanilos: yes, that's what I was referring to above about the upload processore16:05
danilosbigjools, ok consigliere ;)16:06
bigjoolsheh16:06
bigjoolswas about to make a joke about an offer you can't refuse16:06
danilosheh16:06
abentleyjkakar: Here's a version that seems to work: http://pastebin.ubuntu.com/530766/16:07
jkakarabentley: Yeah, not surprisingly.  I wonder how that query performs compared to the other one, though?16:07
abentleyjkakar: For the cases where both work, I bet they both perform the same.  That's got to be trivial to optimize.16:08
jkakarabentley: Probably, yes.  Though, in practice, I've occasionally seen dramatically different performance when a query uses a subselect vs. when it doesn't.16:10
jkakarIt's hard to understand when that will be the case or why, though.16:10
henningegary_poster: staging is timing out, too, now. It has been updated from 9955 to 996516:16
gary_posterhenninge: well, that seem to point a pretty stong finger at code then, which simplifies things in some ways.  how is the revert going on qastaging?16:17
gary_poster*strong16:17
henningegary_poster: it's taking it's time16:17
gary_poster:-)16:17
henningeits16:17
gary_posterok16:17
* gary_poster carefully replaces the second "it's" but leaves the first intact ;-)16:18
henningethanks for being careful ;)16:19
abentleyjkakar: filed as https://bugs.edge.launchpad.net/storm/+bug/67458216:21
_mup_Bug #674582: Storm may generate SQL errors on ResultSets.set for otherwise-working ResultSets. <Storm:New> <https://launchpad.net/bugs/674582>16:21
jkakarabentley: Thanks!16:22
=== al-maisan is now known as almaisan-away
sinzuihenninge, gary_poster: I agree that staging is now as useless as qastaging, but I do not see what has changed to make SPR/SPPH queries slower.16:27
gary_postersinzui, I am no longer actively investigating, because henninge's summary that revisions 11888 -> 11899 -> 11914 are a likely cause sounded like a good hypothesis.  We are waiting to see if reverting these clears up qastaging.16:29
sinzui11914 is not on staging, so I discount that16:33
gary_posteryes, but it's part of the logical set16:33
LPCIBotYippie, build fixed!16:44
LPCIBotProject devel build (221): FIXED in 4 hr 3 min: https://hudson.wedontsleep.org/job/devel/221/16:44
sinzuigary_poster, henninge, I do not understand the "set" point. I do not see that revision on staging 11914. I suspect that 11914 fixes the issue. I think the origin of the issue is 1189916:53
henningesinzui: 11914 does not fix it, it had already been on qastaging and did not help16:54
henningesinzui: we are currently reverting 914 and 89916:55
henningegary_poster, sinzui: do you know if the revision display at the bottom of the LP page is dynamic or static?16:59
henningei.e. Does it need a "make build" to be updated or is the information straight from the branch?16:59
jmlpeople.c.c has an old launchpadlib :(17:00
sinzuihenninge, it requires make build17:03
henningeso Chex just told me17:03
bigjoolsjml: I added a test to directly use downloadPage against a real slave in a test and it gets a "405 Method not allowed".  Do you know if Twisted has the equivalent of urllib2.debug = True ?17:06
sinzuiEdwinGrubbs, I am looking at distroseries.getCurrentSourceReleases() I think the subquery for max(spph.id) is doing a full table scan of SPNs because there is no constraint to return only the SPNs passed to the method17:07
jmlbigjools: I don't know what urllib2.debug=True is.17:08
jmlbigjools: and I don't know of any debugging foo off the top of my head17:08
bigjoolsjml: it dumps the http comms to stdout - I'm trying to work out what methods it's using that's not allowed17:08
sinzuiEdwinGrubbs, I suspect that moving 'SourcePackageRelease.sourcepackagename IN %s" into the subquery will make the query faster17:08
EdwinGrubbssinzui: that shouldn't be necessary since it looks like "spr.sourcepackagename = SourcePackageRelease.sourcepackagename" makes it search for all the spph/spr records for a single sourcepackagename.17:11
jmlbigjools: nothing obviously like that in t.web17:12
bigjoolsjml: yeah, I looked too17:12
jmlbigjools: wireshark maybe?17:12
bigjoolstcpdump ... :)17:12
jmlbigjools: ooh, did you know about from launchpadlib.uris import DOGFOOD_SERVICE_ROOT?17:17
bigjoolsyes17:17
bigjoolsI think I put it there and shamefully forgot17:17
sinzuiEdwinGrubbs, that assumes that the query planner built that set first17:17
EdwinGrubbssinzui: I've never seen an instance where the query planner thought that it would be faster to run a correlated subquery first and then limit the results of the outer query.17:20
jmlwhy is it that bazaar.launchpad.net is so hard for dns servers to resolve?17:20
EdwinGrubbss/limit/filter/17:20
sinzuiEdwinGrubbs, since we are looking at a PG 8.4 change + the removal of the SPN table from the query. I think I should get sometimes based on where that constraint is placed17:22
jmlmrevell: still around17:23
jml?17:23
mrevellHi jml, sure am17:24
jmlmrevell: I don't know where best to put this link on the beautifully presented https://dev.launchpad.net/BugJamhttp://mumak.net/lp-bugjam-2010/17:24
jmlmrevell: it's a count of the number of bugs fixed during the bug jam so far17:24
EdwinGrubbssinzui: how many sourcepackagenames are passed in as an argument to getCurrentSourceReleases()17:24
mrevelljml, I love it :)17:24
mrevelljml, I'll put a link under "Tracking progress"17:25
jmlmrevell: thanks.17:25
sinzuiEdwinGrubbs, 1, but  get get 38536 where we would expect 1 from natty, maybe 3 for maverick17:26
sinzuiEdwin 1, my move of the constraint does not fix the issue, 2 I feel pretty good that getting what looks like getting a match for every SPN in natty implies an open join17:29
EdwinGrubbssinzui: can I see the query plan?17:30
sinzuiI will get it for you17:31
bigjoolsjml: well, that flushed out a nice bug in the tests we wrote a few weeks ago :)17:33
jmlbigjools: which was?17:34
bigjoolsjml: it was constructing a url of the form /rpc/rpc17:34
jmlbigjools: heh17:34
jkakarjml, abentley: Woah: http://paste.ubuntu.com/530794/17:34
jmljkakar: yeah, it's filed as a critical bug.17:35
jmljkakar: https://bugs.launchpad.net/launchpad-code/+bug/67430517:36
_mup_Bug #674305: bzr push occasionally reports AssertionError on terminal <codehosting-ssh> <xmlrpc> <Launchpad Bazaar Integration:Triaged> <https://launchpad.net/bugs/674305>17:36
sinzuiEdwinGrubbs,  this is the plan to get the current release of bzr, 1 SPN provided and only 1 expected: http://pastebin.ubuntu.com/530797/17:37
jkakarjml: Cool.17:39
jkakarjml: Dunno if it helps debugging, but this was with a bound branch, it wasn't a push (explicitly).17:40
jmljkakar: I'm not at all involved in fixing it17:40
jml<- part of the problem17:40
jkakarHeh17:40
sinzuiEdwinGrubbs, sourcepackagename is still listed in the FROM. It was removed several revisions ago17:40
sinzuiremoving it from the query fixes everything17:41
* sinzui looks at code again17:41
EdwinGrubbssinzui: yeah, I was wondering where that table came from.17:41
EdwinGrubbssinzui: so, is the code not broken? Was it just an old oops?17:42
sinzuiEdwinGrubbs, It was removed a few days ago, lifeless removed it from clauseTables in r11914, but I suspect something else is putting the table in the from clause17:43
sinzuiEdwin to be clear, the SPN joins were removed a few days ago, Lifeless then landed another branch to remove it fix clauseTables. But this oops shows that the SPN table is still in the from clause17:44
sinzuiEdwinGrubbs, ^17:46
sinzuiEdwinGrubbs, sorry. I am looking at too may oopses. That oops was for an older revision17:49
* sinzui tries query from r1191517:50
sinzuiEdwinGrubbs, This is the correct plan for qastaging: http://pastebin.ubuntu.com/530801/17:51
mrevellsinzui, Thanks for your post wrt strategies for the bug jam.17:58
sinzuimrevell, your welcome17:59
mrevellHave a wonderful weekend people. See you Monday.17:59
EdwinGrubbssinzui: ok, the problem is that there are 1138 spr records for a single sourcepackagename.18:01
sinzuiedwin I agree. I am looking for a constraint or a revised subquery that  removes the loop or 113818:03
lifelessmoin18:04
lifelesssinzui: rev 1191418:05
lifelessEdwinGrubbs: ^18:05
sinzuilifeless: yes, but 11915 still times out18:06
sinzuilifeless we want to reduce the loop of SPRs in the query18:06
bigjoolsgood bye, have a nice weekend18:07
EdwinGrubbssinzui: since, there is only one valid spph record for all the spr records, you would get good performance by just eliminating the subquery and moving the conditions into the outer query. You will just have to eliminate the duplicates. DISTINCT won't let you choose the spph record with the max id, so you would have to do that in python, if it is important to get that spph record and not a random one.18:07
* sinzui nods18:08
lifelesssinzui: works for me18:08
lifelesshttps://qastaging.launchpad.net/~yavdr/+archive/stable-vdr/+packages?start=0&batch=20418:09
lifeless    At least 782 queries/external actions issued in 17.77 seconds18:09
lifelesslittle slower than ideal18:09
lifelesstrying again to remove cold cache effects18:09
* sinzui just went from 9695.618 to 33.446 ms using a subquery table of just current ids18:09
lifeless    At least 782 queries/external actions issued in 12.63 seconds18:10
lifelesssinzui: ^ https://qastaging.launchpad.net/~yavdr/+archive/stable-vdr/+packages?start=0&batch=20418:10
lifelesssinzui: I welcome further improvements here18:10
lifelessEdwinGrubbs: bringing too much back and filtering in python will almost always be slower18:10
sinzuilifeless yes, we want to see a source package page load a single spr.18:10
lifelessstorm is (relatively) slow at deserialisation, due to the cache coherency logic18:11
sinzuihttps://qastaging.launchpad.net/ubuntu/natty/+source/bzr18:11
lifeless    At least 49 queries/external actions issued in 1.91 seconds18:11
lifelessview-source:https://qastaging.launchpad.net/~yavdr/+archive/stable-vdr/+packages?start=0&batch=118:11
lifelessinterestingly that page is not flat yet, the binaries must be the cause because there is a test that its flat with sources...and the binary test seemed surprisingly low to me18:12
EdwinGrubbslifeless: it won't bring too much back since there is only one spph record for 1300 spr records that meets the condition. So, the filtering in python might only have to deal with eliminating a handful of records.18:13
sinzuiEdwinGrubbs, I essentially  did the reverse, of your suggestion. I converted the subquery to get the max id to be a table of only viable candidates: http://pastebin.ubuntu.com/530811/18:13
* sinzui now tries to do it the EdwinGrubbs approved way18:14
lifelessah right, deep history leading to a slow query18:15
lifelessEdwinGrubbs: when we query for 200 rows18:15
lifelessEdwinGrubbs: what would happen then18:15
lifelessEdwinGrubbs: e.g. for  http://pastebin.com/7jC2vD7G18:16
EdwinGrubbslifeless: yes, I would like to do it in the database, but I don't know if getting just the max(spph.id) for each sourcepackagename is important or not. To do that in the database would require using a temp table in order to get rid of the subquery.18:20
sinzuiEdwinGrubbs, lifeless. I think this is the solution we want to achieve in the code http://pastebin.ubuntu.com/530817/18:20
EdwinGrubbssinzui: that only works for a single sourcepackagename.18:21
EdwinGrubbssinzui: oh wait18:22
sinzuiEdwin why? I see the table controls the SPNs18:22
sinzuime tries a list18:22
sinzuiEdwin it does work with multiple SPNs18:23
EdwinGrubbssinzui: ok, that makes sense. I was thinking that you would run into problems with group by, but you are just grouping by the spr columns, so it all works out.18:24
sinzuiwell, it certainly did not work until I add that18:24
sinzuiEdwinGrubbs, I do not need the outer "SourcePackageRelease.sourcepackagename IN ()" do I?18:25
EdwinGrubbssinzui: no18:25
sinzuiThis is wicked fast18:26
sinzuiI am going to start a branch and watch the tests pass18:26
sinzuigary_poster, henninge: I have a very fast query that fixes distroseries.getCurrentSourceReleases()18:27
=== almaisan-away is now known as al-maisan
lifeless(37 rows)18:42
lifelessTime: 186.584 ms18:42
lifelesssinzui: thats for the big page18:42
lifelessusing your branch18:42
lifelesssinzui: love your work18:43
sinzuiwow. I feel good18:43
lifelessthis is going to knock +packages right back to zilch on the timeouts chart I think18:43
sinzuiThis will have to be stormified. I know how two write this in storm, but not sqlobject18:43
lifelesshmm?18:44
lifelessI mean thats great18:44
lifelessbut I can help you do it in situ if you want18:44
henningesinzui, lifeless: Is what you are doing related to the qastaging timeouts?18:44
sinzuiyes18:44
lifelesshenninge: do you mean on +packages?18:45
sinzuithis looks like it will also fix many other timeouts in production too18:45
henningeno, the general timeouts we get on all kinds of pages.18:45
lifelesshenninge: no18:45
henninge:(18:45
lifelesshenninge: we get timeouts because of a few reasons18:45
lifelessa) cold cache effects in the db - its much smaller in memory that production18:45
lifelessb) we have inefficient code and staging hardware shows this up18:46
lifelessthis is a case in point - sinzui is shaving many seconds off of a routine page18:46
lifelessc) contention/thrashing in the appserver due to all the scripts running on the appserver staging host asuka18:46
lifelessthere is an rt open to address (c)18:46
lifeless(a) - retry a few times, if it eventually works prod will probably chew it up happily18:47
lifeless(b) - we need to fix our code. Which will help with (a) too18:47
henningebut it seems to be related to certain revisions of the code18:48
henningeit started on quastaging and when staging got updated with the same revisions it showed the same timeouts whereas before it (staging) was working fine.18:48
lifelesshenninge: what pages specifically18:48
henningeall project homepages18:49
henningelaunchpad.net/anyproject18:49
henningeall source packages18:49
lifelessfrom 11888 to 11914 we had a very broken query for getCurrentSourceReleases18:49
henningelaunchpad.net/ubuntu/maverick/+sourece/anypackage18:49
lifelessall the pages you're listing are covered by it; it should be tolerable now - the same as before 1188818:49
henninge11914 did not fix it, though18:49
lifelesswhat EdwinGrubbs and sinzui are doing is  about to make it much better18:50
lifelesshenninge: this is one reason those pages are all slow on lpnet too18:50
sinzuilifeless, henninge method is used in soyuz, translations, registry, and bugs pages. Anything that wants to know the current release of a package is going to be between 50 and 100 times faster18:51
lifelesssinzui: yah18:52
lifelesssinzui: note that production db is much faster18:52
lifelesssinzui: so not all pages will zoom as much18:52
lifelesshttps://launchpad.net/ubuntu/natty/+source/bzr18:52
rockstarDoes this error message in rabbitmq mean anything to anyone? It's preventing me from install launchpad-developer-dependencies. http://pastebin.ubuntu.com/530828/18:52
lifelessbut there are many pages which do this query that will benefit a great deal18:53
lifelessrockstar: is it already running?18:53
rockstarlifeless, no, it won't start.18:53
henningelifeless: I don't find those pages particularly slow but maybe I am just so accustomed to LP slowness ...18:54
henninge;-)18:54
rockstarlifeless, when the package gets installed, it explodes and prevents anything else from being installed.18:54
lifelessrockstar: on maverick ?18:54
rockstarlifeless, yes.18:54
lifelesshmm18:54
lifelessI don't know sorry18:54
henningeOK, let's wait and see the outcome of that work.18:54
lifelessinet_tcp",{{badmatch,{error,duplicate_name18:54
lifelessmakes me think the socket is in use18:54
rockstarlifeless, hm...18:54
lifelesswhich would happen if you had a rabbit instance already running18:54
lifelesse.g. if the devscript is buggy on upgrades18:55
henningesinzui: you have my r-c approval for landing that if it gets too late.18:55
rockstarlifeless, oh!  Yeah, the other change on this laptop is the u1 setup, so I guess that makes sense. I completely spaced that.18:55
sinzuihenninge, thanks.18:56
henningePQM is scheduled to close in 3 hours18:56
rockstarlifeless, everything are happy now.18:56
henningeso you will need r-c ;)18:56
=== henninge changed the topic of #launchpad-dev to: Launchpad Development Channel | Week 3 of 10.11 | PQM is closing at 22 UTC | firefighting: Lots of timeouts on qastaging!! | https:/​/​dev.launchpad.net/​ | Get the code: https:/​/​dev.launchpad.net/​Getting
rockstarlifeless, thanks for intervening between my head and the wall.18:56
lifelessrockstar: that was it ?18:58
lifelessrockstar: if so, please file a bug ... buggy package ;)18:58
rockstarlifeless, well, I should also say that I run launchpad in a chroot.18:59
lifelessrockstar: thats fodder for the bug report19:00
flacostehenninge, sinzui: could these slow pages be related to the changes to add latest releases to the source package pages?19:03
flacostehenninge, sinzui: that was added in a recent revision19:03
sinzuiflacoste, I do not think so. The method was unchanged this year except for lifelesses changes this week19:03
sinzuiflacoste I think this is PG 8.419:03
flacostesinzui: hennige says that timeouts increased with a recent revision19:04
flacosteas staging is now seeing the same timeouts than qastaging19:04
flacostewhereas it wasn't until it was updated19:04
flacosteand qastaging wasn't either yesterday19:04
sinzuiflacoste yes, we had an open join, but there were timeouts none-the-less19:04
sinzuiflacostewe had two landing to fix the issue, neither was substantial19:05
lifelessI fluffed one19:05
lifelessremoved an unneeded table *constraint*, left the table in by mistake.19:05
lifelessthat went boom badly ;)19:05
flacostepages affected are:19:06
flacosteall project homepages19:06
flacosteall sourcepackages19:06
flacosteaccording to henninge again19:06
lifelessflacoste: we did just discuss this19:06
lifeless20 minutes ago19:06
flacosteright, i read the backlog19:07
lifelesskk19:07
flacostebut it's not clear that have identified the issue19:07
lifelessflacoste: we have an 8 second query19:07
lifelessthat will come down to 140ms19:07
lifelesson qastaging19:07
flacostesure19:07
lifelesswe know they all use it19:07
lifelessuntil its fixed we have no data about what lies behind it19:08
flacostewell there is an alternative, which is to do a binary search to find the revision introducing the slowness19:08
lifelessflacoste: 1188819:09
flacostelifeless: i was under the impression that we tried reverting 11888 and its two following fixes from qastaging, but that still resulted in all these pages timing out19:10
flacostebut now, i'm not sure, it's possible that only the follow-up fixes were reverted...19:10
=== shadeslayer is now known as evilshadeslayer
lifelessflacoste: 11888 is a confounding factor19:11
flacostehenninge: could you confirm/inform the above?^^^19:11
lifelessflacoste: with 11888 present, any other flaws would have been magnified19:11
flacosteright, but without it, we shouldn't see any more timeouts than before19:11
lifelessflacoste: even if 11888 isn't the cause of all the issues, we can't be sure without running with 11888 reverted and the others present19:11
lifelessflacoste: we're runnign 11887 live19:12
flacostei know19:12
lifelessflacoste: so yes, I agree.19:12
lifelesswe've always seen more timeouts on qastaging19:12
lifelessit has a 10 second timeout19:12
flacostebut i thought we had made that tests on qastaging (no 11888, others present) and found that it was still timing out all over the place19:12
flacostebut maybe, that's not the test that took place19:12
flacostelet me check the branch...19:13
flacostelp:~henninge/launchpad/stable-revert19:13
flacosteah, no19:13
flacosteonly 11899 and 11914 were reverted19:14
flacosteso your hypothese holds19:14
lifelessok19:15
lifelesshttps://lp-oops.canonical.com/oops.py/?oopsid=OOPS-1777QS139 is qastaging.launchpad.net/bzr19:16
EdwinGrubbssinzui: can you look at these screenshots of the involvement portlet for the bugsupervisor versus the admins? https://devpad.canonical.com/~egrubbs/configuration/19:46
sinzuiEdwin It was easier to write an exception?19:48
sinzuiEdwinGrubbs, We wanted to hide the link once the tracker was configured19:49
EdwinGrubbssinzui: well, the progress bar doesn't make sense with just one link being shown, so an exception seems like the cleaner solution. It would also be odd to have a single link hidden under the "Configuration options" expander.19:52
sinzuiokay, I agree. Your approach is correct19:53
lifelessflacoste: https://lp-oops.canonical.com/oops.py/?oopsid=OOPS-1777QS13919:55
henningeflacoste: I did not revert 11888 in that branch because it has been on qastaging for a while without any trouble.19:56
lifelessqastaging.launchpad.net/bzr - its the query sinzui is redoing19:56
henningeflacoste: I am sorry for the misunderstanding19:56
flacostelifeless: ack19:56
sinzuiI am fixing the distroseries/source package problem illustrated by qastaging.launchpad.net/ubuntu/natty/+source/bzr19:57
* henninge has to go away for a bit again19:57
EdwinGrubbssinzui: can you review https://code.edge.launchpad.net/~edwin-grubbs/launchpad/bug-664788-configure-bugtracker-link-permission/+merge/4075420:07
sinzuiI will20:07
lifelessthumper: perhaps we can cowboy in a squelch for the xmlrpc Fault20:13
=== al-maisan is now known as almaisan-away
=== matsubara is now known as matsubara-afk
lifelesssinzui: I have a favour to ask20:30
lifelesssinzui: add [rollback=11888] to your landing for the new query20:31
sinzuiyes lifeless20:31
sinzuiI will20:31
lifelesssinzui: it will tell qatagger that https://bugs.launchpad.net/soyuz/+bug/662523 can be unblocked so the deploy report is accurate20:31
_mup_Bug #662523: Archive:EntryResource:getBuildSummariesForSourceIds times out <bad-commit-11888> <timeout> <Soyuz:Fix Committed by lifeless> <https://launchpad.net/bugs/662523>20:31
lifelesssinzui: thank you!20:31
wgrantlifeless: So that SPN fix worked?21:57
lifelesswgrant: yes, and sinzui has an even more effective fix to make other uses of the query much more efficient21:59
lifelesswgrant: https://qastaging.launchpad.net/~yavdr/+archive/stable-vdr/+packages?start=0&batch=20421:59
wgrantlifeless: How fast is sinzui's?21:59
lifeless8700->100ms21:59
lifelesswgrant: on production this is already tolerably fast, db server size yada yada yada22:00
wgrantNice.22:00
wgrantYep.22:00
lifelesswgrant: but I expect a positive improvement all over.22:00
sinzuiwgrant: my mp has time summaries and SQL explains https://code.launchpad.net/~sinzui/launchpad/ds-getcurrentreleases/+merge/4075622:01
lifelesssinzui: I'm really glad you guys dug into this22:01
sinzuiMaking the SP and DS pages faster really has requires half a dozen engineers looking at the same number of objects22:02
=== henninge changed the topic of #launchpad-dev to: Launchpad Development Channel | Week 3 of 10.11 | PQM is in release-critical mode | firefighting: - | https:/​/​dev.launchpad.net/​ | Get the code: https:/​/​dev.launchpad.net/​Getting
lifelesssinzui: we've 22 or so, looking all across the board22:08
sinzuiI fear milestones will be the last to fix :( I have time to return to that one next week.22:09
lifeless+commentedbugs is the current most severe timeout22:10
lifelessand stub has a fix \o/22:10
lifelessI won't have time to do anything with it till week after next22:10
wgrantLet me guess... it's querying badly to try to find comments with index != 0?22:16
lifelessread the bug :)22:17
cody-somervillehttp://www.jacobian.org/writing/buildbot/ci-is-hard/ <-- lmao. "Django’s big. The test suite is around 40,000 lines of code in something like 3,000 individual tests. We work constantly to speed up the test suite, but best case it still takes about 5 minutes to run. This means that our CI absolutely needs to be distributed — a single test server won’t cut it."22:32
lifelessflacoste: ping22:51
flacostehi lifeless22:52
lifelessflacoste: I think we need to treat this bzr thing as an emergency22:52
lifelessflacoste: its very frequent22:52
poolielifeless, which?22:52
lifelesspoolie: the backtrace on push22:52
flacostelifeless: my understanding is that it's only annoying, not a real error22:52
lifelessflacoste: our users don't know this22:52
lifelessflacoste: perception22:52
pooliethis is the zope error being shown to the user?22:53
poolieis there a bug?22:53
pooliebug number22:53
lifelesspoolie: flacoste: https://bugs.launchpad.net/launchpad-code/+bug/67430522:53
_mup_Bug #674305: bzr push occasionally reports AssertionError on terminal <codehosting-ssh> <xmlrpc> <Launchpad Bazaar Integration:Triaged> <https://launchpad.net/bugs/674305>22:53
wgrantAlso, doesn't it stop a scan from being requested?22:54
flacostelifeless: any idea of how we could fix this apart from escalating this RT?22:55
flacostewgrant: that would be new information22:55
lifelessflacoste: here are the options I know about22:55
lifelesswgrant: thats m understanding too22:55
lifelessflacoste: a) escalate the RT22:55
wgrantIf it doesn't mean that no scan is requested, then we have bigger problems.22:56
lifelessb) wedge in some retry code here - high risk22:56
lifelesswgrant: there are multiple routes to trigger scans22:56
lifelesswgrant: its possible a redundant route is saving us22:56
lifelesswgrant: e.g. the disconnect hook22:56
wgrantPossibly.22:56
lifelessc) push the mailman improvement and hope its enough22:57
lifelessd) disable other services like codeimport that use the same service22:57
flacostec and d looks like the main option at this time23:02
flacostecan we get confirmation that scan isn't triggered?23:02
flacostei don't get any errors from here fwiw23:06
lifelessit seems to be a couple of users an hour - which, because its not (appearing-to-be) localised to product/bug like other timeouts, particularly confusing and harmful to our users.23:07
lifelesstheres no obvious rationale they can connect it to23:08
lifelesswgrant: we're talking with ops now23:09
lifelessTime Out Counts by Page ID23:33
lifelessHardSoftPage ID23:33
lifeless5707384CodeImportSchedulerApplication:CodeImportSchedulerAPI23:33
lifeless21132Person:+commentedbugs23:33
lifeless164561CodehostingApplication:CodehostingAPI23:33
lifeless44156BugTask:+index23:33
lifeless810ProjectGroup:+milestones23:33
lifeless6305Distribution:+bugtarget-portlet-bugfilters-stats23:33
lifeless5259Distribution:+bugs23:33
lifeless514Person:+bugs23:33
lifeless57DistroSeries:+queue23:33
lifeless54Archive:EntryResource:getBuildSummariesForSourceIds23:33
lifelessbah sorrry for the formatting23:33
lifelessflacoste: ^ turning off code imports is probably the fastest thing we can do23:33
lifelessmbarnett: how hard is it to disable all code imports ?23:33
lifelessmbarnett: we should be able to see an immediate drop in that netstat over a couple of minutes if thats were to help23:34
lifelessflacoste: 570 7384 CodeImportSchedulerApplication:CodeImportSchedulerAPI23:36
lifelessflacoste: 164 561 164I561ICodehostingApplication:CodehostingAPI23:36
flacostelifeless: good suggestion23:36
lifelessmbarnett: please turn off the importds23:37
lifelessmbarnett: And keep watching that netstat23:37
lifelessflacoste: https://devpad.canonical.com/~lpqateam/lpnet-oops.html#time-outs is where I'm looking23:41
mbarnettno more imports should be fired off after any currently running complete.23:43
lifelessflacoste: look at this:23:43
lifelesshttps://lp-oops.canonical.com/oops.py/?oopsid=OOPS-1777XMLP101123:43
lifelessSQL time: 17 ms23:43
lifelessNon-sql time: 15074 ms23:43
wgrantOw.23:43
lifelessflacoste: this is why I want a) single threaded appservers and b) in the main cluster23:44
flacosteright23:46
flacostethe GIL hypothesis23:46
lifelessyes23:46
lifelessfor instance23:46
lifelesshttps://lp-oops.canonical.com/oops.py/?oopsid=OOPS-1777XMLP467523:46
lifelessbut i have another hypothesis23:46
lifelessif we start the timer too early23:47
lifelessa deep queue could look like this as well23:47
lifelesssee the 4675 in particular its a soft timeout23:47

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!