/srv/irclogs.ubuntu.com/2010/07/20/#launchpad-dev.txt

=== Ursinha is now known as Ursinha-afk
wgrantspm: Hi. I'm trying to QA my fix for bug #592573. The differences between the right panels on https://launchpad.net/builders and https://edge.launchpad.net/builders show that there are ~146 builds in some sort of limbo. Do you have a moment to do a bit of DB poking to work out what they are and why?03:15
_mup_Bug #592573: BuilderSet.getBuildQueueSizes doesn't consider non-binary builds <Soyuz:Fix Committed by wgrant> <https://launchpad.net/bugs/592573>03:15
spmwgrant: hrm. curious. sure. I guess the ones you have via the bug is a good starter?03:19
wgrantspm: SELECT id, builder, lastscore, job, job_type, processor FROM buildqueue WHERE virtualized=true;03:27
wgrantAt the moment that query *should* be empty.03:27
wgrantBut it will probably return around 61 rows03:28
spm(23461 rows) ho ho ho ho03:28
wgrantOh, oops. Forgot a join.03:28
spmheh, np03:28
wgrantSELECT id, builder, lastscore, job, job_type, processor FROM buildqueue JOIN job ON job.id = buildqueue.job WHERE virtualized=true AND job.status = 0;03:28
spmyou did get the '61', just missed the 23,400 as well. so.. not too shabby?03:29
wgrantSELECT buildqueue.id, builder, lastscore, job, job_type, processor FROM buildqueue JOIN job ON job.id = buildqueue.job WHERE virtualized=true AND job.status = 0;03:29
spmyarp. 61 rows03:29
spmone looks suspiciously old. very low job# compared to the rest03:30
wgrantThere should be nothing sensitive there (I excluded logtail). Can you pastebin, please?03:30
spmhttp://paste.ubuntu.com/466237/03:31
wgrantOh wow :(03:31
spmhrm? in what sense?03:31
wgrantCan you 'SELECT MAX(id) FROM buildqueue;' just to see how old those are?03:31
spm3708959 eek03:32
wgrantAh, so nothing new. Good.03:32
wgrantJust from the early days of the move to the job system, I suspect.03:32
wgrant(those are all queued builds that are being ignored, so are somehow corrupt)03:33
spmahh I see. is this something that should be cleaned up? or can be happily ignored?03:35
wgrantWe need to clean it up. I guess I'll talk to Julian about it.03:36
spmoki, ta03:37
wgrantSELECT * FROM buildpackagejob WHERE job=2691238;03:37
wgrantI suspect it's the BuildPackageJobs that are missing.03:38
spm   id   |   job   |  build03:38
spm--------+---------+---------03:38
spm 495749 | 2691238 | 168415503:38
wgrantDamn.03:38
mtaylorspm: ola amigo. como estas03:45
mtaylorspm: or, should I say - ¿como estas?03:45
spmmtaylor: alas, my spanish is limited to that picked up via overhearing Dora the Explorer. So beyond Hola and Gracios(sp?) you've lost me :-)03:46
mtaylorspm: ¿como estas? means "what's up?" - although I honestly don't speak much myself03:46
spmheh03:47
mtaylorbut I'm in cozumel, mx, so I'm trying to make an effort past "uno mas magarita, por favor"03:47
wgrantspm: One last one:03:48
wgrantSELECT buildqueue.id, builder, lastscore, buildqueue.job, job_type, processor, build FROM buildqueue JOIN job ON job.id = buildqueue.job JOIN buildpackagejob ON buildpackagejob.job = job.id WHERE virtualized=true AND job.status = 0;03:48
wgrantThen I can examine the builds myself.03:48
spmmtaylor: ha!03:48
* mtaylor cringes that you're implementing queues in the database... then decides to shut his mouth before he's asked to fix it03:49
spmwgrant: http://paste.ubuntu.com/466245/03:49
spmwhat do you mean *implementing*?!? implemented. :-)03:49
mtaylor*shudder* :)03:50
wgrantmtaylor: Before my time :(03:50
wgrantThis is five years old :(03:50
spmI believe plans are to move to something like rabbit, or whatever. but nfi.03:50
mtaylorwgrant: it's ok - in my days as a mysql consultant, I saw _many_ _MANY_ thing implemented inside of an RDBMS that didn't belong there03:50
mtaylorbut my favorite things people mis-use dbs for are queues and email03:51
mtaylorfree tip #1 from formerly over-paid db consultant - if your app needs some sort of message system that's similar to email ... USE EMAIL SERVERS .. don't half-way implement private broken email in database tables03:52
mtaylorspm: :)03:52
wgrantspm: Ah, it's not a bug after all. Most of those are builds that were cancelled by manual SQL to mark them superseded.03:52
spmah03:52
wgrantNot all, but most. I'll work out how to clean them up.03:53
wgrantThanks.03:53
spmmtaylor: that sounds suspiciously like heresy. email for email!?!? I mean, srsly!?!?03:53
spmwgrant: np03:54
mtaylorspm: I know - right? you should have seen the look on the client's face the first time I suggested that as the fix for their performance isues03:55
mtaylorissues03:55
mtaylor"um, hai! you've implemented email in php ... try learning the internet"03:55
spmheh03:55
spmNIH. Alive and well.03:56
mtayloryup. also "I don't want to learn anything"03:56
lifelessmorning04:58
* wgrant is still looking for someone to land three branches.05:19
mwhudsonhere's two fun lines to have next to each other:05:29
mwhudsonfrom canonical.launchpad.layers import WebServiceLayer05:29
mwhudsonfrom canonical.testing.layers import FunctionalLayer05:29
lifelessmwhudson: can you help wgrant out?05:43
lifelesspretty please?05:43
mwhudsonah right05:46
mwhudsonwgrant: url me up05:46
wgrantmwhudson: lp:~wgrant/launchpad/bug-598345-restrict-dep-contexts, lp:~wgrant/launchpad/refactor-_dominateBinary and lp:~wgrant/launchpad/really-publish-ppa-ddebs05:47
* mwhudson grrs at not being able to give branch urls to ec2 land05:48
wgrantCan't you?05:49
wgrantI thought you could now.05:49
mwhudsonwell i didn't work for these05:49
wgrantDoes anyone happen to know if BFB works for Hardy yet?05:55
mwhudsonwgrant: ok, three instances started05:59
mwhudsonwgrant: did it not work for hardy at some point?06:00
wgrantmwhudson: Oh, right, I remember now.06:01
wgrantIts bzr was old, so it didn't work with 2a.06:01
wgrantThanks for sending those off.06:03
wgrantmwhudson: Ah, no, it is really broken like this: http://launchpadlibrarian.net/52193804/buildlog.txt.gz06:23
wgrant  bzr-builder: Depends: python-debian but it is not going to be installed06:23
mwhudsonwgrant: weee06:24
* wgrant builds it the old way :(06:26
lifelesswgrant: hi08:03
wgrantlifeless: Hi.08:39
rockstarmtaylor, re: Tarmac needing commit message set... The reasoning is because Tarmac doesn't know what to set the commit message to when it commits if you don't specify it.08:40
rockstarmtaylor, I realize that this is rather awkward. Merge queues will fix that.08:40
* rockstar keeps rolling the squeaky wheel08:40
pooliewhat advice do we normally get to people who don't receive the account confirmation mail?08:40
lifelesscheck spam08:41
wgrantSSO does hate some domains, though.08:41
poolieor they hate us?08:43
poolieis there any escape?08:43
wgrantWell, OK, one of my email aliases that is handled by a Canonical machine does not get LP or SSO email.08:44
lifelessit becomes a losa ping08:44
wgrantEverything else works.08:44
wgrantSo there's something not quite right about some email setup somewhere.08:44
lifelesswgrant: orly? have you filed an rt?08:44
wgrantlifeless: The address isn't important to me, so I never really bothered.08:45
wgrantlifeless: What was that ping before about?08:48
lifelessbuildds08:48
lifelessseen the thread on deployment?08:48
wgrantYes.08:48
wgrantThey don't care if the connection dies.08:48
lifelesscan you confirm or deny slave behaviour ?08:48
wgrantYou can kill buildd-manager at any time, and it will be fine.08:49
lifeless\o/08:49
lifelesselmo: ^08:49
lifelesswgrant: what other things can go wrong08:49
wgrantbuildd-manager and the protocol are pretty stateless.08:49
wgrant(apart from DB build state)08:49
lifelesssure08:49
lifelesswgrant: is there anything that can be done to make the publisher runs able to be interrupted without causing havoc ?08:50
wgrantlifeless: Um.08:50
wgrantThey shouldn't be tooo bad at the moment.08:50
lifelessany manual recovery == too bad08:50
wgrantFor PPAs it would be really bad.08:50
wgrantFor primary it should just about work.08:50
wgrant(primary has atomic dists/ update; PPAs do not)08:51
=== almaisan-away is now known as al-maisan
wgrantPhase A (file publishing) is fine, since it doesn't matter if we publish something again on the next run.08:54
wgrantB (domination) is all in one transaction, and doesn't touch the filesystem at all.08:55
wgrantBut C and D touch indices, so will leave the archive inconsistent.08:55
wgrantHm, actually, it's slightly worse.08:56
wgrantIf we get through A fine, commit, then get terminated, there'll be no dirty pockets to force index regeneration when everything is rerun.08:56
lifeless2pc needed ?08:57
wgrantThat would mean a half-hour transaction.08:57
lifelessugh08:57
wgrant(alternatively we could make the publisher less goddamn slow)08:57
lifelessor start domination later ?08:57
wgrantSo, there are two problems:08:58
lifelessthere are two sorts of people in the world ....08:58
wgrant1) We rely on state internal to the publisher process to work out whether we need to regenerate indices.08:58
wgrant2) Termination during index generation will leave the archive inconsistent for at least a few minutes.08:58
lifelessseems like they are both addressed by making indice regeneration into a separate task08:59
lifelesspipeline-like08:59
wgrantApplying the atomic dists/ update system to all archives would solve #2 simply.08:59
wgrantFor #1... we may need to store dirty pockets in the DB.09:00
lifelessalso less code variation - ++09:00
wgrantPerhaps.09:03
wgrantBut the way it's done now is not exactly... acceptable.09:03
mrevellGUten morgen.09:03
lifelesswgrant: are bugs filed to fix it?09:06
wgrantlifeless: It's cron.publish-ftpmaster. Everyone is terrified.09:07
lifelessis it risk ?09:07
wgrantIt's evil and does some things that nobody understands any more.09:09
wgrantWell, I guess that's just dsync.09:09
wgrantMorning bigjools.09:13
wgrantYou may be, er, interested to know that we have somewhere around 146 inconsistent builds on production.09:13
wgrantThey have BuildQueues and Jobs, but are not actually pending.09:13
bigjoolssigh09:15
bigjoolsgivebacks?09:15
wgrantSome are SUPERSEDED, others are FULLYBUILT.09:15
bigjoolsdid I just catch lifeless looking at cron.publish-ftpmaster?09:15
wgrantThe former can be explained by LOSAs manually cancelling builds, I suppose.09:15
bigjoolsI expect so - doing it wrong.  Are they package builds?09:16
bigjoolsbinary that is09:16
wgrantThey're all BPBs, yes.09:16
bigjoolsare they rebuilds?09:16
wgranthttp://paste.ubuntu.com/466245/ are the virtualized ones. Although there are a couple of legitimate builds that snuck in there.09:16
bigjoolshow was this discovered?09:17
wgrantThe ones with score == 0 are the buggy ones.09:17
wgrantI was QAing my getBuildQueueSizes change.09:17
wgrantCompare the build queue sizes on edge and production.09:17
bigjoolssigh09:17
bigjoolslastscore=0 indicates a retry09:18
wgrantAh, true. I guess it would be NULL if they'd just finished naturally.09:18
wgrantThey're fortunately all fairly old.09:19
bigjoolsI need to clean up the rebuild anyway09:19
wgrantThere are 80 or so non-virt builds which I didn't query for, since the non-virt build queues are large.09:20
bigjoolshow old?09:20
wgrantWell, we're up to BQ 3700000 or so.09:21
wgrantMost of them are around 333000009:21
bigjoolsdid you get any dates?09:21
wgrantThey should be in Job, but I didn't query for them, no.09:22
wgrantI decided I'd bothered people enough.09:22
bigjoolsit's so much harder to query across jobs now :(09:23
wgranthttp://paste.ubuntu.com/466245/  has the query09:24
bigjoolsI have some pre-potted ones09:24
wgrantJust throw a couple of extra columns like BPB.status and Job.date_created in, I suppose.09:24
bigjoolsargh, I can't even do this on staging any more09:25
wgrantNo perms?09:25
bigjoolsno, we wipe the queues when restoring09:25
bigjoolsto stop staging collecting build from production09:26
wgrantOh, true.09:26
wgrantSo, the few builds on there from before 3300000 are all SUPERSEDED.09:26
wgrantThen there are a couple around 330000009:26
wgrantAnd the rest around 333000009:26
wgrantI haven't checked the statuses for the last two categories, though09:27
bigjoolswgrant: there's one (!) like that on dogfood09:32
wgrantbigjools: What's its status?09:33
wgrantBuild status, not Job status.09:33
* bigjools rides SQL wild horses09:37
lifelessbigjools: asking about things that will make deployment hard09:38
lifelessbigjools: easy deployment is a really important thing for iterating on performance at faster than monthly cycles.09:38
bigjoolslifeless: do you know how it will make it hard?09:38
lifelessbigjools: no idea, wgrant mentioned it was all09:39
bigjoolsother than being a PoS bash script, I don't see the problem09:39
lifelessbigjools: for the publisher specifically I'm told that interrupting it causes damaged PPA archives until its run again, and in some cases until the PPA has a new build and the publisher runs again.09:39
bigjoolsyes it's possible, but easy to fix09:40
wgrantbigjools: If we die during !primary index generation, the indices will be inconsistent until the next run.09:40
wgrantAnd if we die before index generation, there'll be no dirty pockets, so no index regeneration will occur.09:40
bigjoolswe need to write to a tmp dir like ubuntu does09:40
wgrantThat's what I suggested.09:40
wgrantBut it's not trivial to port, given how it's done now.09:41
bigjoolsand remove the stupid partial commits09:41
wgrantWe can't do that until we publish everything really quickly.09:41
lifelessbigjools: the impact of that on the deployment story is that we have to be careful about when we start deployments, shutting down the task well in advance, which adds latency and that adds perceived downtime.09:41
wgrantI think a half-hour transaction over the whole primary archive would make stub cry a bit.09:41
lifelessnone of this is intractable, but the more steps that are needed, the more coordination, the slower the process is.09:42
bigjoolsyeah, whoever wrote the publisher knew nothing about recoverability09:42
lifelessyup09:43
wgrantbigjools: Alternatively, we could write indices atomically like the primary archive, and store dirty pockets somewhere persistent.09:43
wgrantEverything else should work.09:43
lifelesswgrant: everything you do to make the system more resilient, j'adore09:43
bigjoolsme likey atomic09:43
wgrantOr we make the publisher complete in a few seconds, and do it all in one transaction.09:44
wgrantBut I can't get it much below two minutes.09:44
lifelesscan you make it pipeline / incremental ?09:44
wgrantThe thing that takes all the time (after optimisation) is serialising the indices.09:45
lifelessdoes the db know the indices ?09:45
bigjoolspipelining is possible but has ramifications09:45
lifelesscould you, just commit, then do the indices as a read-back from the db ?09:46
lifeless(in principle)09:46
bigjoolslifeless: you mean lazy generation?09:48
bigjoolsnot sure I follow09:48
jmlI'm fixing the conflict, btw09:49
jmlis there any meaningful difference between IStore(self) and Store.of(self)?09:50
jmlIStore uses c.l.webapp.adapter.get_store09:50
lifelessI think the second reads more easily09:51
wgrantDoes the former respect the master/slave policy?09:51
lifelessbigjools: I mean doing the transaction commit as soon as possible09:51
lifelessbigjools: and generating the indices from the result of the commit, not within the commit09:51
wgrantlifeless: The problem is that if we commit before doing indices, we don't know that we need to regenerate them later.09:52
bigjoolslifeless: what wgrant said09:52
lifelesswgrant: that seems fixable09:52
bigjoolshehe - take a look at the publisher :)09:52
wgrantlifeless: Right, by storing dirty pockets in the DB, which fixes it all.09:52
wgrantBut is very ugly.09:52
lifelesscompared to 30 minute db transactions with uninterruptable cron scripts that prevent rollouts for 30% of the day.09:53
lifelessnot ugly at all.09:53
wgrantShhhhh.09:53
lifelessI'm very keen to see things here improved.09:54
lifelessThe general principle of 'do small amounts of work, often' and 'delay till outside of transactions things that don't need to be in the transaction' are very dear to me.09:54
jmlwgrant, the former goes straight to the master, I think.09:56
wgrantjml: Ah.09:56
bigjoolswgrant: does this look sane to display the affected builds? http://pastebin.ubuntu.com/466361/10:04
mwhudsonwgrant: your branches were 1 of 3, did you get the emails?10:06
wgrantmwhudson: Yeah, saw that. Thanks.10:12
wgrantbigjools: I would have said buildfarmjob.status NOT IN (0, 6), but looks OK.10:12
bigjoolswgrant: using NOT IN makes queries slow10:13
wgrantbigjools: Ah, I guess so.10:13
wgrantbigjools: Also, grab buildqueue.virtualized.10:13
bigjoolswhy?10:13
wgrantWhy not?10:14
wgrantMight as well get all the categorisation information.10:14
wgrantI only omitted it from the initial query because I was restricting to virt jobs.10:16
* wgrant grumbles about the incomplete Registry<->Soyuz split.10:23
wgrantmwhudson: Tests fixed. Can you please rerun, if you're still around?10:41
wgrantI guess not.10:42
wgrantCan someone else please re-EC2 lp:~wgrant/launchpad/refactor-_dominateBinary and lp:~wgrant/launchpad/bug-598345-restrict-dep-contexts?10:43
bigjoolswgrant: problematic queue rows blitzed, how's it looking?10:49
wgrantbigjools: So none of them were actually pending?10:49
bigjoolssome where but I left those alone ;)10:50
bigjoolssigh10:50
bigjoolssome *were*10:50
wgrantThat looks OK.10:50
wgrantThe numbers match now.10:50
bigjools\\o/10:50
wgrantI will be really glad when we get the model rework done, and such inconsistency becomes impossible.10:50
bigjoolsyarp10:50
wgrantBecause we will be able to avoid storing the status in four or five places...10:51
bigjoolsalthough notice that all of those were from march/april10:51
wgrantYeah.10:51
bigjoolswgrant: https://edge.launchpad.net/~oem-archive/+archive/budapest/+build/188025710:52
bigjoolscan you see that?10:52
wgrantbigjools: No. Is that the one that failed to upload this morning?10:53
bigjoolsyes10:54
wgrantI considered going hunting for the upload log, but decided the search space was slightly too big.10:54
bigjoolsonly it says it's built properly so it looks like it dispatched twice :/10:54
wgrantWhat was the upload error?10:54
bigjoolsPM10:54
wgrantOr is it FULLYBUILT now?10:54
wgrant?10:54
wgrantoH.10:55
wgrantRight.10:55
bigjoolsit's built10:55
wgrantSo it was built four times.10:55
wgrantSucceeded the first and last.10:55
wgrantBut somehow was retried after the first.10:55
wgrantYay.10:55
wgrantI thought we'd weeded out all of that :(10:55
bigjoolswgrant: I suspect double clicking on UI buttons11:05
wgrantbigjools: But... transactions.11:05
bigjoolswgrant: that already causes havok on copy packages11:05
wgrantFor copies, sure.11:05
bigjoolswgrant: they don't help if the db constraints don't catch the problem11:05
wgrantbigjools: But a retry resets the BFJ status.11:05
wgrantThey both have to update the same row.11:06
lifelessI have a suspicion something is double-forwarding every now and then11:06
lifelessor something11:06
bigjoolsto the same thing11:06
lifelesssee the bug I filed about getting two bugs11:06
wgrantbigjools: I really hope that Postgres doesn't accept that.11:06
bigjoolswgrant: unless there's a constraint, it will11:06
bigjoolslifeless: really? ugh11:06
lifelesselmo: https://bugs.edge.launchpad.net/soyuz/+bug/60739711:08
_mup_Bug #607397: buildds need to survive the buildd master being upgraded <Soyuz:Incomplete> <https://launchpad.net/bugs/607397>11:08
lifelesselmo: can you please describe there the build farm issue you related to me - or perhaps its no longer an issue ?11:08
wgrantbigjools: With SERIALIZABLE as the isolation level, a concurrent update like that is prevented.11:09
wgrantI just checked.11:09
wgrantI wonder what Storm uses, though.11:09
lifelesselmo: nvm11:10
wgrantIt default to serializable, as I'd hoped.11:10
wgrantAhh.11:12
wgrantBut LP overrides it to READ COMMITTED, and that allows it.11:12
wgrantDamn.11:12
lifelessbwah11:13
lifelessall of lp  is read committed ?11:13
wgrantI think so. Need to check Postgres logs harder.11:13
bigjoolsthat's.... not good11:13
* wgrant looks harder.11:13
wgrantAt least the appserver transactions immediately set it to READ COMMITTED, or so the postgresql logs show.11:14
lifelessbah11:15
lifelesswgrant: you broke our builds11:15
lifeless:P11:15
wgrantlifeless: It was too quick to be my fault :(11:15
lifelessBuild Reason:11:15
* wgrant blames the build system.11:15
lifelessBuild Source Stamp: [branch bzr+ssh://bazaar.launchpad.net/~launchpad-pqm/launchpad/devel] HEAD11:15
lifelessBlamelist: William Grant <me@williamgrant.id.au>11:15
lifelessBUILD FAILED: failed shell_6 compile11:15
wgrantOh yes, I got the email.11:15
wgrantBut the test suite doesn't take an hour.11:15
wgrantSo either I broke things really badly, or the build system is broken as usual.11:16
wgrantOr someone turned on parallelisation while I wasn't looking.11:16
lifelessno11:16
lifelessnot done yet11:17
wgrantbigjools: Ah, wait, there's a UNIQUE on buildpackagejob.build. So you can't queue the build twice regardless of how screwed LP's default transaction level may or may not be.11:18
bigjoolsthat's the kind of constraint I like11:19
wgrantSo we have more hunting to do.11:19
bigjoolsgrar11:19
wgrantAnything in the logs yet, or must we go librarian diving?11:19
bigjoolshang on11:20
wgrantHmm. The librarian GC is rather aggressive :/11:22
lifelessmpt: hai11:22
lifelessmpt: lunch is @ 111:22
lifelessmpt: are you planning to starve me?11:22
wgrantIt will immediately kill anything unreferenced and with no expiry date set.11:22
mptlifeless, clearly, by "1" I meant "2"11:23
lifelessawesome11:23
lifelessUrsinha-afk: how does the oops <-> bug stuff work ?11:23
lifelesswgrant: are you working on 78 SELECT COUNT(*) FROM Archive, BinaryPackageBuild, BuildFarmJob, PackageBuild WHERE distro_arch_se ... tus=$INT AND Archive.purpose IN ($INT,$INT) AND Archive.id = PackageBuild.archive AND ($INT=$INT):11:24
lifeless ?11:24
wgrantjml: Um, canonical.uuid has been gone for more than a hundred devel revisions...11:24
wgrantlifeless: I don't really know how to fix it.11:24
wgrantThere's nothing obviously wrong with it that I can see.11:25
wgranthttp://paste.ubuntu.com/465800/ is the EXPLAIN ANALYZE of the other slow query, which is just about the same.11:25
lifelessright11:26
lifelessso dropping the count * separate query will save 5 seconds11:26
lifelesssorry 7.711:26
wgrantBut that's lazr.restful.11:26
wgrantNot sure we can do much about that.11:26
wgrantAnd the query shouldn't take long at all anyway.11:26
lifelesssure we can11:26
lifelessis there a bug for it ?11:26
wgrantThe slow queries?11:26
jmlwgrant, I'm just the messenger.11:26
lifelessthe problem11:26
wgrantI filed one a month or two ago.11:26
lifelesswhats the number11:26
* wgrant is hunting.11:27
wgrantIf bug search was a little better...11:27
lifelesshush11:27
wgrantBug #59070811:27
_mup_Bug #590708: DistroSeries.getBuildRecords often timing out <api> <oops> <soyuz-build> <timeout> <Soyuz:Triaged by michael.nelson> <https://launchpad.net/bugs/590708>11:27
wgrantjml: Well, since I have no recent shipit, I can do nothing.11:28
lifelessbigjools: hi - https://bugs.edge.launchpad.net/soyuz/+bug/59070811:29
_mup_Bug #590708: DistroSeries.getBuildRecords often timing out <api> <oops> <soyuz-build> <timeout> <Soyuz:Triaged by michael.nelson> <https://launchpad.net/bugs/590708>11:29
* wgrant just commented with the paste.11:30
wgrantlifeless: I wonder if we should test kernel delayed copies and acceptance from +queue before taking the timeout down permanently. Those are done infrequently, take ages, and it's pretty bad if they stop working.11:30
lifelesswgrant: can you add a test plan for testing them on staging ?11:34
lifelesswgrant: staging is at 12 seconds.11:34
wgrantlifeless: Um, I'm not sure if testing on staging is valid.11:34
lifelesswgrant: why wouldn't it be ?11:35
wgrantlifeless: It sucks performance-wise.11:35
lifelessright11:36
lifelessso if it works on staging, we're set for prod.11:36
wgrantTrue.11:36
wgrantbigjools: Any luck? It'd be nice to get onto it before librarian-gc deletes the evidence.11:45
bigjoolswgrant: no, best start diving11:45
wgrantbigjools: I guess you could just look for any recent restricted 'uploader.log's...11:50
bigjoolsurh11:51
wgrantSince buildd-manager's log doesn't seem to be much help in this sort of situation.11:51
lifelesswgrant: so, I've commented and escalated this12:00
wgrantlifeless: Thanks.12:00
lifelesswgrant: I think lazr restful is hurting us here and we may want to change it.12:01
wgrantlifeless: Possibly, that probably breaks all clients in the wild.12:01
lifelessseparately fixing up the query to not do table scans - +112:01
lifelessflacoste: http://people.canonical.com/~flacoste/tags-burndown-report.html is not updating ?12:01
lifelesswgrant: on this url, they are already broken.12:01
deryckMorning, all.12:02
lifelessit was timing out regularly on prod before12:02
lifelessnow its just -clear-  :)12:02
lifelesshey deryck12:02
wgrantCan someone please re-EC2 lp:~wgrant/launchpad/refactor-_dominateBinary and lp:~wgrant/launchpad/bug-598345-restrict-dep-contexts?12:02
bigjoolswgrant: I can do it locally if nobody volunteers ec212:08
bigjoolsbtw I haz logs12:08
wgrantOoh, logs.12:08
wgrantAre the logs useful?12:08
bigjoolssomewhat, I'm PM you12:08
=== al-maisan is now known as almaisan-away
=== mrevell is now known as mrevell-lunch
=== almaisan-away is now known as al-maisan
poolielifeless, hi?13:15
=== mrevell-lunch is now known as mrevell
danilosadiroiban, hi, it seems there's a conflict in +templates fix of yours now (I'm trying to land it); can you please take a look and fix it :)13:19
* wgrant still needs someone to land those two branches.13:21
adiroibandanilos: Hi. Looking...13:21
daniloswgrant, got MPs for me that I can just pass into "ec2 land"? (fwiw, I had some problems with ec2 land lately, so I am not promising it will work)13:23
wgrantdanilos: Thanks. https://code.edge.launchpad.net/~wgrant/launchpad/refactor-_dominateBinary/+merge/29667 and https://code.edge.launchpad.net/~wgrant/launchpad/bug-598345-restrict-dep-contexts/+merge/30203 are the MPs.13:24
=== Ursinha-afk is now known as Ursinha
jmlmars, hi13:34
marsHi jml, what's up?13:34
jmlmars, I don't understand your recent email.13:34
marsjml, the "Hurray for failing fast" one?13:35
jmlmars, yes. the build *did* go on to fail with indecipherable errors.13:35
marsbuild 1066, right?13:36
marsAccording to the waterfall, I see "pull new revisions [failed]", and "compile [failed]", and that's it13:36
marsAccording to this it never ran the test suite13:36
jmlmars, it still tried to.13:37
jmlmars, and the error the compile fails with is: zope.configuration.xmlconfig.ZopeXMLConfigurationError: File "/srv/buildbot/slaves/launchpad/devel/build/script.zcml", line 7.4-7.3513:37
jml    ZopeXMLConfigurationError: File "/srv/buildbot/slaves/launchpad/devel/build/lib/canonical/configure.zcml", line 157.4-158.4213:37
jml    ZopeXMLConfigurationError: File "/srv/buildbot/slaves/launchpad/devel/build/lib/canonical/shipit/configure.zcml", line 55.413:37
jml    ImportError: No module named uuid13:37
jmlmars, I mean, it's definitely better than running the whole test suite.13:37
jmlmars, which is wonderful :)13:38
marshmm13:38
marsI don't know why it moved on to compile_6.  Just a sec, checking the config13:39
jmlbut if a dependent branch had changed in a subtler way, it still would have gone on to run the whole suite13:39
marsinteresting13:41
marsmy fix did *not* land13:41
marsthe compile steps are supposed to halt the build by default13:42
marsperhaps that is what caught it13:42
jmlwhat caught it was that it's an import error13:42
marsmthaddon, ping, was there any word on landing my buildbot "fail fast" config change?13:42
jmland the apidoc generation has to import everything13:42
mthaddonmars: we landed it but didn't restart the builder - want me to do that now?13:42
jmloh I see, there were no steps beyond the compile one.13:43
jmlactually, no, there were13:43
marsmthaddon, sure, there are three branches in there, but we have to do it sometime :/13:43
marsjml, yes, but you are right about the subtle changes things.  If it misses pulling GPG or something, then it happily goes onward into the suite.  You are right, we are lucky it happened to fail and halt on the compile step.13:45
adiroibandanilos: conflict solved and I have pushed the changes13:45
adiroibandanilos: the branch should be ready for ec2-test and landing13:46
jmlmars, ok cool. I'll gladly watch my two branches get delayed if your fix for that gets deployed & works.13:46
jml(will I have to resubmit?)13:46
jmlactually, I guess it's just force another rebuild13:46
marsjml, nope, it will be pulled into the next build13:46
marsright13:46
* jml moves on to his next problem13:47
jmlhow can I run pyflakes on doctests?13:47
jmlI seem to remember being able to do so13:47
Ursinhaor, lifeless, hi13:49
marsjml, if you don't find something in the list archive, it might be on the Hacking pages13:49
marsjml, check your "Sent" folder, dated 13/7/2009, "[EMACS] Another flymake trick"13:50
lifelessUrsinha: hi! ;)13:51
Ursinhalifeless, you asked about the oops <-> bug link, I assume you're talking about the bug link in the oopses?13:51
jmlmars, :) thanks.13:51
Ursinhalifeless, it was too early in this timezone when you pinged me :)13:51
lifelessUrsinha: yeah13:51
lifelessUrsinha: and yes, I asked for your backscroll :)13:51
lifelessUrsinha: I also filed a bug13:52
jmlmars, actually, nothing useful in there :\13:52
Ursinhalifeless, let me see13:52
barrylosa ping: bazaar.lp.net seems unhappy. can i haz restart?13:53
lifelessbarry: hi13:53
lifelessbarry: whats your bug # ?13:53
marsjml, Ah, sorry, thought that mail was right on target.  Maybe the great Warsaw knows13:53
mthaddonbarry: unhappy in what way?13:53
barrymthaddon, can't connect13:53
mthaddonbarry: via bzr+ssh?13:54
lifelessbarry: its happy for me13:54
barrylifeless, bug #?13:54
lifelessbarry: try turning on your network13:54
barryhttp://bazaar.launchpad.net/~ubuntu-dev/ubuntu-dev-tools/trunk/files/head:/doc/13:54
lifeless?13:54
lifelessbarry: you said you had a page timing out13:54
mthaddonbarry: that's codebrowse - was just restarted13:54
marsbarry, works for me13:54
barrylifeless, ah, yes13:54
lifelessbarry: also if that page comes up13:54
barrylifeless, https://edge.launchpad.net/~pythoneers/+archive/py27stack4/+packages?start=0&batch=20013:54
Ursinhalifeless, bug 607087 ?13:54
_mup_Bug #607087: enable 'search by method' <OOPS Tools:New> <https://launchpad.net/bugs/607087>13:54
lifelessbarry: wait 60 seconds and hit ctrl-R13:54
barrymthaddon, thanks! works now13:54
lifelessUrsinha: no, a new one ;)13:54
lifelessUrsinha: I'd love that one fixed too, of course ;)13:55
Ursinhahehe13:55
lifelessUrsinha: I filed the other one in launchpad-foundations13:55
Ursinhaah13:55
Ursinhalifeless, do you have the #:13:55
lifelessbecause its about our docs13:55
lifelessuhm13:55
wgrantbarry: Is this another of those 2700-build monsters that gets deleted a couple of hours after it finishes using days of build farm time?13:55
lifelesssec13:55
Ursinhalifeless, I can find it here, no prob13:55
barrywgrant, ;) no13:55
lifelessbarry: so is that your hacked url13:56
barrywgrant, just 150-ish packages13:56
marsmthaddon, btw, can we update the buildbot configs trunk with my 'fail fast' patch?  Or do you want to wait for it to run successfully first?13:56
lifelessbarry: or the original13:56
barrylifeless, it is hacked13:56
lifelessbarry: what url fails13:56
Ursinhalifeless, bug 60768013:56
_mup_Bug #607680: documentation needed on oops<->bugs linking <Launchpad Foundations:New for matsubara> <https://launchpad.net/bugs/607680>13:56
wgrantbarry: Ah, good. Sanity.13:56
lifelessUrsinha: yes!13:56
barrylifeless, that url.  iow.  normally you get batches of 50 but i want to see all packages in one page13:56
Ursinhalifeless, what links oopses to bugs is part of the oops-tools13:56
jmlbarry, do you have a copy of pyflakes-doctest, or now where I can find one?13:57
barryjml, atm, i don't13:57
jmlbarry, thanks anyway.13:57
barryjml, yeah, sorry13:57
lifelessUrsinha: ok, thats cool. However most devs don't have that on their disk :)13:57
* barry -> reboots. hopefully will brb13:57
Ursinhalifeless, exactly :) afaiu it's a simple mechanism that associates the "oops signature" to a bug number13:58
jmlahh13:58
jmlit's in old versions of the tree13:58
Ursinhathat's why we have incorrect links sometimes13:59
Ursinhalifeless, what exactly are you aiming with that?13:59
flacostelifeless: that graph was moved to lpstats14:00
danilosadiroiban, wgrant: all your branches are on ec2 now14:00
flacostelifeless: https://lpstats.canonical.com/graphs/LPQA/14:01
flacostelifeless: https://lpstats.canonical.com/graphs/LPQAByTeam/14:01
daniloslifeless, I have a suggestion for bug 590708, I've added it to the bug and emailed some reasoning to the list as well14:01
_mup_Bug #590708: DistroSeries.getBuildRecords often timing out <api> <oops> <soyuz-build> <timeout> <Soyuz:Triaged by michael.nelson> <https://launchpad.net/bugs/590708>14:01
wgrantdanilos: Thanks.14:01
danilosbigjools, ^14:01
daniloslifeless, ignore the timing differences on the bug (that's with explain analyze which usually doubles the times) and in the email though ;)14:02
wgrantdanilos: Ooh, that's good.14:02
daniloswgrant, the traditional translations tricks fwiw :)14:02
daniloswgrant, hopefully the queries are compatible :)14:03
danilosor, let's say equivalent14:03
mthaddonmars: which branch is that again?14:07
lifelessUrsinha: I want to be able to make the associations, so I want the way I should do that documented :)14:09
lifelessflacoste: ok cool, its linked from https://dev.launchpad.net/PolicyAndProcess/ZeroOOPSPolicy14:09
lifelessflacoste: how does oops tie into that graph ?14:10
lifelessdanilos: actually the explain in the bug case adds about 1000ms if you compare to the oopses14:10
daniloslifeless, well, even explain analyze runs quickly (300ms) on an optimized limit 50 query for me14:12
Ursinhalifeless, ah, that's bloody simple :) to make the association when there's no association, I mean, because we still cannot edit that other that using sql directly on oops-tools application db14:12
Ursinha*other than14:12
Ursinhalifeless, I'll make sure that's documented somewhere14:13
daniloslifeless, but anyway, I was just trying to point out that the times I recorded are not really correct (I've ran it a number of times with both explain analyze and without), but my conclusions should generally hold for these queries14:13
lifelessdanilos: yeah14:13
lifelessdanilos: want to make a patch ?14:13
lifelessUrsinha: ok, so how does one do it ?14:14
daniloslifeless, not really, I've got a few branches in my queue already :)14:14
daniloslifeless, it'd be good to test queries on production DB first as well14:14
lifelessflacoste: is zerooopspolicy still active? or died-a-quiet-death ?14:14
lifelessmthaddon: could you try danilos query on a slave please? from the bug https://launchpad.net/bugs/59070814:15
_mup_Bug #590708: DistroSeries.getBuildRecords often timing out <api> <oops> <soyuz-build> <timeout> <Soyuz:Triaged by michael.nelson> <https://launchpad.net/bugs/590708>14:15
mthaddonlifeless: am in the middle of some ISD deployments at the moment - can it wait a little?14:15
lifelessmthaddon: of course14:15
mthaddonthx14:16
flacostelifeless: it's kind of a not fully implemented policy14:16
lifelessok14:16
flacostelifeless: the intent is still there, but we lack the tools to fully back it up14:16
Ursinhalifeless, if an oops doesn't have a bug associated, add the bug number to the text box where the number usually is, click "Bug #", and it's done14:16
lifelessbut in principle folk have the goahead to shelve stuff to work on oops14:16
lifelessUrsinha: oh, on the web ui ?14:16
flacostelifeless: the OOPS report don't make it easy to get the "list of things" that need fixing14:16
Ursinhalifeless, as you can see in this one, for example: https://lp-oops.canonical.com/oops.py/?oopsid=OOPS-1661CCW114:16
Ursinhalifeless, yes sir14:17
lifelessah doh! I see14:17
Ursinha:)14:17
flacostelifeless: yes14:17
Ursinhalifeless, it clearly needs to be documented! :)14:17
lifelessUrsinha: nah, i'm still a little tired here, its been busy ;)14:17
lifelessUrsinha: close with 'lifeless is blind', if you like.14:18
lifelessflacoste: ok, so lets talk about making it easy to see what needs to be fixed.14:18
Ursinhalifeless, hehe, no, it really needs to be somewhere, so we can make easy to see what needs to be done14:18
lifelessflacoste: As I get the policy, every oops in the oops report - say just the top 15 timeouts for now - should be a high/critical bug right ?14:18
lifelessUrsinha: does the oops report sent to the list list the bugs when there is one ?14:19
Ursinhalifeless, yes, right below the oops signature/count14:19
Ursinhalifeless, as you can see in the oops reports14:20
UrsinhaI guess I just repeated what you said, but nevermind :)14:21
flacostelifeless: there should be clear list of things to fix for each team14:21
flacostecurrently, they have to look for that list across several reports and several questions14:22
flacostes/questions/sections/14:22
Ursinhaflacoste, I've discussed that with diogo a few times, and we need to improve the oops signature in order to better group the oopses14:23
Ursinhadone that, I guess the per team's reports will be more accurate14:23
UrsinhaI also would like to know from the teams which oopses are real ones and which ones are only tainting the summaries14:23
Ursinhalike the checkwatches one, for example14:23
flacosteUrsinha: best way to do that is to book a call with each TL and go over a couple of reports with them14:24
Ursinhalike bug 59234514:24
_mup_Bug #592345: Checkwatches produces a lot of OOPSes that aren't real LP failures <oops> <Launchpad Bugs:Triaged> <https://launchpad.net/bugs/592345>14:25
Ursinhaflacoste, I started with bugs team14:25
Ursinhabut I'll have to do that with every other one, yes14:25
lifelessUrsinha: they're all real.14:25
lifelessIn my simple simple opinion14:25
flacostelifeless: not really14:26
flacostefor example a 404 isn't a real OOPS14:26
flacosteand a lot of checkwatches failure are of that sort14:26
lifelessflacoste: here - <gmb> Ursinha, We already have a backoff mechanism in place. However, now that we're tracking BugWatchActivity we can probably stop recording OOPSes for some things.14:26
flacosteright14:26
lifelessflacoste: I agree a 404 isn't a real oops, but the oops system already knows about that one and could trivially skip it14:27
lifelessflacoste: (except where we have an internal ref that makes a 404)14:27
lifelessbarry: so what url was timing out again? not the hacked one, the real one.14:28
lifelessUrsinha: help me out with the oops reports here -14:30
lifelessthe 19th report, edge errors14:30
lifeless=== Top 15 Time Outs (total of 108 unique items) ===14:30
lifeless 876 SELECT BugTask.assignee, BugTask.bug, BugTask.bugwatch, BugTask.date_assigned, BugTask.date_close ... ON BugTask.bugwatch = "_prejoin5".id14:30
lifelesswhen I open the first oops up14:31
lifelessit has a bug #14:31
lifelessbut I didn't see the bug reference on the page14:31
Ursinhalifeless, it's wrong14:31
Ursinhahm14:31
barrylifeless, i don't think the unhacked one was timing out :/14:31
Ursinhalifeless, it has?14:31
lifelessbarry: ah. Don't hack the url14:31
Ursinhalifeless, let me just do my daily call with foundations, I'll get back to you in a moment14:31
barry>:-#14:31
lifelesshttps://lp-oops.canonical.com/oops.py/?oopsid=1661A101014:31
barry:)14:31
lifelessUrsinha: ^14:31
lifelessUrsinha: but it thinks it is a translations problem, yet its a bugs issue14:33
Ursinhalifeless, I'll explain14:33
lifelessafter your call :)14:34
lifelessgmb: are you back ?14:37
Ursinhalifeless, so14:41
Ursinhalifeless, this bug is linked incorrectly because of the "oops signature" oops-tools uses to identify the uniqueness of a problem, and link it to the bug14:41
Ursinhalifeless, we're aware that the way it is now isn't good because it doesn't work for timeouts14:42
lifelessUrsinha: can we purge the old one then ? I mean, we need a improvement in that eventually.14:42
Ursinhalifeless, it's a known issue14:42
lifelessbut right now its steering folk wrong14:42
Ursinhalifeless, sure, I can do that14:42
lifelesssweet14:45
lifelessUrsinha: do you think we'll be ready to switch to the new workflow this cycle ?14:47
Ursinhalifeless, that will require changes in the scripts we use today14:47
UrsinhaI'm not sure14:47
Ursinhalifeless, when the new cycle starts? in three weeks?14:48
lifeless1 week I think, we're in 4 of 10.07 according to topic.14:48
lifelessof course, topic could be lying14:48
lifelessUrsinha: if you could have a set of bugs tagged release-features-when-they-are-done, I might try to help out on the change14:48
wgrantThis is 10.08 week 1.14:49
=== Ursinha changed the topic of #launchpad-dev to: Launchpad Development Channel | Week 1 of 10.08 | PQM is open | https://dev.launchpad.net/ | Get the code: https://dev.launchpad.net/Getting | On-call review in irc://irc.freenode.net/#launchpad-reviews | Use http://paste.ubuntu.com/ for pastes
marslifeless, do you mean switch to the new merge workflow at the end of this cycle?14:50
lifelessmars: thats the question14:50
Ursinhalifeless, we still need to create the blesser, and integrate that with the merger infrastructure14:50
Ursinhaand mars know about the second part better14:51
lifelessgiven its non trivial, its not jfdi-able14:51
lifelessI'd like to be able to see where I should help make this happen14:51
marsyes, I can think of three separate code changes (including one entirely new script or application)14:51
lifelesseither by begging some time from team leads, or by helping out and doing, as appropriate.14:51
lifelessits a crucial change to make our production story cleaner and better14:52
marswell, it will happen - as to an alpha of the new cycle, I think getting that for 10.08 is unknown (I don't know our velocity), but I would think 10.09 for an alpha or even a beta could definitely happen14:53
UrsinhaI agree with mars14:53
marslifeless, I'll keep this moving forward, and I will let you know how things are going, so you can jump in where you think it's best14:56
lifelessif you make it visible to me14:56
lifelessI'll help in some fashion14:56
lifelessbetter deployment is a key enabler for overall velocity14:56
lifelessand this is a necessary condition for better deployment14:57
marslifeless, it should be on the Foundations Kanban board - I'll make a lane now14:57
lifelesswow14:57
lifelesshttps://lp-oops.canonical.com/oops.py/?oopsid=1661C18314:57
lifelessmars: so kanban is great too - my main point is if *I* can figure out what is still todo, I can help somehow :)14:58
lifeless14 seconds of non-sql time14:58
marsheh, ok14:59
jmljust to be clear, are our buildbots running Python 2.6?15:03
jml(because ec2 test certainly isn't)15:03
marsjml, only the lucid_db_lp one is15:04
jmlmars, thanks.15:04
jmlnext question15:04
marsjml, ok, so 'lucid ec2 image' goes on my ToDo list15:04
jmlhow do I use "with" statements in doctests in Python 2.5?15:04
marsbenji might know?15:04
gary_posterfrom __future__ in a >>> line doesn't work?15:05
* benji reads the scrollback15:05
jmlgary_poster, apparently not.15:07
lifelessgary_poster: from __future__ is magic IIRC15:07
lifelessgary_poster: it changes the parser15:07
lifelessIIRC to do that with a non .py compile you need to pass a compile flag in15:07
benjijml: I assume you tried "from __future__ import with_statement" and it didn't work.15:08
jmlbenji, that's correct.15:08
lifelessderyck: whats 'null bug task' all about15:08
benjihmm, there is some support for future in doctests, let me look at the source real quick15:08
jmlmy experimentation cycle is quite long, since I don't know how to build Launchpad locally for Python 2.515:09
derycklifeless, null is a workaround for not being able to delete a task, since marking it invalid means you continue to receive mail15:09
lifelessnot the null project15:09
lifelessthe code path15:09
lifelesshttps://lp-oops.canonical.com/oops.py/?oopsid=1661C18315:09
lifelessPage-id: NullBugTask:+index15:09
lifelesstime: 4388 ms15:10
lifelessnon-sql time: 14274 ms15:10
lifelessStatement Count: 49915:10
wgranterm, shouldn't NullBugTask just redirect now?15:10
gary_posterlifeless and bigjools, leonardr and I are discussing his reply to https://bugs.edge.launchpad.net/soyuz/+bug/590708 .15:12
gary_posterWe are considering the backwards compatibility issues of what he described, because we feel we're the ones who are most likely to care about that, and we are responsible for it.15:12
gary_posterIf we decide that leonardr's proposal is acceptable, I have the understanding that you are calling this a critical issue and that we should proceed to work on it, pushing our other tasks aside per the usual "critical" behavior.15:12
gary_poster(1) Do I understand correctly?  (2) If so, feedback on his reply would be appreciated, particularly if you have concerns.15:12
_mup_Bug #590708: DistroSeries.getBuildRecords often timing out <api> <oops> <soyuz-build> <timeout> <Soyuz:Triaged by michael.nelson> <https://launchpad.net/bugs/590708>15:12
bigjoolsgary_poster: sounds good to me - AIUI we are supposed to treat OOPSes as "stop the line"15:13
* bigjools reading the reply15:14
gary_posterI don't think that policy is practically acceptable for Foundations on a global basis, but that's a different conversation for a different forum15:14
lifelessmthaddon: https://bugs.edge.launchpad.net/soyuz/+bug/590708 - another for the queue, equal basis with doing the queries from danilo on a slave15:15
_mup_Bug #590708: DistroSeries.getBuildRecords often timing out <api> <oops> <soyuz-build> <timeout> <Soyuz:Triaged by michael.nelson> <https://launchpad.net/bugs/590708>15:15
lifelessmthaddon: leonardr needs to correspond with the folk having trouble15:16
lifelessleonardr: wgrant is one of those people15:16
wgrantHi.15:16
mthaddonlifeless: can you ping losa, I've passed the baton to another losa for interrupt queries15:16
leonardrwgrant, i'd like to see the program that's getting the timeouts15:16
leonardrso that i can see if your program will break if we apply the fix i've proposed15:17
lifelessmthaddon: sure, sorry I should have in that case too.15:17
lifelesslosa ping15:17
Chexlifeless: morning15:17
wgrantleonardr: http://qa.ubuntuwire.org/ftbfs/source/build_status.py15:17
Chexlifeless: or evening in your case?!15:17
lifelessChex: currently mid avo15:17
lifelessChex: as I'm in prague15:17
Chexlifeless: ah yes, your sprinting, cool.15:18
lifelessChex: if you look up a bit15:18
lifelessthere is a bug - one of several - high frequency timeouts15:18
wgrantleonardr: It's currently timing out every time it runs.15:18
leonardrgary: wgrant's script doesn't use len(), it just iterates over the resultset15:18
gary_posterjml: I wanted to abstract the Python selection so that it wouldn't have to always be the system's Python, but it was regarded as unnecessary.  The "test with another version of Python" story is another way in which that feature would be valuable, though.  Maybe it will come to life at some point.15:19
jmlgary_poster, *nod*15:19
lifelessChex: we'd like two bits of losa assistance in the short term on this - a) danilo has a faster proposed query, we'd like to validate its performance on a production slave.15:19
jmlgary_poster, I guess it's only really necessary during interim phases like this one.15:19
gary_posterright15:19
lifelessChex: b) we may need to get leonardr in contact with some users via api keys15:19
lifelessChex: a) should be cheap - can we do that please; b) ask leonardr in a minute :)15:20
benjijml: any futures included in the test globs are respected15:20
leonardrwgrant: did it ever work?15:20
wgrantleonardr: Until the edge timeout reduction, it worked aboutu 75% of the time.15:20
wgrantleonardr: Until 10.05, it worked 100% of the time.15:20
bigjoolswgrant: it broke with the build farm model changes?15:21
jmlbenji, cool. how would I add a future to a test glob?15:21
wgrantbigjools: Somewhat, yes.15:21
lifelessgary_poster: I don't have an position-specific opinion on who should do the work; LEAN dictates team accountability (all of LP devs being a single team), not smaller granularity; but we don't seem to do that at the moment : and I'm finding my feet.15:21
wgrantbigjools: I suspect it was sitting just under the threshold, or something has gone really wrong plan-wise.15:21
benjiI'll figure out an example.15:21
jmlbenji, thank you.15:21
bigjoolsideally we should make the query very very fast15:22
gary_posterlifeless: sure, fair enough on all counts15:22
wgrantCertainly.15:22
lifelesswe want to both make it fast, and avoid unnecessary work.15:22
Chexlifeless: ok, looking at the bug, and your request15:22
lifelessthe count(*) there seems to be unnecessary for many cases.15:22
lifelessChex: thanks15:22
bigjoolsunless we make the query(ies) faster, LP will never get faster15:22
lifelessright15:23
Chexlifeless: im a little confused what you would like me to do for you?15:23
lifelessChex: run the query in https://bugs.edge.launchpad.net/soyuz/+bug/590708/comments/815:23
_mup_Bug #590708: DistroSeries.getBuildRecords often timing out <api> <oops> <soyuz-build> <timeout> <Soyuz:Triaged by michael.nelson> <https://launchpad.net/bugs/590708>15:23
bigjoolsso I think the count(*) is a red herring15:23
lifelessChex: on a slave15:23
lifelessbigjools: its not the root cause here, but its not a non-issue.15:23
bigjoolsagreed15:23
wgrantCOUNT is always going to be fairly slow.15:24
Chexlifeless: ok, so one of the DB slaves, seems ok, and the SQL seems to be a SELECT only, so thats ok, too15:24
wgrantAnd it15:24
lifelessfixing *either* the query, or the count(*) will fix this issue.15:24
wgrantit's unnecessary.15:24
Chexlifeless: any idea on the performance hit for the query?15:24
lifelessChex: by ok, can you paste the output please :)15:24
lifelessChex: its being hit by bots at least once an hour15:24
bigjoolscount should be as quick as the query itself15:24
lifelessbigjools: hell no15:24
bigjools?15:24
Chexlifeless: you mean the bots are generating that query once an hour?15:25
lifelessChex: something very like it, but causing table scans.15:25
lifelessbigjools: sec, let me get the query tested first.15:25
Chexlifeless: understood, ok, will run on one of the Db slaves, chokecherry15:25
lifelessChex: thanks15:25
lifelessbigjools: ok, so count(*) has to complete the entire thing, ignoring offsets and limits15:26
lifelessbigjools: this is always more work except the special case when the limit of the first chunk matches the total work15:27
bigjoolslifeless: if the query has an order, that doesn't apply15:27
lifelessbigjools: why do you say that?15:27
bigjoolsit has to complete the query to order it!15:27
wgrantEven with indices?15:27
lifelessthat depends on the query15:27
lifelessvery very much depends on the query15:27
bigjoolsyes15:28
* bigjools goes back to buildd-manager hacking15:28
lifelessanyhow, my point is just - don't assume count(*) is effectively free: its not. :)15:28
bigjoolslifeless: hell no15:28
bigjoolsbut if the original query is quick, and it bloody well should be, then the count should not matter in the bigger picture15:29
Chexlifeless: bigjools: https://pastebin.canonical.com/34829/15:29
lifelessI agree it would be lost in noise today15:29
lifelessso, we may have stale statistics or something15:30
lifelessChex: thanks!15:30
bigjoolslifeless: "date_finished IS NOT NULL" kills that query15:30
lifelessbigjools: is date_finished indexed ?15:30
bigjoolsyes15:31
bigjoolsbut it does an index scan over 103k rows15:31
Chexlifeless: your welcome15:31
bigjoolswhen you use NOT IN it has no choice15:31
=== mtaylor is now known as mtaylor|breakfas
bigjoolsChex: thanks from me too :)15:31
lifelessbigjools: not in - yes, I get that. I don't see 'is not null' being == to 'not in', but I may be missing a specific technicality.15:32
bigjoolsmy tiny brain conflated the two, my bad15:32
lifelesshehe no worries.15:33
benjijml: import __future__15:33
lifelessis not null can be badly affected by index statistics and index selectivity though15:33
benjiand then in your test setUp, add a global like...15:33
bigjoolshmmm status15:34
benjitest.globs['with_statement'] = __future__.with_statement15:34
lifelessleonardr: so, I have a small separate idea for you.15:34
leonardrlifeless, ok15:34
lifelessleonardr: what if, when the result set is < pagination size15:34
lifelessleonardr: lazr restful simply *does not* call len() on the result set15:34
jmlbenji, sweet. thanks.15:34
benjijml: I don't have a Python 2.5 so I can't test it, so hopefully it'll work as-is15:34
lifelessleonardr: in this particular case we have 19 results15:34
leonardrlifeless, makes sense15:34
bigjoolslifeless: we can improve it with an index btree(date_finished, status)15:34
jmlbenji, well, I'll try that and resubmit via ec215:35
bigjoolslifeless: it's scanning over status15:35
lifelessleonardr: I know it won't help in the greater case15:35
wgrantlifeless: Only 19? There should be more than that...15:35
jmlbenji, in ~3hrs I'll know if it worked.15:35
benjiheh15:35
benjiit might be quicker for one of us to install 2.515:35
lifelesswgrant: privmsg15:37
lifelessleonardr: how big a fix would that short hack I'm proposing be ?15:37
leonardrlifeless: very easy15:37
lifelessfix/workaround15:37
lifelessleonardr: I propose, if you think its sensible, that we:15:37
lifeless - do this tiny hack; get that cowboyed to prod15:37
lifeless - do the larger one you proposed, if you think its tolerable15:38
lifeless - add the index bigjools is proposing, after evaluating it on staging15:38
wgrantlifeless: The 'Failed to build' query that normally times out should return 2614 results.15:39
lifelesswgrant: thats odd :P15:40
wgrantThat's larger than the batch size.15:40
lifelesswgrant: yes, we have to deal with bigger things15:40
leonardrlifeless et al: a quick analysis of the oopses shows that we seem to have three users15:40
leonardr1. leann ogasawara15:40
leonardr2. someone at ubuntuwire.org15:40
bigjoolslifeless: one thing I think we need to do is to document the queries we're using and the indexes that they need.  They're currently disjoint and we have no idea what needs what and what indexes are obsolete (which are a waste of processing time)15:40
leonardr3. someone at cranberry.canonical.com (wgrant?)15:41
bigjoolslifeless: a bit like the prejoins too15:41
wgrantleonardr: I have access to no Canonical machines. I manage the script on ubuntuwire.org.15:41
leonardrno, sorry, leann ogasawara is the person at cranberry.canonical.com15:41
leonardrour third client is someone from optusnet.com.au15:41
wgrantThat's me.15:41
lifelessbigjools: I think thats a great idea. Start doing it however, we can iterate to make it structured later.15:42
jmlwgrant, not internode?15:42
wgrantjml: No. Too far for reasonable ADSL performance.15:42
bigjoolslifeless: one thing that really scares me is changing prejoins - we've simply got no idea what it will affect.15:43
wgrantjml: And Optus' caps are about an order of magnitude larger now than they were 6 months ago.15:43
wgrantSo it's not that bad.15:43
lifelessbigjools: we'll be a lot safer once staging's timeout limit is down to 5 seconds15:43
leonardrwgrant: ok, so we have two users, you and leann15:43
lifelesswe can rev leann's launchpadlib pretty easily15:43
lifelesslet me go ask her15:43
=== deryck is now known as deryck[lunch]
jmlwgrant, fair enough :)15:44
jmlleann is being asked now in person15:44
leonardrah, i just asked her on irc, but ok15:45
lifelessshe was walking past the door to a meeting15:50
lifelessshe is running in the dc on the platform lp lib - hardies I suspect15:50
lifelessbut she can RT an upgrade anytime.15:50
leonardrlifeless, have we run the queries that oopsed to see how many results they actually return?15:50
lifelessthere are 800 or so15:50
lifelessI'd rather not do that by hand15:51
lifelesswgrant says one in particular he does routinely returns 2.7K15:51
james_wisn't backwards compatibility not really an issue as len() didn't work for a long time?15:52
lifelessleonardr: when doing 'approve' on an MP please approve the overall proposal too - so that the queue is representative of the work reviewers have left to do15:53
leonardrsorry15:53
* wgrant sleeps.15:53
wgrantThanks for looking at this.15:53
lifelessleonardr: no probs15:55
lifelessleonardr: its a small thing, but it helps the flow.15:55
leonardrjames_w: it's somewhat unlikely, but there was a published workaround for a long time, so i want to make sure15:55
lifelessflacoste: is there a burndown chart of oops and timeout bugs ?16:00
lifelessflacoste: or should I perhaps ask jml with his mad graphing skills to write one16:00
jmlI only do graphs with wobbly lines that go upwards and to the right.16:01
lifelessdon't worry, we can make this one do that16:02
lifelessjml: how hard would it be for you to do this?16:02
lifelesshmm meeting time16:03
jmllifeless, about as hard as it would be for you.16:03
jmlmaybe a little bit less if I used lpstats.16:03
* bigjools chuckles16:05
lifelessjml: I think you underestimate startup cost/activation energy16:08
jmllifeless, maybe.16:09
jmllifeless, if you email me with exactly what you want, I can give it a go.16:09
jmllifeless, but if you want it today or tomorrow, you're genuinely better off finding someone else.16:10
lifeless have mailed you16:11
marsrockstar, ping16:15
rockstarmars, distracted pong16:15
marsrockstar, just wondering what the progress was on your YUI 3.1 upgrade.16:16
rockstarmars, ah, very close.  Dealing with a bug where YUI 3.1 doesn't like the lp.client.16:16
marsok16:17
rockstarmars, I'll send it to you for review.16:17
marsI'm going to get someone to test 1.0, then we can look at rippling the change upward through the lazr-js tree16:17
marsrockstar, sure16:17
rockstarmars, 1.0?16:18
leonardrlifeless et al: ogasawara is indeed accessing the total_size (using the workaround since len() doesn't work in old wadllib)16:18
leonardrin fact, that's all she's doing with the data16:19
marsrockstar, yes, there are three lines of development right now: trunk (dev), a 1.0 release branch that will become 1.0-dev, and the 2.0-dev line16:19
rockstarmars, is there anyone else outside of Canonical using lazr-js?16:20
marsrockstar, not to my knowledge16:20
rockstarmars, I guess I'm asking "is there much point in the overhead required for coordinating various lines of development" ?16:20
rockstarI think there should be trunk, and then the project using it can maintain their own branch of that.16:20
marsyes, because we have projects on 1.0, and also projects on 2.016:20
rockstarmars, what projects are those?16:21
marsISD and LP are on 1.0, U1 and Landscape are on 2.016:21
rockstarmars, afaik, LP is on 0.9.2, which means very little, since we're not really on a branch at all.16:22
marsrockstar, I'm working to get it down to two branches: 1.0 dev, and 2.0 dev16:22
marsyes, the fact we are not on a branch is a problem as well16:22
rockstarmars, what I'm saying is "the branch we're based off has little to do with what we're actually running"16:22
rockstarmars, landscape, for instance, maintains their own branch.16:22
leonardrogasawara says that when it worked, her script found a total_size of 2614. given that neither of our users will benefit from the small-batch optimization, i think i should be doing something else (if we have a consensus about what else should be done)16:23
rockstarIf we land something on the 1.0 branch, we shouldn't be risking the breakage of a bunch of other projects.16:23
marsrockstar, yes, and that leads to everything being one big hairball :)16:23
rockstarmars, how's that?16:23
marsbecause changes like the YUI 3.0/3.1 split or the distribute debacle force other projects to maintain branches16:24
rockstarmars, if we make changes in trunk and then the projects then pull when they're ready, then we really only have one branch to worry about as lazr-js (trunk), and two branches to worry about as LP (trunk and whatever we're running)16:24
mars4 projects with 4 branches and mainline is 5 times the maintenance work needed16:24
rockstarmars, the YUI 3.0/3.1 debacle wouldn't have happened if sidnei had landed in trunk.16:24
marsand no one else would have been able to use or patch trunk16:25
marsyou either have everyone maintaining a private fork, or you consolidate16:25
marsI want to get everything consolidated16:25
rockstarmars, everyone needs to maintain a fork anyway.16:25
marsrockstar, why?16:25
rockstarHopefully it's a "pull only" fork.16:25
rockstarmars, because if we update 1.0, we can break other projects.16:26
rockstarAllowing the other projects to pull in changes when they are ready is (IMHO) the best option.  We only have one line of development, and when they're ready to upgrade, they do.16:27
marsthat leads to real problems with versioning and contribution16:27
rockstarmars, howso?16:27
marsyou have to write two patches (one for you, one for mainline), and you can't just pull trunk to get some new feature - there could be massive changes, meaning you have to backport and maintain yet another patch for your private fork16:28
marsPeople should just be able to skip between releases16:28
marsand releases should be documented in the changes they perform16:28
rockstarmars, I think you ought to propose this to a mailing list somewhere, and find out what other projects are doing.  I have suspicions whether or not it needs to be this complicated.16:29
rockstarmars, having two lines of development might mean that I need to patch my private fork, 1.0, and 2.0.16:30
marsrockstar, I don't see the complexity - we'll have one mainline (2.0), and one legacy line (1.0)16:30
marsrockstar, then you should drop your private fork, and fix mainline16:30
rockstarmars, I think this is better for the mailing list.  I suspect that the private fork is a feature other projects are a bit attached to (Launchpad historically is)16:32
marsrockstar, would you be willing to test the 1.0 branch to see if it builds on your system?  I would like to make lazr-js hackable again.16:38
rockstarmars, do you need it right now, or can it be in 1 hour?16:39
marsrockstar, better make that ~3 hours then, I'll be taking lunch around 12:3016:39
marserr, "I'll be taking lunch in 1 hour"16:40
rockstarmars, okay, that will work better, because I need to eat dinner soon.  I'm happy to test it though.16:40
marscool16:40
rockstarbigjools, are you still around?16:40
bigjoolsyep16:40
lifelessleonardr: lol!16:42
lifelessleonardr: so just exposing the size only thing would help her16:42
leonardrlifeless: yes, if we implemented the full solution she would have to change her script but we could get it to work16:43
lifeless\o/16:44
rockstarbigjools, I'm having a pretty hard time changing the status of a SPRBuild...  The security proxy is only slightly the problem.16:50
rockstarbigjools, do you have methods for flipping the switches?  All I see is handleStatus, and then things like "_handleStatus_OK" etc.16:51
bigjoolsrockstar: what are you trying to do?16:51
leonardrlifeless, i posted an update to the bug. i think we should shelve the 'don't run the count(*)' optimization since it's somewhat difficult and it won't solve the big problems. do you want me to work on the annotation-based solution?16:51
leonardr(shelve for purposes of this problem, not permanently)16:51
rockstarbigjools, make a SourcePackageRelease tied to a recipe16:51
rockstar(it's a hole in our testing currently)16:52
bigjoolsrockstar: in a test or in the code?  if the latter, at what point in the pipeline?16:52
rockstarbigjools, in a test.16:53
bigjoolsok16:53
rockstarbigjools, it won't let me set ISourcePackageRecipeBuild.source_package_release16:53
bigjoolsok let me check16:53
=== deryck[lunch] is now known as deryck
bigjoolsrockstar: dude, ISourcePackageRecipeBuild.source_package_release is a property so I'm not surprised :)16:56
bigjoolsrockstar: set SourcePackageRelease.source_package_recipe_build16:57
rockstarbigjools, *facepalm* I was looking at the interface thinking it'd give me all I needed...16:58
rockstar:)16:58
bigjools:)16:58
bigjoolsrockstar: if it makes you feel any better, I've done this exact same thing myself16:58
rockstarbigjools, it's a sign that we should delete all interfaces.16:58
pooliehow do i do an api query for bugs containing a particular tag (and maybe other tags)?17:01
lifelessleonardr: please17:01
lifelessbigjools: BuildFarmJob.status <> 1 - thats an issue17:12
lifelesslosa ping17:14
mthaddonlifeless: hi17:15
lifelesshi, we'd like to run an analyze on packagebuild after checking how big the table is17:15
lifelesson each db server17:15
lifelessand then check explain analyze SELECT BinaryPackageBuild.distro_arch_series, BinaryPackageBuild.id, BinaryPackageBuild.package_build, BinaryPackageBuild.source_package_release FROM Archive, BinaryPackageBuild, BuildFarmJob, PackageBuild WHERE distro_arch_series IN (109, 110, 111, 112, 113, 114) AND BinaryPackageBuild.package_build = PackageBuild.id AND PackageBuild.build_farm_job = BuildFarmJob.id AND (BuildFarmJob.status <> 117:16
lifelessagain17:16
lifelessif the table is -huge- we obviously don't want to wedge things.17:16
mthaddonlifeless: erm, why do we need to do this?17:18
lifelessalso I'd like to know the range of values in BuildFarmJob.status - select status from buildfarmjob unique;17:18
lifelessmthaddon: because we have an API call timing out - taking 18 seconds - and the query plan suggests a mismatch between statistics and actual data.17:19
lifelesshttp://paste.ubuntu.com/465800/17:19
lifelesswe've identified a few issues all at once related to this:17:19
lifeless - the api is doubling the db load by one of its bits of magic17:19
lifeless - the query is extremely slow itself17:19
mthaddonlifeless: I'd like to get stub's input on that (particularly why it's out of whack in the first place) ideally17:20
lifeless - the query contains an exclude - status <>1  rather than status in (2,3,4,5,6) or whatever it should be17:20
lifelessmthaddon: hes on a plane.17:20
lifelessmthaddon: I tried to ring him just before :(17:20
mthaddonlifeless: sure - is this a critical issue now?17:20
lifelessits timing out for ogsawara and wgrant every time; they aren't able to retry to fix it.17:21
lifelessif its simply stale statistics, that would be a very easy bandaid.17:21
lifelesswe can also raise the timeout again17:21
lifelessby which I mean, 'retries also time out'17:22
mthaddonI think increasing the timeout is a better short term fix - what should we increase it to?17:24
mthaddonlifeless: and it times out for them on both edge and lpnet?17:24
lifelessyes17:24
lifelessthats my [weak] understanding17:24
lifelessedge is doing a timeout every 120 seconds17:25
lifelessprod is a lot more unhappy, but that is primarily the bug attachment oops which gmb has been working on.17:25
lifelesswhich reminds me17:25
mthaddonlifeless: so which one are we changing? edge or lpnet? and to what?17:25
lifelessmthaddon: do you know how to generate a manual oops report for the oops since this morning on edge and lpnet?17:26
lifelessmthaddon: that would help me answer your question17:26
lifelessbecause I know what fixes are in-progress17:26
mthaddonlifeless: I don't, no :(17:27
lifelessmthaddon: then I'd say lets raise it back to 14 seconds on edge17:28
lifelessI know that most of the prod ones are the bug attachment script17:28
lifelessand it is in progress17:29
mthaddonlifeless: like this? https://pastebin.canonical.com/34844/17:29
lifelessmthaddon: could we run an analyze on staging at least, see how long it takes, and if it improves the query ?17:29
lifelessmthaddon: yes, that patch will raise the edge timeout.17:30
=== al-maisan is now known as almaisan-away
=== beuno is now known as beuno-lunch
mthaddonlifeless: it's hard to say if that will match production since the load on the DBs is so different though (having never been asked to do this before for LP is throwing up a minor red flag as "doing it wrong" as well)17:33
lifelessmthaddon: I'm positive stub has done analyze's to fix statistics many times. A greb of the lp-code logs will probably find some ;)17:33
mthaddonin any case, I'm pushing out the cowboy to edge with the higher timeout now17:33
lifelessthanks17:33
lifelessmthaddon: I'm curious how, since its the same revno ...17:34
mthaddonlifeless: I landed the branch that allows me to specify a revno17:34
Ursinhalifeless, lpnet oopses since 00utc: https://devpad.canonical.com/~lpqateam/lpnet-oops.html#time-outs17:34
lifelessmthaddon: \o/17:34
lifelessmthaddon: thats awesome17:34
mthaddons/specify a revno/specify a custom directory name/17:34
lifelessUrsinha: thanks17:34
Ursinhalifeless, same for edge and staging: https://devpad.canonical.com/~lpqateam/edge-oops.html#time-outs https://devpad.canonical.com/~lpqateam/staging-oops.html#time-outs17:35
nigelbbryceh: where is the code for it?17:41
nigelb(I wish bluprints had a comments area too for each action item)17:41
brycehnigelb, hang on I'm composing an email17:41
nigelbheh :D17:41
brycehnigelb, damn you're quick ;-)17:41
nigelbhaha17:41
mthaddonlifeless: timeout increased on edge17:47
mthaddonlifeless: although the config change hasn't been landed, so it'll be overwritten on next rollout unless that happens17:48
=== mtaylor|breakfas is now known as mtaylor
lifelessmthaddon: ok, can you do that too - or should I just land an r=mthaddon to increase it ?17:57
lifelessUrsinha: thanks17:57
mthaddonlifeless: r=mthaddon would be great, thx17:58
mthaddonlifeless: has it fixed the issue?17:58
=== beuno-lunch is now known as beuno
lifelessmthaddon: don't know18:06
lifelessmthaddon: wgrant may have gone to sleep18:06
lifelessmthaddon: yes its fixed18:08
mthaddoncool18:08
lifelessat least for leanne18:14
lifelessbut I think they're looking at the same think18:15
lifeless*thing*18:15
lifelesssinzui: bug 607879 - if you want to discuss with me, gimme a shout18:27
_mup_Bug #607879: https://bugs.edge.launchpad.net/~person/+participation timeouts <oops> <timeout> <Launchpad Registry:Triaged> <https://launchpad.net/bugs/607879>18:27
lifelesslosa ping18:57
brycehnigelb, ok finally got that email out18:58
Chexlifeless: hi there18:58
lifelessChex: hi, uhm channel confusion - query plan tweaking on staging18:59
Chexlifeless: ok, run that query on staging DB, then?18:59
lifelessChex: so an analyze of packagebuild on staging19:00
lifelessand then19:00
lifeless# explain analyze SELECT BinaryPackageBuild.distro_arch_series, BinaryPackageBuild.id, BinaryPackageBuild.package_build, BinaryPackageBuild.source_package_release FROM Archive, BinaryPackageBuild, BuildFarmJob, PackageBuild WHERE distro_arch_series IN (109, 110, 111, 112, 113, 114) AND BinaryPackageBuild.package_build = PackageBuild.id AND PackageBuild.build_farm_job = BuildFarmJob.id AND (BuildFarmJob.status <> 1 OR BuildFarm19:00
lifelesson staging19:01
daniloslifeless, you don't mind me doing the query I suggested two times on production slave? (i.e. you still have reasons to believe it would hurt us)19:01
daniloslifeless, fwiw, your query above was cut-off19:02
Chexlifeless: ERROR:  column "buildfarm" does not exist19:02
ChexLINE 5: ... BuildFarmJob.id AND (BuildFarmJob.status <> 1 OR BuildFarm)...19:02
lifelessChex: http://paste.ubuntu.com/465800/19:02
lifelessChex: top line19:02
Chexlifeless: oops, yeah pastebin is better, thanks19:02
Chexlifeless: http://pastebin.ubuntu.com/466583/19:03
lifelessChex: and you analyzed packagebuild first ?19:04
Chexlifeless: sorry, no I did not19:04
daniloslifeless, fwiw, I wasn't thinking of doing analyze on production DB :)19:05
lifelessChex: please do :) - there is a mismatch between rows=702 and rows=28253 in the middle of the explain19:05
lifelessthat jtv pointed out19:05
jtvdanilos: this is "analyze," not "explain analyze"19:06
danilosjtv, right, lifeless stopped me from doing explain analyze on production slave because it would "mess up caches on production DBs"19:07
lifelessdanilos: well no, you were saying something that I interpreted to mean 'drop caches'19:07
lifelessdanilos: which is rather different from 'run twice to eliminate cold cache effects'19:07
daniloslifeless, well, I was saying exactly this: "losa quick ping: hi, can you please check how caches on DB server affect executing a query at https://bugs.edge.launchpad.net/soyuz/+bug/590708/comments/8 (i.e. do it a few times on the same production slave DB)"; I'd never interpret it the way you did, but that's not up for debate :)19:08
_mup_Bug #590708: DistroSeries.getBuildRecords often timing out <api> <oops> <soyuz-build> <timeout> <Soyuz:Triaged by michael.nelson> <https://launchpad.net/bugs/590708>19:08
lifelessdanilos: crossed wires happen :)19:09
daniloslifeless, yeah, you were doing something very similar here so I guess that's where the confusion comes from ;)19:09
danilosanyway, Chex, can you please try the query above twice on a single production slave to compare the results?19:10
Chexlifeless: http://pastebin.ubuntu.com/466585/19:10
Chexlifeless: this look any better?19:10
jtvNope19:11
lifelesswell, 2 seconds better19:11
lifelessChex: so, to check - you did 'analyze packagebuild' then the query from the pastebin I linked earlier ?19:12
lifelessjtv: Nested Loop  (cost=0.00..11792.41 rows=2783 width=8) (actual time=0.068..2155.895 rows=904092 loops=1)19:12
lifelessjtv: thats another row expectation mismatch ?19:12
Chexlifeless: that is correct19:12
Chexthe analyze packagebuild then the query you pasted.19:13
lifelesscool19:13
Chexdanilos: ok, sure, hang on19:13
lifelesscan you please do analyze archive and analyze binarypackagebuild too19:13
Chexthen analyze packagebuild, then the query?19:13
lifelessanalyze archive; analyze binarypackagebuild; the query19:14
jtvlifeless: that looks like a mismatch, yes...  but that looks like a definite but19:14
jtvbug19:14
Chexlifeless: ok.19:14
jtvI mean, why a million rows there?19:14
Chexlifeless: http://pastebin.ubuntu.com/466592/19:15
lifelessoh!19:16
jtvAh, it's a highly unfortunate thing... those million rows are the Archive × PackageBuild records19:16
lifelessChex: lets do this differently.19:17
lifelessChex: analyze packagebuild; analyze archive; analyze binarypackagebuild; - outside of a transaction19:17
lifelessChex: we don't want to rollback, we want the analyzes committed19:17
lifelessI'm not 100% sure about the impact of analyze + rollback :-19:18
Chexlifeless: oh, fair enough, hang on19:19
Chexlifeless: done19:19
Chexlifeless: now try your query?19:19
lifelessnow, the explain query please :)19:19
Chexlifeless: http://pastebin.ubuntu.com/466597/19:21
lifelessok, well that fairly definitely answers that19:21
lifelessthanks19:21
lifelessdanilos: I've finished monopolising Chex for now :)19:21
lifelessjtv: I agree that that 900K loop finding 0 rows is an issue19:22
jtvIt was only expecting 2.0358 iterations there I guess.19:23
brycehnigelb, had a chance to try it out?  thoughts so far?19:23
jtvSorry, 20,35819:23
lifelessso there are lots of bpb records19:24
lifelessand pb records19:24
lifelessI guess19:24
lifelessplease tell me we haven't split out a common table to join that has 1:1 mapping to the table we filter on ?19:24
Chexdanilos: now _your_ query..19:24
danilosChex, mine should be quick ;)19:25
lifelessbigjools: are you still around ?19:26
Chexdanilos: http://pastebin.ubuntu.com/466604/19:27
Chexdanilos: note the much quicker run the 2nd time19:27
danilosChex, excellent, just what I suspected :)19:31
daniloslifeless, jtv: the times with the above query seem much better this time, see http://pastebin.ubuntu.com/466604/ :)19:31
jtvdanilos: why do you join PackageBuild twice?19:31
jtvI mean, not that I'm arguing against the speedup...  :-)19:32
danilosjtv, because I am smart :)19:32
jtvthat can't be it19:32
danilosjtv, that's the same trick we use in translations: note how packagebuild-archive join takes the most time in the original query19:32
danilosjtv, because it joins entire packagebuild with archive (across all rows); this forces postgres to avoid that so it's much faster :)19:33
jtvIt almost looks as if the original query intended something like this... two of the join conditions occurred double, just like with yours.19:34
lifelessits joining to workaround the split out19:35
lifelessI think the split out should not have been done at the table level: N separate tables with a common columnprefix19:35
lifelessdatabases tables are not classes :)19:36
daniloslifeless, I think both of these are much faster simply because the caches are already warm (because of your test :)19:37
danilosanyway, now I go away and will be able to sleep at night :)19:38
daniloscheers19:38
* danilos goes19:38
=== danilos is now known as daniloff
=== leonardr is now known as leonardr-afk
Ursinhasinzui, hello19:38
Ursinhasinzui, we had a bunch of oopses like https://lp-oops.canonical.com/oops.py/?oopsid=OOPS-1661XMLP111, MailingListAPIView19:39
Ursinhasinzui, I see there's a bug for this oops, bug 531371, which is already fix released19:39
_mup_Bug #531371: oops MailingListAPIView email already in use <mailing-lists> <oops> <Launchpad Registry:Fix Released by sinzui> <https://launchpad.net/bugs/531371>19:39
lifelessdaniloff: gnight. I think your query is very clever; It would be good for storm to do that for us19:40
sinzuiUrshina see my email about how I hate Launchpad developers who offer crap services for free19:40
Ursinha:)19:41
UrsinhaI will19:41
sinzuiUrsinha, I am sick and cannot help, the LOSAs corrupted the DB because they thought they were being nice to monte19:41
Ursinhaoh, argh19:41
lifelesssinzui: get well19:42
lifelessI forgot you were ill19:42
Ursinhasinzui, please, get well19:42
sinzuiUrsinha, There is a question tracking how to fix the data. Someone needs to kill the private project that should never have been mondified19:42
Ursinhaoh, that one with the pending mailing list? I see.19:43
Ursinhathanks sinzui, sorry to ping19:45
lifelessUrsinha: so, the grouping20:23
lifelessUrsinha: does it just use the exception type, or the exception type + the string ?20:23
Ursinhalifeless, exception type + value20:26
Ursinhalifeless, bug 46126920:26
_mup_Bug #461269: oops reports should be grouped by oops signature not exception type and exception value <OOPS Tools:Triaged> <https://launchpad.net/bugs/461269>20:26
Ursinhalifeless, I was about to leave to have some food20:28
lifelessciao20:28
lifelessI'm just opportunistic on asking stuff20:28
lifelessno need to hang around for me20:28
Ursinhalifeless, okay :) anything else, just ask and I'll answer when I return20:28
lifelesskk20:28
=== Ursinha is now known as Ursinha-nom
flacostelifeless: your lp:~lifeless/launchpad/soyuz mp diff is screwed up20:28
flacosteand lifeless, i had a test failures back from by ec2 land (feedparser branch)20:30
flacostethis is the fix I applied: http://pastebin.ubuntu.com/466624/20:30
flacostedo you have a better suggestion?20:30
lifelessflacoste: thats fine with me, its not hugely beautiful, but its not ugly.20:32
flacosteyeah, my feeling also, wondered if there was a better known idiom20:33
lifelessits essentially mocking; we could use an official mock, but it wouldn't be any smaller.20:33
flacosteright20:34
flacostewhat library would you recommend for mocking (unrelated to this branch, asking for another project)20:35
flacostedo you use something in bzr?20:35
lifelesswe don't routinely mock20:36
lifelessmocking has some risks20:36
lifelessand some rewards20:36
lifelessuhm20:36
lifelessfor your line there, even with a mocking library, I'd probably just do the lambda :)20:36
flacosteok20:37
lifelessfrom the school of 'simplest is often clearest'20:38
lifelessflacoste: speaking of reviews20:40
lifelessI got the queue down to 020:40
lifelessfor devel anyhow20:41
lifelessah the soyuz brnach is messed up because db-devel exists20:41
benjiflacoste: I've enjoyed using Gustavo Niemeyer's Mocker on another project (http://labix.org/mocker)20:43
benjimany other options at http://pycheesecake.org/wiki/PythonTestingToolsTaxonomy#MockTestingTools20:44
lifelessI much prefer verified fakes to mocks20:44
lifelessless skew20:44
lifelessbut this is not a late-at-night discussion I think; its been ... intense today20:44
flacostethanks benji20:44
benjilifeless: I suspect Mocker can do what you want; it's quite full-featured.20:45
lifelessbenji: the point is to not mock.20:45
lifelessbenji: so no, it can't :)20:45
benjiwhat do you mean by "verified fakes"?20:46
lifelessjust that20:46
lifelessa fake (not a mock or stub) that is verified to behave the same20:46
lifelessas a full implementation20:46
lifelesse.g. sqlite in-memory db's are a pretty good verified fake for disk databases.20:46
benjiok, Martin Fowler's definition of "fake"20:47
lifelessyes, I find the definition to be usefully precise20:48
marsrockstar, ping21:01
lifelesshmm21:27
lifelessmore count(*) taking ages21:27
lifelesshttps://lp-oops.canonical.com/oops.py/?oopsid=OOPS-1662EA43321:27
deryckok, I need to break for awhile.  Until later on then......21:33
lifelessflacoste: may need to think about making oops' critical rather than high... teams have lots of high already :)21:33
flacostelifeless: that was the idea of zerooopspolicy21:34
lifelessflacoste: I thought it said high, not critical21:34
lifelessyes, it says high on the wiki21:34
flacostehmm, ok21:37
lifelessflacoste: If the goal is 'in front of the queue', critical would seem appropriate to me.21:37
lifelessflacoste: but I wasn't part of the discussion for the policy, I don't want to just jump in here21:37
lifelesshttps://lp-oops.canonical.com/oops.py/?oopsid=OOPS-1662EA447 is fun21:43
lifelessanyone have ideas on what to do with *that* ?21:43
lifelessI guess its getting the wadl ?21:44
benjihmm; I did a rocketfuel-get and now my tests fail21:46
benjiis the devel branch broken?21:46
lifelessshouldn't be21:47
lifelessUrsinha-nom: when you return, if you could regenerate the edge oops-since-utc0 page, I think I've got bugs made for most of them now21:54
lifelessrinze: please set the MP status when reviewing as well22:17
rockstarmars, sorry, hi.22:25
=== leonardr-afk is now known as leonardr
lifelessI'm -> sleep22:29
lifelessgary_poster: so you know, edge is back up to 14 seconds, 12 seconds is past the knee and unsafe22:29
gary_posterheh, ok, thanks for update22:30
lifelesshttps://lpstats.canonical.com/graphs/OopsEdgeHourly/ shows it quite graphically22:30
lifelessprod is still unhappy - https://lpstats.canonical.com/graphs/OopsLpnetHourly/ -  but the pending fixes should make a dramatic difference to that22:31
=== flacoste is now known as flacoste_afk
lifelessand your pqm hack is on my kanban todo, but I've been bouncing from thing to thing all day.22:31
lifelesson the bright side I seem to have gotten past the stomach ache part of this lurgy, so I can actually concentrate again.22:31
lifelessand with that, I bid you all asnore.22:32
gary_posterthank you and good night22:35
wgrantCan someone please ec2 land https://code.edge.launchpad.net/~wgrant/launchpad/refactor-_dominateBinary/+merge/29667? danilos tried to do it last night, but apparently ended up starting two instances for the *other* branch.23:25
nigelbbryceh: trying out now.  Do I get to confirm on the upstream tracker before it gets submitted?23:38
brycehnigelb, yes23:39
brycehnigelb, btw do you find that aspect important?  I've considered eliminating that as an extraneous step if it isn't considered important23:39
nigelbSince I'm testing now, I'd find it important.  But when I'm using the tool, I'd find it extraneous23:41
nigelbI keep getting, "Sory produce xorg in ubuntu does not exist or you're not allowed to report a bug in it" :/23:42
brycehyeah try another package.  'xorg' isn't supported, but e.g. 'xserver-xorg-video-intel' is23:42
bryceh(there isn't actually an 'xorg' package upstream, it's a non-source debian package only)23:43
nigelbahh23:43
nigelbbryceh: same error with xserver-xory-video-intel23:52
brycehhrm23:54
brycehnigelb, ok try now23:56
brycehweird, I was sure I'd fixed that already23:56
nigelbbryceh: wow, just WOW23:59

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!