[00:04] atm [00:04] losa pin [00:04] lifeless: heya [00:04] ^ [00:04] thumper: is it on staging ? [00:04] lifeless: staging is down [00:05] thumper: well, spmdo works miracles [00:05] anyhow [00:05] the answer is: [00:05] - to profile [00:05] a) get it on staging [00:05] b) losa ping and ask them to enable profiling. hit the page. ask them to disable [00:05] and the profile ends up on devpad 3 minutes later [00:05] thumper: if its on edge, do you have a user-oops to look at perhaps ? [00:06] no, but I can create one with ++oops++ [00:06] :) [00:07] right, I'd do that - until we have a staging-daily environment its the best you can do until it has percolated to staging [00:08] * thumper nods === Ursinha is now known as Ursinha-afk [00:21] wgrant: ping [00:39] thumper: Hi. [00:39] wgrant: hi [00:39] wgrant: IBuildFarmBuildJob says that build can't be None [00:39] does that fit with your understanding? [00:39] because the database doesn't have that constraint [00:40] thumper: Indeed it doesn't. That's a bug. [00:40] But those tables are scheduled for removal as soon as Translations gets their job fixed. [00:41] ah... [00:41] what? [00:41] Hm? [00:41] I know there was a migration from some tables to some other tables [00:41] but I thought that this is the one we migrated to not from [00:41] because the recipe stuff is using it [00:42] The migration is half done. [00:42] (at least on production) [00:42] was the recipe stuff landed this cycle? [00:42] The remainder cannot be done until Translations does the first half. [00:42] which might be on db-devel only [00:42] Last cycle. [00:42] Production has the current build farm schema. [00:43] if this is a table that needs to go, and recipe stuff should not be using it, and it should have landed last cycle [00:43] why are we getting oops 1705EB2401 [00:43] Recipe stuff is using it. [00:44] but it shouldn't be? [00:44] It is part of the old queueing model. [00:44] SPRBs and BPBs have been ported to the new one. [00:44] But they still use the old one too. [00:45] Since TTBJs use the old one. [00:45] rockstar: btw, I'd really like us to keep the approval means 'human says ok' and use Queued for tarmac. [00:45] Once TTBJs use the new one, we can quickly stop using and remove the old one. [00:45] rockstar: really really really like us to do that. [00:47] thumper: So, the table is still in use, but is redundant. [00:49] wgrant: do you know if the TTBJs have been moved? [00:50] They haven't. [00:50] I was hoping it would happen this cycle. [00:50] But apparently not. [00:50] * thumper adds reminder to poke danilos [01:07] wgrant: whats DistroSeries:+templates all about ? [01:07] reminder to self; nag stub about PPR reliability, staging. [01:09] lifeless: Isn't that Translations? [01:09] ah, that would make sense [01:09] 29.25 - 99% completion time [01:10] How is that possible? [01:10] How can the 99% be above the timeout? [01:10] the magic of science [01:10] easily === Ursinha-afk is now known as Ursinha [01:28] hm [01:28] does someone remember where that dead easy graph generator was? [01:29] where you could graph any number on any website? [01:29] webnumbr? [01:29] poolie: yes, thanks [01:50] wgrant: around ? [01:52] wgrant: were you going to poke at https://bugs.edge.launchpad.net/soyuz/+bug/618372 or is it my imagination ? [01:52] <_mup_> Bug #618372: Distribution:+search slow 50% of requests [01:54] lifeless: I don't think I was doing anything about that. [01:55] I did look and cringe at the code and views a few weeks ago, but that was for other reasons. [01:55] wgrant: care to be enticed ? [01:58] I have four projects due tomorrow... so no, sorry. [01:58] wgrant: meep. get on them :) [01:58] Heh, maybe. [01:58] wgrant: Did you see this one - 19997.01launchpad-main-masterSELECT COUNT(*) FROM Bug WHERE Bug.private = FALSE [01:58] bah [01:58] thats 1 9997 [01:58] on https://api.staging.launchpad.net/1.0/bugs?assignee=xxx [01:59] Hah. [01:59] Awesome. [01:59] Any idea what's causing that? [01:59] not yet. [02:00] but I LoLed [02:00] wgrant: also, https://dev.launchpad.net/ArchitectureGuide may be interesting for you; as always I seek feedback on these things. [02:05] Hmm. [02:06] wgrant: as I say in the presentation, the metrics are noddy [02:06] but until we try, we won't have any [02:07] Launchpad's never really done the whole metric thing. [02:07] So it's a good start. [02:16] wgrant: so, distro:+search [02:16] it was the top timeout yesterday [02:16] Hm, that's good, considering how many other timeouts there used to be. [02:16] 57 / 13 Distribution:+search [02:17] 44 / 114 CodeImportSchedulerApplication:CodeImportSchedulerAPI [02:17] anyhow [02:17] you said it was terrible and should use <<>>> [02:17] did you have some advice for what it should use ? [02:18] wgrant: yes, we're getting the first rung of omg out of the way [02:18] wgrant: I'm nearly ready to lower the timeout again [02:19] in fact, I'll throw that up now [02:20] What are they sitting at now? [02:20] lpnet is 17 [02:21] I'll leave edge for now [02:21] hmm edge had 29 timeouts [02:21] drop both [02:21] Yup. [02:22] It would be nice to see how much the two cache tables actually win us. [02:23] Because the query there isn't really using them. [02:25] I can't see what's generating that query. [02:25] I'll poke some [02:25] perhaps do-from-scratch [02:25] Oh, right, there. [02:26] Distribution.searchBinaryPackages, with exact_match=True. [02:26] At that point it's really not using the only beneficial part of the cache. [02:26] (binpkgnames) [02:27] IIRC that search is just about the only user of the tables... so if you can make it work without them we can delete lots. [02:33] thumper: ping [02:34] spm: ping [02:34] lifeless: heyo [02:34] hey [02:35] that config change didn't merge [02:35] or something [02:35] and I have more now [02:35] also the diff updater is slow? [02:35] by which I mean, please check it doth not need shooting. [02:35] yeah. :-) started looking then got an alert that calamansi has gone awol - just chsing that down atm. I've decied I'm going to relearn forth. the stack based thing is more in tune with my reality/ [02:36] you could go the musical route [02:36] learn fifth [02:36] * lifeless boom tishes [02:36] * spm stares blankly at the screen and conceeds the point to lifeless [02:37] spm: I'm pretty sure the MP daemon is ded [02:37] 10 mins and 5 deep in interrupts. le sigh :-) [02:37] :< [02:37] are they all LP ? [02:38] alas no === Ursinha is now known as Ursinha-afk [02:39] yeah. looks wedged. about 30 mins. killing... [02:40] 20 mins. not 30. [02:40] lifeless: What are your performance targets? [02:40] For timeout and 99%. [02:40] wgrant: the same, and both were in the vision document ;) [02:41] wgrant: the immediate target is 5 seconds for 99% of requests [02:41] the long term one is 1 second, with hard timeout still at 5 seconds [02:41] (so right now they are the same, once we get below 5 seconds across the board it will be different) [02:41] lifeless: I didn't read the presentation. [02:42] wgrant: :P [02:42] Right, I was wanting the long term one. [02:42] wgrant: not even 2 months back ? [02:42] (I read the presentation just after the Epic) [02:42] Yeah. [02:42] yeah [02:42] lifeless: well you'll be pleased to know that it was either you or maxb that caused this problem <== science-less accusation [02:42] spm: ^^ [02:42] but those mp's should now be processed :-) [02:43] thanks [02:44] spm: ok, https://code.edge.launchpad.net/~lifeless/lp-production-configs/timeouts/+merge/34042 is updated; can you please (I know you're busy) eyeball the updated diff, so i can start the 1 hour lead-in to make the change happen ? [02:44] sure sure [02:45] lifeless: +1'd [02:46] lifeless: actually - I assumed that we have already dropped to equivalent values on staging as a trial? [02:46] staging is set to 10 seconds [02:46] the data I am going off is the last few days of oops reports for lpnet and edge [02:46] sweet; re-confirming the +1. ta. [02:46] because not enough people hammer staging for it to mean anything [02:47] :-) [02:47] I have a selection of hammers here I'm happy to lend? [02:47] small sledge, ball pein, couple of reular claws; japanese chisel (my personal fave); few others.... [02:48] rubber ducky, you're the one === Ursinha-afk is now known as Ursinha [02:51] lifeless: pong [02:52] thumper: I want your +1 on lowering the timeouts [02:52] thumper: since the new policy seems to say noone can JFDI even with sysadmin agreement [02:53] thumper: the branch which will need to be deployed is running through now (its not an LP branch) [02:53] lifeless: lowering which timeouts to what? [02:53] lpnet and edge hard timeouts, by 1 second each [02:53] do it [02:53] +1 [02:54] thank you [02:54] spm: that is in pqm now, all going well in ~ 60 minutes I'll be asking for a deployment to all appservers. [02:54] spm: if that is convenient [02:55] * spm consults the diary [02:55] spm: I can use the magic words if you want [02:55] "NOW"? [02:56] spm: NOW, Knave! [02:56] Knave. I like it. [02:56] but actually, NOW+60 :P [02:56] ${NOW}+60? [02:56] :> [03:07] what happened between 1500 and 1600 graph time yesterday ? [03:07] https://lpstats.canonical.com/graphs/OopsEdgeHourly/ [03:58] spm: help [03:58] zope.configuration.xmlconfig.ZopeXMLConfigurationError: File "/home/pqm/pqm-workdir/home/---trunk/launchpad/script.zcml", line 7.4-7.35 [03:58] ZopeXMLConfigurationError: File "/home/pqm/pqm-workdir/home/---trunk/launchpad/lib/canonical/configure.zcml", line 80.4-86.10 [03:59] ImportError: No module named debian [03:59] I got that merging the config change [03:59] make: *** [lib/canonical/launchpad/apidoc/index.html] Error 1 [03:59] blink [03:59] It means the pqm chroot's packages are out of date [03:59] * spm holds head in hands and cries a little. on the inside at any rate. [03:59] spm: can you smack or arrange a smack, of apt-get update; apt-get-upgrade in it ? [04:00] yah, one sec === Ursinha is now known as Ursinha-zzz [05:05] spm: hi [05:05] spm: https://bugs.edge.launchpad.net/soyuz/+bug/618372/comments/6 [05:05] <_mup_> Bug #618372: Distribution:+search slow 50% of requests [05:05] can you please rnu the query in that link on a prod slave [05:05] I've checked staging already, its 10x slower than stub found [05:06] I just need the \timing for it [05:09] db tuning: funrollloops for adults [05:11] wgrant: whould would searchBinaryPackages use the dspc ? [05:16] wgrant: also Distribution.searchBinaryPackages interface docstring lies! [05:30] lifeless: Why wouldn't it? [05:31] Ah. [05:31] It returns DSPCs, rather than DSPs? [05:31] yes [05:31] so its not as simple as 'make a DSP returning function' [05:31] lifeless: ew. fun. [05:32] spm: pweese [05:32] wgrant: distroarchseries searchBinaryPackages doesn't, it does something different [05:33] wgrant: I don't understand why they have or need different code [05:33] wgrant: though you may hate das - it uses BPPH, or is that ok ? [05:34] lifeless: Well, it might be that simple if you change the callsites to not be insane. [05:34] "may hate das"? [05:34] Oh. [05:34] DAS [05:34] Right. [05:35] lifeless: If DAS's implementation is fast, it's fine. [05:35] its not on the top candidates page [05:35] I'll shove it into D [05:36] DSPC is basically just a precalculated version of that join. [05:36] lifeless: prod1. Time: 173.932 ms [05:36] spm: grahfuck [05:36] repeats are 50ms... [05:36] spm: what pg version is staging running ? [05:36] 8.3 [05:36] ok [05:37] so there is something really quite different there >< [05:37] spm: that was the one in my comment, comment 6 ? [05:38] yup [05:39] thanks, appreciated. [05:39] * lifeless subscribes stub again [05:44] wgrant: also, DAS has no code reuse at all. sob. [05:45] lifeless: is it worse that foo.specifications ? [05:45] *than [05:46] yes [05:46] specs at least has a mix in I could pull code into [05:46] impressive [05:46] DAS was stuck between Soyuz and Registry for a while. [05:46] It's not well-loved. [05:46] it appears positively hated [05:46] but fast (or unused) [05:47] https://edge.launchpad.net/ubuntu/+search?text=mplayer is still boom on prod [05:48] oh and thats interesting [05:48] distroseries search is totally differnet [05:48] \o/ [05:48] * lifeless needs a drink [05:51] wgrant: is that a bug? [05:52] anyhow, there would be a different if I drop DSPC - rather than one row per source package it would be N rows - one per binary [05:52] so, I'm going to assume thats actually the desired win for now [05:52] and try and ask bigjools about this later [05:53] lifeless: Can't you just get distinct DSPs? [05:54] wgrant: *think* [05:54] lifeless: Hm? [05:55] wgrant: oh, hmm. [05:55] yes, should be doable [05:55] DAS.sBP returns BinaryPackageRelease, [05:56] wgrant: you asked about 99% stuff and timeouts before [05:56] wgrant: I forgot to answer [05:56] the time at which 99% of requests complete can be over the timeout if either: [05:56] - enough requests fail (e.g. 2%) [05:57] - the requests are completing with high durations as soft time outs due to bugs in the timeout trapping code (or no queries happening after the timeout) [05:57] stub: hi! [05:57] stub: got a few minutes? [05:57] lifeless: I don't see how it's possible unless the timeout stuff is buggy. [05:57] Since nothing can finish after the timeout -- it would have timed out. [05:58] wgrant: the timeout code works by checking for a timeout at predefined points. [05:58] currently that predefined point is 'sql queries within publication' [05:58] yo [05:58] lifeless: Ah, so an SQL query won't be terminated? [05:58] wgrant: it will, because we tell the server the remaining time [05:58] Ah. [05:58] wgrant: but there are caveats - pg doesn't cancel queries until you get the locks yoiu're waiting on [05:59] wgrant: and if stop querying and go into python, or librarian, or email time, you won't time out. [05:59] these are bugs. they will be fixed. [05:59] Ew. [06:00] stub: so, I have a bug you looked at that puzzles me on staging; https://bugs.edge.launchpad.net/soyuz/+bug/618372 and... [06:00] <_mup_> Bug #618372: Distribution:+search slow 50% of requests [06:00] stub: PPR is failing very often - it seems to run sodium into swap, which triggers a meltdown [06:00] and [06:00] stub: staging is failing waiting for a slony timeout, so its now not updated since sunday. [06:01] stub: I have no idea how these slot into your priorities [06:01] I got the PPR email. I haven't seen it go that big. We can fix that but it will be slower generating reports (how much? Dunno until it is done) [06:02] wgrant: so its not hugely ew; unless we raise a signal right on the timeout its always going to be a little fuzzy [06:02] Staging probably means the slon daemons died. Restarting the process should sort things, but I was going to have a look now. [06:02] wgrant: (and doing that has *serious* hair on it) [06:02] stub: its died the same way three times in a row [06:03] wgrant: so if its approximate, even without massive-overruns, all you need is 2% timing out and the 99% point will be outside the timeout. [06:03] ok. no idea what happened then - the email I had wasn't enough to diagnose. [06:04] stub: I say this because I've seen the same pastebin contents +- the list of exact what is attached, on tuesd morn, wed morn & today :) [06:08] That is odd, because the only log on the system is from the pastebin I got emailed yesterday. [06:08] So the restore has been attempted once and it failed, or the logs are disappearing or this was from manual runs [06:12] is https://pastebin.canonical.com/36551/ what you got yesteday ? [06:12] that happened overnight from my perspective [06:13] oic. This is a code only update, and the slon daemons have died for whatever reason so it can't apply new patches to the existing database. [06:14] So the code only update needs to bouce the replication daemons to ensure they are running [06:14] jtv: bug 618393 - you say it would be hard to do for individual translators - could you expand why? it might be like the one I did for Bug.userCanView (which grants EDIT) [06:14] <_mup_> Bug #618393: TranslationGroup:+index slow 1-2% of requests timing out [06:14] brb [06:15] spm: /srv/staging.launchpad.net/staging/launchpad/database/replication; LPCONFIG=staging ./slon_ctl.py stop; LPCONFIG=staging ./slon_ctl.py start [06:16] spm: I've run that so the code update should work now. [06:16] stub: context? that needs to be in the staging restore script? [06:16] oh right. ta. I'll rm the lock [06:16] spm: So we must not have a nagios check on the staging replication lag [06:17] cat /etc/nagios/nrpe.cfg :-) [06:17] nope === jamesh_ is now known as jamesh [07:04] spm: Do you know if there is movement on the PG 8.4 on staging request? [07:05] stub: that's stuck on me atm, and no is the short answer; I plan on going into an irc hole of /ignore * tomorrow to make it happen - get the packages ready as in. [07:05] Ta [07:06] ... not helped by codebounce wanting all the swap it can get it's grubby mitts on [07:06] spm, lifeless: One of the things that is improved every PG release is the query planner. Some of the issues we are looking at the moment might just evaporate. [07:06] ^^^ boundless optimism on display from the DBA [07:06] (Oooo! it rhymes!) [07:07] If I can keep 'that should be fixed in the next release' up until retirement, I'll be sweet. [07:07] stub + sweet. the mind rebels. === henninge_ is now known as henninge [08:52] good morning [09:03] spm: have you EOD'd ? [09:03] I'm still getting ZopeXMLConfigurationError: File "/home/pqm/pqm-workdir/home/---trunk/launchpad/lib/canonical/configure.zcml", line 80.4-86.10 [09:03] ImportError: No module named debian [09:03] landing the config change [09:03] losa ping ^ [09:05] '/home/---trunk/' [09:05] ---? [09:05] pqm glue [09:05] Yay. [09:06] wgrant: do you recall the cause of that error ? [09:06] wgrant: its stale package versions right ? [09:06] mthaddon: ping [09:06] one sec - in a meeting [09:06] mthaddon: I'll write here, take it up when you get free :) [09:07] the above error is occuring trying to land a change to launchpad-production-config; AIUI landing those configs does a full buildout, so the remaining cause is a stale apt package for (IIRC) python-apt, or something like that. [09:08] There should be new packages in CAT; spm had updated this earlier today, we thought. [09:08] lifeless: Or stale sourcedeps. [09:08] I would deeply deeply appreciate it if we can fix this so that the production-configs change can land [09:08] Hm, actually, python-debian is a package now. [09:09] So, yeah, old version of the package. [09:09] so it may be that PQM is pulling in an old sourcedep [09:09] e.g. if for some reason its using a config-manager config rather than buildout [09:09] (I suspect it may be) [09:09] then, that config may be stale and need to be updated to use the python-debian package rather than the sourcedep. [09:09] lifeless: sorta - was on the weekly with tom [09:10] I will pop back later to offer what assistance I can [09:10] spm: mthaddon: sorry for interrupting the meeting [09:10] so what update to launchpad-dependencies do we think will fix this? [09:10] oh right. so we updated the packages in the chroot for pqm on prasé that should have pulled in the latest hotness of everything [09:10] so, its not the packages [09:11] its probably the manner in which it sets up 'sourcecode' ? [09:11] mthaddon: https://pastebin.canonical.com/36602/ is the list of what was updated [09:12] I don't see python-debian there [09:13] (pqm-hardy)pqm@praseodymium:~$ dpkg -l | grep python-apt [09:13] ii python-apt 0.7.4ubuntu7.5 Python interface to libapt-pkg [09:13] er... [09:13] no [09:13] python-debian [09:13] sorry, wrong package [09:13] ii python-debian 0.1.9 python modules to work with Debian-related d [09:13] whats that launchpad-dependencies package version ? [09:14] 0.72~0.IS.8.04 [09:14] current is 0.81 [09:15] that's not been installed on any of the servers yet [09:15] but nothing in the changelog looks relevant [09:15] and if this works on the servers its clearly not the issue [09:15] mthaddon: can you pastebin me the pqm stanza for production-configs ? [09:15] hrm. that bb amis rt I'm working on may be relevant to that version.... [09:15] yeah, just finding it now [09:16] hrm. no. 72 should be fine. [09:18] wgrant: can you check your sourcecode dir [09:18] tell me, do you have a python-debian subdir ? === danilo_ is now known as danilos [09:20] lifeless: I don' [09:20] t [09:20] wgrant: ah, you're on a recent release [09:21] wgrant: look in lib, it has deb822 and debian symlinks [09:21] lifeless: Yeah, I'm not sure they're used any more [09:21] they are new [09:21] rev 11324 [09:22] Ah. [09:28] mthaddon: spm: Thanks for the help. [09:28] sure [09:28] stub: when is pg8.4 realistically happening; I don't want to wait another cycle to fix the batch of issues. [09:29] stub: but I could; I *really* don't want to wait another 2 cycles. [09:30] lifeless: It will take 1 week after we are happy with staging - swap over one box per day. [09:30] stub: Rather than waiting for the query planner to magically be fixed, shouldn't we be reporting problems? [09:30] lifeless: It is on a separate cycle to rollouts [09:30] lifeless: np [09:30] stub: ok, thats good to know. [09:32] wgrant: I haven't actually seen problems with the query planner. I have seen issues where it is not smart enough. [09:32] wgrant: if the planner isn't smart enough in 8.4 they would reasonably say 'try 9' [09:33] wgrant: we should look at bringing up a trunk pg box to do such evaluations on [09:33] but not just yet [09:33] stub: Well, one could argue that a completely dumb planner is just not smart enough. [09:34] ok, pqm is now trying that -again- [09:34] we'll know in ~ an hour [09:34] wgrant: Yes. But it isn't like upstream is doing nothing - I mentioned PG 8.4 before *because* of the insane speed upstream moves at. [09:34] wgrant: Its been rare to trip over a PG issue that hasn't already been fixed. [09:35] stub: Ah, I see. [09:35] ... twitter: When you click on these links from Twitter.com or a Twitter application, Twitter will log that click. We hope to use this data to provide better and more relevant content to you over time. ... [09:36] not evil. no, not at all. [09:36] I think the issue with GIN indexes is one that hasn't been dealt with that will help us [09:36] And there is an open bug on that. [09:37] lifeless: Doesn't just about everybody do that now? [09:37] Google, Facebook... [09:37] lifeless: Which is pretty stupid, as every twitter client I have seen goes through a URL shortening service. [09:38] noodles775: Bug 628427 looks relevant. [09:38] <_mup_> Bug #628427: oops on /builders - LocationError(SourcePackageRecipeBuildJob, 'build') - BuilderSet:+index [09:38] (morning) [09:39] * noodles775 looks. Hi :) [09:39] stub: they are going to unwrap it [09:39] stub: store it, and rewrap it, AFAICT [09:42] wgrant: Indeed, I've updated 628239. [09:45] lifeless: Assuming URL shortening services play ball. [09:45] stub: indeed, but they just announced this so I assume their ducks are lined up [09:45] I guess difficult not too [09:46] I heard some time ago they were planning this? [09:47] yeah [09:48] I mean they've done their 'mail all users' anouncement [09:48] I recall discussion on using this to bypass character limits, so http://someservice.com/lets/have/our/tweet/in/a/url would... erm... I forget why. [09:53] noodles775: The expandy bit for PPA source packages lists binaries (without links), then later links to all of the files. Is there any particular reason that it doesn't instead have "(i386) (amd64) (lpia)" links next to each binary package? [09:53] It would save space and probably be more understandable. [10:06] bigjools: Do you know much about lp.archivepublisher, specifically its test suite? [10:06] allenap: lots. But I am OTP, I can talk in 30m [10:06] bigjools: Cool, thanks :) [10:07] allenap: or you can grab one of my esteemed team [10:09] bigjools: Sure, anyone in particular for lp.ap? My questions are about test layers and possible weird interactions with zope.component. [10:19] jelmer: Do you have time to help me with a problem in lp.archiveuploader? [10:19] allenap, sure [10:20] jelmer: In 4 tests of test_pool.TestPool, I get "Could not adapt" errors for some new cachedproperty code I've done. [10:21] jelmer: http://pastebin.ubuntu.com/487147/ [10:21] It's a plain TestCase. [10:21] It has no layer. [10:21] So no Zope. [10:21] wgrant: Yeah, but the adapters are also registered with the global site manager when propertycache is imported. [10:21] allenap: Is there a global site manager? [10:22] wgrant: Always. [10:22] wgrant: Well, zope.component.getGlobalSiteManager() will always return something :) [10:23] Hmm. [10:23] wgrant, jelmer: One oddity is that these tests run fine locally. On EC2 they fail. I think there's something weird going on when the full suite is run. [10:24] allenap: you could consider nuking the zope aspect of this [10:24] allenap: I don't think it would make any difference to the spelling [10:24] lifeless: Of propertycache? Yeah, that's on my mind :-/ I'd like to try and understand why it's not working if I can. [10:25] lifeless: Yeah, agreed. It's just very frustrating and confusing :) [10:25] allenap: I'm at a loss so far, my thought would also be that it's related to layers but then it shouldn't work locally either... [10:27] jelmer: Yeah, bloody odd. Okay, I'll email the list to see if anyone has inspiration, and remove the Zope-ish bits so this can land. Thank you :) [10:28] Graaaaah. [10:29] archiveuploader, your lack of test coverage and corresponding missed brokenness distress me. [10:31] The relevant test elides the important bit. [10:31] Gah. [10:32] wgrant: Gah indeed :) [10:33] wgrant: 'gahk' [10:35] \o/ [10:35] lifeless: That too, if it's the sound of choking on doctests. [10:35] https://bugs.edge.launchpad.net/ubuntu/+bug/1 didn't time out [10:35] <_mup_> Bug #1: Microsoft has a majority market share showing all 1327 pages did [10:36] Which pages? [10:37] comments [10:37] blah [10:37] I knew what I meant [10:37] Ah. [10:37] They should be paginated :( [10:37] lets see what the query count is down to [10:39] hmm [10:39] 1327 comments in the summary [10:39] comment 1350 is the end one shown [10:39] methinks there is a bug [10:40] anyhow, at least that is down to 408 queries :P [10:41] hmm, assignments still is right on the edge [10:41] allenap: hi, did someone answer your question? [10:41] At least 76 queries issued in 13.06 seconds [10:42] bigjools: Yeah, jelmer helped out. It baffled him too, so I'm taking lifeless's advice, which is to remove the Zopeish bits for now. [10:42] ooh thats nice, bug index in 1.17 seconds now [10:42] allenap: I didn't really pay attention to what you were talking about, so that doesn't mean much to me :) [10:42] for a more regular bug [10:43] lifeless: There may be "deleted" comments... I /think/ they're still counted so that urls are stable. [10:43] allenap: ahh [10:43] anyone else feel bug pages are a bit snappier today ? [10:43] on edge, since it deployed [10:44] bigjools: I'll explain if you're interested? Part of my plan was also to write to the list so you can wait for that if you want. [10:44] allenap: oh ok I'll wait, don't explain twice [10:45] bigjools: Cool :) [10:47] all of bug 1's comments can be retrieved in 2.5 seconds [10:47] <_mup_> Bug #1: Microsoft has a majority market share wgrant: ^ [10:47] lifeless: Wow! [10:47] wgrant: theres not reason to paginate, we should be able to spit it out easily [10:47] SQL time: 2550 ms [10:47] Non-sql time: 14016 ms [10:47] Total time: 16566 ms [10:47] Statement Count: 408 [10:47] Impressive. [10:48] with some more tuning [10:48] I reckon we can take that under 2 seconds sql [10:48] next step is a profile on staging, which will be tomorrow - need the patch in db-stable [10:49] it also does some daft stuff - grabs all the official bug tags for all related things [10:49] rather than joining and getting the relevant ones [10:49] thats trivial, but still linear vs log [10:52] mthaddon: we run the default threads per appserver right? [10:52] mthaddon: what would be in the impact on you & deployment if I asked for a change to 1 thread per appserver instance and 4 times the appserver instances. [10:52] mthaddon: same hardware. [10:53] lifeless: it'd be a real maintenance headache (4 times as many initscripts), but it'd be doable [10:54] mthaddon: there is mounting evidence that we're having cross thread interaction [10:54] not bugs, just python being python [10:54] you're saying python can't do threads? [10:54] python can't do threads [10:54] well known fact [10:54] * mthaddon gets a poster made up [10:55] Well, CPython can't do threads. [10:55] if we were primarily waiting on db, it wouldn't matter [10:55] but reality is we're getting that sorted fairly well [10:55] wgrant: I'm not sure I'm visualising what you mean (re. the build/package links on the package details expander) [10:55] i expect db load to go down over the next few months [10:56] noodles775: I want to remove the binaries from the file list, replacing them with links in the 'Built packages' section. [10:56] wgrant: yes, thats true. But its what everyone means when they don't say 'python the specification' [10:56] except perhaps 4 or 5 people [10:57] Heh. [10:57] lifeless: we benched this in montreal as part of the splitit project, and certainly at the time, there was no evidence that we were being hurt by gil contention [10:57] That will hopefully change soon. [10:57] elmo: I didn't say anything about gil contention [10:57] while, I'm sure a lot's changed since then, this is not the kind of change I want to run in without some evidence not only that it won't hurt but that it'll help [10:57] elmo: I'm talkin about gil serialisation which is the primary gil issue [10:57] lifeless: then s/contention/problems with python and threads/ [10:58] elmo: naturally. I'd want an oops-per-instance report; then try one server with the different config and see if the oops rate changes more than due to the splitting of the instances [10:58] plus PPR stats of the same [10:58] what I mean is, specifically, we messed around with number of app servers, number of threads per app server quite a lot [10:58] elmo: what application [10:59] elmo: because if it was the SSO the results are meaningless for this [10:59] it was SSO and bugs [10:59] bugs? [10:59] bugs are problems in software, but that's not important right now [10:59] ok [10:59] lifeless: no, seriously, I mean malone/bugs.launchpad.net - whatever you want to call it [11:00] so the reason the SSO stuff isn't relevant is that it isn't pushing 15K responses around [11:00] elmo: I didn't know you benched malone [11:00] elmo: do you have the data -raw or massaged- for me to look at? [11:01] elmo: when a request takes 5 seconds to render, most of that time is pure bytecode interpretation [11:01] which is primarily serialised [11:01] so a 7 second request with 2 seconds db and 5 seconds render [unloaded] [11:02] means, effectively, ~4 seconds during which the other threads can do nothing - but its not in one big block because the GIL gets released [11:02] lifeless: I'm sorry, i don't have any of the data to hand - what there is may be on the wiki if you search for 'splitit' [11:02] right [11:02] so wait, how is this not about contention then? [11:02] if the timeout is near the requests actual render time then, two of these at once will time each other out. [11:03] i.e. what are you trying to solve by having one app server? [11:03] parallelism [11:03] responsiveness more precisely [11:04] so, the problem with all of this is that app servers are not exactly slim and svelte [11:05] 8056 launchpa 20 0 814m 510m 9512 S 19 8.5 729:46.99 /usr/bin/python -S bin/run -i lpnet7 [11:05] elmo: much of the footprint is per-thread, in theory. [11:05] we can't actually have 16 of those [11:05] elmo: of course not. [11:05] well we'd want 16? [11:05] if you run up a single thread appserver and put it under load, you can see how big it will be [11:06] we don't want 16 of *those*, we want N of *those/N* where N is the current threadcount (4 for launchpad) [11:06] they won't drop quite linearly in size [11:08] elmo: https://wiki.canonical.com/TaskForce/SplitIt/PerformanceSprint doesn't talk about bugs at all, only shipit and sso :( [11:08] elmo: a key finding: but the average response time increase as the number of concurrent users increase. [11:09] wgrant: I don't see how you could move all the "Package files" (ie. 3 per build) next to the one built package? (can you scribble on a screenshot? it might make this easier. I've confidence that what you want will make sense ;) ). [11:09] elmo: thats exactly what I'm worried is strongly affecting LP [11:09] noodles775: My though was to stick '(i386) (amd64) (lpia)' links next to each package name. [11:10] wgrant: OK, what about when the package files are no longer published (ie. we'd want the builds to still be linked)? [11:12] elmo: ok thanks I read the wrapup [11:12] elmo: what I read correlates with the theory I'm putting forward [11:12] elmo: if I'm right, even halving the thread count and doubling the appserver count would be an improvement [11:13] elmo: there is no suggestion, in the benchmarks, that you tried 1 or 2 threads per appserver. [11:15] noodles775: http://williamgrant.id.au/f/1/2010/listing-archive-extra.png [11:16] noodles775: They're not the build links. [11:18] wgrant: Yes, I find that more intuitive (thinking as a user after the binary package). [11:21] noodles775: Right. I'm looking at this since a friend complained how confusing it was. [11:24] wgrant: as a user, I'd want those links (to the debs) to be available from the main PPA page. There's certainly room for a column with the built debs (when available) at https://edge.launchpad.net/~wgrant/+archive/experimental [11:25] noodles775: Well, PPA +index needs a bit of a rethink. [11:26] wgrant: indeed... there was some thought put into it about 6 months ago, but never the time to actualise it :/ [11:34] Can I change PPA key generation to always use 'Launchpad PPA for '? At the moment it will be named after the display name of your first PPA and then shared with the rest, which is almost always not what you want. [11:34] +1 [11:34] bigjools: briefly, before I crash [11:34] bigjools: distro/+search [11:34] yarp? [11:34] distro/series/+search [11:35] distro/series/arch/+search [11:35] all return different objects [11:35] the /arch/ one I can understand [11:35] distro/+search is performing very badly [11:35] near-top timeout [11:35] by volume, yesterday [11:36] I'm wondering if you want them harmonised, and if so which objet - DSP or DSPCache is appropriate (assuming that the performance is acceptable either way) [11:37] lifeless: I'd need to think about it. I don't recall the use cases for /arch and /series and it might be useful to talk to Ubuntu folks first. [11:38] Do we log who uses pages? [11:38] bigjools: so the thing I really care about is DSP/DSPCache, and AFAICT they are meant to be approx equivalent in UI [11:38] lifeless: I think DSPCache was invented for performance reasons, if we can do without it I'd be ecstatic since the script that generates the cache only runs daily. [11:38] wgrant: effectively, yes. [11:38] bigjools: ok, I will have a fiddle on staging tomorrow [11:38] bigjools: also I have a favour to ask [11:41] lifeless: can I breathe yet? :) [11:41] bigjools: the favour is this: have a look at [11:42] https://lpstats.canonical.com/graphs/OopsEdgeHourly/ [11:42] and [11:42] https://lpstats.canonical.com/graphs/OopsLpnetHourly [11:42] about 70 minutes after mthaddon finishes the timeout CP [11:42] I was going to monitor this myself, but an orthogonal issue prevented the CP until now. [11:42] what are you CPing? [11:42] if it has gone ballistic, roll it back [11:42] bigjools: 1 second timeout drop [11:43] /o\ [11:43] ok [11:43] on soft or hard or both? [11:43] hard [11:43] soft gets tweaked a little to match other reporting tools [11:43] lpnet and edge [11:44] I don't expect a disaster, or I wouldn't be doing it. [11:44] but if it is terrible, we have the tools to undo it rapidly. [11:44] mthaddon: can assist rolling back the timeouts *if* needed. [11:44] bigjools: are you up for this ? [11:45] lifeless: I'll monitor it for an hour or so then I have to go out [11:45] I can pass the baton to someone else [11:45] bigjools: oh, and as an expectation - 50 hard oopses an hour is tolerable [11:46] 200 hard oopses an hour - rollback [11:46] good data point, thanks [11:46] thank you! [11:46] * lifeless crashes [11:46] sleep well [11:46] Night lifeless. [12:03] Morning, all. [12:08] Morning deryck :) === almaisan-away is now known as al-maisan [12:40] gmb: I have a faint recollection that ProductWithLicenses was one of yours. Is that right? [12:42] jtv, No, mot I. [12:42] *not [12:42] I'll ask bzr then.. [12:42] * gmb ;unches [12:42] * gmb learns to type [12:43] ah, 'twas EdwinGrubbs [12:44] EdwinGrubbs: very minor bug in ProductWithLicenses. [12:51] bigjools: http://bazaar.launchpad.net/~wgrant/launchpad/ppa-key-name/revision/11486 [12:55] heh ""Not Celso Providelo"" [13:06] bigjools: So, does it look reasonable? [13:06] I can't think of a reason why it should not land [13:06] OK, I'll propose it then. [13:06] Thanks. [13:06] cheers [13:07] * bigjools -> lunch [14:05] wgrant: I got your script working in a virtualenv with the new launchpadlib; I did have to make some small changes: http://pastebin.ubuntu.com/487228/ [14:05] I'm going to see if the changes I made indicate bugs in the latest launchpadlib or if they were intentional changes [14:06] Ah, great. [14:06] speed-wise, in unscientific testing I got no meaningful difference in total run time [14:30] Is sourcepackage one word or two? I keep forgetting what we prefer? [14:31] It's capitalised as two. [14:31] There have been a few debates around this. === Ursinha-zzz is now known as Ursinha === kiko` is now known as kiko [15:02] adeuring, almost done on your review. Some questions, just to be clear, though.... [15:02] adeuring, there is no public api change here, right? [15:02] deryck: yes, the api did not change [15:03] adeuring, and concerning the bits of the webservice test you added, the main reason for that test is to verify the app server is serving the librarian file? [15:03] deryck: right [15:04] adeuring, so the bits that would fail if this were not true are the X-Powered-By header line? The rest is just setup? [15:06] deryck: yes; thoguh the X-powered-by header is not really important; the point is that "websevice.get(...)" returns the file data [15:06] deryck: a corresponding test some lines up in that test connect to the Librarian in a completely different way to read a publci file [15:07] gotcha [15:07] adeuring, so what is the final webservice.patch for in the diff? [15:08] deryck: some other tests later in that file need a public bug [15:09] adeuring, ah, because it's sample data based. So I think we could expand the doc portion of this doctest to make this clear. [15:10] deryck: i'll add a note [15:10] I do realize the webservice tests are kind of a weird page/doc test amalgam, but it wasn't obvious to me. [15:17] adeuring, so lifeless concern about this was that by serving files from the app server, we will most certainly have timeout issues now with private attachments via the API? [15:18] that was a question, i.e. is that ^^ correct? [15:18] deryck: it could be. But I am not 100% sure: IIRC, the timeout exceptions are raised somewhere "near" the storm layer -- but once we have the LFA ready, the request does not touch DB objects anymore [15:22] adeuring, so this wasn't lifeless concern then? [15:22] what was? [15:22] deryck: no, as i understood him, these timeouts were his concern [15:23] ..perhaps aside from additional load for the app servers [15:24] adeuring, and you're suggesting that you're not sure you agree that timeouts will be an issue? [15:24] deryck: well, we should test that ;) Should be easy: We just need to upload a sufficiemtly large file to the staging librarian and then to download it again via a sufficiently slow connection [15:25] adeuring, at any rate, we really don't have another option, short of the "enable retracers in dc" option that we cannot do, right? [15:26] right. and regarind testing. there is even this linux kernel feature to throttle connections, about which I always forget how to use it [15:32] adeuring, ok. I'm about ready to r=me this. I asked gary_poster to take a look over my shoulder, just to make sure I'm not missing something. [15:32] deryck: ok, thanks! [15:32] sorry for paranoia, but I want this to go in without further problems. [15:33] adeuring, you could go ahead and ping losa to cowboy the patch to staging and performance test it, since that is a primary concern from lifeless. [15:33] deryck: ok, though I think testing it tomorrow should fine [15:35] adeuring, well, ubuntu beta is today and they need the retracers running. I'd like to end today telling them to point the retracers at edge and go for it, but I want to have confidence this will work. [15:35] deryck: ok [15:48] deryck: could you please add your review to to mp? [15:48] adeuring, yes, sorry. [15:48] npmp ;) [15:56] adeuring, you're ready to go now. See my comments/reminder about the staging test. [15:56] deryck: thanks! [15:56] np! === matsubara is now known as matsubara-lunch === Ursinha is now known as Ursinha-lunch === salgado is now known as salgado-lunch [17:03] rockstar, here are the permission changes that are giving me grief: http://pastebin.ubuntu.com/487320/ [17:04] rockstar, or lp:~abentley/launchpad/recipe-interfaces === benji is now known as benji-lunch === Ursinha-lunch is now known as Ursinha === matsubara-lunch is now known as matsubara === beuno is now known as beuno-lunch === salgado-lunch is now known as salgado === benji-lunch is now known as benji === deryck is now known as deryck[lunch] === gary_poster changed the topic of #launchpad-dev to: Code hosting offline 8.00-9.30 UTC on Friday 3rd September for unexpected hardware maintenance. http://is.gd/eRMxF | Launchpad Development Channel | Performance Tuesday | Week 3 of 10.09 | PQM is OPEN | firefighting: - | https://dev.launchpad.net/ | Get the code: https://dev.launchpad.net/Getting | On-call review in irc://irc.freenode.net/#launchpad-reviews [18:59] lifeless, you around today yet? === al-maisan is now known as almaisan-away === beuno-lunch is now known as beuno === deryck[lunch] is now known as deryck [19:53] jcsackett: hi [19:53] lifeless: heya. [19:54] so, EdwinGrubbs and I have looked into what was plaguing me yesterday. Edwin may have found a fix for the test: http://pastebin.ubuntu.com/487390/ [19:54] so, we may now be good. when i pinged you earlier, still no luck. [19:55] i do have the one extra query isolated, if you're interested. [19:55] what is it [19:55] looking at your patch [19:55] the flush+reset would need an explanation at a minimum [19:56] https://pastebin.canonical.com/36645/ [19:56] lifeless ^ [19:56] and yeah, this is without comments; just seeing if it passes in the full suite. [19:58] that looks like _init to me [20:01] what happens is that _init - a storm hook - is called *before* the variables are assigned [20:01] Person._init checks self.teamownerID [20:02] and storm does an on-demand query to populate the attributes [20:03] * jcsackett nods. [20:03] so this problem is already fix-committed, right? [20:04] the storm bug is yes [20:04] fixing it though, uncovered a nastier one [20:04] https://bugs.edge.launchpad.net/storm/+bug/620615 [20:04] <_mup_> Bug #620615: please contact the developers [20:05] bug 628762 [20:05] <_mup_> Bug #628762: propertycache adaption failures in test suite [20:05] allenap: jfdi ;) [20:09] oops graphs are looking good [20:12] lifeless: that does look unpleasant. [20:12] slight increase on prod, edge seems unaffected \o/ [20:28] james_w`: ping [20:29] hi lifeless [20:29] hiya [20:32] james_w`: so, swallowing eh [20:33] yeah, I don't want to trigger exceptions [20:33] I was starting to reply [20:33] I figured I'd chat [20:33] usually means that something has gone wrong, and so I don't want my tests to pass [20:34] cleanUp is used from two locations [20:34] as a cleanUp [20:35] and from reset [20:35] now, cleanUp isn't nestable [20:35] I mean TestCase.addCleanUp - [20:35] cleanups are regular work-or-throw functions [20:35] they can't represent 2 exceptions [20:36] what you have to do is to have two cleanups [20:37] so, if TestCase.addCleanUp could in some way handle functions that want to blow up many times [20:37] we could change the fixture protocol to match [20:38] hmm [20:38] why not mandate that cleanups can be run multiple times? [20:38] lets say we make TestCase.addCleanUpExt [20:38] it takes things that yield exceptions [20:38] and has an adapter that is used by addCleanUp [20:38] to turn a regular function into a yields-exc-info [20:39] james_w`: because for existing trivial cleanups its undefined whether thats safe or not [20:39] james_w`: and for many, it won't be, or will raise the same way every time [20:40] james_w`: its also less general: it only helps with cleanUps that happen to be a loop-on-a-list [20:40] mmm, now what about reset [20:41] it will need to signal exceptions in the same way as cleanup [20:42] (whether thats raise a MultiException with a list of exc_infos, or return an iterable, I dunno yet) [20:42] it also needs to signal 'setUp failed' [20:43] probably any different exception would be sufficient for that - e.g. user caused exceptions are fine. [20:44] testresources optimisation should stop I guess if it sees reset fail, and skip all the tests with that fixture [20:44] lifeless, can you tell me if http://pastebin.ubuntu.com/487402/ is ec2 being flaky or a real problem? i can't duplicate it locally and it seems to have nothing to do with my branch [20:45] leonardr: line 18 is that storm bug I was discussing with jcsackett [20:45] Person._init being called to early [20:45] i see [20:45] the fingerprint is the id=%s limit 1 call [20:45] james_w`: what do you think of my sketching ? [20:46] hi folks [20:46] lifeless: sounds reasonable to me [20:46] hi sabdfl [20:46] does the bug description deliberately specify a different font-face, or is that an bug? [20:46] for example, https://bugs.edge.launchpad.net/indicator-network/+bug/621168 [20:46] <_mup_> Bug #621168: Connection to encrypted hidden network fails [20:46] new font is looking pretty good generally, mono version on the way [20:47] deryck: this ones for you I think? [20:47] ah, yes. [20:48] sabdfl, it's a bug. It's falling back to the YUI css for some non-obvious reason. [20:48] sinzui and I worked on this a bit already. [20:48] deryck, the YUI CSS is more specific than the one in LP, so it wins [20:48] aha. low importance, i was just curious, but please do file it so it doesn't get lost [20:48] sabdfl, will do. [20:49] jcsackett: with your query count bug [20:49] thanks deryck [20:49] jcsackett: please use [20:49] Equals(count), Equals(count + 1) [20:49] jcsackett: so that we don't have to simulaneously change the tests when we get storm 0.18 [20:51] leonardr: changing the test you're hitting to be similar to the pastebin changes jcsackett pasted before - with a MatchesAny(Equals(n), Equals(n+1)) and a flush + reset should work around it. [20:51] leonardr: I will be very happy when storm 0.18 arrives ;)( [20:51] beuno, thanks for the "more specific" pointer [20:52] deryck, also, a different font may break the widget layout. You had worked on making them fluid rather than fixed, may of fixed the font-face problem [20:53] deryck, we may want to make lazr-js be less specific about CSS, so it doesn't end up being a battle between the projects' CSS and the widget [20:56] beuno, IIRC, the font had to be specified to make it re-usable. Changing the font will force changes to the widget, whether it happens in lazr-js on lp's use of the widget. [20:57] deryck, right. I was thinking of just specifying the font more generically on the DOM elements [20:57] vaguely enough that it applies by default, but can still be easily overriden [20:58] ah, so it does win in the specificity battle. [20:58] right [20:59] otherwise we end up having to slap !important on [20:59] or have crazy CSS rules [20:59] either way, LP still needs to catch-up with YUI 3.1.2 [20:59] so you will probably have to hack it in LP first [21:00] this was more of a for-the-future thought [21:01] right [21:01] thanks, I appreciate it. [21:02] and now we have bug 629063 [21:02] <_mup_> Bug #629063: Description editing widget should use the Ubuntu font [21:02] lifeless: do you want me to make that change in this branch, or is it ok to land my branch as is? ie. are you just telling me how to work around it for now? [21:03] your branch won't land because you're tickling the storm cache enough that you'll see that extra query. I don't know what your branch changes :) [21:04] so you need to make that change in your branch, so you can land it. I'll happily review the workaround for you === dobey_ is now known as dobey === abentley_ is now known as abentley [21:35] ok [22:04] lifeless: all right, see how http://pastebin.ubuntu.com/487469/ grabs you [22:08] leonardr: that should be enough, yes. give it a sping - with XXX around the flush and reset saying 'bug 619017' please [22:08] <_mup_> Bug #619017: __storm_loaded__ called on empty object === matsubara is now known as matsubara-afk === salgado is now known as salgado-afk [23:14] Is there anyway to change an attribute in a zope macro? [23:20] bdmurray: what do you mean? [23:20] lifeless: lib/lp/bugs/templates/bug-portlet-dupe-subscribers-content.pt uses subscriber-row from bug-portlet-subscribers-content and I want to chagne the title attribute in subscriber-row [23:21] hmmm [23:21] I don't tihnk so [23:21] it also feels weird to me [23:23] losa ping [23:23] https://staging.launchpad.net/successful-updates.txt claims 9737 [23:23] but the running instance is 9710 [23:44] spm: when you start, I know you want to crawl into a hole and hide; I have two small things batched up so that they don't interupt. [23:44] spm: one is the staging thing above [23:45] spm: the other is can you please put the canonical losa tag whatever it is onto https://bugs.edge.launchpad.net/launchpad-foundations/+bug/629139 [23:45] <_mup_> Bug #629139: cannot make production config changes [23:45] jcsackett, hello [23:45] jcsackett, just saw your comment on bug 623428 about the bug status change, I'm investigating [23:45] <_mup_> Bug #623428: Don't get ZODB 3.10 by installing Zope2.13a [23:46] hmmm [23:46] bug 623408 [23:46] <_mup_> Bug #623408: Offiical_* booleans must be deprecated in favor of usage enums [23:46] there you go :) [23:47] it's not supposed to change the bug status, just tag it as untestable [23:47] because of the new merge workflow [23:48] I'm taking care of it [23:50] Ursinha: thanks. [23:51] lifeless: spm: i have tagged that bug. [23:51] mbarnett: oh awesome, thanks [23:51] mbarnett: I figured you'd EOD when the ping fell into silence :) [23:51] lifeless: we had an austin meetup this afternoon.. was in transit home [23:52] meetup? [23:52] a bunch of canonical employees who live in the area worked from a coffee shop together for a couple hours. [23:52] nice [23:52] yeah, i hadn't put a face on a couple people yet, so it was great. [23:53] I met the closest canonical staffer to me, in prague :P [23:54] (we both live in NZ)