[00:26] <lifeless> bdmurray: 'bugtask-search
[00:26] <lifeless> bdmurray: 'bugtask-search|patches-view'
[00:26] <lifeless> mmm, possible with brackets, depends on the engine
[01:57] <lifeless> it might be nice to give community devs - those that have reached reviewer level, for instance, access to the OOPS summaries
[01:58] <lifeless> also
[01:58] <lifeless> wow, 1135 second page request. ouch.
[03:26] <mars> lifeless, it looks like your patch worked.  The generic everyday "Threads left garbage" message was raising an error, which locked up the entire process stack in os.wait() calls.
[03:27] <mars> and that is why it was random - some thread somewhere wasn't shut down before the testrunner proceeded.  Maybe a race condition.
[03:28] <mwhudson> mars: yay for progress on that one
[03:30] <mars> mwhudson, yes, I'm happy we finally found the source.  Now we just have to land the fixes for each of the links in the chain of cascading failures.
[03:31] <mwhudson> the boring bit :-)
[03:31] <mars> hehe
[03:31] <mwhudson> mars: do you know what's going on with ec2 thinking failing tests runs are successes?
[03:31] <mars> that is a mystery to me.
[03:32] <mars> I couldn't reproduce it.  But thanks to this fix I now know where to look in the zope code for errors.
[03:37] <cody-somerville> lifeless, That rpm.newrelic.com website crashes my browser :P
[03:38] <lifeless> oops :P
[03:39] <lifeless> mars: glad to have helped
[07:06] <wgrant> lifeless: Why is bug #516709 Soyuz? Isn't it Code? All of the changes seem to be to the branch, not upload rights.
[07:06] <mup> Bug #516709: revisit official package branch permissions <Soyuz:New> <Ubuntu Distributed Development:New> <https://launchpad.net/bugs/516709>
[07:32] <bilalakhtar> Hi there lp devs, do I need to run lp on my own computer before submitting a patch? I know it would be better to do that, but isn't lp too bulky to download?
[07:35] <beuno> bilalakhtar, ideally you would, you cacn try submitting the merge proposal and seeing if you can get a developer to run the tests for you
[07:36] <bilalakhtar> beuno: oh ok thanks
[07:37] <beuno> bilalakhtar, what is this patch about?
[07:38] <bilalakhtar> beuno: I haven't begun work yet, but I want to add the following feature in lp answers: One should be able to assign someone to a question or change its status using an AJAX overlay.
[07:38] <bilalakhtar> beuno: such a feature already exists in malone
[07:39] <beuno> ah
[07:39] <beuno> so, just to give you a tip, you may need to build the API for that, because those are old parts of the code and probably don't have APIs to leverage javascript
[07:40] <beuno> which means you will need to run LP, because it's a significant chunk of work  :)
[07:40] <bilalakhtar> ahha
[07:40] <bilalakhtar> thanksf for the info, beuno
[07:41] <bilalakhtar> beuno: What do you mean by "API" ?
[07:41] <bilalakhtar> beuno: launchpadlib?
[07:41] <beuno> a level down, internal API
[07:41] <bilalakhtar> beuno: you mean the RESTful API that lp exposes?
[07:42] <beuno> yes
[07:42]  * bilalakhtar understands his task, and still he is determined to work on this feature
[07:43] <beuno> bilalakhtar, that is awesome
[07:43] <beuno> I look forward to it!
[07:44] <bilalakhtar> beuno: Thanks, it will take a week, since I don't get to code very often
[07:45] <bilalakhtar> beuno: Another question: Why are there 4 lp branches? On which one should I work?
[07:45] <spm> the latter Q is easy. devel. the former... lengthy to explain.
[07:47] <wgrant> No Answers stuff is exposed over the API yet. This is not going to be a simple task.
[07:47] <bilalakhtar> spm: I think the answer is this: All branches merge into devel, then go into edge soon, where they will be deployed to edge.lp.net . staging branch is the code behind staging.lp.net . the lp:launchpad branch is for running the production part of lp.
[07:47] <beuno> I don't say this very often, but what spm said
[07:48] <spm> basically, the 4 branches allow devs to keep developing without DB changes blocking all updates. such that edge keeps getting updates till we do a release. staging, is where db mods are trialed.
[07:48] <wgrant> Ignore stable (edge) and db-stable (staging).
[07:48] <spm> unless your a losa :-)
[07:48] <wgrant> And probably read http://dev.launchpad.net/Trunk
[07:48] <wgrant> spm: Shh.
[07:48] <bilalakhtar> beuno and wgrant: I will try to copy the code from malone :)
[07:48] <spm> heh
[07:49] <wgrant> bilalakhtar: It's not that easy.
[07:49] <bilalakhtar> losa?
[07:49] <spm> (launchpad, landscape, ubuntu-one and other stuff) operational sys admin
[07:49] <bilalakhtar> wgrant: ok, then I will try once, if I fail then will searchi for a bug to patch
[07:49] <poolie> bilalakhtar, that's great
[07:49] <wgrant> It's probably best to try some smaller things first.
[07:50] <spm> the l has becomes somewhat overloaded. I prefer the "l == legendary" explanation myself.
[07:50] <poolie> good idea though
[07:50] <bilalakhtar> wgrant: good tip, will search for some tiny bugs
[07:50] <wgrant> API + JS is not the best combination to start with.
[07:50] <wgrant> But Answers could certainly do with lots of AJAX.
[07:50] <spm> hey noodles775
[07:50] <noodles775> Hiya
[07:50] <bilalakhtar> noodles775: hi there, buildd admin
[07:51] <noodles775> ehem
[07:51] <noodles775> erm, what's wrong...
[07:51]  * noodles775 starts loading pages :)
[07:51] <bilalakhtar> I can't believe that lp was proprietary once! Actually, I joined lp quite late
[07:52] <poolie> https://bugs.edge.launchpad.net/launchpad-answers/+bug/58670 would probably be easy and useful
[07:52] <mup> Bug #58670: Highlight comments from the reporter <feature> <ui> <Launchpad Answers:Triaged> <Launchpad Bugs:Triaged> <https://launchpad.net/bugs/58670>
[07:52] <noodles775> Yeah, lp has certainly benefited lots from being open :) (IMO)
[07:52] <poolie> or https://bugs.edge.launchpad.net/launchpad-answers/+bug/226690
[07:52] <mup> Bug #226690: Not obvious that expired questions can be reopened <confusing-ui> <Launchpad Answers:Triaged> <https://launchpad.net/bugs/226690>
[07:52] <bilalakhtar> poolie: thanks
[07:52] <beuno> I remember when it was closed, wgrant was cranky all the time instead of hyper-productive   :)
[07:53] <poolie> and cranky :)
[07:54] <wgrant> beuno: See, I wasn't just complaining for the sake of complaining.
[07:55]  * beuno hugs wgrant 
[07:55]  * poolie hugs you both
[07:55] <beuno> :)
[07:56]  * bilalakhtar is amazed to find lp managed by bots :)
[07:56] <poolie> i certainly feel better about contributing now it's open
[07:56] <poolie> even though it was possible before, it just feels better now it's properly open not just internally open
[07:56] <poolie> and there's more infrastructure and documentation towards helping others
[07:57] <bilalakhtar> God has said: The world will end at a time when non-living things will take control over the jobs of people.
[07:57]  * bilalakhtar agrees with poolie beuno and wgrant 
[08:08] <bilalakhtar> A question: How large is the lp devel branch?
[08:08] <noodles775> wgrant: just in case you're gone when I try later, is there anything non-obvious that I should watch out for when trying to run a SPRecipeBuild locally (that's not on the runningsoyuzlocally wiki)?
[08:12] <lifeless> wgrant: they seemed to be about upload rights to me
[08:12] <lifeless> wgrant: besides which, code, soyuz, its all the same
[08:16] <wgrant> noodles775: Barring the bug that you're trying to fix, it's pretty simple. Just 'make run_codehosting' (also starts the appserver), push a branch up, and create a recipe through the UI, request a build, start buildd-manager.
[08:17] <wgrant> Also, recipe builds will crash the buildd due to another bug.
[08:17] <noodles775> Thanks.
[08:17] <adeuring> good morning
[08:17] <wgrant> Bug #587109
[08:17] <mup> Bug #587109: Needs to cope with not receiving package_name from the master <Launchpad Auto Build System:Incomplete> <https://launchpad.net/bugs/587109>
[08:21] <wgrant> noodles775: Is buildd-manager crashing?
[08:21] <wgrant> Maybe it's having recipe fun.
[08:21] <wgrant> The queue is large, and logtails appear to not be updating, at any rate.
[08:22] <poolie> bilalakhtar, about 213MB
[08:22] <bilalakhtar> poolie: oops, will take more than 2 hours on my connection
[08:22] <wgrant> poolie, bilalakhtar: Plus 210MB for one set of deps, and another 100MBish for the other set.
[08:23] <wgrant> Plus 100-200MB for the apt dependencies.
[08:23] <bilalakhtar> wgrant: Should I use rocketfuel?
[08:23] <wgrant> bilalakhtar: You should use rocketfuel-setup, yes.
[08:24] <wgrant> Doing it manually is possible, but difficult.
[08:24] <bilalakhtar> wgrant: ok. so which branch should I work on? I am confused between db-devel and devel, even though I read that runk page
[08:25] <wgrant> bilalakhtar: If you need to make database changes, work on db-devel. Otherwise, use devel.
[08:25] <noodles775> wgrant: the last synced log looks fine (up to 8:08 UTC).
[08:26] <wgrant> noodles775: Argh. Maybe it's just being slow at processing uploads.
[08:26] <noodles775> erm, that's obviously not utc.
[08:26] <wgrant> Hey, you never know...
[08:29] <wgrant> Mm, yeah, it's mostly filled up now.
[08:29] <wgrant> Although something is still wrong.
[08:34] <spm> noodles775: I've not had a chance to chase; but would the irregularity around retry-depwait be a possible issue here?
[08:35] <noodles775> spm: possibly - einsteinium's last entry in the log is certainly 2010-06-04 07:42:40+0100 [-] ***** einsteinium is MANUALDEPWAIT *****
[08:36] <spm> ew. the retry-depwait log is *full* of entries like that. 2010-06-04 07:09:33 INFO    Found 1076 builds in MANUALDEPWAIT state.
[08:36] <noodles775> but the others (shipova, rosehip) simply have:2010-06-04 07:40:34+0100 [QueryWithTimeoutProtocol,client] <rosehip:http://rosehip.ppa:8221/> marked as done. [0]
[08:39] <noodles775> yep, last mention some of the other idle builders is also MANUALDEPWAIT.
[08:39] <noodles775> s/idle/idle 386/
[08:39] <wgrant> noodles775: Um, is that nearly an hour ago?
[08:40] <noodles775> Yes.
[08:40] <wgrant> They're the latest?
[08:40] <noodles775> Yep... latest mention in the log.
[08:41] <wgrant> But it's still showing regular scans?
[08:41] <noodles775> Yep. And starting new builds on other buildds.
[08:41] <noodles775> wgrant: sorry...
[08:42] <noodles775> wgrant: It's dispatching new builds, but you're right, last mention of starting scanning cycle is at: 2010-06-04 07:40:39+0100 [-] Starting scanning cycle.
[08:43] <wgrant> noodles775: It's dispatching?
[08:43] <noodles775> 2010-06-04 08:31:56+0100 [-] startBuild(http://dubnium.ppa:8221/, shotwell, 0.5.2-1~karmic1, Release)
[08:43] <wgrant> As in, has been for the last hour?
[08:44] <wgrant> That's pretty special.
[08:44] <wgrant> Since the startBuild calls are asynchronous.
[08:44] <wgrant> Unless the DB calls are slow?
[08:44] <wgrant> Which they might well be....
[08:45] <wgrant> But not that slow.
[08:45] <wgrant> Surely.
[08:46] <wgrant> And no hints of any recipe builds firing accidentally?
[08:47] <noodles775> I'll chekc in a tick.... but checking the frequency of "Starting scanning"... aside from one anomaly, they're all around 2hrs apart :/
[08:48] <wgrant> For whole long?
[08:48] <wgrant> Er.
[08:48] <wgrant> *how* long.
[08:49] <noodles775> It was fine last night (a few times per minute)...
[08:50] <wgrant> Ah, good. I was hoping the logging wasn't inconsistent.
[08:50] <noodles775> Seems to have gotten progressively worse since 2010-06-03 19:15:53+0100 [-] Starting scanning cycle.
[08:51] <wgrant> Since that would be... not unheard off in buildd-manager.
[08:51] <wgrant> Hmm.
[08:51] <wgrant> And there are startBuild calls spread throughout the intervals?
[08:52] <wgrant> This isn't a failure mode I've seen before.
[08:52] <noodles775> yeah, me either... it's very strange.
[08:53] <spm> would it help if I mention that I have the utmost confidence in you guys to figure it out? morale booster? ;-)
[08:54] <wgrant> spm: Ah, you EOD in 5 minutes, that's why you're so happy :P
[08:55] <spm> my secret is out :-)
[09:03] <noodles775> wgrant: the i386 buildds have filled up a bit (but all without logs).
[09:09] <wgrant> noodles775: Are the startBuilds delayed (implicating the synchronous bit), or are they all within a couple of seconds (implicating the async bit, and ewwww Twisted)?
[09:09] <noodles775> wgrant: there are definite breaks between some startBuilds calls... I'm including that on the bug I'm creating so we can collect info there.
[09:11] <wgrant> OK, great.
[09:11] <wgrant> This is a really, really odd one.
[09:16] <noodles775> wgrant, bigjools : I've created bug 589577 which has a small snippet of the log before I lost my connection to the log server (and can't reestablish)
[09:16] <mup> Bug #589577: buildd is not scanning regularly <Soyuz:New> <https://launchpad.net/bugs/589577>
[09:16] <noodles775> bigjools: it's also got a link to the irc conversation so far.
[09:16] <wgrant> Ugh.
[09:17] <wgrant> 13 minutes.
[09:19] <bigjools> my immediate thought is that the network has a problem
[09:20]  * noodles775 hopes bigjools is right :)
[09:21] <bigjools> I am checking with IS
[10:02] <stub> noodles775: So that build page that was timing out. Does this query provide all the information we need for the bits that are timing out? http://paste.ubuntu.com/444495/
[10:13] <noodles775> stub: checking
[10:14] <stub> noodles775: Things also run more than twice as fast on a slave node (that particular query takes just over 5 seconds on the master, but 1.4 seconds on a slave). I don't think anyone will care if the stats are maybe a few seconds out of date.
[10:18] <noodles775> stub: did you try with the SUM too?
[10:19] <stub> That's what is in the pastebin, isn't it?
[10:19] <noodles775> stub: ah, i didn't scroll past the first..
[10:19] <noodles775> stub: er, I was looking at the wrong paste... got it.
[10:21] <noodles775> stub: great, so I can update the storm code to (1) run on the slave and (2) use the count/sum in the findspec rather than querying once for each. Certainly looks much better.
[10:22] <stub> Yup.
[10:22] <noodles775> Or should I just use the query verbatim (so we know exactly what's being executed)?
[10:22] <noodles775> (and thanks!)
[10:22] <stub> Using Storm to generate the query should give you pretty much the same thing
[10:23] <stub> You can check by turning on the storm SQL tracing. Or getting a user requested oops report.
[10:24] <stub> I've been looking into indexes - BuildFarmJob.status and Archive.require_virtualized help a little with the existing query, but not much and not at all with the count/sum query
[12:02] <wgrant> stub: Why's the master so slow? Load?
[12:04] <wgrant> noodles775: buildd-manager still looks a bit unhappy... any progress?
[12:05] <noodles775> wgrant: bigjools is still investigating it (we tried disabling retryDepwait in case it was table locks), but not yet that I'm aware of (I've switched back to the builders index now that stub's provided one query to rule them all :) ).
[12:05] <wgrant> Aha.
[12:06] <bigjools> wgrant: the findBuildCandidate query is taking 10 minutes instead of 10 milliseconds
[12:06] <bigjools> we have a missing index
[12:06] <noodles775> Which one? (I thought stub added BFJ.status?)
[12:07] <wgrant> bigjools: After seeing the location of the break in the logs, I had a suspicion it might be DBish.
[12:07] <bigjools> noodles775: don't know, stub is looking at the query for me
[12:07] <noodles775> ah, great.
[12:08] <noodles775> oh, that's *the* query... ew.
[12:08] <wgrant> Yes. *That* query.
[12:08] <wgrant> I wonder if there is actually a bigger one in LP.
[12:08] <bigjools> yes, the BUDQ
[12:08] <wgrant> Maybe the one to expire PPA files.
[12:08] <bigjools> add "F"s to taste in that acronym
[12:09] <wgrant> Hm.
[12:09] <wgrant> I wonder if this is related to the getBuildRecords timeouts that started with 10.05.
[12:24] <stub> bigjools: Looking at that query, I'm tempted to say scrap it and start again.
[12:25] <bigjools> stub: it's necessarily complicated
[12:26] <bigjools> because it's built up from different parts of the code
[12:26] <stub> Maybe scrap all the EXISTS, refactoring it to precalculate them into temporary tables.
[12:26] <bigjools> stub: see lib/lp/buildmaster/model/builder.py: _findBuildCandidate
[12:27] <stub> So that sounds like what needs to be refactored. If the code is generating something that complex and unoptimizable because it has to, there is a problem.
[12:28] <wgrant> stub: Oh, you've recovered from the horror-induced coma already?
[12:29] <bigjools> stub: I think that's a good approach, but how can we fix this critical problem right now?
[12:29] <bigjools> can you see a missing index?
[12:29] <bigjools> it was working fine until noodles' model change
[12:31] <stub> Strangely enough, I just ran that query = 440ms
[12:32] <stub> So the horrible bits only got executed 115 times because the raw queue isn't that big.
[12:32] <bigjools> yeah, some of them run fast, some are slow
[12:32] <bigjools> I think it depends on the architecture
[12:33] <stub> So I need a slow one
[12:34] <bigjools> we could run the b-m with storm tracing on
[12:34] <stub> The planner will chose different plans depending on table statistics - eg. using a sequential scan instead of an index lookup if it believes a large percentage of the table needs to be retrieved anyway.
[12:36] <stub> So all those exists get executed for each and every row not filtered by the preceding criteria. That means between 0 and 54k times I think.
[12:37] <stub> I can try and chose some bad proceeding criteria.
[12:38] <noodles775> stub, bigjools: if it's any help, you can see how little changed in that query with bzr diff -c10937 lib/lp/buildmaster/model/builder.py (shown here: http://pastebin.ubuntu.com/444570/)
[12:38] <bigjools> I suspect the massive buildqueue is not helping
[12:38] <bigjools> I'm gonna blow away any disabled archive buildqueues
[12:41] <stub> Do you have the algorithm for finding the next build candidate in English?
[12:43] <bigjools> I can try
[12:49] <bigjools> stub: http://pastebin.ubuntu.com/444576/
[13:21] <stub> So if we have 35k items in the queue (such as we have now for processor=1 and virtualized=true), we order them by lastscore and check them one at a time until all our criteria match. That might be a lot of time.
[13:22] <stub> If an item doesn't match criteria, why do we keep its lastscore high? If we bumped it to the end of the queue (or just increased it by some factor), the queue items with poor scores would bubble to the end.
[13:30] <bigjools> stub: scores never change unless changed manually
[13:30] <stub> bigjools: So for the slow cases I've found, it is the 80% utilization check that is the killer
[13:30] <bigjools> :(
[13:30] <bigjools> crap
[13:30] <stub> Not really
[13:31] <bigjools> are you going to tell me there's a quick fix? :)
[13:31] <stub> If we have 10k items in the queue, all in the same archive, we currently end up issuing that query 10k times (failing each time) to get past them
[13:32] <stub> So we move that out of the SQL. Instead, we do that check when filtering the first real item from the potential candidates, and cache it for subsequent checks in the loop
[13:33] <stub> Or alternatively, we calculate the list of banned archives once first and filter that way.
[13:36] <stub> Does the theory sound sane?
[13:37] <bigjools> I'm not sure
[13:38] <bigjools> the utilisation is very dynamic
[13:38] <bigjools> the point is that the query is doing what we'd have to do in Python anyway, so caching seems the only option
[13:39] <bigjools> I'm going to blitz 54k queue items which should speed this up a but
[13:39] <wgrant> Do we know that the 80% query is actually doing much useful?
[13:39] <bigjools> yes, it is
[13:39] <wgrant> and why is destroying those items a good idea? Is it not possible to filter out the suspended ones first?
[13:39] <bigjools> it stops the daily builds from monopolising the farm
[13:40] <wgrant> Mmm not really. There are several daily PPAs.
[13:40] <wgrant> So it stops a single PPA from monopolising it, and just makes several do it :P
[13:40] <bigjools> well it depends on when they start building
[13:40] <bigjools> yes it's still possible given enough PPAs
[13:40] <bigjools> but at least they still get a look in
[13:41] <bigjools> hmmm actually it won't help by blitzing them
[13:43] <bigjools> stub: ok for now I suggest we cowboy out the 80% check until we find a better solution
[13:43] <stub> Its a quick way of confirming the theory. I don't know if the slow query I manufactured is the same as the slow queries we are seeing on production.
[13:44] <stub> Actually, I can confirm since I can see the important bit. Goes slow when virtualized=true and processor=1
[13:48] <bigjools> stub: the first part of the query filters out jobs that are not waiting
[13:48] <bigjools> so it should not be checking that many rows in that 80% check
[13:50] <stub> If I comment out that chunk, the query stops taking minutes (I give up and cancel), and instead takes 614ms.
[13:51] <bigjools> ok I will run this stuff that blows away the suspended jobs
[13:51] <bigjools> and see if it makes any difference
[13:51] <stub> I've lost the analyze from before that pointed to it... I seem to recall about 56k checks but I'm not sure where they came from since I would have thought only 35k would be checked
[13:51] <bigjools> 56k is the number of buildqueue rows
[13:52] <bigjools> like I said, it's not supposed to be checking all those!
[13:52] <bigjools> 54k of them are suspended
[13:52] <stub> So PG decided to do the check before filtering because it thought it would be faster :-(
[13:52] <bigjools> :(
[13:52] <bigjools> this was fine before the rollout, I can't work out what's broken it
[13:52] <bigjools> s/rollout/re-roll/
[14:02] <stub> Still slow if I force the filtering properly - only a max of 1.8k checks
[14:02] <bigjools> gah
[14:03] <bigjools> ok it's gotta go
[14:04] <stub> Why is the archive=2 check inside the exists?
[14:04] <bigjools> it only applies to PPAs
[14:05] <bigjools> public PPAs
[14:06] <bigjools> flacoste: just the man!
[14:07] <bigjools> flacoste: we've got problems with the buildd-manager being very slow, I need to make a cowboy, can you approve this please:http://pastebin.ubuntu.com/444605/
[14:07] <bigjools> it removes the slow query part
[14:07] <stub> Yes, but the Archive table is from the outer scope
[14:08] <flacoste> bigjools: ???
[14:08] <flacoste> bigjools: what does the slow part do?
[14:08] <bigjools> flacoste: something has changed to make the dispatcher query very slow, we don't know what's caused it
[14:09] <flacoste> bigjools: iow, what functionality/conditions are we disabling?
[14:09] <bigjools> flacoste: limits builder usage to 80% of an architecture for each archive
[14:09] <flacoste> bigjools: why do we do that? or what are the consequences of not doing that?
[14:09] <bigjools> flacoste: the consequences are that a single PPA can hog all the builders of a single architecture
[14:10] <bigjools> basically the daily build ppas
[14:10] <bigjools> but that's currently less worse than a 2 hour scan cycle
[14:10] <flacoste> bigjools: i agree, should i worry about stub comment abouit the Archive table being from the outer scope?
[14:11] <bigjools> flacoste: that's part of what we're removing from the query
[14:11] <stub> Don't mind me - I'm just trying to decode this query
[14:11] <bigjools> I'd like to restore the build farm first, then look at this problem with less pressure
[14:11] <stub> +1
[14:12] <bigjools> I can restart retry-depwait as well and see if those indexes worked
[14:14] <wgrant> While you're considering build-related queries, getBuildRecords timeouts are causing cron to spam me far too frequently since 10.05. It might be related, I guess, so I thought I might point it out.
[14:14] <bigjools> wgrant: api?
[14:14] <wgrant> bigjools: That's the one.
[14:14] <bigjools> ok
[14:14] <bigjools> did you file a bug?
[14:15] <wgrant> No. I was going to wait to see if it persisted -- it has. I'll file one tomorrow.
[14:15] <flacoste> bigjools: did you start an incident report?
[14:15] <flacoste> bigjools: r=me with an incident report :-)
[14:16] <bigjools> flacoste: I've not had time to fart, let alone write an incident report
[14:16] <bigjools> but trust me when I say I've been thinking about it :)
[14:16] <flacoste> good
[14:21] <stub> I just can't decode how that EXISTS is supposed to work at all (the one we are removing). It is trying to filter out jobs if the archive is utilizing 80% capacity. It counts the number of jobs currently active for the archive, but does not count the total capacity so how does it make that calculation?
[14:24] <bigjools> hmmmm
[14:24] <bigjools> good point
[14:25] <wgrant> It divides by %s, which is num_arch_builders, doesn't it?
[14:28] <bigjools> ah yes - can't tell from the raw log :)
[14:28] <sinzui> maxb ping
[14:28] <maxb> pong
[14:29] <sinzui> maxb: I think I recall you used suggested a fix for a gpg error we were/are seeing when we import a key
[14:29] <sinzui> s/used/once/
[14:29] <maxb> Um. Can you show the exact error you are talking about, to try to jog my memory?
[14:32] <sinzui> maxb: I updated bug 568456
[14:32] <mup> Bug #568456: GpgmeError raised importing public gpg key <oops> <Launchpad Foundations:Triaged> <https://launchpad.net/bugs/568456>
[14:33] <sinzui> maxb I recall someone suggested using str() in a cowboy one or two released to fix a gpg error.
[14:34] <maxb> I definitely recall discussing str vs. unicode issues. It's not, however, obvious to me that this is the same or a related issue
[14:34] <maxb> I believe we got a more informative message than 'General error' at that time
[14:37] <sinzui> yes. I think the real error is masked. This would be easier to fix if we could reproduce it
[14:44] <noodles775> sinzui: you might have been thinking of this: https://code.edge.launchpad.net/~michael.nelson/launchpad/ppa-generate-key-failure/+merge/24871
[14:46] <sinzui> noodles775, yes!
[14:46] <sinzui> noodles775, this may help
[14:53] <bigjools> stub, noodles775: buildd-manager healthy again with that query part ripped out
[14:53] <stub> bigjools: How many builders for processorfamily 1 are there?
[14:56] <bigjools> stub: https://edge.launchpad.net/builders
[14:56] <bigjools> 17
[14:57] <bigjools> well, 14 for PPAs
[15:00] <bigjools> I need food, BBIAB
[15:03] <stub> bigjools: http://paste.ubuntu.com/444626/ is the query modified to use NOT IN to filter out archives that are over capacity, and runs reasonably fast. I'm not sure of the rules though - can only PPA archives go over capacity?
[15:27] <bigjools> stub: yes, that rule only applies to PPAs
[15:30] <stub> bigjools: http://paste.ubuntu.com/444637/ then?
[15:32] <bigjools> stub: one other option is to factor out that query so we have a python list of archives that is evaluated once, and then we plumb that result into the bigger query
[15:36] <bigjools> the list is never going to be very big
[15:57] <stub> bigjools: The bit in the 'NOT IN' should only be evaluated once inside that query. If you are calling that function multiple times though in a transaction, then factoring it out would be better
[15:57] <bigjools> stub: no, it gets called once for each polled builder
[15:58] <bigjools> stub: anyway, that's awesome, thanks
[16:31] <krkhan> "bin/test -m bugs" starts running *all* tests. what would be the correct way of running only tests related to bugs
[16:32] <krkhan> "bin/test -t bugs" does the same
[16:34] <beuno> -v?
[16:34] <beuno> I don't remember exactly
[16:39] <bigjools> -m bugs should only run the tests in the bugs module
[16:42] <krkhan> -m bugs spends an eternity on setting up layers e.g. "Set up canonical.testing.layers.FunctionalLayer". is that normal?
[16:48] <bigjools> unfortunately yes
[16:50] <krkhan> ouch :-) okay
[18:18] <EdwinGrubbs> Ursinha, do you know if anyone from translations will be available today?
[18:41] <EdwinGrubbs> adiroiban, ping
[19:04] <bdmurray> Is it not possible to subscribe to wiki pages on dev.launchpad.net?  I tried subscribe user and got a 'you are not allowed to perform this action' message.
[19:11] <krkhan> bin/test fails with the error IOError: [Errno 2] No such file or directory: '/var/tmp/mailman/logs/smtpd'. i'm using karmic. any ideas what's causing it?
[19:52] <bdmurray> krkhan: how are you using bin test? what is the full command?
[20:40] <maxb> bdmurray: So, the problem is that whoever wrote the moin skin for dev.lp.net did a somewhat poor job, and removed the 'subscribe' action link
[20:41] <maxb> the 'subscribe user' link you see is, I think, for subscribing other people
[20:42] <maxb> bdmurray: Workarounds are to either manually append ?action=subscribe to the page url, or to go to https://dev.launchpad.net/UserPreferences and enter page names in the relevant form field
[20:43] <rockstar> sinzui, if I have required=True in my interface field, and use use_template() to copy the field to a form schema, shouldn't the validation check it if it's blank?
[20:44] <lifeless> gary_poster: thanks for digging into the oops view thing
[20:44] <lifeless> gary_poster: I do enjoy finding turtles-all-the-way-down bugs :)
[20:45] <gary_poster> lifeless: sure.  lol, yeah, that's what this was. :-)
[21:02] <sinzui> rockstar, I think the answer is true. I have not use use_template, but I have used copy_field, and it does copy the required=* behaviour
[21:03]  * sinzui had to override the complete copy behaviour in fact.
[21:03] <rockstar> sinzui, well, it doesn't seem to be grabbing that behavior.
[21:03] <sinzui> do you know if copy_template is also using copy_field for build a schema?
[21:04] <rockstar> sinzui, no, I don't.  Lemme try something.
[21:05] <bdmurray> maxb: thanks for the work around!
[21:07] <rockstar> sinzui, so, it looks like the required=True part is checked AFTER validate, so you have to check the data to see if the key exists in the validate method...
[21:07] <rockstar> sinzui, that seems...backward.
[21:08] <sinzui> rockstar, that is not right, field validation does happen before form validation in LaunchpadFormView.
[21:08] <rockstar> sinzui, This oops says otherwise: https://lp-oops.canonical.com/oops.py/?oopsid=1615EA2203
[21:13] <sinzui> rockstar, LaunchpadFormVIew_validate() validates the widget input before the view's validate() method. That is why the view's method can check for field errors at the start
[21:14] <rockstar> sinzui, hm, so I'm not sure why this bug is occurring then.  If I add the "if data.get('name', None):" it works fine, gives me the validation error, etc.
[21:15] <rockstar> sinzui, does it continue on, gathering all field errors from _validate and validate before it gives you all errors?
[21:16] <sinzui> rockstar all field errors are created as the widgets are iterated. The the view's method validation() gets the data to do additional rule checking, invariant kinds of checking
[21:17] <sinzui> rockstar, does this field also have a default? because that can create a value
[21:17] <rockstar> sinzui, no, no default.
[21:18] <sinzui> rockstar, is there another oops id, that url will not load
[21:19] <rockstar> sinzui, I don't think so:  OOPS-1615EA2203
[21:19] <rockstar> That's the one from barry's bug.
[21:21] <sinzui> rockstar, looking at that oops, is the error that name has a space in it?
[21:23] <rockstar> sinzui, the way I reproduced it, it was entirely empty.
[21:23] <sinzui> ISourcePackageRecipe?
[21:23] <rockstar> sinzui, yes.
[21:24] <sinzui> that is a bad name
[21:24] <krkhan> bdmurray: i was using bin/test -t lp.bugs. but i'm getting a lot of other failures as well. i guess the devel branches aren't supposed to pass each and every test, are they?
[21:24] <sinzui> TextLine(title=_("Name"), required=True, constraint=name_validator,description=_("The name of this recipe."))
[21:25] <sinzui> rockstar, what view is this? I want to read the @action
[21:26] <sinzui> rockstar, is this it: SourcePackageRecipeAddView(RecipeTextValidatorMixin...)
[21:28] <sinzui> rockstar, I have seen this
[21:30] <sinzui> rockstar, When the vocabulary field is invalid, it is not in the data. It may also be true for NameFields I think you want to check for field errors at the start of validate()
[21:32] <sinzui> rockstar, several views have a guard at the start of validate()
[21:32] <sinzui> if len(self.errors) > 0:
[21:32] <sinzui>     return
[21:33] <rockstar> sinzui, ah, okay, that makes more sense.
[21:33] <sinzui> I don't see any looking at the errors, they just get the len() and leave it is not zero
[21:37] <rockstar> sinzui, okay.  I would think that LaunchpadFormView would be a better place for that check.  What do you think?
[21:38] <sinzui> No, because validate() is allowed to overwrite those errors. The ones we get from zope field are so arcane that gary has to look them up
[21:38] <gary_poster> heh
[21:38] <rockstar> hahahahahaha
[21:39] <sinzui> The bug supervisor message is an example of one that is easy to put into a sentence, but the field validation about vocabularies make no sense to a user
[22:07] <zyga> gmb, ping
[22:08] <zyga> gmb, I'd love to use your django-xmlrpc library
[22:36] <adiroiban> EdwinGrubbs: hi
[22:51] <EdwinGrubbs> adiroiban, I was told you could answer some questions I have about LP Translations. I'm working on caching the sum of the POTemplate.messagecount in the DistributionSourcePackage table in a new column called po_message_count. I'm wondering if the best place in the code to update the cache would be POTemplate.importFromQueue() and POFile.updateStatistics() since they both update POTemplate.messagecount.
[22:53] <adiroiban> hm... well, in LP translations can be updated by both an import, or by direct submission via web interface
[22:53] <adiroiban> ah
[22:53] <adiroiban> sorry
[22:54] <adiroiban> for the POTemplate
[22:54] <adiroiban> so. for messagecount of a POTemplate, importFromQueue() should do it...
[22:55] <adiroiban> I am not sure why POFile.updateStatitics() is updating the POTemplate.messagecount ...
[22:55] <adiroiban> looking
[23:03] <adiroiban> EdwinGrubbs: I don't know why the code from POFile.updateStatistics() is touching the potemplate.messagecount. It was added to fix bug 371453, but I can not find any clue
[23:03] <mup> Bug #371453: Broken statistics <message-sharing> <ui> <Launchpad Translations:Fix Released by danilo> <https://launchpad.net/bugs/371453>
[23:05] <EdwinGrubbs> adiroiban, do you think it's safe to add extra logic to those two methods?
[23:06] <adiroiban> The code from POFile.updateStatistics() that is updating the POTemplate looks strange, in before adding anything I would talk with Danilo ...
[23:06] <adiroiban> but Danilo is on leave
[23:07] <adiroiban> POTemplate.importFromQueue should be right place for updating POTemplate.messagecount cached value
[23:09] <adiroiban> EdwinGrubbs: also, POFile.updateStatistics() is called in POTemplate.importFromQueue() ... after
[23:09] <adiroiban>             # Update cached number of msgsets.
[23:09] <adiroiban>             self.messagecount = self.getPOTMsgSetsCount()
[23:09] <EdwinGrubbs> hmmm, that doesn't seem good
[23:10] <adiroiban> EdwinGrubbs: also
[23:10] <adiroiban> the code from POFile
[23:10] <adiroiban> is using potemplate.getPOTMsgSets().count()
[23:10] <adiroiban> instead of potemplate.getPOTMsgSetsCount()
[23:12] <adiroiban> but this is a minor fact
[23:12] <adiroiban> so, to askwer your initial question
[23:13] <adiroiban> I would say that POTemplate messagecount cache data should be updated only in importFromQueue
[23:14] <EdwinGrubbs> thanks
[23:16] <adiroiban> EdwinGrubbs: I could not find the MP for the branch that added those lines in POFile.updateStatistics()
[23:16] <adiroiban> if you can find it, maybe you could find some tips regarding those lines
[23:16] <EdwinGrubbs> ok