[00:00] StevenK: I can has blessing? [00:00] thumper: https://code.launchpad.net/~wgrant/launchpad/bug-707741-addFile-master/+merge/47606 please [00:01] wgrant: I was getting there :-) [00:01] StevenK: Great, thanks. [00:01] So many regressions :( [00:02] leonardr: is there some reason we can't have a destructor parameter? That doesn't seem to alter the boolean nature [00:05] lifeless: Thanks. [00:09] lifeless: Bugs needs a kanban UI. [00:11] ... well, this is amusing. [00:11] getUnlandedSourceBranchRevisions has a reasonably complex query, and it is completely untested. [00:12] I expect that sort of thing from Soyuz, not Code :/ [00:24] poolie: https://pastebin.canonical.com/42398/ [00:27] -> allergy injection [00:29] * henninge eods [00:36] StevenK: Another regression fix for you, if you have a moment: https://code.launchpad.net/~wgrant/launchpad/bug-707808-unlanded-revisions/+merge/47617 [00:38] lifeless, thumper: https://code.launchpad.net/~wgrant/launchpad/bug-707808-unlanded-revisions/+merge/47617 if you please [00:58] StevenK: ack [00:59] * StevenK kicks the merging code [01:04] thumper: https://code.launchpad.net/~deon-wurley/phpldapadmin/1.2.x <- are Remote branches without URLs legal? [01:04] wgrant: yes... [01:04] wgrant: but sad [01:04] WTF [01:04] But OK. [01:04] wgrant: not much point really [01:04] wgrant, hi [01:05] Ursinha: Hi, 'sup? [01:05] wgrant, so, there are bunches of ApacheLogParserError [01:05] Ah, those. [01:05] All fixed, but not deployed yet. [01:05] like this: https://lp-oops.canonical.com/oops.py/?oopsid=OOPS-1851PPA1051 [01:05] Since germanium isn't in nodowntime. [01:05] hmm, I see. [01:06] wgrant, thanks :) [01:07] I hope we can germanium in nodowntime soon :( [01:07] +put [01:08] thumper: Thanks. [01:08] np [01:08] All major regressions fixed, yay. [01:09] Now deploy them? [01:09] Maybe in 8 or so hours. [01:11] Longer, devel will hit testfix [01:14] Heh. [01:14] Oh fuck, you're serious. [01:15] Gnrwgwnfjew [01:15] why [01:15] ? [01:15] That spurious librarian thing. [01:15] oh FFS [01:15] We should kill the current build. [01:15] yes [01:15] +1 [01:15] spm: Can we easily kill a buildbot build? [01:15] yah. which one. [01:15] Where easy is er, not [01:15] If not, could you do it the hard way? :P [01:15] lucid_lp [01:15] where kill == stop. [01:15] spm: 556 on lucid_lp [01:17] I'm off to lunch now; could someone force a build once it's done? [01:18] wgrant: Aye [01:18] spm: Tell me when it stops hurting [01:18] so the lsave is completely ded. how do we clean this up so it won't run that revno again? [01:19] The revno is fine [01:19] The errors are spurious and we don't want to be in testfix [01:19] oh I see. perhaps getting the spurious nature of the errors fixed would be a GoodIdea™? ;-) [01:19] Here, have a Hudson [01:19] That should help [01:20] :-) [01:20] spm: Do you need to restart the master to get the slave back? [01:20] seems to be showing back to me. should be able to force and magic happen [01:21] It's buildbot, it isn't close to magic [01:21] spm: Done, thanks [01:21] anti-magic? [01:21] wgrant: Build forced [02:28] * thumper stabs the librarian [02:28] AttributeError: 'NoneType' object has no attribute 'add_section' [02:28] while trying to set up the librarian in tests [02:28] anyone got any ideas? [02:40] poolie: ok I'm back [02:40] thumper: that means the layer which supplies the config to edit hasn't been setup [02:41] lifeless: which I'm getting why? [02:41] it has only just started happening [02:41] after merging dev [02:41] also [02:41] bin/kill-test-services is borked [02:41] theres a bug on that [02:41] with an explanation; fix should be trivial [02:42] lifeless, i think your mail is fine [02:42] i replied too [02:45] poolie: well, I've just sent my mail [02:46] interesting new reply from max [02:46] lifeless: my librarian layer won't setup [02:47] thumper: Turn LP_PERSISTENT_TEST_SERVICES off. [02:47] It doesn't work any more. [02:47] ah [02:47] poo [02:47] iz bug [02:47] I put work in to make it work [02:47] It's been that way for a week or so now. [02:47] but its not actually tested [02:47] Ah. [02:47] I see. [02:47] I didn't *intend* to break it [02:47] but I couldn't tell that I *had* [02:47] also [02:47] its useless for parallel testing [02:48] I presumed you turned a blind eye to your breakage of it. [02:48] It is, yes. But we have no parallel testing yet. [02:48] so I'm not very interested in maintaining it [02:48] wgrant: we have some, a decent subset of tests work fine in parallel now [02:48] Indeed. [02:48] It is close. [02:48] The main problem now is AppServerLayer. [02:48] indeed [02:48] Plus some directories that need randomising. [02:49] yes [02:49] lifeless: You binged? [02:49] 2.7 is close enough to be done by a maintenance squad, and I think parallel testing is probably almost there too. [02:50] maxb: python-subunit 0.0.6 in the lp ppa [02:50] maxb: part of fixing pqm to mail on conflicts [02:50] maxb: however it looks like the losas can grab from the bzr ppa ok [02:50] wgrant: feature work -> feature queue [02:50] wgrant: if you're looking for something to do, I see a single bug causing 16K oopses a day. [02:51] wgrant: (hint) [02:51] Heh, no, I'm not looking for stuff to do :( [02:51] That's the codebrowse connection closed one? [02:51] lifeless: No fair if it's loggerhead [02:51] wgrant: seriously, yes its close, but I guarantee it will have a long tail [02:51] StevenK: loggerhead probably has more test coverage the LP proper [02:52] wgrant: and the long tail is what makes me want to give it to a feature squad: get it live in buildbot/hudson, working in tarmac, profile it and assess scaling, disk IO utilisation, make recommendations on a new server to run massively parallel tests [02:52] wgrant: figure out the damn db leak bug [02:53] lifeless: I think that figuring that out is probably best achieved by deleting layers and zope.testrunner. [02:53] wgrant: I have handwavy plans for that [02:53] wgrant: its also pretty shallow [02:53] but, time. [02:53] Indeed. [02:53] wgrant: focus will get us places! [02:53] lifeless: I'd prefer if codebrowse actually looked like the rest of LP [02:53] StevenK: so would codebrowse :P [02:54] StevenK: it is themeable [02:54] But I can recall you telling me that's pointless in Dallas [02:54] StevenK: that doesn't sound like me; I think I may have said tha tthat is the least of the issues [02:54] StevenK: Everything is pointless in Dallas. [02:55] StevenK: which != pointless, but I could well have been confusing the issue [02:55] i might try just running tip on a big tree [02:55] and see how it does [02:56] huwshimi: hi [02:57] lifeless: Hey there [02:57] huwshimi: I dunno if you've seen https://dev.launchpad.net/BugTriage yet ? [02:58] lifeless: Uh, no. Are you referring to the bug I just filed? [03:00] huwshimi: yes I am :) [03:00] huwshimi: have a read :) [03:00] lifeless: Yeah thanks. Reading it now. [03:01] huwshimi: its useful context to fit in with the team; the specific thing that alerted me was creating a medium importance bug (we don't use medium) and then starting work on it (thus ignoring the critical and high bugs) [03:01] huwshimi: I think your particular focus means that looking at all critical/high bugs is irrelevant to you [03:02] huwshimi: -but- you probably want to be sorting the design related bugs by the impact/importance you think they have [03:03] huwshimi: anyhow, no drama; we have a massive learning curve and I primarily wanted to let you know about this part of it :) [03:03] lifeless: Sure thanks for letting me know. I'm about to submit a bunch of bugs so I'm glad you told me now :) [03:14] lifeless: There was some discussion last week about triaging bugs ourselves. Was there a decision made about that? Am I better off not self triaging for now? [03:18] huwshimi: self triage is great [03:19] follow the guidelines in the wiki page [03:19] beyond that, if there are disagreements, english is a wonderful tool for figuring em out ;) [03:19] lifeless: OK thanks for that :) [03:31] * thumper grrs [04:41] * thumper is slowly getting there [04:52] lifeless: So I don't think I can do the heavy lifting with SQL. If the recipe has no builds, http://pastebin.ubuntu.com/558848/ returns nothing, and so no daily builds get dispatched [04:52] how do i, through the api, find all bugs assigned to a particular team? [04:52] s/team/person [04:53] StevenK: looking [04:53] StevenK: you probably want either a LEFT OUTER JOIN or a UNION, or a COALESCE [04:53] StevenK: probably a left outer join [04:54] lifeless: Storm doesn't have a LeftOuterJoin? [04:54] hmm... [04:54] how do I emit a tag in a page template of a particular type? [04:55] StevenK: it does, LeftJoin IIRC [04:55] we use it all over [04:55] view/tag is something like "h1", or "span" [04:55] and I want to get

in the pt based on view/tag [04:55] * thumper is open to suggestions [04:55] * StevenK is open to patches, never having done joins through Storm before [04:55] I'm thinking I may need an attribute and structure [04:56] StevenK: grep for LeftJoin [04:56] StevenK: I'm in the middle of a deep n meaningful over loggerhead [04:56] its breaking my brain to think SQL at the same time [04:57] Mission accomplished [04:57] * StevenK hides [05:19] lifeless: If you want to point what I'm doing wrong in http://pastebin.ubuntu.com/558856/ , that would be awesome. I can't find any usage like this in the tree [05:22] * thumper is almost done refactoring widgets [05:25] StevenK: wtf [05:26] Clearly then, the answer is "everything" [05:26] StevenK: the pattern is [05:27] (IIRC) LeftJoin(lefttable, righttable, condition) [05:27] e.g. [05:28] SourcePackageRecipe LEFT OUTER JOIN SourcePackageRecipeBuild on SourcePackageRecipeBuild.recipe == SourcePackageRecipe.id [05:28] -> [05:28] LeftJoin(SourcePackageRecipe, SourcePackageRecipeBuild, SourcePackageRecipeBuild.recipe == SourcePackageRecipe.id) [05:28] the result of that is a table itself [05:28] so to nest you'd do [05:29] LeftJoin(LeftJoin(SourcePackageRecipe, SourcePackageRecipeBuild, SourcePackageRecipeBuild.recipe == SourcePackageRecipe.id), PackageBuild, PackageBuild.id ==SourcePackageRecipeBuild.package_build_id) [05:29] but you don't need a nested left join [05:29] you only need one outer join - at the point you're willing to have a NULL row [05:29] so [05:29] Join(LeftJoin(SourcePackageRecipe, SourcePackageRecipeBuild, SourcePackageRecipeBuild.recipe == SourcePackageRecipe.id), PackageBuild, PackageBuild.id ==SourcePackageRecipeBuild.package_build_id) [05:29] etc [05:30] Join(Join(LeftJoin(SourcePackageRecipe, SourcePackageRecipeBuild, SourcePackageRecipeBuild.recipe == SourcePackageRecipe.id), PackageBuild, PackageBuild.id ==SourcePackageRecipeBuild.package_build_id), BuildFarmJob, BuildFarmJob.id == PackageBuild.build_farm_job_id) [05:30] is your using object [05:30] then in the where clause you put [05:30] Or(BuildFarmJob.id == None, BuildFarmJob.date_created > one_day_ago) [05:35] w00t w00t [05:35] StevenK: does that make sense? [05:35] StevenK: use LP_DEBUG_SQL to see the sql being emitted [05:35] StevenK: and adjust until you have a query you're happy with [05:36] Yes, it made sense [05:36] ok [05:36] did it help ? [05:40] lifeless: Not really, it still doesn't return SPRecipes that haven't built [05:41] I suspect SourcePackageRecipeBuild.id == None will help, since they are created for builds, which then creates the PackageBuild and BFJ [05:43] StevenK: you probably want to bring back the sprb as well [05:44] since that will tell you the when. Or perhaps not. [06:03] * thumper EODs [06:08] * huwshimi waves goodbye [06:21] StevenK: need more help? I have cycles in ~ 10 [06:42] lifeless: Sorry, I was picking up Sarah, and I EOD'd 1.6 hours ago [06:44] kk [06:44] grab me tomorrow [06:45] and we can nut it out [07:12] lifeless: Was my plan === almaisan-away is now known as al-maisan [09:19] good morning [09:22] Guten morgen === al-maisan is now known as almaisan-away === allenap changed the topic of #launchpad-dev to: Launchpad development https:/​/​dev.launchpad.net/​ | PQM is open | firefighting: - | On call reviewer: allenap | https://code.launchpad.net/launchpad-project/+activereviews === salgado-afk is now known as salgado [11:35] Does anyone know what in the sweet hell this is about?: [11:35] An error occurred during a connection to launchpad.dev. [11:35] SSL received a record that exceeded the maximum permissible length. [11:35] (Error code: ssl_error_rx_record_too_long) [11:42] gmb: This is on a fresh LP install? [11:45] wgrant: No, it's an old one. But I did have to re-run rocketfuel-setup recently to correct a couple of pebkacs. [11:47] good morning launchpad [11:47] * gmb re-does the apache config dance [11:49] gmb: Odd. I normally only see that when it's a fresh install needing an Apache restart, or I've broken the vhost config somehow. [11:49] wgrant: Yeah. I think I've borked the vhost config in some myserious way (probably because I have a remote access setup and rocketfuel-setup fought with it). [11:49] Ah. [11:49] I'm redoing it now. [11:50] I find the nicest way to manage a remote access thingy is to let rocketfuel-setup manage the local-launchpad config file, but a2dissite it, copy it, and modify the copy === Ursinha is now known as Ursinha-afk [11:54] maxb: Very wise. I shall do that henceforth. [11:55] Hoorah. It works. [11:58] /me lunches [12:06] Morning, all. [12:08] morning deryck === mrevell is now known as mrevell-lunch [12:20] what's the best thing to use when writing a setup.py these days? setuptools? distribute? [12:24] salgado: what sort of features do you need from it? [12:24] distutils is widely available and seems to work pretty well for basic python modules [12:29] jelmer, our needs seem to be rather basic, so maybe distutils will be enough indeed [12:31] salgado: I copy an existing setup.py :) [12:45] jml, that's always a good strategy [13:01] can someone give me a dumb manager's summary of the codebrowse discussion on the mailing list? === jcsackett changed the topic of #launchpad-dev to: Launchpad development https:/​/​dev.launchpad.net/​ | PQM is open | firefighting: - | On call reviewer: allenap, jcsackett | https://code.launchpad.net/launchpad-project/+activereviews === mrevell-lunch is now known as mrevell === almaisan-away is now known as al-maisan [14:13] benji, your last message to that mp I reviewed yesterday had only some quoted text from my previous message [14:14] salgado: hmm, that's odd. I just said "sorry about that, thanks for asking Curtis to look at it" [14:15] benji, looks like the issue I encountered yesterday; already reported a bug for it [14:16] do you have the bug number at hand? [14:18] jelmer, ping [14:18] jelmer, unping [14:18] hi [14:19] i'm trying to setup launchpad on a virtual machine, but when i try make run i get an importerror (no module named loom.branch) [14:19] anybody got any hints? [14:20] http://pastebin.com/JBn59ekF [14:22] flacoste, now that domain expertise is spread across squads, maybe it would be good to have a list of domain experts, e.g. on the wiki? [14:25] (i can import the branch in python shells, so it should be installed and everything) [14:28] oyv: are you following the instructions on the wiki? [14:29] i use isntructions from here: https://dev.launchpad.net/Getting [14:29] (and the "running" page) [14:30] oyv, ok. what version of ubuntu are you running? [14:32] Ubuntu 10.04.1 LTS (lucid lynx) [14:34] oyv: i encountered this at one point; i am trying to remember how i fixed it. [14:38] oyv: i assume you are not actually using bzr loom? it shouldn't fail in that case, i'm just defining what's going on. [14:39] i don't think i'm using it, not really sure what it is ;) [14:40] it's a plugin for bzr, http://doc.bazaar.canonical.com/plugins/en/loom-plugin.html. many lp developers use it, but last i checked it was not a dependency. [14:41] and if it is, it should have been automatically installed when you hit the update step in the get/run instructions. [14:41] jcsackett: it is a dep [14:41] jml: really? did that change? when i started on lp i was told it wasn't, b/c i was hitting this (or a similar) issue. [14:42] edb@launchpad:~/launchpad/lp-branches/devel$ bzr plugins [14:42] [14:42] loom 2.1.0 Loom is a bzr plugin which adds new commands to manage a loom of patches. [14:42] * jcsackett supposes he could have been deasily misinformed. [14:42] jcsackett: since 2008, I think. it's needed to process looms [14:42] doesn't that mean it's installed? [14:43] either is fine with me; but more importantly it means I need to go pour my coffee right now [14:43] pfft, wrong chan [14:45] jml: dig. clearly i was misinformed. :-P [14:51] oyv: sorry, i'm not sure what might be going on. it does seem that it is installed as a plugin. [14:52] i could be of more help if i had a functioning machine right now, but i'm rebuilding after hardware failure yesterday, so i can't explore much on my end. :-/ === al-maisan is now known as almaisan-away [14:53] ok, thanks anyway :) [14:55] Project devel build (396): FAILURE in 4 hr 38 min: https://hudson.wedontsleep.org/job/devel/396/ [14:55] Launchpad Patch Queue Manager: [r=lifeless, [14:55] stevenk][bug=707741] Fix LibrarianClient.addFile to function under [14:55] SlaveDatabasePolicy. [14:55] oyv: I don't suppose you get a traceback related to that ImportError? [14:56] http://pastebin.com/JBn59ekF [14:56] When you're running "make run", the copy of bzr-loom that is supposed to be being used is the one found via bzrplugins/loom/ in the Launchpad tree [14:56] benji, it's bug 708258 [14:56] <_mup_> Bug #708258: Failed to parse merge proposal comment sent via email < https://launchpad.net/bugs/708258 > [14:56] salgado: thanks === Ursinha is now known as Ursinha-afk [14:58] hmm [14:59] * maxb sobs a bit as launchpad-database-setup tries to configure pg8.2 [14:59] there's only one folder in the bzrplugin folder, lpserve [14:59] Your tree is broken then [14:59] damnit.. [14:59] ;) [14:59] thanks [14:59] abentley: that's what the wiki pages related to the Launchpad in 30 minutes presentation was meant to do [15:00] abentley: https://dev.launchpad.net/Foundations/ComponentHelp [15:00] i think Tim did something similar [15:00] and so did Bugs [15:01] not sure if Soyuz and Translations put their stuff in that format [15:02] flacoste, Ah, didn't think of looking there. [15:02] ah, adeuring, call time. Sorry got distracted running tests and answering emails. [15:02] abentley: it would probably make sense to collate all the content in one place, or at least make an easy-to-find index page to it [15:02] deryck: ok, no problem [15:02] goooooood morning [15:02] flacoste, +1 === Ursinha-afk is now known as Ursinha [15:07] bigjools, can you confirm that we never send emails for successful binary builds? Because there are some tests of Build.notify with BuildStatus.FULLYBUILT. [15:09] abentley: I can confirm that [15:09] that test sounds a bit bong [15:09] bigjools, great. [15:11] henninge, I need 5 minutes to wrap up some notes, and then we can chat. if that works for you. [15:11] deryck: that's fine [15:12] bigjools, since recipe builds want to notify on success, I'm moving the differentiation into BinaryPackageBuild.notify/SourcePackageRecipeBuild.notify, and fixing the tests. [15:12] abentley: you *want* them to notify on success or you're removing that? [15:12] bigjools, I *want* them to notify on success. [15:13] ok [15:13] really? [15:13] sounds odd to be [15:13] me [15:13] will the coalesce? [15:13] they [15:13] james_w, yes. How else do I know that a recipe build has happened? [15:13] prepare for wrath if you do this! [15:13] bigjools, we've been doing this from the start. [15:14] interesting [15:14] because they happen every day? because Launchpad is reliable and does what I ask? [15:14] we really need a better email notification story [15:15] james_w, no, they only happen when the source changes. [15:15] james_w, my bzr recipe triggers once or twice a month. [15:15] are you /really/ going to send people 200 emails every day in the extreme case? [15:15] how is that useful? === salgado is now known as salgado-lunch [15:15] james_w, it's useful so that people know when there's a new build. [15:16] but that information isn't very useful at all as you move towards heavy users [15:16] james_w, if you get it and you don't want it, you can filter it out. If you don't get it and you want it, you have no option. [15:16] do you really want to discourage heavy use of the service using email volume [15:17] gmb: Do you have time to do a sanity check on https://code.launchpad.net/~allenap/launchpad/freedesktop-importance-flip-flop-bug-707478/+merge/47667 please? [15:17] allenap: Sure. [15:17] client-side filtering has been deemed to be not sufficient in the bugs case [15:19] james_w, client-side filtering is better than no mail. [15:19] are you sure your users would agree with that? [15:19] james_w, I am sure you can find users to disagree with anything. [15:20] well, let's all go shopping then [15:22] allenap: Approved. [15:22] james_w, I am just fixing a bug where the wrong kind of notification is sent, not increasing the number of notifications. [15:23] gmb: Thank you :) [15:23] ok, then you are absolved from any responsibility for the system [15:26] james_w, so I've been talking with jelmer about this, and we both agree that there's a lot of room for improvement in the build notification story. [15:27] james_w, jelmer would like to see a notification of successful binary builds, grouped so that you only get one email for multiple builds. [15:28] james_w, and then we would be able to omit the success emails for recipe builds. [15:29] all emails that LP sends should be controllable from a single page [15:29] per person, I mean [15:30] bigjools, I don't know what to think about that. It would be a verrrry long page. [15:30] potentially but not always [15:31] it can be ajaxified but you know what I mean - the subject of being sent machine-generated email is very divisive [15:35] bigjools, generally, asking users to make choices is bad. The fewer choices, the smoother something feels. This makes me think that the need for such a page indicates a design problem. [15:40] allenap: thanks for the review! [15:40] abentley: that's the Gnone way, which is completely disagree with [15:40] bigjools, I'm not disputing the actual need, but I do wonder whether we're not thinking big enough. [15:40] s/is/I/ [15:41] adeuring: did you see my review of your branch? [15:41] anyway, I'm not bikeshedding over this [15:42] henninge: yes, thanks. Actually, I think we should keep the assert() calls in setUp() because it is so esay to cause a mess there, and I am not sure if this would always result in test failures [15:43] adeuring: well, I think that is true for a lot of places in the code, though. [15:44] adeuring: but I don't really mind leaving them in there. [15:44] henninge: OK; perhaps my concerns are related to my unfamiliarity with translations ;) [15:45] adeuring: ... I tried not to be so blunt ;-) [15:45] adeuring: I have done that before, added asserts to make sure I got things right but once it worked, I removed them. [15:45] henninge: ;) but nevertheless, if we can screw the setup just by exchanging two factory calls, I think it is worth to check we don't do that... [15:46] adeuring: that's why I suggested leaving just the one in there. [15:46] and a comment [15:47] "# The order of creation is important here to make sure is_current_ubuntu is set to False. [15:47] self.assertFalse(message.is_current_ubuntu)" [15:48] adeuring: ^ like that, only use the right variable for 'message'. [16:01] gmb: Do you remember why sourceforge.net bug watches are disabled? [16:02] allenap: Because our super-duper sourceforge slurping screenscraping fantastical disaster rather relied on them never, ever changing their templates, and they did. [16:02] (I might have overstated how good that code actually was) [16:09] gmb: Oh yes, thanks. I've just had a look at their templates and they're very much simpler now. Might fix that bad boy. [16:09] allenap: Don't they offer an API now? [16:10] gmb: Do they? Awesome. [16:10] ISTR they do. Maybe I saw that on the ForgePlucker mailing list. === salgado-lunch is now known as salgado [16:14] allenap, jcsackett could you please review https://code.launchpad.net/~abentley/launchpad/build-mail3/+merge/47679 ? It's long, but only because of indentation fixes in a doctest. [16:14] abentley: sure. [16:15] jcsackett, thanks. [16:15] gmb: I can't find an API, but tell me if you happen upon it. [16:16] allenap: Yeah, I'm not able to find one either, disappointingly. Must have been a happy dream. [16:16] I need new dreams. [16:16] Hehe, you do :) === almaisan-away is now known as al-maisan [16:30] deryck, I feel like I lack the domain knowledge to dive into any of the tasks on the Kanban board. Can you suggest one? [16:30] * deryck is looking [16:31] abentley, so I'm sure I lack the domain knowledge too :-) However, the card for bug 696009 seems a nice next step..... [16:31] <_mup_> Bug #696009: Provide ITranslationMessage.shareIfPossible unit tests < https://launchpad.net/bugs/696009 > [16:32] abentley, it's still in the area of test clean up. so will broaden domain knowledge as cleanup happens. [16:33] deryck, okay, I'll tak that. [16:33] s/tak/take. [16:33] ok, cool === deryck is now known as deryck[lunch] [16:35] adeuring: am I to expect a new revision on your branch? Otherwise I'll start landing mine (which includes yours and abentley's) [16:35] actually, I'll make sure to merge the latest first [16:38] henninge: well, I'd prefer to leave the tests as they are... so, no new revesion ;) [16:38] adeuring: ok, thanks. ;) [16:46] abentley: comments on your MP. feel free to answer there or here. [16:46] jcsackett, fire away. [16:47] abentley: as commented on the MP, i'm wondering about the use of self.builds in the from_build method. [16:48] mainly, in the first hit, can that get big enough to cause performance issues? [16:48] jcsackett, it is always one or zero hits. [16:49] abentley: so does the cache help? the cache is always gone at the end of the transaction, yes? [16:50] I broke the build; rolling it back now. Sorry folks. [16:50] jcsackett, the cache will not help with the cases we have at hand, because we only use it once. [16:50] so putting it in cached_property is forward looking? [16:51] jcsackett, but I felt that it made sense to cache it since one does not expect an attribute to be expensive. === al-maisan is now known as almaisan-away [16:51] Oh, nice. I specified --rollback for lp-land but it doesn't seem to have tagged the commit message thus. [16:51] * gmb files a bug. [16:53] jcsackett, you could say putting it in cached_property is forward-looking. [16:53] abentley: so do you see any problem with the use of self.builds being called in there? since no preloading will happen, can hitting self.builds get sufficiently expensive to be a worry? [16:54] i'm not saying it will. i'm not familiar with this part of the codebase, so i'm wondering. === almaisan-away is now known as al-maisan [16:54] jcsackett, I don't know what you're asking. Are you wanting me to run an EXPLAIN on the query? [16:55] abentley: i'm just double checking performance concerns. [16:55] jcsackett, what kind of answer are you looking for? [16:57] abentley: builds is a SQLMultipleJoin; i'm wondering if when self.build gets called we may return a huge rowset in some cases. [16:57] jcsackett, as I said, it's always one or zero results. === al-maisan is now known as almaisan-away [16:57] abenltey, ah! i thought you meant from_build was called one or zero times. [16:57] jcsackett, there is a comment in the code saying it shouldn't be a multiple join. [16:59] abentley: yes, i see now. [16:59] apologies for the confusion. [16:59] jcsackett, cool === Ursinha is now known as Ursinha-lunch [17:01] abentley: r=me. i need follow up, as a mentee :-P. bac, can you follow up on https://code.launchpad.net/~abentley/launchpad/build-mail3/+merge/47679? === jcsackett changed the topic of #launchpad-dev to: Launchpad development https:/​/​dev.launchpad.net/​ | PQM is open | firefighting: - | On call reviewer: allenap, jcsackett* | https://code.launchpad.net/launchpad-project/+activereviews === matsubara is now known as matsubara-lunch [17:14] jcsackett, I've added a comment per your suggestion. [17:19] abentley: thank you. :-) === deryck[lunch] is now known as deryck === beuno is now known as beuno-lunch [17:43] hurrah! Windmill re-enable branch finally made it through. === matsubara-lunch is now known as matsubara [18:26] jml: morning [18:26] deryck: cool [18:26] lifeless: hi [18:27] jml: would you like a catchup call - we seem to be trading 1-liners this week [18:27] lifeless: Yes, I'd like one. Only have 15mins though. [18:28] skype? [18:28] lifeless: sure. === beuno-lunch is now known as beuno [18:46] flacoste: ping [18:59] hi abentley [19:00] bac, hi. [19:00] abentley: i'm trying to mentor jcsackett's review. the diff in the MP is overwhelmed by lint changes. i've tried to get a diff with just the non-lint changes but failed. could you easily whip one up and paste it? [19:02] bac, not sure. I don't think they were separate commits, or anything. [19:04] bac, how's this? http://pastebin.ubuntu.com/559153/ [19:05] abentley: 228 is much better than 1654! :) thanks! [19:08] hello, all! Is there any way of seeing how many people are using a PPA of mine/a package? Or even just seeing how many people have branched from bzr or downloaded a .tar.gz? Thanks! [19:13] aroman: there is an LP API that can get you download statistics for a PPA [19:14] hi thumper [19:14] lifeless: ah, but there's no sort of graphical frontend to see that information? [19:14] not yet [19:14] flacoste: can you talk briefly about your review? [19:14] lifeless: ah, alright, well i'll check that out then. I assume it's python, right? [19:14] flacoste: hi; can I get a small timeslice today ? (but not for ~30) [19:14] thumper: sure, mumble? [19:14] ok [19:14] aroman: its a RESTful api, but we have a python library you can use [19:15] lifeless: how small? [19:15] flacoste: 10 min ? [19:15] lifeless: excellent, thanks! [19:15] flacoste: crossing t's dotting i's on the loggerhead thing [19:27] Yippie, build fixed! [19:27] Project devel build (397): FIXED in 4 hr 31 min: https://hudson.wedontsleep.org/job/devel/397/ [19:27] Launchpad Patch Queue Manager: [r=gary][ui=none][bug=704685] BugSubscription.bug_notification_level [19:27] is now exposed in the devel webservice API. [19:27] yay gmb ! [19:28] oh, boo [19:28] that's hudson [19:28] on buildbot, it is being rolled back :-/ [19:28] oh, why? [19:29] https://lpbuildbot.canonical.com/builders/lucid_lp/builds/561/steps/shell_6/logs/summary [19:30] ah scaling test failed [19:30] I'd like to make those more reliable and isolated [19:30] they are just a little flaky atm [19:30] (different standalone vs in a larger run) [19:31] yeah [19:31] I think they are still a net win [19:31] but it can be frustrating getting them going [19:38] allenap: around? === Ursinha-lunch is now known as Ursinha [19:58] bigjools: got a minute for a question about FTPArchiveHandler? [19:58] jtv: yup [19:59] great [19:59] I was just wondering… it uses the LoopTuner all over the place already, but only for gc. [19:59] Was it originally meant to commit transactions as well? [20:00] no, it's writing out files for apt-ftparchive to use [20:01] Okay, but it's not writing to the DB that I can see and it's a big suspect in the long-transaction crime mystery I'm pursuing. [20:02] So it'd seem that a few commits would be a relatively harmless way to get around the timeouts. [20:03] I was thinking perhaps I could force the main body of work on the slave store, and throw in a few commits. [20:03] * bigjools thinks [20:03] that code is probably the bit I understand the least in soyuz [20:03] it's never gone wrong since cprov left so I never had to look at it :) [20:05] jtv: I'm a bit scared of using the slave store in case of replication delays [20:06] Scared is sensible. But what's there to replicate? [20:06] all the publishing data it relies on, which is fast-changing [20:07] I see. [20:08] jtv: it would be a good start to work out if it really is keeping a transaction open [20:08] if it's not writing to the db can we change the $mumble [20:08] It has to be. It's not committing anywhere, and it's looping over lots of packages AFAICS. [20:08] The $mumble? [20:09] yeah, the thing [20:09] Oh, the thing. [20:09] The database policy? [20:09] isloation? [20:09] isolation ,even [20:09] Isolation level? [20:09] dunno if that would affect it [20:09] That would affect reads, not too much to do with writes. [20:09] can we connection r/o to the master? [20:09] sigh [20:10] connect* [20:10] I don't think we can, no. [20:10] Something we could try though is run a test with a slave-only policy, and see what fails. [20:11] ok [20:11] I see why you suggested the slave now [20:11] lifeless: Hi. [20:11] lifeless: Is this about the BugMessage crack? [20:11] bigjools: Well it also helps scale-out and locking, of course. [20:11] yes [20:12] allenap: though I'm not sure its crack L) [20:12] bigjools: unless there's something that changes the database and then immediately runs the script, we should get about as consistent a view of the database from a slave as we do from the master—just slightly older. [20:13] I'm reading the code now to try and remind myself of the weirdnesses. [20:13] allenap: so [20:13] we decorate Message to make it an IndexedMessage but the actual relation is BugMessage which we don't expose at all [20:13] thumper: i'm back [20:14] thats beside the point though [20:14] bigjools: "the thing"—https://www.ubersoft.net/comic/hd/2011/01/next-time-try-index-cards [20:14] flacoste: want to continue? [20:14] thumper: sure thing [20:14] we can still decorate but get the indices from the db once they are cached [20:15] jtv: the thing I am scared of is writing out inconsistent indices in the Ubuntu archive. That would be fairly nasty. [20:15] now while I am scared I am not sure how realistic the chances of that are [20:16] In that case, we could try moving to the slave and _not_ committing. [20:17] If that works out we'd still get a single huge transaction, but also a guarantee that it takes no write locks. [20:17] lifeless: Yeah, that sounds sane. [20:17] jtv: well publish-distro does do commits [20:17] Yes, just not in its huge loops. [20:17] after a-f runs [20:17] it needs to mark stuff as publishe [20:17] d [20:18] allenap: the goal is to be able to do a range query on BugMessage and then get just the Messages we want [20:18] bigjools: I guess that happens all the way at the end? [20:18] allenap: which will also let us move to ajax population of pages like bug 1 [20:18] <_mup_> Bug #1: Microsoft has a majority market share lifeless: Judging by Bug._indexed_messages, that should work well. [20:19] allenap: indeed; I wrote Bug._indexed_messages in october or so as part of optimising within the prior schema [20:19] lifeless: Like load-on-scroll? [20:19] lifeless: Yeah, it looked like one of yours :) [20:19] allenap: hah! I'll take that as a complement. [20:19] allenap: and yes, load on scroll [20:19] bigjools: if we did the actual work on the slave but the marking-as-published on the master, we'd risk marking a slightly newer version as published than we actually published. Is that a problem? [20:19] Awesome. [20:20] allenap: should be easy if we can get one message + adjacent actions pretty cheaply. [20:20] jtv: there's 4 steps in the publisher: 1) write files to pool, 2) domination, 3) generate files for a-f and run it, 4) write release files [20:20] jtv: big problem ,yes [20:20] lifeless: Adjacent actions? [20:21] bigjools: though strictly speaking, I would expect that that problem already exists to about the same extent [20:21] allenap: look at a BugTask:+index page [20:21] allenap: it shows action action message action message etc interleaved [20:21] allenap: it pretends there is one sequence [20:22] jtv: actually, the publishing record would only get written by this process anyway at this point in its lifecycle [20:22] lifeless: Ah, yes, I wrote a lot of that ;) I was having an association fail on the word "actions". [20:22] Well, not a lot, some. [20:22] bigjools: assuming there's no overlapping runs, I guess it'd only be a problem if there were a few last changes while the script was running, and then nothing before the next run (so it'd think it was already published or something). [20:23] lifeless: Are you going to work on this, or are you softening me up to do it? [20:23] lifeless: i'm free whenever you are [20:23] allenap: I'm helping on the oops/performance aspect [20:24] allenap: future stuff will be feature cards I suspect, unless you were to have a sudden fit of zomg I want to do this [20:24] flacoste: voice? [20:24] lifeless: sure [20:24] lifeless: Okay, so you'll get the index into the database, and the get-adjacent-actions and load-on-scroll stuff is up to others? [20:25] lifeless: skype me [20:25] allenap: yeah, I don't have long enough timeslices to do larger stuff [20:26] lifeless: That's cool, I just wanted to be sure what's expected of me. [20:27] be excellent [20:27] thats it [20:28] Heh :) [20:30] jtv: so it marks everything published before a-f runs [20:30] there's the long transaction [20:30] bigjools: it may help my understanding if I see what that entails… do you happen to know where that is done? [20:31] jtv: look at lib/lp/archivepublisher/publishing.py [20:31] * jtv looks at lib/lp/archivepublisher/publishing.py [20:31] and scripts/publish-distro.py which is what calls it [20:32] C_doFTPArchive is for Ubuntu, C_writeIndexes is for PPAs [20:32] Ah, that helps [20:32] So the marking-as-published… what do I look for? [20:32] ArchivePublisherBase.publish() [20:33] follow it down to there from distroseries.publish() [20:33] ah cool [20:33] But I think the script commits soon after that happens anyway. [20:33] the latter being called from A_publish() [20:34] The problem is in C_doFTPArchive [20:34] (I think) [20:34] yes [20:34] each stage is wrapped it try_and_commit() [20:34] in* [20:37] ahh got it: setPublished [20:38] And we want to publish a state that's no older than that datepublished timestamp, right? [20:40] jtv: so my concern is that we only see half of the records that were just set published if C_doFTPArchive is using a different store [20:41] or maybe even none [20:41] but that's extreme [20:43] I wonder if we could set "now minus replication lag" as the publication date. [20:43] hi deryck, you still around? [20:43] the date is not a concern [20:44] it's the inconsistency [20:45] bigjools: would that change though if we otherwise ran everything on the slave store? [20:45] because what can happen is that we write the pool file, miss it in a-f and then write the release file with it in [20:45] that would be quite disastrous [20:45] What's a-f stand for by the way? [20:45] Apt FTPArchive [20:46] in that case we'd end up with checksums that are incorrect [20:46] You're saying the whole pool file might be lost? Or an individual package that's in there? [20:46] no [20:47] a-f writes the indexes - with replication lag it could miss something [20:47] then if that something replicates before we get to stage "D" where it writes the Release file, the Release file will be wrong compared to the index [20:48] I think [20:48] Well if it's that complicated I probably can't afford to mess with it anyway. :/ [20:48] the publisher is *the* most critical part of soyuz, if it goes wrong we can do a lot of damage [20:49] which is why I am rather conservative in this area :) [20:49] If we could move the whole thing minus something small over to the slave, we'd have a consistent view (just a slightly older one) and no worries. But if there's any sort of read-modify interaction it gets riskier. [20:49] yeah [20:50] we should go through it sometime and record all the interactions [20:53] bigjools: would it make sense to do a trial run with a slave-only policy and setPublished disabled? [20:54] I dont think we have any tests that do everything at once like that, so we'd need to check the archive state manually [20:55] Meaning we'd probably miss something? [20:57] possibly but if you point an apt client at the archive it would soon belly-ache [20:58] we should get wgrant to comment on this [21:04] thumper: StevenK: we having standup? [21:05] thumper: i'm talking, jsut a sec [21:05] thumper: you ok, my sound backend bad [21:05] fixing [21:05] wallyworld_, StevenK: FYI https://code.launchpad.net/~thumper/launchpad/refactor-lazrjs-text-widgets/+merge/47634 [21:05] thumper: I can hear you too [21:17] thumper: https://code.edge.launchpad.net/~wallyworld/launchpad/recipe-find-related-branches/+merge/47367 === salgado is now known as salgado-afk [21:30] Ugh, so my laptop has decided that when I'm logged in I should only be able to type numbers :( [21:30] Yet, you're typing not numbers. :-) [21:31] StevenK: I am using my netbook to whinge about it [21:31] Heh [21:33] thumper: https://pastebin.canonical.com/42451/ [21:34] bigjools: Hi [21:34] morning wgrant [21:37] bigjools: This whole discussion makes me cry. [21:39] Why do we want to move it to the slave? [21:39] wgrant: no doubt [21:39] huwshimi: numlock? [21:39] benji: Thanks, I just figured that out right then. [21:39] We *could* do it, and things would remain consistent, but some things would show as published when they weren't. [21:39] thumper: I've been trying! [21:39] wgrant: jtv has been investigating but he wants to get rid of a long-running transaction [21:39] And the latency would be two hours. [21:40] benji: They could have at least labelled it :) [21:40] wgrant: yes that's exactly my concern in addition to incorrect indices vs release files [21:41] bigjools: We may have extra stuff on disk, but we may also be removing stuff from disk later that is still referenced by the indices. [21:41] So no, we cannot sensibly use the slave. [21:41] However, yes, we can remove the long-running transactions by making the publisher not take 300 years. [21:41] But that is a fair bit of work. [21:41] we could add a ro connection to the master [21:41] about 300 years [21:42] lifeless: Is it only R/W transactions that are the problem? [21:42] wgrant: no [21:42] That's what I thought. [21:42] wgrant: but they are a bigger problem than r/o transactions [21:42] r/o transactions prevent index and row gc (mvcc chaff) [21:43] r/w transactions cause related row exclusive locks [21:43] Right. [21:43] but its the actual *changes* in r/w transactions that matter [21:43] * bigjools gets tests working with a version column of type debversion \\o/ [21:43] length is just a (poor) proxy for predicting problems. [21:43] There should be none. But it would be nice to verify that. [21:43] bigjools: Yay! [21:44] now, to optimise the queries [21:44] thumper: So I use Mumble on the old laptop the audio is choppy and I can talk, if I use the new laptop the audio is excellent and I can't talk [21:45] heh [21:45] wgrant: btw, the 2-builds-per-builder happened again today [21:45] bigjools: Same builder? [21:45] no, after talking to lamont it turns out that it happens when he kills a long-running stuck build [21:45] I noticed 6 or so builders earlier disabled with strange messages. [21:46] But they were all happy once I re-enabled them. [21:46] because we rely on aborting nonvirt builders [21:46] flacoste: poolie: draft sent to you [21:46] so I am going to change to code disable the builder it we see it aborting [21:46] lifeless: Do you know what's going on with sinzui's mailing list QA? [21:47] bigjools: Why? [21:47] abort does not work on nonvirts [21:47] ABORTING will drop to ABORTED, modulo a slave bug which doesn't properly kill sbuild because of a missing trap. [21:47] exactly [21:47] so until that's fixed... [21:47] wgrant: yes [21:47] wgrant: can I brain dump and have you chase? [21:47] lifeless: That is what I was hoping for. [21:48] wgrant: ok so lp prod configs was broken for qastaging mailman [21:48] the qa port is 9097 [21:48] bigjools: Until that's fixed, buildd-manager will ignore the builder and there is a nagios check to tell lamont to fix it. [21:48] Right. [21:48] it said 8097 [21:48] there is a branch to fix that [21:48] I believe that got reviewed and landed. [21:48] so if it has [21:48] It should be good? [21:49] then it should be deployed and the mailman log should be calling into the api with the newer code [21:49] It doesn't seem to be landed. I might fix that. [21:49] remaining steps [21:49] I've only run mailman locally a couple of times, so I'm not 100% on how exactly it works, but I guess I'll work it out. [21:51] lifeless: qastaging should automatically update configs with its usual update, right? [21:51] wgrant: yes [21:51] wgrant: so, 1) get the right port out [21:52] 2) make sure its querying members (via the logs) [21:52] 3) profit [21:52] Great, thanks. [21:52] If all goes well, we can deploy in a couple of hours :) [21:53] lifeless: So, the query Storm is creating is: SELECT SourcePackageRecipe.build_daily, SourcePackageRecipe.daily_build_archive, SourcePackageRecipe.date_created, SourcePackageRecipe.date_last_modified, SourcePackageRecipe.description, SourcePackageRecipe.id, SourcePackageRecipe.is_stale, SourcePackageRecipe.name, SourcePackageRecipe.owner, SourcePackageRecipe.registrant FROM SourcePackageRecipe LEFT JOIN SourcePackageRecipeBuild ON SourcePac [21:53] Hm. That was a bit longer than it looked in the terminal [21:54] bigjools: We haven't purged non-main indices for existing PPAs yet, have we? [21:54] wgrant: new ppas won't have them at all [21:54] I know. [21:54] It broke a few things :) [21:54] But they're all fixed now. [21:55] But we were also going to manually remove the old ones. [21:56] lifeless: replied [21:57] thanks [21:57] poolie: hi [21:58] wgrant: no [21:58] wgrant: go ahead and clean up, then we can re-enable the daily job [22:40] lifeless: O hai. I have added a recipe with no builds and dropped the query and it returns 0 rows [22:40] StevenK: you need to paste the query without getting cut off :) [22:42] lifeless: http://pastebin.ubuntu.com/559270/ ; I've taken what LP_DEBUG_SQL gave me, replaced the %s's and dropped most of SourcePackageRecipe's columns from the SELECT [22:43] hi wgrant! [22:43] If I drop the extra JOINs, it returns a row [22:44] Morning jtv. [22:45] wgrant: bigjools & I were looking at the long transactions in the archive publisher earlier. [22:45] jtv: So I saw. [22:45] You're everywhere. [22:45] We cannot sanely move it onto a slave. [22:45] I was wondering whether we could move essentially the whole process over to a slave without risking inconsistencies. [22:45] We need to make it take less insanely long. [22:45] Ah. [22:46] That answers that. [22:46] jtv: wgrant made some more scary points that I'd not mentioned in addition to my scary ones [22:46] Oh good [22:46] that sounds like fun. [22:46] Making it take less insanely long looks somewhat doable, but I'll need a way to test. [22:46] It's FTPArchiveHandler.run that takes the bulk of the time, right? [22:47] There are two things that take forever: file list generation, and a-f itself. [22:47] The latter can be parallelised. [22:47] StevenK: I'm fidddling [22:47] wgrant: I'd like to see a-f and NMAF in a race [22:47] StevenK: trivially doing left joins all over works, but that can get inefficient [22:48] wgrant: and the former looks like it may be a typical naïve-fetch-inside-loop pattern. [22:48] I think it would be close [22:48] It's _almost_ a simple prejoin but there's an interaction with slicing to watch out for. [22:48] bigjools: a-f is hundreds of times slower. [22:48] Er. [22:48] NMAF is. [22:48] NMAF is *slower*? [22:48] jtv: Possibly. But the queries themselves are bad. [22:48] StevenK: Yes, it issues many many many more queries. [22:49] wgrant: the pattern, or some of the individual ones? If you've got a list of known troublemakers, that'd help me see. [22:50] jtv: I don't recall exactly. But I believe it's fairly efficient query-count-wise, but the queries are not quick. [22:50] It's been more than a year since I seriously tried optimising this phase of the publisher. [22:50] I got a *long* way, but it was sort of full of hacks. [22:51] Down to 3-4 seconds for each index file, on NMAF, for 1.5x primary's size. [22:51] cprov also optimised the a-f file list code a lot just before he left. [22:51] My may suspect based on following the arrows in the code was FTPArchiveHandler.publishFileLists. [22:52] That contains some loops over all source/binary packages, I believe. [22:52] Yes, but that uses the views. [22:52] which are evil [22:52] * jtv crosses himself [22:52] It should not do any queries. [22:52] Luckily I made merit by visiting the temple: saw a working Difference Engine 2 yesterday. [22:52] * bigjools looks for garlic and silver [22:52] Until right at the end when I do the disabled chekc. [22:52] (which is only once per pocket-das) [22:53] What's a pocket-das? [22:53] (PackagePublishingPocket, DistroArchSeries) [22:53] hardy-i386-RELEASE [22:53] For instance [22:54] How many cores does cocoplum have? [22:54] StevenK, wgrant: I am putting my debversion stuff on DF, please to not be touching for a bit [22:54] bigjools: k === matsubara is now known as matsubara-afk [22:54] unless you guys are using it in which case I can wait [22:55] wgrant: 4 [22:55] :( [22:55] StevenK: SourcePackageRecipeBuild JOIN PackageBuild ON PackageBuild.id = SourcePackageRecipeBuild.package_build JOIN BuildFarmJob ON BuildFarmJob.id = PackageBuild.build_farm_job right join SourcePackageRecipe ON SourcePackageRecipeBuild.recipe = SourcePackageRecipe.id [22:55] StevenK: put that in from 'FROM' ... 'WHERE' [22:56] wgrant: it looked to me as if, unless we mess with the code's structure which I'm reluctant to do, it should do a bunch of batched prefetch queries in each iteration of its loop-tuner callback. If it's using views, maybe that's worth eliminating as well. [22:56] jtv: Because it's using views, it's already entirely prefetched. [22:56] All you can do is optimise the view. [22:56] Actually, prefetching may be what's slowing it down then. [22:57] However, file list generation is only a couple of minutes. [22:57] Running a-f takes 15 or so. [22:57] lifeless: That looks good. How do I tell Storm that? [22:57] RightJoin [22:57] Instead of left? [22:57] StevenK: works like LeftJoin in terms of function calls [22:57] StevenK: yes; same parameters, same translation [22:58] StevenK: play with it a little to get the invocation you need [22:58] I'm just checking the query plan on staging [22:58] 96ms [22:59] so fast [23:00] StevenK: you'll also want DISTINCT, because you only want one row per sourcepackagerecipe [23:00] StevenK: as written, on staging, we can get [23:00] id | name | owner [23:00] ----+-------------+--------- [23:00] 15 | awn-testing | 1382524 [23:00] (3 rows) [23:04] lifeless: I'm still distilling your query into Storm. *Then* I'll worry about distinct [23:04] StevenK: I'd do it for you, but known how this works will help you [23:06] wgrant: drat. So to speed the code up significantly we'd have to run parallel apt-ftparchive instances on separate file lists? [23:07] jtv: Yes. We already have separate file lists, so it's fairly easy. [23:07] lifeless: Storm has SELECT ... FROM SourcePackageRecipe; whereas your query has SELECT ... FROM SourcePackageRecipeBuild; is that pertinent? [23:07] wgrant: and then… just concatenate the outputs? [23:08] jtv: No. Each file list results in one index. [23:08] If you look at the log, you'll see it generates one index for (maverick-release, source, main), another for (maverick-release, i386, universe), etc. [23:09] Each of those is big. [23:09] And each of those can be done in parallel. [23:09] StevenK: yes [23:09] StevenK: or probably yes, show me thje FROM..WHERE bit ? [23:10] lifeless: http://pastebin.ubuntu.com/559277/ [23:11] wgrant: ah so parallel runs per architecture, plus one for source, and they should all overlap fairly well. [23:11] jtv: Yes. [23:11] The hard part would probably managing the parallel processes. [23:11] Yes. And that's not terribly hard. [23:12] In the run I'm looking at, running a-f takes 16 minutes. File list generation takes 1.5. [23:30] lifeless: I suspect you got distracted? [23:37] StevenK: born distracted [23:37] Duh [23:42] StevenK: ok [23:42] so that thing you pasted means [23:43] 'select the things joined together where there are 0 or many sourcepackagerecipes' [23:43] what you want is [23:43] 'select source package recipes where there are 0 or many (things joined together)' [23:43] right joins allow NULL on the left hand side [23:43] lifeless, spm, as a specific next step on bug 701329, can we get haproxy logs? [23:43] <_mup_> Bug #701329: error: [Errno 32] Broken pipe < https://launchpad.net/bugs/701329 > [23:44] that is, they include 'every row on the right hand side' [23:44] poolie: can we not just catch the exception ? [23:44] or, try to titrate them to a level where they tell us about interesting errors without flooding the system [23:44] poolie: what are you trying to determine [23:44] why haproxy closed the connection [23:44] and from which ip it came [23:45] poolie: can you back it out a step higher [23:45] I don't understand why this isn't a fairly direct code change [23:47] so, to close this particular bug, we can, say [23:47] just make it not oops when the connection is abruptly closed? [23:47] perhaps it should log a warning-type error [23:47] that seems pretty easy [23:47] none of our other servers log on this [23:47] *or* [23:47] it makes me wonder if we are papering over a problem [23:47] none of our other servers are experiencing it [23:47] so i'd like to see a bit more data about why the connection is being dropped [23:48] haproxy is the obvious place to look for that [23:48] and, i think it is a bit weird to have a production service where logging is entirely disabled [23:48] (perhaps "it's weird" is not a reason to change it but i think it's reasonable to ask) [23:48] I thought you were told that it was a volume problem ? [23:49] haproxy being a bit of a firehose [23:49] anyhow [23:49] if you're working on this bug [23:49] thats cool [23:49] lifeless, i'd guess that zope etc don't traceback on epipe, and i agree that in general loggerhead shouldn't [23:49] uhm, what can I do to hepl. [23:49] firstly, have you looked at some of the OOPSes? [23:49] yes [23:49] i'd like to turn logging >=WARNING [23:49] and see if that is in fact a firehose [23:50] if it is, that's a bit alarming, and i'd like to just see what it's logging [23:50] then we can turn it off again [23:50] is that ok? [23:50] I've no objections in principle, though the losas generally make informed decisions about this sort of thing :) [23:51] spm, do you mind trying that? [23:51] we start with "no" and negotiate from there. [23:51] sigh [23:51] poolie: nope, worth a shot. :-) [23:51] thanks [23:51] * StevenK was waiting for "Depends. Do you have cake?" [23:51] the haproxy manual says you can set the log leevl [23:52] StevenK: it's poolie. he lives closer than wgrant, and rides a motorbike ie Hells Angel in training. I've learnt to be nice to him. [23:52] poolie: so [23:52] poolie: looking at https://lp-oops.canonical.com/oops.py/?oopsid=1836CBA1 [23:52] poolie: there are several things that make this hard to analyze [23:53] poolie: if I was working on it, I would fix those first [23:53] poolie: firstly there is no total time recorded [23:53] * StevenK suspects that poolie is shinier, due to lifeless completly context switching and goes afk for ten or so [23:53] poolie: nor any timeline (which we could use to track xmlrpc / bzr call overhead if we wanted to) [23:53] i've learned to front-load anything that may need SA action [23:53] StevenK: sorry, I thought I answered you? [23:54] StevenK: @10:44 [23:54] poolie: if we had timing data, we could assess whether this is really a timeout or not [23:55] poolie: in fact, we could add a thing to say 'if the time is > some N, convert to a timeout' [23:55] yup [23:55] we probably need to expose an API call to ask for the timeout [23:55] but if haproxy is saying "request blah blah timeout, killed" [23:55] e.g. 'what should my timeout be please?' [23:55] that seems easier [23:56] poolie: I worry that we're going to need a lot of correlation to match up all 15K a day [23:56] poolie: gathering data at source seems more useful (to me) [23:56] poolie: anyhow, thats just what I'd do if I was working on it [23:57] however [23:57] note that haproxy probably won't tell us [23:57] if this is coming in behind haproxy [23:57] spm: how often does nagios query ? [23:58] i think it's up to 3 times every 10 minutes [23:58] lifeless, i don't want to match up all of them, i just want to know if haproxy thinks its client is dropping the connection, or if it is dropping the connection itself [23:58] or if it's oblivious [23:58] it shouldn't be too hard to answer [23:58] sure [23:59] but probably easier to look in the logs and count [23:59] I am open to many ways to figure out whats up [23:59] * thumper is busy writing real documentation