[05:16] <haobug> hi there,
[05:17] <haobug> i am trying the make launchpad to run on my own server. but i didn't found a step by step guide to do so. can any on help me out ?
[05:21] <wgrant> haobug: https://dev.launchpad.net/Running is just about all there is.
[05:24] <haobug> wgrant:does bzr --no-plugins line download from remote again. i have already downloaded the source. can i make use of that?
[05:28] <wgrant> haobug: To avoid redownloading it, run 'mkdir ~/launchpad; bzr init-repo ~/launchpad/lp-branches; bzr branch /PATH/TO/YOUR/LOCAL/BRANCH ~/launchpad/lp-branches/temp', then run ~/launchpad/lp-branches/temp/utilities/rocketfuel-setup and follow the instructions from 'Building' on that page.
[05:34] <haobug> wgrant: i know know so much about bazzer, it seems 'bzr branch/checkout' are making a copy of the original directory. that looks strange.
[05:35] <haobug> sorry, i don't know so much about bazzer.
[05:37] <lifeless> What is bazzer ?
[05:38] <haobug> lifeless: sorry again, i mean Bazaar. miss-typed.
[05:39] <wgrant> haobug: That's right. I suggested that so it could pre-seed ~/launchpad/lp-branches with the data you downloaded earlier, to minimise the volume that rocketfuel-setup must download.
[05:49] <haobug> i have another question about Karma, what mathematics modal are based on the build that? it is the most intention of me to try out the launchpad all in one system.
[05:55] <haobug> let me make my question clean and short(for my poor English skill): what mathematics modal Karma point are based on?
[11:52] <cjwatson> As I understand the PostgreSQL docs, if I have "CONSTRAINT livefs__owner__distroseries__name__key UNIQUE (owner, distroseries, name)", I don't also need to do "CREATE INDEX livefs__owner__distroseries__name__idx ON LiveFS (owner, distroseries, name);" - is that right?
[11:52] <cjwatson> (http://www.postgresql.org/docs/9.1/static/indexes-unique.html)
[11:53] <wgrant> cjwatson: You shouldn't need to, no.
[11:54] <wgrant> The alternative is to "CREATE UNIQUE INDEX livefs__owner__distroseries__name__key ON LiveFS (owner, distroseries, name);" instead of using CONSTRAINT in the table definition.
[11:54] <wgrant> But it makes no difference for a new, empty table.
[11:56] <cjwatson> OK, thanks.  I'm really not very experienced at all with this.  Does http://paste.ubuntu.com/7351944/ look remotely sensible?
[11:56] <cjwatson> I don't know whether indices of lots of columns for this sort of thing (updated maybe tens of times per day) are useful
[11:57] <cjwatson> LiveFS.builds actually hits a couple more columns for ordering, which I don't know whether/how I should include
[12:02] <wgrant> It totally depends on the sorts of queries you're doing. Indices are pretty much just B-trees, so they have the performance characteristics you'd expect: early index columns should approximate the common query filters, and the query's sort columns must be a suffix of the index's columns.
[12:02] <wgrant> Do you have some example queries>?
[12:06] <cjwatson> http://bazaar.launchpad.net/~cjwatson/launchpad/livefs/view/head:/lib/lp/soyuz/model/livefs.py lines 97-146
[12:07] <wgrant> +-- LiveFS.getMedianBuildDuration
[12:07] <wgrant> +CREATE INDEX livefsbuild__livefs__das__finished__idx
[12:07] <wgrant> +    ON LiveFSBuild (livefs, distroarchseries, date_finished);
[12:07] <wgrant> That's usable, but would be more efficient with a DESC on the date_finished
[12:09] <cjwatson> Probably right in general, although the current code (which I cloned-and-hacked from sourcepackagerecipebuild) will not make effective use of that
[12:09] <cjwatson> http://bazaar.launchpad.net/~cjwatson/launchpad/livefs/view/head:/lib/lp/soyuz/model/livefsbuild.py#L248
[12:09] <wgrant> Ah, indeed, no DAS
[12:09] <wgrant> oh
[12:10] <wgrant> doesn't use last_completed_build?
[12:10] <wgrant> Er
[12:10] <wgrant> Does SPRB really do that?
[12:10] <cjwatson> You think it should just use the last one as an estimate?  That doesn't seem very robust in cases where we have builders of different speeds
[12:10] <cjwatson> It does
[12:10] <cjwatson> I assume that should be done with a postgres expression ...
[12:11] <wgrant> I don't even
[12:11] <wgrant> It shouldn't be done at all
[12:11] <wgrant> That's an unbounded set.
[12:11] <wgrant> At least take the last ten or something.
[12:12] <wgrant> Doing it in postgres (or just pulling back date_finished and date_started, rather than instantiating entire objects) would be more efficient, but it's still pointlessly inefficient.
[12:12] <wgrant> That first build 5 years ago isn't very interesting.
[12:12] <wgrant> The median of the last dozen might be.
[12:13] <cjwatson> mm, right.  can I slice in the usual way after doing an order_by?
[12:14] <wgrant> Yeah
[12:15] <wgrant> I'd store.find((LiveFSBuild.date_finished, LiveFSBuild.date_started), blah blah blah).order_by(Desc(LiveFSBuild.date_finished))[:10] or so
[12:15] <cjwatson> Right, makes sense
[12:17] <wgrant> Just pull back ten sets of two datetimes, rather than instantiating 3462 full SourcePackageRecipeBuilds just to pull those two fields out.
[12:25] <wgrant> Oh that's right, I was meant to be looking for sane indices, not crying over bad existing code.
[12:26] <wgrant> last_completed_build (if it should exist in the first place) wants (livefs, date_finished DESC). builds wants (livefs, GREATEST(date_started, date_finished) DESC, date_created DESC, id DESC). The requestBuild one is fine.
[12:27] <cjwatson> I was thinking of using last_completed_build from cdimage, but maybe that's wrong and it should be looking up a particular build
[12:27] <wgrant> getMedianBuildDuration post-rewrite wants (livefs, distroarchseries, date_finished DESC) WHERE date_finished IS NOT NULL
[12:27] <wgrant> last_completed_build doesn't know about DASes.
[12:27] <wgrant> So I don't think it's useful for cdimage.
[12:28] <cjwatson> A reasonable point
[12:28] <wgrant> I try.
[12:28] <cjwatson> Why wouldn't builds want archive in there too?
[12:29] <cjwatson> (Oh, in fact I think I came up with last_completed_build before it occurred to me that it might be clever to add version to LiveFSBuild ...)
[12:31] <wgrant> Because we're not filtering to a small subset of the archives, so adding the column to the index would serve mostly to make it useless for the sort. If I have an index on foo(bar, id) and do SELECT * FROM foo WHERE bar = 1 ORDER BY id LIMIT 10, the DB can just seek to bar in the index and read the first ten entries.
[12:32] <wgrant> If I have foo(bar, baz, id) and do SELECT * FROM foo WHERE bar = 1 AND baz != 2 ORDER BY id LIMIT 10, the DB has to seek to bar, then read through the index for every value of baz that isn't 2, compile a list of all of the rows, and then sort in memory to find the latest ten.
[12:33] <cjwatson> We are filtering to a small subset, namely one ...
[12:33] <wgrant>             LiveFSBuild.archive_id == Archive.id,
[12:33] <wgrant>             Archive._enabled == True,
[12:33] <wgrant> That's more than one.
[12:33] <cjwatson> Oh, er, idiot
[12:33] <cjwatson> Yes
[12:33] <cjwatson> OK, that makes sense then
[12:33] <wgrant> Though you probably do want a method that filters by archive.
[12:34] <wgrant> In that case, you'd always want one with (livefs, archive, SORT) instead of just (livefs, SORT)
[12:34] <cjwatson> I'm not sure where such a method would actually be exposed.
[12:34] <wgrant> But for the general case it's far more efficient to read the first maybe 20 rows until you find ten that have valid archives, rather than having to read every row for the interesting archives.
[12:34] <wgrant> A good question.
[12:35] <wgrant> LiveFS.getBuilds, probably. I'm not sure the builds property is actually valuable.
[12:35] <wgrant> getBuilds can take an optional archive, das, pocket, etc.
[12:35] <cjwatson> It's an API convenience, I think.
[12:36] <cjwatson> cdimage will probably just be looking at the one it gets back from requestBuild.
[12:37] <cjwatson> I haven't quite decided what LiveFS:+builds should look like yet
[12:37] <cjwatson> It could perhaps reasonably display .builds[:10] or so
[12:38] <cjwatson> Or just batchnav the whole of .builds
[12:38] <cjwatson> We'll certainly want to be able to go back and look at old logs, but I don't know that the effort of adding fancy filtering forms and the like would be worthwhile
[12:41] <cjwatson> Maybe by status and architecture, like DS:+builds, I guess
[12:41] <cjwatson> I guess I can fill in indices needed for browser code in a -1 DB patch
[12:44] <wgrant> Right, and we can apply indices live
[12:44] <wgrant> So you could do them in 40 separate patches and it wouldn't be a major stab-worthy inconvenience.
[12:49] <cjwatson> OK, so I'll do the file URLs API method, nuke last_completed_build, do the reasonably clear parts of the indices above, and then I think I should have something worth reviewing.  I plan to defer the garbo job to a subsequent branch since it isn't actually vital until we start doing a non-trivial number of builds.
[12:49] <wgrant> Yep
[12:49] <wgrant> And this branch is already quite large enough
[12:50] <cjwatson> Indeed
[12:54] <cjwatson> Hm, I wonder if I need to do some archiveuploader work here, since there's no .changes file
[12:58] <wgrant> cjwatson: Not archiveuploader, but something similar, yeah.
[12:59] <wgrant> This isn't a package upload, so there's no reason to involve archiveuploader.
[12:59] <wgrant> But we need to do the librarian upload asynchronously (as we should with TTBJs, but we sorta get away with it now because they're so rare).
[13:11] <cjwatson> Doesn't it need to go through process-upload --builds?
[13:11] <cjwatson> Which is basically a shim over archiveuploader ...
[13:12] <wgrant> Why does it need to do that?
[13:12] <wgrant> process-upload is used to process package uploads.
[13:12] <cjwatson> I thought that was responsible for pulling everything off the buildds
[13:13] <cjwatson> Or rather dealing with all the stuff the buildds have pushed to builddmaster
[13:13] <wgrant> It's up to the BFJB
[13:14] <wgrant> The base implementation of BFJB._handleStatus_OK (which really should be in some other mixin) pulls the files off the buildd into a temporary directory, performs some sanity checks, then throws them into incoming/ for process-upload to handle.
[13:14] <wgrant> It then sets the build status to UPLOADING
[13:14] <wgrant> process-upload eventually runs, processes the upload, and sets the status to FULLYBUILT
[13:14] <wgrant> TTBJs skip UPLOADING, and just go from BUILDING -> FULLYBUILT inside the BFJB
[13:14] <wgrant> We can't really do that for LFSBs, because the librarian upload will take A While™.
[13:15] <wgrant> So we need some other async job.
[13:15] <wgrant> But there's no reason to shoehorn that into archiveuploader, unless you really like changes files.
[13:15] <cjwatson> Right, which means I need something very like process-upload except that it processes all the files in the upload rather than needing a .changes file
[13:15] <wgrant> It's similar to the very top level of process-upload, indeed.
[13:15] <wgrant> But it has basically none of the checks.
[13:16] <cjwatson> It seemed that it might make more sense to make the .changes dependency in process-upload be dependent on the BFJB
[13:16] <cjwatson> But maybe that's ridiculously hard
[13:16] <wgrant> It just runs listdir to find pending uploads, then for each it: finds the build from the dir name, listdirs the upload dir, uploads the files to the librarian, and sets the build status.
[13:17] <wgrant> You could possibly abstract process-upload to only invoke archiveuploader in some circumstances, but that seems like a lot of refactoring for little gain.
[13:18] <cjwatson> It would be nice not to have to basically reimplement BFJB._handleStatus_OK except for the incoming dir name
[13:18] <cjwatson> I guess I could make archiveuploader ignore livefs jobs
[13:18] <wgrant> It also wouldn't be fatally hard to refactor _handleStatus_OK slightly to let you customise that easily.
[13:19] <cjwatson> True
[13:20] <wgrant> Note that this will change a bit once we have the 1SS buildd-master satellite. BFJBs will be thoroughly neutered, and all expensive work must be deferred to a later asynchronous cronjob. At that point I suspect that _handleStatus_OK will become totally generic and just dump files into a directory, and there'll be something like process-upload running over a directory of all types of build results. But I'm not sure it's worth you doing that ...
[13:21] <wgrant> ... restructuring as part of the already complicated livefs work.
[13:21] <wgrant> I'd use a separate cronjob on alphecca, probably using some refactored bits of process-upload's build integration.
[13:21] <wgrant> The only bit of archiveuploader that is relevant at all is the build handling bit of process-upload.
[13:23] <cjwatson> OK
[13:23] <cjwatson> Hopefully my to-do list will shorten at some point :)
[13:24] <wgrant> Indeed