[00:12] <StevenK> wgrant: Shall I put a deployment together?
[00:13] <wgrant> StevenK: I was just looking to see what our state was. Please do.
[00:40] <wallyworld> wgrant: i've been reading some postgres tuning stuff and it seems it's best to create separate, single column indices to aid sorting, rather than a multi-column index. do you agree?
[00:45] <wgrant> wallyworld: Where'd you read that?
[00:45] <wgrant> It is nonsense :)
[00:45] <wallyworld> i'd have to find the lin kagain
[00:45] <wgrant> You can't sort using a composition of multiple indices -- that doesn't make sense
[00:46] <wgrant> It has to be a single composite index
[00:46] <StevenK> wallyworld: History -> Recently Closed Tabs ?
[00:46] <StevenK> If using Firefox
[00:46] <wallyworld> there's a lot of recently closed tabs
[00:46] <wgrant> Indices are usually b-trees, and it's a bit difficult to aggregate b-trees usefully :)
[00:46] <wallyworld> i can't recall which one
[00:47] <wallyworld> in this case, we need to sort on distorseries, spn, archive, date, person
[00:47] <wgrant> Why?
[00:47] <wgrant> Which query's this?
[00:47] <wallyworld> where person is creator or maintainor or neither
[00:47] <wallyworld> because we need the grouping
[00:48] <wgrant> Right, but think about how you'd follow indices to get what you need
[00:48] <wallyworld> so the batch iteration can figure out the distinct sprs
[00:48] <wgrant> You should quickly realise that that index is entirely inappropriate for it
[00:48] <wgrant> We have a person, and want want the n most recent uploads
[00:48] <wgrant> So the first key has to be the person
[00:48] <wgrant> Following that must be the overall sort key
[00:49] <wgrant> So we have (creator, dateuploaded DESC, id)
[00:49] <wallyworld> sure, i didn;t say about that would be the index order
[00:49] <wgrant> Ah
[00:49] <wgrant> Why do we need to sort on distroseries, spn, archive?
[00:49] <wallyworld> but if we need to sort on those columns, they would need to be in the index somewhere, no?
[00:49] <wgrant> We only need that if we do the distinct in postgres
[00:49] <wallyworld> for the frouping
[00:49] <StevenK> wgrant: Tempted to nail bug 779367 shut
[00:49] <_mup_> Bug #779367: spurious failure in test_builder.TestSlave <critical-analysis> <spurious-test-failure> <test-system> <Launchpad itself:Triaged> < https://launchpad.net/bugs/779367 >
[00:49] <wallyworld> grouping
[00:49] <wgrant> But we can't efficiently do the distinct in postgres
[00:49] <wgrant> StevenK: Indeed
[00:50] <wallyworld> wgrant: if we need to iterate the batch, we need the grouping
[00:50] <wgrant> wallyworld: howso?
[00:50] <wallyworld> so that all the 'like' records are together in the batch
[00:50] <wgrant> Postgres requires that, but a Python implementation would not
[00:51] <wgrant> We're doing it in Python precisely because postgres can't do it efficiently
[00:51] <wgrant> If we sort by the grouping, then we have to traverse the whole set anyway
[00:51] <wgrant> So the index is pointless
[00:51] <wallyworld> so we could have a (archive, date) at the start, and an (archive, date) at the end of the query
[00:51] <wallyworld> and we would not know they were for the same archive
[00:52] <wallyworld> since they would be in different batches
[00:52] <wgrant> Ah, indeed, it's batched, forgot that
[00:52] <wallyworld> so sadly we need to group
[00:52] <wgrant> To do the batching we will have to evaluate the whole thing in Python, either way
[00:52] <wgrant> We cannot group at the DB level.
[00:53] <wallyworld> so i wil have to iterate the entire result set
[00:53] <wgrant> Which means that batches after the first will be inefficient, but there's not much we can do
[00:53] <wgrant> You don't have to iterate the entire resultset
[00:53] <wgrant> You only have to iterate until you find 75 that are distinct
[00:53] <wallyworld> which could be everything
[00:53] <wgrant> It could be, yes.
[00:53] <wgrant> But it's not very likely
[00:54] <wallyworld> also, the creator or maintainer is optional
[00:54] <wgrant> How's it optional?
[00:54] <wgrant> +ppa-packages has a person as context
[00:54] <wgrant> Therefore it always has a creator
[00:54] <wallyworld> sure, but the other ones don't
[00:54] <wallyworld> there are 3
[00:54] <wgrant> They're all pages under :Person
[00:54] <wgrant> So they will always have as creator or maintainer
[00:54] <wallyworld> but the same metyhod is used elsewhere
[00:55] <wallyworld> with no creator or maintainer
[00:55] <wgrant> It presumably has some other context, though
[00:55] <wallyworld> but it will be slow then
[00:56] <wgrant> In which other contexts is it used?
[00:56] <wallyworld> getLatestMaintainedPackages
[00:56] <wallyworld> it's also on the person page
[00:57] <wallyworld> it sets uploader_only - false
[00:58] <wallyworld> so no creator or maintainer is added to the query
[00:59] <wgrant> But it's on Person, so it presumably has some kind of person filter
[01:00] <wallyworld> not that i can see unless i missed something
[01:01] <wgrant> Well, it would otherwise be global
[01:01] <wgrant> Which is entirely nonsensical
[01:02] <wgrant> It is called in three modes: maintainer, non-PPA uploader, and PPA uploader
[01:02] <wallyworld> i just checked, i think there is an extra condition added in, but just not there
[01:02] <StevenK> https://code.launchpad.net/~stevenk/launchpad/undisable-tests/+merge/132235
[01:02] <wgrant> uploader_only=False and ppa_only=True would result in an unbounded set, but that's never used
[01:03] <wallyworld> so it does seem it's always creator or maintainer in the query
[01:04] <wgrant> Indeed
[01:05] <wgrant> It doesn't really make sense for a method on Person to not constrain by person :)
[01:06] <wallyworld> true
[01:06] <wallyworld> so, what would a reasonable batch size be? 75?
[01:07] <wgrant> That tends to be the default
[01:07] <wallyworld> ok, will try that. we'll have to test with known pathological cases
[01:09] <wgrant> Well, this will be exposed to a new variety of pathological cases
[01:09] <wallyworld> yeah, sadly that may be the case
[01:09] <wgrant> It may almost be worth denorming this, but the workaround for these pages shouldn't be too terrible for now
[01:09] <wallyworld> let's see how this goes first
[01:18] <StevenK> wallyworld, wgrant: https://code.launchpad.net/~stevenk/launchpad/undisable-tests/+merge/132235
[01:19] <wallyworld> WCPGW
[01:20] <wallyworld> is it worth looking at WHY they are fragile?
[01:20] <wallyworld> and perhaps fixing a root cause issue?
[01:21] <StevenK> Any data we have on these tests is >18 months old, we need to enable them to see what is going on now
[01:21] <wallyworld> so maybe just an "ec2 test" run to see what's up?
[01:21] <wallyworld> i fear that they may well pass once, geth through ok, and then fail tomorrow
[01:22] <StevenK> I'll be tossing it at ec2 land once the MP is approved
[01:22] <StevenK> wallyworld: And we can disable them if they prove to be doing their old tricks, but then we have current data
[01:23] <wallyworld> guess so
[01:23] <StevenK> What I am doing is a Curtis Approved Plan.
[01:23] <wallyworld> a couple of ec2 test runs would also provide data
[01:23] <wallyworld> ok
[01:23] <StevenK> wallyworld: Wrong. buildbot is parallel, and ec2 is not.
[01:24] <wallyworld> were the failures only parallel? i don't think so
[01:24] <wallyworld> anyway, r=me
[01:24] <StevenK> No, but they can be impacted by it
[01:25] <wallyworld> maybe the stars will align and it will all be ok, let's see
[01:37] <nigelb> Hi! I have an old lp tree lying around.
[01:37] <nigelb> Will doing a bzr update and make run "Just Work"?
[01:38] <nigelb> s/update/pull/g
[01:41]  * nigelb does rf-get and hopes
[01:45] <wallyworld> it should work, but you will need to do a make schema before you run
[01:48] <StevenK> nigelb: rf-get will also grab new sourcedeps and such too
[01:49] <nigelb> StevenK: Oh awesome.
[01:49] <nigelb> wallyworld: Noted.
[01:49] <nigelb> This is gonna take a while. it's at least an year old tree.
[02:49] <wallyworld> wgrant: here's the mp for that related software work https://code.launchpad.net/~wallyworld/launchpad/ppa-packages-timeout-1071581/+merge/132236
[03:02] <StevenK> lp.code.model.tests.test_sourcepackagerecipebuild.TestBuildNotifications.test_handleStatus_OK_successful_upload loses
[03:03] <StevenK> Fails on ec2
[03:07] <wallyworld> :-(
[03:12] <nigelb> Apparently it's not very smooth.
[03:13] <nigelb> I got hit with "ImportError: No module named convoy.meta"
[03:13] <StevenK> Then your launchpad-dependencies is out of date
[03:19] <nigelb> I thought rf-get would update it. no?
[03:20] <nigelb> I'm on 0.119~lucid1
[03:22] <StevenK> nigelb: rf-get won't update launchpad-dependencies, it's a system package, and rf-get does not have root.
[03:23] <StevenK> nigelb: Is python-convoy installed?
[03:24] <nigelb> StevenK: It wasn't. Started install.
[03:56] <wallyworld> StevenK: archives have a checkArchivePermission method. the lp.View security adaptor checks this, and also checks if the user is subscribed for private archives, is there any reason why the subscription check is not done inside the checkArchivePermission method?
[03:57] <wallyworld> ie irrespective of any other check permission checks, shouldn't subscribers always have access?
[04:08] <StevenK> wallyworld: No, P3A subscribers are not allowed to see the archive
[04:09] <StevenK> They can browse it, but not Archive:+packages
[04:09] <StevenK> Which is why it's a little strange
[04:09] <wallyworld> StevenK: so, i'm looking at person+index
[04:10] <wallyworld> ok, Archive+packages only uses the checkArchivePermission check
[04:10] <wallyworld> but not the p3a subscription one
[04:10] <nigelb> woah
[04:10] <nigelb> it worked I think.
[04:12] <wallyworld> StevenK: Archive+packages uses lp.View permission, which also does the p3a subscriber check
[04:14] <nigelb> hrm. did lp upgrade postgres versions?
[04:14] <nigelb> I have this http://hastebin.com/qepumeveto.vhdl
[04:15] <wallyworld> not sure, you need 9.1
[04:15] <wallyworld> rf won't upgrade it
[04:15] <wallyworld> but launchpad-database-dependencies should i think
[04:15] <nigelb> if only rf told me.
[04:16] <wallyworld> i never use rf
[04:16] <nigelb> ah
[04:16] <wallyworld> but rf should still work nonetheless
[04:16] <nigelb> rf did most of the work.
[04:16] <wallyworld> i haven't looked at the code so cannot really comment
[04:16] <nigelb> The only things I failed doing was upgrading the dependency packages :)
[04:17] <wallyworld> well, that's not too bad
[04:44] <nigelb> Attempt #4.
[04:45] <nigelb> Gah. Failed at the exact same point. I suspect I need to do some magic to start the right version of postgres. But now, I have to start the day job.
[04:46] <StevenK> wallyworld: Actually, I might be wrong. We may allow Archive:+packages, but want to forbid subscribers from copying packages out using +copy-packages
[04:46] <wallyworld> right, that makes more sense
[04:47] <StevenK> nigelb: Do you have time to pastebin the error?
[04:50] <nigelb> StevenK: I pastebin'd it earlier. http://hastebin.com/qepumeveto.vhdl
[04:51] <StevenK> nigelb: You're still using 8.4?
[04:52] <nigelb> StevenK: I installed 9.1, but I suspect it's not yet running. I dont want to break it just before I start work since I need PG for work.
[04:53] <StevenK> nigelb: Ah, that could be a problem, since the LP setup scripts sort of destroy other databases
[04:55] <nigelb> StevenK: I suspected as much :)
[05:08] <stub> You want to use the LXC environment I think
[05:09] <nigelb> I'm on lucid > <
[05:18] <StevenK> Then you want to upgrade to Precise, and create an Lucid LXC for LP and another for $work? :-)
[05:19] <nigelb> Yeah, that's the eventual plan.
[05:20] <StevenK> lpsetup in the LP PPA will create a Lucid LXC for LP very easily
[05:20] <nigelb> The trouble is upgrading.
[05:20] <nigelb> My harddisk has no more space.
[05:20] <nigelb> so I need to get a new one first before I upgrade.
[05:22] <lifeless> nigelb: or delete crap :)
[05:22] <nigelb> lifeless: I've deleted most of the crap I don't want.
[05:23] <nigelb> Everything that's left, I do want :)
[05:23] <StevenK> Oh bleh, everyone knows /usr isn't important
[05:24] <StevenK> Or /lib
[05:24] <nigelb> I deleted a ton of Lp branches.
[05:24] <nigelb> I had all the ones I worked on lying around :)
[05:27] <nigelb> I'm hoping to get a 1TB harddisk this month.
[05:27] <nigelb> That should clear up significant amounts of space.
[05:29]  * StevenK is still waiting for a du to scare nigelb with
[05:37] <StevenK> steven@undermined:~/launchpad/lp-branches% du -sh .
[05:37] <StevenK> 63G	.
[05:39] <nigelb> o_O
[05:50] <StevenK> Can we destroy delayed copies yet?
[05:53] <StevenK> Hahaha
[05:53] <StevenK> lp-oops is broken
[05:54] <nigelb> so, are lp-oops erros logged with lp-oops-oops? ;)
[05:55] <lifeless> nigelb: yes
[05:55] <StevenK> The OOPS is useless
[05:55] <nigelb> ha!
[05:55] <StevenK> TemplateDoesNotExist: 500.html
[05:56] <StevenK> So it's an ISE
[05:56] <StevenK> Thanks for the content, you useless piece of crap, Django
[05:57] <lifeless> thats fixed in trunk
[05:58] <StevenK> I'm sure lp-oops is getting updated

[06:44] <wgrant> wallyworld: The last index is not useful
[06:45] <wgrant> Well, it would be useful if we wanted more than about 20% of the rows in the query, but we don't
[07:06] <wallyworld> wgrant: the final query would use it, no?
[07:07] <wallyworld> since we just order by date, id
[07:08] <wallyworld> and on the individual pages, we do return all matching rows
[07:08] <wallyworld> not just the top 5
[07:11] <wgrant> wallyworld: "all matching rows" == 75
[07:11] <wallyworld> huh?
[07:11] <wallyworld> the individual pages load all matching records
[07:11] <wallyworld> and then batch them
[07:11] <wallyworld> i think you are referring to the other query to load the ids
[07:12] <wgrant> We don't ever load all matching records unless we're on the last batch, do we?
[07:12] <wallyworld> we do - the 3 individual pages for maintained, uploaded etc
[07:13] <wgrant> They're in batches of 75
[07:13] <wallyworld> a summary is shown on the related software overview page
[07:13] <wallyworld> but the whole result set needs to be ordered, right?
[07:13] <wallyworld> so that the batches can be calculated
[07:13] <wgrant> But we only need to calculate the batches up to the current position
[07:14] <wgrant> So if I'm on the first page, I only need to load until I have 75 records to show
[07:14] <wgrant> I don't need to grab all 30000
[07:14] <wallyworld> but wouldn't the load be done suing the index
[07:14] <wallyworld> otherwise how would it efficently know what to load
[07:14] <wallyworld> since we are odering by date
[07:15] <wgrant> But we're filtering by creator/maintainer at that point
[07:15] <wallyworld> so we only grab 75, but we need to know which 75
[07:15] <wgrant> That's why we have the creator- and maintainer-prefixed indices
[07:15] <wallyworld> no, not there we arent
[07:15] <wallyworld> at that point, we are filtering by id in(....)
[07:16] <wallyworld> we filter by creator etc to figure out which ids to include in the final result set
[07:16] <wgrant> And that bit is only ever called with 75 items
[07:16] <wgrant> Because the final set is known by then
[07:16] <wgrant> We just have to load them
[07:16] <wallyworld> only the ids
[07:16] <wallyworld> and not the order
[07:16] <wgrant> Sure
[07:16] <wallyworld> so select * from sourcepackagerelease where id in (...) order by date desc
[07:17] <wallyworld> why doesn't that need an index?
[07:17] <wgrant> But how can the sort index be used for that?
[07:17] <wgrant> It's not what needs an index.
[07:17] <wgrant> It's a matter of what *can* be indexed
[07:17] <wallyworld> an index on date can be used
[07:17] <wgrant> How?
[07:17] <wgrant> sourcepackagerelease has more than 1.5 million rows
[07:17] <wallyworld> to efficiently communicate the order to traverse in
[07:18] <wgrant> Indices don't work that way
[07:18] <wgrant> In any database :)
[07:18] <wgrant> An index on (dateuploaded DESC, id) is useful for sorting in some situations
[07:18] <wallyworld> so you sating sorting doesn;t use indices?
[07:18] <wgrant> But in this case we'd have to traverse 1.5 million index tuples to pull out 75 of them
[07:18] <wgrant> Which is entirely not worth it
[07:19] <wgrant> It would never be silly enough to use an index for that sort
[07:19] <wallyworld> why? we would just traverse the first 75 in order
[07:19] <wgrant> Huh, how?
[07:19] <wgrant> We have IDs
[07:19] <wallyworld> since the index is ordered by date
[07:19] <wgrant> We know the IDs we want
[07:19] <wgrant> How can we efficiently look that up in the index?
[07:20] <wallyworld> well we may have 10000 ids
[07:20] <wgrant> Even if we have 10000, the density is only <0.01 still
[07:20] <wallyworld> i guess it could ignore the index and traverse all the ids
[07:20] <wgrant> If we were to force it to use the sort index, it would have to traverse 1.5 million tuples to find 75
[07:21] <wallyworld> why?
[07:21] <wgrant> If it doesn't use the index, it can do 75 index lookups and sort in memory
[07:21] <wallyworld> the index gices the order
[07:21] <wgrant> Think about the physical layout of the index
[07:21] <wgrant> The index allows us to find a tuple based on its dateuploaded *and* its id
[07:22] <wgrant> Or to traverse tuples in (dateuploaded DESC, id) order
[07:22] <wallyworld> which is what the query is asking for
[07:22] <wallyworld> so it just has to walk the index
[07:22] <wallyworld> 75 records at a time
[07:22] <wgrant> huh?
[07:23] <wgrant> We know the IDs of the 75 records that we need
[07:23] <wallyworld> but not the date order
[07:23] <wallyworld> so we have 100000 ids but don't know the order to present them in
[07:23] <wgrant> Sure, but how does "75 records at a time" come into it?
[07:23] <wallyworld> that's the batch size
[07:23] <wgrant> We don't have 100000 IDs
[07:23] <wgrant> We have 75
[07:23] <wgrant> We partitioned early on, didn't we?
[07:24] <wallyworld> the rs.countz() may be 100000
[07:24] <wgrant> Because we read in IDs in date order, until we had 75 unique ones
[07:24] <wallyworld> no, you are confusing the 2 queries
[07:24] <wallyworld> we did the limit to 5 unique ones
[07:24] <wallyworld> nbecause that's what the overview page shows
[07:25] <wgrant> Then +ppa-packages shows batches of 75 unique ones
[07:25] <wallyworld> but the individual pages show everything
[07:25] <wallyworld> yes, out of say 100000
[07:25] <wallyworld> so it has to know which 75 to load at a time
[07:25] <wallyworld> so it has to have them ordered
[07:25] <wgrant> The first query filters by creator or maintainer, and crawls back in time until it finds 75 unique ones
[07:25] <wgrant> This is always filtering by person, so it can use the creator- or maintainer-prefixed index
[07:25] <wallyworld> no it doesn't
[07:26] <wgrant> Then, once it has those 75 IDs, it does a second query to load the SPRs
[07:26] <wallyworld> it doesn't stop at 75
[07:26] <wgrant> Oh?
[07:26] <wallyworld> because the batching is done at the view level
[07:26] <wgrant> 202	+ if max_results and len(ids) >= max_results:
[07:26] <wallyworld> yes, max results is 5 and optional
[07:26] <wallyworld> max results is used for the overview page sunnaries
[07:27] <wallyworld> to show 5 from each list
[07:27] <wgrant> Ah, well then this will probably always time out
[07:27] <wallyworld> but the invidual pages don't use max_results
[07:27] <wgrant> There's no point having the indexed sort if you're just going to load the entire set anyway
[07:27] <wallyworld> only the ids
[07:27] <wgrant> The whole problem with the query is that it ends up loading the whole set
[07:28] <wallyworld> not the recordds, just the ids
[07:28] <wallyworld> and then the outer query is batched
[07:28] <wgrant> That's irrelevant; it's still considering ~100000 more tuples than is required
[07:28] <wallyworld> so an indidual person can have 100000 ppas?
[07:29] <wallyworld> because the ids are filtered
[07:29] <wallyworld> to match the creator/maintainer etc
[07:29] <wallyworld> only only those matching ones go to be used in the outer query
[07:31] <wgrant> 100000 SPRs? Sure.
[07:32] <wallyworld> well, the only alternative it to implement a custom batcher up in the view layer
[07:33] <wallyworld> which can be done
[07:34] <wallyworld> wgrant: is the current implementation worth trying? it won't be worse than what's there
[07:35] <wgrant> wallyworld: It is strictly worse than what's there
[07:35] <wgrant> We're no longer just asking postgres to consider every related row
[07:35] <wgrant> We're asking it to return bits of them to Python
[07:35] <wgrant> In terms of O(disk accesses) it's no worse, but there's an extra constant multiplier there :)
[07:36] <wallyworld> ok, there's no choice then, but to move the code
[07:36] <wallyworld> to a batcher
[07:36] <wgrant> Right
[07:37] <wallyworld> there will still be issues
[07:37] <wgrant> Right, there is great potential for pathological cases
[07:37] <wallyworld> if the batches are not traversed sequentually
[07:37] <wgrant> Oh?
[07:37] <wallyworld> if someobe url hacks
[07:38] <wgrant> I'm not seeing the issue
[07:38] <wallyworld> the relationship to records loaded to batches in not defined
[07:38] <wgrant> (though ideally you'd use a derivative of SRF here, which doesn't support URL hacking anyway)
[07:38] <wallyworld> so it would need to walk the records sequentuially
[07:38] <wallyworld> instead of using offset and limit
[07:39] <wallyworld> as one would do for a normal query
[07:39] <wgrant> Right, that's the problem with an offset-based batcher
[07:39] <wgrant> Although I guess due to deduping even a range-based duper would have to traverse the whole lot
[07:39] <wallyworld> yep
[07:40] <wgrant> Still, this will make the page suck less until we can denorm it properly
[07:40] <wgrant> And the MP you have is strictly worse than the status quo
[07:40] <wallyworld> and as someone clicks through the batches, stuff needs to be retained in memory
[07:41] <wgrant> Not retained in memory as such
[07:41] <wgrant> But subsequent batches will also have to calculate all previous batches
[07:41] <wallyworld> or if it is retained, then notr really
[07:41] <wgrant> How can we retain it?
[07:42] <wallyworld> i session variable
[07:42] <wgrant> Hah
[07:44] <wallyworld> wgrant: so if someone is many batches into scrolling through the table, then each next click presents more and more work for the batcher
[07:44] <wgrant> Sure
[07:44] <wallyworld> with a large number of records, it surely will timeout
[07:44] <wgrant> And that's something we probably have to live with until we fix the schema
[07:45] <wgrant> It's relatively rare that someone will go deep enough that it will be a problem
[07:45] <wgrant> And it's completely not worth hacking something into the session for this temporary fix
[07:45] <wallyworld> i will have to handle Last properly though
[07:45] <wallyworld> and Prev
[07:46] <wgrant> They'll work fine as long as the user has a small number of uploads
[07:47] <wallyworld> why don't i just say fuck it, there's no need to order by date the records on the individual pages
[07:47] <wgrant> They have to be ordered somehow
[07:47] <wgrant> Or batching will not work
[07:47] <wgrant> It's not the order that's the problem; it's the distinct on
[07:48] <wallyworld> yes, i'm just wonderong out loud if we can avoid that
[07:48] <wgrant> If we dump the distinct on then the entire problem becomes trivial
[07:48] <wallyworld> we need the distinct on to shw the latest 5 for the overview page
[07:48] <wgrant> Why?
[07:49] <wallyworld> just because
[07:49] <wgrant> By the current definition of the latest 5 we need the distinct
[07:49] <wallyworld> that's how it is put together
[07:49] <wgrant> But by the current definition of +ppa-packages we need it there too
[07:49] <wallyworld> propably, i ws just wondering out loud
[07:50] <wallyworld> oh well, custom batcher it is i guess
[07:51] <wallyworld> wgrant: so what's the "normal" number of ppas a person would have on that page?
[07:51] <wgrant> s/ppas/packages/?
[07:51] <wallyworld> yes
[07:51] <wgrant> Let's see
[07:51] <wallyworld> cutis has 38 i think
[07:52] <wgrant> Curtis is at the low end :)
[07:52] <wgrant> The most is probably about 250k
[07:52] <wgrant> But that's exceptionally high
[07:52] <wallyworld> so my implemention was premised on it only being a few 100 at most
[07:52] <wallyworld> wow 250k!
[07:52] <wgrant> Your implementation is basically what postgres chooses today, except slower.
[07:54] <wgrant> The top 100 ranges from 2k to 220k
[07:55] <wallyworld> ok
[07:57] <wgrant> It may almost be worth just adding a new table maintained by garbo to make all these pages trivial
[07:57] <StevenK> And Archive size?
[07:58] <StevenK> Since I'm still sure you'll murder me for two columns on archive
[07:59] <wgrant> Well
[07:59] <wgrant> A similar approach is probably useful there
[07:59] <wgrant> archive already has a tonne of related stats columns, and I haven't worked out whether they're worth it yet
[08:00] <wgrant> They probably are
[08:00] <wgrant> s/worth it/worth having in the main table/
[08:00] <wallyworld> there's all sorts of stuff that can be denormed in the spr, spph area isn't there?
[08:01] <wgrant> Those are the two big ones
[08:01] <wallyworld> i should just finish this branch though
[08:02] <StevenK> wgrant: It's clear we need a schema change. Even pulling the rows from LFC is 2.8seconds
[08:02] <wgrant> StevenK: Certainly, we need a schema change
[08:02] <wgrant> Calculating all this stuff live is insane
[08:02] <wgrant> It's just not clear what the detail of that schema change is
[08:02] <wgrant> For +ppa-packages and co it's much clearer and simpler
[08:03] <StevenK> The two I can think of are Archive.{source,binary}_size or Archive.size as an array
[08:03] <wgrant> It makes no sense as an array
[08:03] <wallyworld> maybe i should just do the denorm then
[08:03] <wgrant> wallyworld: Or decide to not do the DISTINCT ON
[08:04] <wallyworld> would that matter? we would get dupes then
[08:04] <wgrant> It would change the definition of the page
[08:04] <wgrant> It's not clear whether it would be changing it in a problematic fashion
[08:04] <wallyworld> from what i gather, we are not sure what the page is used for
[08:04] <wallyworld> or who uses it
[08:05] <wgrant> We know sort of
[08:05] <wgrant> Anyway, the page can retain its current definition in a performant manner if we do the denorm, otherwise it probably cannot. It can be made performant without the denorm if we eliminate the distinctness
[08:06] <wallyworld> and does that "sort of" indicate we couild remove the distinct on?
[08:06] <wgrant> I don't think removing the distinct on would be a problem
[08:06] <wgrant> It's probably more confusing than anything else
[08:06] <wgrant> We've had at least one bug filed about that aspect of the behaviour, although I think it was invalided.
[08:07] <wallyworld> we would need to ensure we display a distinguishing piece of data for each record
[08:07] <wallyworld> let's discuss tomorrow
[08:07] <wgrant> Indeed
[08:07] <wgrant> The denorm is quite simple, if we opt to do it
[08:08] <wgrant> Instantaneous updates are not mandatory, so we can just do it in garbo-frequently
[08:08] <wallyworld> it would be nice if it also has line of sight for other fixes
[08:08] <wallyworld> StevenK: i replied to your mp comment - do you agree?
[08:08] <wgrant> Oh, crap, that MP
[08:08]  * wgrant glances
[08:08] <wallyworld> lol, you have too much on your plate
[08:09] <wallyworld> ah, trick or treaters, better get some lollies for them
[08:09] <wgrant> crazy qlders
[08:09] <StevenK> I'd still prefer an override policy that backs onto UnknownOverridePolicy
[08:09] <wgrant> Sure
[08:09] <wgrant> That's the ideal solution
[08:10] <wgrant> find_and_apply_overrides should be replaced by override policy calls
[08:10] <wgrant> But Ian quite reasonably wants to be done before the heat death of the universe :)
[08:10] <wallyworld> yes, but i thought we agreed that could wait
[08:10] <StevenK> Like override policies are hard to right :-P
[08:10] <StevenK> Sigh, write
[08:11] <StevenK> Didn't I write all of the current ones?
[08:11] <wgrant> I think so
[08:11] <StevenK> wallyworld: Then add an XX
[08:11] <StevenK> XXX
[08:12] <StevenK> Obviously, typing is HARD
[08:12] <wallyworld> ok, i'll take another look, maybe it can be done easily
[08:12] <wgrant> This fix is nearly correct; let's not overcomplicate it now with it so close to done
[08:12] <wgrant> This is a useful incremental improvement
[08:12] <wgrant> Which doesn't break anything
[08:13] <wallyworld> what did i miss?
[08:13] <wgrant> You shouldn't be overriding the section, just the component
[08:13] <wallyworld> is there an existing bug for the XXX i would add?
[08:14] <StevenK> Not sure, but it would be Low
[08:14] <wallyworld> ah ok, i saw somewhere else was overrding the secion
[08:14] <wallyworld> why does that other place do the secion but not here?
[08:15] <wgrant> wallyworld: We want to inherit the section of a previous publication, because for a particularly binary it doesn't usually change
[08:15] <wallyworld> but not if we get the component from a source publication
[08:15] <wgrant> eg. libfoo is in libs, libfoo-dev is in libdevel
[08:15] <wgrant> It's a different dimension to component
[08:16] <wgrant> component is archive policy, section is just package categorisation
[08:16] <wallyworld> ok, i have no clue about the data model
[08:16] <wallyworld> thanks
[08:16] <wallyworld> i'll fix that bit
[08:16] <wgrant> It makes sense to inherit the archive policy from the source, but not the package category (since the binaries usually specify a sensible section, and the section differs between binaries)
[08:17] <wallyworld> ok, makes sense
[08:17] <wgrant> The code structure is a bit terrible, but it was already awful so I can hardly object on that basis
[08:17] <wallyworld> i didn't realise section was categorisation
[08:18] <wallyworld> i tweaked an existing method
[08:18] <wgrant> Cish libraries are in libs, X stuff is x11, Python libraries are in python, etc.
[08:18] <wgrant> Right, but that method is called in an existing thing to handle various existing override cases
[08:18] <wgrant> So it's slightly ugly
[08:18] <wallyworld> yes, but it provided reusable code
[08:18] <wallyworld> so i just hijacked it a bit
[08:19] <wgrant> Right
[08:19] <wallyworld> keeps loc count low :-)
[08:19]  * wgrant needs to head out for a while now... hopefully StevenK can approve it, otherwise I can in a bit
[08:19] <wallyworld> thanks again
[08:19] <wallyworld> wgrant: going trick or treating :-)
[08:19] <wgrant> Heh
[08:19] <wgrant> Not since I lived in Canada :)
[08:20] <wallyworld> it's getting bigger here
[08:20] <wallyworld> i don't mind it cause it's pagan and celtic in orign
[08:21] <StevenK> And it helps you're a Scrooge who hates Christmas :-)
[08:21] <wallyworld> yep, that's me
[08:21] <wallyworld> i hate anything to do with organised religion
[08:21] <wallyworld> ewspecially since they stole Christmas from the pagans
[08:22] <wallyworld> cause it was originally Saturnalia
[08:46] <adeuring> good morning
[08:53] <deryck> Morning, adeuring
[08:53] <adeuring> hi deryck
[11:07] <czajkowski> cjwatson: https://bugs.launchpad.net/launchpad/+bug/1073492  is this down to some of the work you've done recently ?
[11:07] <_mup_> Bug #1073492: Sync changelog doesn't include all changelogs between release version and new Debian version <derivation> <Launchpad itself:New> < https://launchpad.net/bugs/1073492 >
[11:10] <tumbleweed> czajkowski: that's not a launchpad bug
[11:12] <tumbleweed> oh, it's not what I thought it was
[11:12] <czajkowski> :)
[11:13]  * tumbleweed is glad to see syncpackage getting it right, for a change  :)
[11:19] <cjwatson> czajkowski: not sure, would probably need to spend more concentration than is typically available during a conference to figure that out
[11:19] <cjwatson> czajkowski: I don't recall changing anything around there *intentionally*
[20:13] <jcsackett> sinzui: can you review https://code.launchpad.net/~jcsackett/launchpad/invalid-product-names/+merge/132405
[20:14] <sinzui> I will
[20:15] <jcsackett> thanks.
[20:28] <sinzui> jcsackett, r=me
[20:28] <jcsackett> sinzui: awesome thanks.
[20:29] <sinzui> jcsackett, you might want to look at bug 1055751 again
[20:29] <_mup_> Bug #1055751: Permission denied: "Cannot create '+filediff' viewing diffs on +branch urls <403> <codebrowse> <oops> <Launchpad itself:Triaged> < https://launchpad.net/bugs/1055751 >
[20:29] <sinzui> I think the issue is in translatePath or BranchNamespace where we do not expand or lookup the aliased name
[20:30] <jcsackett> sinzui: ok, happy to take another look at it.
[20:30] <jcsackett> i've got some distance on it now and a bit more knowledge of the domain. :-P
[20:31] <sinzui> jcsackett, We get the error when users are browsing the series alias rather than the 3-part/5-part names that the code you were looking at expects
[20:31] <sinzui> eg lp:launchpad -> lp:~launchpad-pqm/launchpad/devel
[20:32] <sinzui> jcsackett, I am not feeling well, and I have the added stress of children putting on Halloween costumes. I will send and email summarising my effort to be productive to the squad
[20:33] <jcsackett> sinzui: dig.
[20:57] <diptanuc`> Hi Guys
[20:57] <diptanuc`> Just saw that you guys use Go to power services at canonical. Was interested to talk more about what you do with it.
[21:14] <sinzui> diptanuc`, Launchpad doesn't use Go. JuJu and related cloud tools uses it.
[21:50] <diptanuc`> sinzui: Oh i see, who is the right person to talk to?
[21:50] <diptanuc`> Is there a channel where they hang out?
[21:50] <lifeless> diptanuc`: gustavo niemeyer would be a good choice, in #juju
[21:53] <czajkowski> cr3: ping, trying to work out your issue on https://answers.launchpad.net/launchpad/+question/212793
[21:54] <cr3> czajkowski: thanks, I'm very curious to understand how to get checkbox-ihv to look like certify-web
[21:54] <wgrant> cr3: Read the paragraph at the top of +sharing
[21:55] <wgrant> On both projects
[21:56] <czajkowski> cr3: I cant see the-ihv but as wgrant says , if you read that area on the +sharing it should become clearer.
[21:56] <cr3> wgrant: yeah, they look different. should I have asked for a proprietary project instead of a private one?
[21:56] <wgrant> cr3: You created an open source project
[21:57] <cr3> wgrant: my mistake, it seems I should've created a proprietary one :(
[21:57] <wgrant> You probably wanted to select Other/Proprietary as the license, which will enable to you use commercial features
[21:57] <wgrant> You can change it on +edit
[21:57] <czajkowski> cr3: is it actually properietary though
[21:57] <czajkowski> or is it thats how you think it should be set up for sharing ?
[21:58] <cr3> czajkowski: it's actually proprietary
[21:59] <czajkowski> ok
[22:00] <wgrant> Right, proprietary features tend to be easier to enable when you configure your project as not just another random open source project :)
[22:04] <wgrant> cr3: Once you've set it to Other/Proprietary you'll get a 30 day trial commercial subscription, and you can set stuff on +sharing
[22:04] <wgrant> You'll need to poke someone (or RT) to get the subscription extended
[22:04] <czajkowski> aka me via commercial@lp.net
[22:05] <wgrant> Ah yes, that's the address
[22:05] <czajkowski> wgrant: care to explain https://answers.launchpad.net/launchpad/+question/212880  to me ?
[22:06] <cr3> czajkowski, wgrant: thanks folks, I'll have another look into this tomorrow
[22:06] <czajkowski> cr3: np
[22:06] <wgrant> czajkowski: Looking
[22:43] <czajkowski> wgrant: thank you
[23:46] <wallyworld> StevenK: i've updated the code to add the XXX etc
[23:57] <StevenK> wallyworld: I was expecting the XXX to be just above the if block you added
[23:58] <wallyworld> StevenK: well, it's the whole method than needs to be refactored
[23:58] <StevenK> Then why is the XXX half-way through it?
[23:58] <wallyworld> it's in the doc at the top
[23:59] <StevenK> Well, the hack is in processUnknownFile