[00:00] And yeah, we probably need to pull PCJ into it [00:01] StevenK: Of course, you don't have to do all this in SQL, but you'll want to use at least fragments of this to not make several queries per binary [00:04] wgrant: The package_name I'm already doing handles source and PCJs, so I might leave that, but yes, looping over the binaries is bad [00:06] StevenK: And you really want to be avoiding doing even one query per upload, ideally. [00:08] wgrant: I have to do at least one? [00:08] StevenK: Why? [00:08] You can probably get away with <5 queries per batch [00:08] Right [00:08] Even for batch sizes larger than 5 :) [00:43] * StevenK stabs Storm [00:43] TypeError: Expected int, found : (1,) === cyclicflux is now known as Guest64285 [01:22] 2012-11-28 01:22:14 DEBUG2 [PopulatePackageUploadSearchableNames] Iteration 7 (size 532.2): 2.258 seconds [01:22] wgrant: ^ [01:26] StevenK: Much better :) [01:28] Dare I look at the code... [01:29] Heh [01:29] No, you daren't [01:29] I recognise that SQL... [01:29] wgrant: I was trying to avoid spending four hours converting it to Storm [01:30] Huh, did that actually work? I would expect the 500 updates to take more than 2 seconds [01:30] I guess because the DB is local it works [01:30] I wasn't sure how to phrase it as one update [01:31] There's only one bulk update in the codebase [01:31] and it's on line 637 [01:37] wgrant: I wasn't even clear on how to phrase it as SQL [01:40] Oh god, that monstrous 34 line SQL has to go into BulkUpdate, doesn't it? [01:40] Well, you don't have to do the string calculation in SQL [01:40] It's probably awkward to [01:41] So you can end up just passing the final strings into BulkUpdate [01:41] Yes [01:42] update_columns turns into PackageUpload.searchable_names, values=list of the names from results, where=list of ids from results? [01:43] Reading BulkUpdate to figure out how it works has left me more confused. [01:43] Sort of. The Values expression is a subquery table, so it will want to be a list of [(id, searchable_names)] [01:43] So list(results), then [01:44] By happy coinience [01:44] The where= will want to match the Values' id with PackageUpload.id, so the rows get matched with each other. [01:44] And then updated_columns wants to set PackageUpload.searchable_names to the Values' searchable_names [01:45] How do I reference Values'? [01:45] Do I need to ClassAlias? [01:45] (in this case, the Values virtual table is represented to Storm as the cache_data ClassAlias. I did that since the layout of the table is similar to the real LatestPersonSourcePackageReleaseCache table, but in your case it might be best to just define a class there with two cols [01:45] Or you could alias PackageUpload [01:45] Since it will have the two columns that you need [01:49] wgrant: http://pastebin.ubuntu.com/1393383/ [01:52] StevenK: Are you deliberately shadowing the Values import there? [01:52] Values is a storm expression class [01:54] Nope, renamed to PUValues [01:56] That's still not a valid variable name [01:56] Oh, but I guess it's a classalias [01:56] So eeh [01:57] UpdateUpload ? [01:57] Trying to be short and descriptive, and failing [02:00] wgrant: Wait, so updated_columns is a list of the updated names, or a Storm expr? [02:03] StevenK: Updated columns is the SET dict. [02:03] {some_col_to_set: the_value_expression, another_col_to_set: some_other_value_expression} [02:04] In the existing example, the value expressions just directly reference columns from the Values expression. [02:04] Which is probably sufficient for your needs too [02:04] But they could be any Storm expression [02:07] ProgrammingError: syntax error at or near "1" [02:07] LINE 1: ...d SET searchable_names="_3".searchable_names FROM 1, mozilla... [02:07] Haha [02:07] sounds like you didn't actually use Values [02:09] I took "The Values expression is a subquery table, so it will want to be a list of [(id, searchable_names)]" to mean values=list(results) where results is the RS from the SQL o' doom [02:12] StevenK: It's not quite that simple. Look at how the LPSPRC thing does it [03:11] wgrant: My attempted simplification of LPSRPCs thing is not going well [03:13] StevenK: Oh? [03:14] TypeError: zip argument #1 must support iteration [03:14] Which falls out of compile_bulkupdate [03:18] StevenK: Diff? [03:19] wgrant: http://pastebin.ubuntu.com/1393537/ [03:21] StevenK: Well, I'm not sure if it's the cause of this error, but your Values expression doesn't seem to contain the ID [03:22] Which isn't going to be helpful when you try to match the searchable_names up to their corresponding PackageUploads [03:24] Oh [03:24] I see why it's crashing [03:24] Your list of rows is just a list of values [03:24] You dropped a layer of list comprehensions [03:25] Yes, see 'attempted simplification' [03:25] Well, you simplified the table to be a single column, which isn't supported because it makes no sense whatsoever :) [03:26] (unless you want to choose a random row -- there's no other key) [03:26] WCPGW [03:28] Then I don't need the for data in ... second list comprehsion? [03:28] Since I have a list of tuples already? [03:29] StevenK: Since it's just an int and a string you might not need the list comps [03:29] It was required in the LPSPRC job to cope with the Reference and DateTime fields [03:30] It doesn't love that either [03:31] I've not used zip very often [03:33] In fact, It's clearly not working at all [03:34] What's the error now, and what's the code? [03:34] Well, zip((PackageUpload.id, PackageUpload.searchable_names), results) [03:34] => [(, (1, u'mozilla-firefox'))] [03:35] Which is clearly why dbify_value is choking [03:35] dbify_value is for values [03:35] Not columns [03:36] Oh [03:36] I think you might be nested one level too deep now [03:36] .. I am? [03:36] And here I was thinking I needed another loop [03:36] Where did you get that zip() invocation from? [03:37] Trying to munge it from LPSRPC [03:37] Maybe you're still missing a list comprehension [03:38] If you look at LPSPRC, the values list comp iterates over the rows of the resultset. It zips each row with the columns, then dbifys each value. [03:38] Whereas you seem to be zipping the entire resultset with the columns. [03:48] 0-1@SQL-main-master UPDATE PackageUpload SET searchable_names=cache_data.searchable_names FROM (VALUES (7::integer, E'netapplet-1.0.0.tar.gz'::text), (8, E'cnews'), (9, E'cnews'), (10, E'netapplet-1.0.0.tar.gz'), (11, E'mozilla-firefox'), (12, E'language-pack-de'), (13, E'commercialpackage'), (14, E'pmount'), (15, E'iceweasel')) AS cache_data(id, searchable_names) WHERE PackageUpload.id = cache_data.id [03:48] * StevenK declares victory [03:53] wgrant: Can I have DF's database back? [03:54] Hmm? [03:54] I hold no locks of significance [03:54] 2012-11-28 03:52:52 INFO [PopulatePackageUploadSearchableNames] Blocked on 0:40:49.687301 old xact launchpad_main@launchpad_dogfood/16564 - in transaction. [03:55] Ah [03:55] That was probably my iharness [03:55] fg [03:56] * StevenK waits for PPUSN to ramp up [03:56] It's only doing ~ 200 per iteration [03:56] Where's the time going? [03:57] 2012-11-28 03:57:07 DEBUG2 [PopulatePackageUploadSearchableNames] Iteration 39 (size 334.5): 1.334 seconds [03:57] 2012-11-28 03:57:24 DEBUG2 [PopulatePackageUploadSearchableNames] Iteration 43 (size 769.7): 3.574 seconds [03:57] Sure, but which queries are being slow? [03:58] (LP_DEBUG_SQL will be helpful here) [03:58] Argh [03:59] LP_DEBUG_SQL=1 is not helpful [03:59] Why not? [03:59] The queries are massive [03:59] Ah, true, there might be a bit of scrolling :) [03:59] A bit? [04:06] 2012-11-28 04:06:19 DEBUG2 [PopulatePackageUploadSearchableNames] Iteration 8 (size 1019.7): 2.791 seconds [04:07] Seems the slowest bit is the UPDATE [04:08] Or building it [04:09] wgrant: Perhaps we're being slowed down by tgrm [04:09] Ah, that's likely indeed [04:09] The strings are large [04:10] However, we're hovering around 850 per iteration [04:10] As long as we have a reasonable number of queries (ie. <10) and the UPDATE is the slowest bit, we are probably doing well [04:10] w [04:10] Bleh [04:10] 4 queries per batch [04:11] done is doing a SELECT 1 FROM (...), then __call__ is getting a list of ids, calculating the data, and then running the bulk update [04:13] Sounds good :) [04:14] wgrant: So should I put up the DB patch? [04:15] Might as well [05:12] wgrant: Do you want to review the DB patch, or wait for stub? [05:15] I imagine he'll be around soon [05:16] wgrant: I can't rip out most of IPackageUploadSet.getAll() due to the API :-( [05:27] StevenK: What do you mean? [05:32] wgrant: version and exact_match are exported via IDistroSeries.getPackageUploads() [05:36] Oh my god [05:36] I'd forgotten what getAll looked like [05:37] Heh heh [05:38] StevenK: So, exact_match=True is fairly easily doable [05:38] You just need to match LIKE 'term %' OR LIKE '% term %' OR LIKE '% term' [05:39] As for version, um. [05:40] We could check that Ubuntu doesn't use it and then entirely unexport it [05:40] Or we could denorm that too [05:40] I'd suggest the former [05:40] I was wondering about that [05:43] Wondering I care enough to grep appserver logs to check for getPackageUploads.*version= [05:44] We could be naughty, make version a no-op and see if anyone notices :-P [05:48] The PPR is probably running around now, so it's not a good time to be grepping [05:49] But I guess it might be worth doing [05:49] grep a week of logs or so, I guess [05:49] Not sure if -security use it [05:51] StevenK: So it turns out all the vocabs and person searching in general are sort of completely terrible [05:52] My performance fix comes out to -150 or so [05:52] Haha [05:52] Are you still abusing DF? [05:52] Yeah [05:52] Well, garbo-hourly is on its second run [06:06] wgrant: zcat -f *access*112[0-7]* | grep -E 'getPackageUploads.*version=' turned up nothing for all four appservers. [06:07] StevenK: Most of the logs are bzip2ed, IIRC [06:07] They are not [06:07] Also, make sure they're actually syncd [06:07] Oh, maybe that finally got fixed [06:07] That would be handy [06:07] All but gac were gzip'd until I got annoyed and filed an RT [06:08] so, remove it, I suppose :) [06:10] wgrant: garbo-hourly is done again if you want me to stop touching DF [06:12] It's roughly half done with two runs [06:14] StevenK: https://code.launchpad.net/~wgrant/launchpad/bug-1075767/+merge/136572 [06:14] w [06:14] Bleh [06:19] wgrant: r=me [06:30] StevenK: Thanks === cyclicflux is now known as CyclicFlux [08:49] good morning === yofel_ is now known as yofel [09:48] stub: https://code.launchpad.net/~stevenk/launchpad/db-searchablenames-for-pu/+merge/136568 would love a review [10:10] StevenK: How are you going to be searching that btw? [10:11] Do you expect searches to return few results, or are we going to have lots of results and we need to order them by similarity or similar? [10:20] stub: contains_string [10:20] And like [10:20] stub: And both, sadly [10:21] Depends on if it's via a view, or the API [10:22] So if a search matches 100,000 results we need to return them all? [10:23] I doubt it's 100,000 results [10:23] Perhaps 400-500 [10:23] Usually nowhere near that many [10:27] Ok, that sounds sanish [10:28] StevenK: If we needed to LIMIT, ordering my similarity, we might need a GiST index instead of GIN or perhaps an alternative data structure [10:28] c/my/by [10:50] stub: Can I get a +1 on that MP? [10:55] StevenK: Done, but index creation needs to be in a separate patch to apply live (took > 5s on staging) [11:01] Ah, right [11:02] I'll split it out tonight and land just ALTER TABLE tomorrow [11:30] wgrant,StevenK: We use the version parameter in the event that somebody uses the 'queue {accept,reject} PACKAGE/VERSION' syntax, which is rather handy [11:31] I'm sure we could do it otherwise by getting all the uploads for a given version and filtering manually, but I'm not sure that's an improvement (especially in the case of accepting from the rejected queue) === almaisan-away is now known as al-maisan === al-maisan is now known as almaisan-away === abentley changed the topic of #launchpad-dev to: http://dev.launchpad.net/ | On-call reviewer: abentley | Firefighting: - | Critical bugs: ~170 === gary_poster is now known as gary_poster|away [15:35] ) [15:37] party [15:38] rick_h_: you bringin the cake! [15:39] czajkowski: sure, my wife makes a mean homemade cake, or maybe pie [15:39] mmmm cake === gary_poster|away is now known as gary_poster [15:50] abentley: a review for you if you get a chance please https://code.launchpad.net/~rharding/launchpad/nonpublic_1052659/+merge/136479 [15:50] rick_h_: Sure. [15:53] rick_h_: visible_specification_query already removes inactive products. Line 1299. [15:54] looking [15:57] abentley: ah ok [15:57] rick_h_: Could you explain why moving _cache_people is helpful? I can see why making it static is good, but I don't see why you want to move it to SpecificationSet. [15:58] abentley: because it's used in two places [15:58] it was orignally inside of the method _preload_specifications_people [15:58] so for me to access it outside of that context I needed to move it to its own method [15:59] I can't just use _preload_specification_people since I need to do things like limit and order the original result set [16:00] rick_h_: You need to limit and order the original, not the decorated one? [16:00] abentley: right [16:00] rick_h_: Why? [16:01] so if I'm understanding how the iter hook stuff works it first grabs the results of the original query, then generates a list of people ids, then does a query for all those people, and attaches them to the original result set [16:01] rick_h_: Sorry, I've got to go now. I'll ping you when I'm back. [16:01] if you only want 10 specs (the home page stuff only lists 5 or so) then you want to get that result set limited first, then run it through the person fetcher method [16:01] abentley: rgr, np [16:01] thanks [16:04] rick_h_: abentley is deryck about today ? [16:05] czajkowski: no, he's sick today [16:05] :( [16:06] rick_h_: any idea if/when documentation will be out to explain the private bugs, properetary stuff, as am getting a lot of users very confused atm [16:06] czajkowski: not at the moment. deryck was going to pull together some notes and go over the material already out there [16:07] but with a friday deadline we've been cranking on the bugs vs docs so far [16:07] nods [16:07] czajkowski: so the *if* is we will have some docs. Then *when* might not be until next week but don't quote me on it. I guess call it 'in progress' [16:07] no I understand [16:08] just trying to point peopleat stuff to explain it [16:08] if you can do us a favor though, keep tabs on the questions/issues that come up [16:08] np [16:08] it'll help direct us to make sure we hit points in the FAQish section or what not [16:08] and potential future bugs to make things easier/less crazy [16:09] the terminalogy seems to be the confusing [16:09] heh, yea... [17:47] rick_h_: Back. [17:47] abentley: awesome [17:48] rick_h_: The iter_hook_stuff doesn't fire until you actually start to access the results. So you don't need the resultset in final form before you create the DecoratedResultSet. [17:49] abentley: ok right, but I moved it from something defined inside the _preload_specifications_people method [17:49] rick_h_: The other call sites for _preload_specifications_people do ordering and sorting afterward, so if that was a performance issue, we'd know already. [17:49] to it's own method on the HasSpecificationsMixin [17:50] abentley: ok, gotcha, but the reason to move it was more to keep the _preload_specifications_people as is. [17:50] rick_h_: Right. I said why, you said because I need the resultset in its final form, I'm saying no you don't. [17:50] I guess what you're saying is that I could use that and append my order and limit onto the returned DecoratedResultSet? [17:51] rick_h_: Why can't you use _preload_specifications_people as is? [17:51] abentley: ok, I gotcha. Because I didn't realize it could be used as it was. So let me update that then. [17:52] rick_h_: Thanks. [17:54] rick_h_: The other call sites are Product.specifications and Distribution.specifications, if you want to see how they do it. [17:54] abentley: will do [17:57] rick_h_: Most implementations of IHasSpecification.specifications are not well-tested, so I would add tests for the old implementation, make sure they passed, then do the stormification. [17:58] rick_h_: I'd like to see something similar for SpecificationSet. You can just copy the tests from test_product.TestSpecifications, but please make sure they pass against both implementations. [17:59] abentley: can you see -ops just running out the door and there is an issue [17:59] please [18:48] abentley: do you have time to review https://code.launchpad.net/~jcsackett/launchpad/private-blueprint-dependencies/+merge/136756 ? [18:49] jcsackett: sure. [18:49] thanks. [18:52] jcsackett: I don't think this is a view issue. I think we want to forbid this kind of change at the model level using a storm validator. We can catch the exception at the view to set the field error. [18:54] abentley: if we do that, we end up having to set the error and abort outside of the validation, right? that seems awkward. i can see an argument for having the model forbid it as well, but i think validate's the right place for this check. [18:55] jcsackett: It's not awkward. I do it all the time. [18:55] abentley: can you point me to an example? i've only dealt with form handling within validate. [19:00] jcsackett: lp.code.browser.BranchEditFormView.change_action [19:00] jcsackett: The bit where it handles BranchTargetError [19:01] abentley: huh, that's it? [19:01] ok, i'll make that change. [19:01] jcsackett: Yes. [19:01] i'll ping you when it's updated. [19:02] jcsackett: Cool. [19:10] jcsackett: The one thing you could run into is I believe self.next_url must not be set for this to work. [19:30] abentley: changes have been made; worked out quite nicely, thanks. [19:30] jcsackett: Thanks. Looking. [19:33] jcsackett: Nice. r=me. [19:34] abentley: thanks. [21:11] abentley, jcsackett: It's a bit strange to use a validator there, due to artifact-specific grants. You can't assume that just because two blueprints are private the permissions are equal. [21:12] wgrant: not sure i follow you; we're not testing for equality between two blueprints. just ensuring that if the root one is public, it doesn't have a private dependency. [21:12] wgrant: That's a good point. [21:13] jcsackett: There is also the case where both are proprietary, but one is visible to the user and the other is not. [21:13] aaah. [21:14] abentley: i think though that's a situation handled by the follow up work. the strict "no dependency" rule only really applies in the public/non-public situation. [21:15] jcsackett: But do we want that rule if it won't prevent oopses? [21:15] abentley: we do, because it makes the excised dependency graph still valid information. [21:17] jcsackett: Could you explain what you mean by that? [21:17] It's perfectly valid for a public blueprint to depend on a private one [21:17] abentley: i was writing out what i meant, and i may be changing my mind. gimme a sec to think through this. :-P [21:17] Unlike branch stacking, a blueprint is useful without its deps [21:17] perhaps we just cease to show the dependency graph if you don't have permission to see the various parts of it. [21:18] b/c i'm not comfortable for the possibility of information leakage that anything else implies. [21:18] jcsackett: Cease to show the graph completely? [21:18] As long as you don't traverse through invisible blueprints there's no possible leakage [21:18] abentley: that's becoming my preference, yes. [21:19] But your solution is easiest [21:19] Hm [21:19] On the other hand, hiding the graph entirely reveals that there are private things :) [21:19] wgrant: yes, and time is a factor, since we're operating under a hard deadline. [21:19] jcsackett: I am okay with that. Of course, I'm not a stakeholder, so I recommend getting deryck's buy-in. [21:19] You're well aware of how little respect I have for that deadline :) [21:20] wgrant: don' [21:20] wgrant: don't i know it. :-P [21:20] wgrant: but i don't think it's budging. [21:20] Yeah [21:21] abentley: i'm going to kill the ec2 land run for the branch, and we'll bring it up at standup tomorrow, i guess? [21:22] jcsackett: Sounds good. [21:22] done. === lifeless_ is now known as lifeless === abentley changed the topic of #launchpad-dev to: http://dev.launchpad.net/ | On-call reviewer: - | Firefighting: - | Critical bugs: ~170 [22:36] wgrant: how bad is doing check_permission over a sequence of items? I seem to recall that being a performance horror. [22:40] jcsackett: For blueprints? It'll issue a query per object, unless you've precached a positive or negative assertion for each [22:44] so, gets bad with scale. [22:46] the easiest way to deal with not navigating the bad nodes in dep graphs is to just filter with check_permission. as a proof of concept it works, but i think the performance question could become bad. [22:47] jcsackett: Well, I'd do the filter in the initial DB query [22:47] IIRC it's a WITH RECURSIVE, to which you could just add the usual access filter [22:48] 're referring to modifying those attrs? [22:48] ...something strange just happened... [22:48] trying again... so the graph stuff doesn't do any querying directly. it uses the spec's "dependencies" and "blocked_dep" attrs. but i guess those might be what you're referring to. [22:49] Right, Specification.all_deps and Specification.all_blocked use a WITH RECURSIVE query to determine the dependency tree in two queries. [22:50] wgrant: ok. i'll take a look at modifying those tomorrow morning. thanks. [22:51] It should be easy enough [23:34] wgrant: I'd not played with the db setting you'd mentioned there [23:34] I mainly just got stuck because a bad db query will come out as some sort of .pt 'location error' [23:34] so got really confused for a bit until I pdb'd through it and got at the real error [23:34] rick_h_: Setting LP_DEBUG_SQL makes the appserver and bin/harness etc. spew DB queries (including times) to stderr, which can be handy [23:35] very cool [23:35] LocationErrors are usually an issue with the Python expression, not the actual SQL [23:35] Since TAL has a bad habit of suppressing AttributeErrors [23:35] Who needs tracebacks anyway :) [23:35] well, it was that I did .using(tables) vs .using(*tables( so the sql generated was comma seperated through the join section :/ [23:36] Hm, I wouldn't expect that to be a LocationError, but who knows [23:36] and that came out to the app/traceback level as a locationerror [23:36] heh, well I do have a way of breaking things in fun ways :) [23:36] Heh [23:37] I did want to say that I feel like an LP dev now. Back to back deploys in my name wooo! :)