/srv/irclogs.ubuntu.com/2012/11/28/#launchpad-dev.txt

StevenKAnd yeah, we probably need to pull PCJ into it00:00
wgrantStevenK: Of course, you don't have to do all this in SQL, but you'll want to use at least fragments of this to not make several queries per binary00:01
StevenKwgrant: The package_name I'm already doing handles source and PCJs, so I might leave that, but yes, looping over the binaries is bad00:04
wgrantStevenK: And you really want to be avoiding doing even one query per upload, ideally.00:06
StevenKwgrant: I have to do at least one?00:08
wgrantStevenK: Why?00:08
wgrantYou can probably get away with <5 queries per batch00:08
StevenKRight00:08
wgrantEven for batch sizes larger than 5 :)00:08
* StevenK stabs Storm00:43
StevenKTypeError: Expected int, found <type 'tuple'>: (1,)00:43
=== cyclicflux is now known as Guest64285
StevenK2012-11-28 01:22:14 DEBUG2  [PopulatePackageUploadSearchableNames] Iteration 7 (size 532.2): 2.258 seconds01:22
StevenKwgrant: ^01:22
wgrantStevenK: Much better :)01:26
wgrantDare I look at the code...01:28
wgrantHeh01:29
StevenKNo, you daren't01:29
wgrantI recognise that SQL...01:29
StevenKwgrant: I was trying to avoid spending four hours converting it to Storm01:29
wgrantHuh, did that actually work? I would expect the 500 updates to take more than 2 seconds01:30
wgrantI guess because the DB is local it works01:30
StevenKI wasn't sure how to phrase it as one update01:30
wgrantThere's only one bulk update in the codebase01:31
wgrantand it's on line 63701:31
StevenKwgrant: I wasn't even clear on how to phrase it as SQL01:37
StevenKOh god, that monstrous 34 line SQL has to go into BulkUpdate, doesn't it?01:40
wgrantWell, you don't have to do the string calculation in SQL01:40
wgrantIt's probably awkward to01:40
wgrantSo you can end up just passing the final strings into BulkUpdate01:41
StevenKYes01:41
StevenKupdate_columns turns into PackageUpload.searchable_names, values=list of the names from results, where=list of ids from results?01:42
StevenKReading BulkUpdate to figure out how it works has left me more confused.01:43
wgrantSort of. The Values expression is a subquery table, so it will want to be a list of [(id, searchable_names)]01:43
StevenKSo list(results), then01:43
StevenKBy happy coinience01:44
wgrantThe where= will want to match the Values' id with PackageUpload.id, so the rows get matched with each other.01:44
wgrantAnd then updated_columns wants to set PackageUpload.searchable_names to the Values' searchable_names01:44
StevenKHow do I reference Values'?01:45
StevenKDo I need to ClassAlias?01:45
wgrant(in this case, the Values virtual table is represented to Storm as the cache_data ClassAlias. I did that since the layout of the table is similar to the real LatestPersonSourcePackageReleaseCache table, but in your case it might be best to just define a class there with two cols01:45
wgrantOr you could alias PackageUpload01:45
wgrantSince it will have the two columns that you need01:45
StevenKwgrant: http://pastebin.ubuntu.com/1393383/01:49
wgrantStevenK: Are you deliberately shadowing the Values import there?01:52
wgrantValues is a storm expression class01:52
StevenKNope, renamed to PUValues01:54
wgrantThat's still not a valid variable name01:56
wgrantOh, but I guess it's a classalias01:56
wgrantSo eeh01:56
StevenKUpdateUpload ?01:57
StevenKTrying to be short and descriptive, and failing01:57
StevenKwgrant: Wait, so updated_columns is a list of the updated names, or a Storm expr?02:00
wgrantStevenK: Updated columns is the SET dict.02:03
wgrant{some_col_to_set: the_value_expression, another_col_to_set: some_other_value_expression}02:03
wgrantIn the existing example, the value expressions just directly reference columns from the Values expression.02:04
wgrantWhich is probably sufficient for your needs too02:04
wgrantBut they could be any Storm expression02:04
StevenKProgrammingError: syntax error at or near "1"02:07
StevenKLINE 1: ...d SET searchable_names="_3".searchable_names FROM 1, mozilla...02:07
StevenKHaha02:07
wgrantsounds like you didn't actually use Values02:07
StevenKI took "The Values expression is a subquery table, so it will want to be a list of [(id, searchable_names)]" to mean values=list(results) where results is the RS from the SQL o' doom02:09
wgrantStevenK: It's not quite that simple. Look at how the LPSPRC thing does it02:12
StevenKwgrant: My attempted simplification of LPSRPCs thing is not going well03:11
wgrantStevenK: Oh?03:13
StevenKTypeError: zip argument #1 must support iteration03:14
StevenKWhich falls out of compile_bulkupdate03:14
wgrantStevenK: Diff?03:18
StevenKwgrant: http://pastebin.ubuntu.com/1393537/03:19
wgrantStevenK: Well, I'm not sure if it's the cause of this error, but your Values expression doesn't seem to contain the ID03:21
wgrantWhich isn't going to be helpful when you try to match the searchable_names up to their corresponding PackageUploads03:22
wgrantOh03:24
wgrantI see why it's crashing03:24
wgrantYour list of rows is just a list of values03:24
wgrantYou dropped a layer of list comprehensions03:24
StevenKYes, see 'attempted simplification'03:25
wgrantWell, you simplified the table to be a single column, which isn't supported because it makes no sense whatsoever :)03:25
wgrant(unless you want to choose a random row -- there's no other key)03:26
StevenKWCPGW03:26
StevenKThen I don't need the for data in ... second list comprehsion?03:28
StevenKSince I have a list of tuples already?03:28
wgrantStevenK: Since it's just an int and a string you might not need the list comps03:29
wgrantIt was required in the LPSPRC job to cope with the Reference and DateTime fields03:29
StevenKIt doesn't love that either03:30
StevenKI've not used zip very often03:31
StevenKIn fact, It's clearly not working at all03:33
wgrantWhat's the error now, and what's the code?03:34
StevenKWell, zip((PackageUpload.id, PackageUpload.searchable_names), results)03:34
StevenK=> [(<storm.properties.PropertyColumn object at 0x6cb3600>, (1, u'mozilla-firefox'))]03:34
StevenKWhich is clearly why dbify_value is choking03:35
wgrantdbify_value is for values03:35
wgrantNot columns03:35
wgrantOh03:36
wgrantI think you might be nested one level too deep now03:36
StevenK.. I am?03:36
StevenKAnd here I was thinking I needed another loop03:36
wgrantWhere did you get that zip() invocation from?03:36
StevenKTrying to munge it from LPSRPC03:37
wgrantMaybe you're still missing a list comprehension03:37
wgrantIf you look at LPSPRC, the values list comp iterates over the rows of the resultset. It zips each row with the columns, then dbifys each value.03:38
wgrantWhereas you seem to be zipping the entire resultset with the columns.03:38
StevenK0-1@SQL-main-master UPDATE PackageUpload SET searchable_names=cache_data.searchable_names FROM (VALUES (7::integer, E'netapplet-1.0.0.tar.gz'::text), (8, E'cnews'), (9, E'cnews'), (10, E'netapplet-1.0.0.tar.gz'), (11, E'mozilla-firefox'), (12, E'language-pack-de'), (13, E'commercialpackage'), (14, E'pmount'), (15, E'iceweasel')) AS cache_data(id, searchable_names) WHERE PackageUpload.id = cache_data.id03:48
* StevenK declares victory03:48
StevenKwgrant: Can I have DF's database back?03:53
wgrantHmm?03:54
wgrantI hold no locks of significance03:54
StevenK2012-11-28 03:52:52 INFO    [PopulatePackageUploadSearchableNames] Blocked on 0:40:49.687301 old xact launchpad_main@launchpad_dogfood/16564 - <IDLE> in transaction.03:54
wgrantAh03:55
wgrantThat was probably my iharness03:55
wgrantfg03:55
* StevenK waits for PPUSN to ramp up03:56
StevenKIt's only doing ~ 200 per iteration03:56
wgrantWhere's the time going?03:56
StevenK2012-11-28 03:57:07 DEBUG2  [PopulatePackageUploadSearchableNames] Iteration 39 (size 334.5): 1.334 seconds03:57
StevenK2012-11-28 03:57:24 DEBUG2  [PopulatePackageUploadSearchableNames] Iteration 43 (size 769.7): 3.574 seconds03:57
wgrantSure, but which queries are being slow?03:57
wgrant(LP_DEBUG_SQL will be helpful here)03:58
StevenKArgh03:58
StevenKLP_DEBUG_SQL=1 is not helpful03:59
wgrantWhy not?03:59
StevenKThe queries are massive03:59
wgrantAh, true, there might be a bit of scrolling :)03:59
StevenKA bit?03:59
StevenK2012-11-28 04:06:19 DEBUG2  [PopulatePackageUploadSearchableNames] Iteration 8 (size 1019.7): 2.791 seconds04:06
StevenKSeems the slowest bit is the UPDATE04:07
StevenKOr building it04:08
StevenKwgrant: Perhaps we're being slowed down by tgrm04:09
wgrantAh, that's likely indeed04:09
wgrantThe strings are large04:09
StevenKHowever, we're hovering around 850 per iteration04:10
wgrantAs long as we have a reasonable number of queries (ie. <10) and the UPDATE is the slowest bit, we are probably doing well04:10
StevenKw04:10
StevenKBleh04:10
StevenK4 queries per batch04:10
StevenKdone is doing a SELECT 1 FROM (...), then __call__ is getting a list of ids, calculating the data, and then running the bulk update04:11
wgrantSounds good :)04:13
StevenKwgrant: So should I put up the DB patch?04:14
wgrantMight as well04:15
StevenKwgrant: Do you want to review the DB patch, or wait for stub?05:12
wgrantI imagine he'll be around soon05:15
StevenKwgrant: I can't rip out most of IPackageUploadSet.getAll() due to the API :-(05:16
wgrantStevenK: What do you mean?05:27
StevenKwgrant: version and exact_match are exported via IDistroSeries.getPackageUploads()05:32
wgrantOh my god05:36
wgrantI'd forgotten what getAll looked like05:36
StevenKHeh heh05:37
wgrantStevenK: So, exact_match=True is fairly easily doable05:38
wgrantYou just need to match LIKE 'term %' OR LIKE '% term %' OR LIKE '% term'05:38
wgrantAs for version, um.05:39
wgrantWe could check that Ubuntu doesn't use it and then entirely unexport it05:40
wgrantOr we could denorm that too05:40
wgrantI'd suggest the former05:40
StevenKI was wondering about that05:40
StevenKWondering I care enough to grep appserver logs to check for getPackageUploads.*version=05:43
StevenKWe could be naughty, make version a no-op and see if anyone notices :-P05:44
wgrantThe PPR is probably running around now, so it's not a good time to be grepping05:48
wgrantBut I guess it might be worth doing05:49
wgrantgrep a week of logs or so, I guess05:49
wgrantNot sure if -security use it05:49
wgrantStevenK: So it turns out all the vocabs and person searching in general are sort of completely terrible05:51
wgrantMy performance fix comes out to -150 or so05:52
StevenKHaha05:52
wgrantAre you still abusing DF?05:52
StevenKYeah05:52
StevenKWell, garbo-hourly is on its second run05:52
StevenKwgrant: zcat -f *access*112[0-7]* | grep -E 'getPackageUploads.*version=' turned up nothing for all four appservers.06:06
wgrantStevenK: Most of the logs are bzip2ed, IIRC06:07
StevenKThey are not06:07
wgrantAlso, make sure they're actually syncd06:07
wgrantOh, maybe that finally got fixed06:07
wgrantThat would be handy06:07
StevenKAll but gac were gzip'd until I got annoyed and filed an RT06:07
wgrantso, remove it, I suppose :)06:08
StevenKwgrant: garbo-hourly is done again if you want me to stop touching DF06:10
StevenKIt's roughly half done with two runs06:12
wgrantStevenK: https://code.launchpad.net/~wgrant/launchpad/bug-1075767/+merge/13657206:14
StevenKw06:14
StevenKBleh06:14
StevenKwgrant: r=me06:19
wgrantStevenK: Thanks06:30
=== cyclicflux is now known as CyclicFlux
adeuringgood morning08:49
=== yofel_ is now known as yofel
StevenKstub: https://code.launchpad.net/~stevenk/launchpad/db-searchablenames-for-pu/+merge/136568 would love a review09:48
stubStevenK: How are you going to be searching that btw?10:10
stubDo you expect searches to return few results, or are we going to have lots of results and we need to order them by similarity or similar?10:11
StevenKstub: contains_string10:20
StevenKAnd like10:20
StevenKstub: And both, sadly10:20
StevenKDepends on if it's via a view, or the API10:21
stubSo if a search matches 100,000 results we need to return them all?10:22
StevenKI doubt it's 100,000 results10:23
StevenKPerhaps 400-50010:23
StevenKUsually nowhere near that many10:23
stubOk, that sounds sanish10:27
stubStevenK: If we needed to LIMIT, ordering my similarity, we might need a GiST index instead of GIN or perhaps an alternative data structure10:28
stubc/my/by10:28
StevenKstub: Can I get a +1 on that MP?10:50
stubStevenK: Done, but index creation needs to be in a separate patch to apply live (took > 5s on staging)10:55
StevenKAh, right11:01
StevenKI'll split it out tonight and land just ALTER TABLE tomorrow11:02
cjwatsonwgrant,StevenK: We use the version parameter in the event that somebody uses the 'queue {accept,reject} PACKAGE/VERSION' syntax, which is rather handy11:30
cjwatsonI'm sure we could do it otherwise by getting all the uploads for a given version and filtering manually, but I'm not sure that's an improvement (especially in the case of accepting from the rejected queue)11:31
=== almaisan-away is now known as al-maisan
=== al-maisan is now known as almaisan-away
=== abentley changed the topic of #launchpad-dev to: http://dev.launchpad.net/ | On-call reviewer: abentley | Firefighting: - | Critical bugs: ~170
=== gary_poster is now known as gary_poster|away
cgoldberg)15:35
rick_h_party15:37
czajkowskirick_h_: you bringin the cake!15:38
rick_h_czajkowski: sure, my wife makes a mean homemade cake, or maybe pie15:39
czajkowskimmmm cake15:39
=== gary_poster|away is now known as gary_poster
rick_h_abentley: a review for you if you get a chance please https://code.launchpad.net/~rharding/launchpad/nonpublic_1052659/+merge/13647915:50
abentleyrick_h_: Sure.15:50
abentleyrick_h_: visible_specification_query already removes inactive products.  Line 1299.15:53
rick_h_looking15:54
rick_h_abentley: ah ok15:57
abentleyrick_h_: Could you explain why moving _cache_people is helpful?  I can see why making it static is good, but I don't see why you want to move it to SpecificationSet.15:57
rick_h_abentley: because it's used in two places15:58
rick_h_it was orignally inside of the method  _preload_specifications_people15:58
rick_h_so for me to access it outside of that context I needed to move it to its own method15:58
rick_h_I can't just use _preload_specification_people since I need to do things like limit and order the original result set15:59
abentleyrick_h_: You need to limit and order the original, not the decorated one?16:00
rick_h_abentley: right16:00
abentleyrick_h_: Why?16:00
rick_h_so if I'm understanding how the iter hook stuff works it first grabs the results of the original query, then generates a list of people ids, then does a query for all those people, and attaches them to the original result set16:01
abentleyrick_h_: Sorry, I've got to go now.  I'll ping you when I'm back.16:01
rick_h_if you only want 10 specs (the home page stuff only lists 5 or so) then you want to get that result set limited first, then run it through the person fetcher method16:01
rick_h_abentley: rgr, np16:01
rick_h_thanks16:01
czajkowskirick_h_: abentley is deryck about today ?16:04
rick_h_czajkowski: no, he's sick today16:05
czajkowski:(16:05
czajkowskirick_h_: any idea if/when documentation will be out to explain the private bugs, properetary stuff, as am getting a lot of users very confused atm16:06
rick_h_czajkowski: not at the moment. deryck was going to pull together some notes and go over the material already out there16:06
rick_h_but with a friday deadline we've been cranking on the bugs vs docs so far16:07
czajkowskinods16:07
rick_h_czajkowski: so the *if* is we will have some docs. Then *when* might not be until next week but don't quote me on it. I guess call it 'in progress'16:07
czajkowskino I understand16:07
czajkowskijust trying to point peopleat stuff to explain it16:08
rick_h_if you can do us a favor though, keep tabs on the questions/issues that come up16:08
czajkowskinp16:08
rick_h_it'll help direct us to make sure we hit points in the FAQish section or what not16:08
rick_h_and potential future bugs to make things easier/less crazy16:08
czajkowskithe terminalogy seems to be the confusing16:09
rick_h_heh, yea...16:09
abentleyrick_h_: Back.17:47
rick_h_abentley: awesome17:47
abentleyrick_h_: The iter_hook_stuff doesn't fire until you actually start to access the results.  So you don't need the resultset in final form before you create the DecoratedResultSet.17:48
rick_h_abentley: ok right, but I moved it from something defined inside the  _preload_specifications_people method17:49
abentleyrick_h_: The other call sites for _preload_specifications_people do ordering and sorting afterward, so if that was a performance issue, we'd know already.17:49
rick_h_to it's own method on the HasSpecificationsMixin17:49
rick_h_abentley: ok, gotcha, but the reason to move it was more to keep the _preload_specifications_people as is.17:50
abentleyrick_h_: Right.  I said why, you said because I need the resultset in its final form, I'm saying no you don't.17:50
rick_h_I guess what you're saying is that I could use that and append my order and limit onto the returned DecoratedResultSet?17:50
abentleyrick_h_: Why can't you use _preload_specifications_people as is?17:51
rick_h_abentley: ok, I gotcha. Because I didn't realize it could be used as it was. So let me update that then.17:51
abentleyrick_h_: Thanks.17:52
abentleyrick_h_: The other call sites are Product.specifications and Distribution.specifications, if you want to see how they do it.17:54
rick_h_abentley: will do17:54
abentleyrick_h_: Most implementations of IHasSpecification.specifications are not well-tested, so I would add tests for the old implementation, make sure they passed, then do the stormification.17:57
abentleyrick_h_: I'd like to see something similar for SpecificationSet.  You can just copy the tests from test_product.TestSpecifications, but please make sure they pass against both implementations.17:58
czajkowskiabentley: can you see -ops just running out the door and there is an issue17:59
czajkowskiplease17:59
jcsackettabentley: do you have time to review https://code.launchpad.net/~jcsackett/launchpad/private-blueprint-dependencies/+merge/136756 ?18:48
abentleyjcsackett: sure.18:49
jcsackettthanks.18:49
abentleyjcsackett: I don't think this is a view issue.  I think we want to forbid this kind of change at the model level using a storm validator.  We can catch the exception at the view to set the field error.18:52
jcsackettabentley: if we do that, we end up having to set the error and abort outside of the validation, right? that seems awkward. i can see an argument for having the model forbid it as well, but i think validate's the right place for this check.18:54
abentleyjcsackett: It's not awkward.  I do it all the time.18:55
jcsackettabentley: can you point me to an example? i've only dealt with form handling within validate.18:55
abentleyjcsackett: lp.code.browser.BranchEditFormView.change_action19:00
abentleyjcsackett: The bit where it handles BranchTargetError19:00
jcsackettabentley: huh, that's it?19:01
jcsackettok, i'll make that change.19:01
abentleyjcsackett: Yes.19:01
jcsacketti'll ping you when it's updated.19:01
abentleyjcsackett: Cool.19:02
abentleyjcsackett: The one thing you could run into is I believe self.next_url must not be set for this to work.19:10
jcsackettabentley: changes have been made; worked out quite nicely, thanks.19:30
abentleyjcsackett: Thanks.  Looking.19:30
abentleyjcsackett: Nice.  r=me.19:33
jcsackettabentley: thanks.19:34
wgrantabentley, jcsackett: It's a bit strange to use a validator there, due to artifact-specific grants. You can't assume that just because two blueprints are private the permissions are equal.21:11
jcsackettwgrant: not sure i follow you; we're not testing for equality between two blueprints. just ensuring that if the root one is public, it doesn't have a private dependency.21:12
abentleywgrant: That's a good point.21:12
abentleyjcsackett: There is also the case where both are proprietary, but one is visible to the user and the other is not.21:13
jcsackettaaah.21:13
jcsackettabentley: i think though that's a situation handled by the follow up work. the strict "no dependency" rule only really applies in the public/non-public situation.21:14
abentleyjcsackett: But do we want that rule if it won't prevent oopses?21:15
jcsackettabentley: we do, because it makes the excised dependency graph still valid information.21:15
abentleyjcsackett: Could you explain what you mean by that?21:17
wgrantIt's perfectly valid for a public blueprint to depend on a private one21:17
jcsackettabentley: i was writing out what i meant, and i may be changing my mind. gimme a sec to think through this. :-P21:17
wgrantUnlike branch stacking, a blueprint is useful without its deps21:17
jcsackettperhaps we just cease to show the dependency graph if you don't have permission to see the various parts of it.21:17
jcsackettb/c i'm not comfortable for the possibility of information leakage that anything else implies.21:18
abentleyjcsackett: Cease to show the graph completely?21:18
wgrantAs long as you don't traverse through invisible blueprints there's no possible leakage21:18
jcsackettabentley: that's becoming my preference, yes.21:18
wgrantBut your solution is easiest21:19
wgrantHm21:19
wgrantOn the other hand, hiding the graph entirely reveals that there are private things :)21:19
jcsackettwgrant: yes, and time is a factor, since we're operating under a hard deadline.21:19
abentleyjcsackett: I am okay with that.  Of course, I'm not a stakeholder, so I recommend getting deryck's buy-in.21:19
wgrantYou're well aware of how little respect I have for that deadline :)21:19
jcsackettwgrant: don'21:20
jcsackettwgrant: don't i know it. :-P21:20
jcsackettwgrant: but i don't think it's budging.21:20
wgrantYeah21:20
jcsackettabentley: i'm going to kill the ec2 land run for the branch, and we'll bring it up at standup tomorrow, i guess?21:21
abentleyjcsackett: Sounds good.21:22
jcsackettdone.21:22
=== lifeless_ is now known as lifeless
=== abentley changed the topic of #launchpad-dev to: http://dev.launchpad.net/ | On-call reviewer: - | Firefighting: - | Critical bugs: ~170
jcsackettwgrant: how bad is doing check_permission over a sequence of items? I seem to recall that being a performance horror.22:36
wgrantjcsackett: For blueprints? It'll issue a query per object, unless you've precached a positive or negative assertion for each22:40
jcsackettso, gets bad with scale.22:44
jcsackettthe easiest way to deal with not navigating the bad nodes in dep graphs is to just filter with check_permission. as a proof of concept it works, but i think the performance question could become bad.22:46
wgrantjcsackett: Well, I'd do the filter in the initial DB query22:47
wgrantIIRC it's a WITH RECURSIVE, to which you could just add the usual access filter22:47
jcsackett're referring to modifying those attrs?22:48
jcsackett...something strange just happened...22:48
jcsacketttrying again... so the graph stuff doesn't do any querying directly. it uses the spec's "dependencies" and "blocked_dep" attrs. but i guess those might be what you're referring to.22:48
wgrantRight, Specification.all_deps and Specification.all_blocked use a WITH RECURSIVE query to determine the dependency tree in two queries.22:49
jcsackettwgrant: ok. i'll take a look at modifying those tomorrow morning. thanks.22:50
wgrantIt should be easy enough22:51
rick_h_wgrant: I'd not played with the db setting you'd mentioned there23:34
rick_h_I mainly just got stuck because a bad db query will come out as some sort of .pt 'location error'23:34
rick_h_so got really confused for a bit until I pdb'd through it and got at the real error23:34
wgrantrick_h_: Setting LP_DEBUG_SQL makes the appserver and bin/harness etc. spew DB queries (including times) to stderr, which can be handy23:34
rick_h_very cool23:35
wgrantLocationErrors are usually an issue with the Python expression, not the actual SQL23:35
wgrantSince TAL has a bad habit of suppressing AttributeErrors23:35
wgrantWho needs tracebacks anyway :)23:35
rick_h_well, it was that I did .using(tables) vs .using(*tables( so the sql generated was comma seperated through the join section :/23:35
wgrantHm, I wouldn't expect that to be a LocationError, but who knows23:36
rick_h_and that came out to the app/traceback  level as a locationerror23:36
rick_h_heh, well I do have a way of breaking things in fun ways :)23:36
wgrantHeh23:36
rick_h_I did want to say that I feel like an LP dev now. Back to back deploys in my name wooo! :)23:37

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!