[00:00] Ah [00:00] So the two tests are terrible? [00:00] Effectively, they are getUtility(IPackageUploadSet).getAll(distroseries, name=u"'") with and without exact_match=True [00:00] Huh? [00:01] No [00:01] The tests have revealed a bug in your code [00:01] If API input can cause a crash, then our code is buggy [00:02] That's not API input, that's calling the method directly [00:02] Sure [00:02] But I can pass in a name of "'" through the API [00:21] wgrant: But name = name.replace("'", "").replace('"', '') probably makes you hate me ... [00:22] That's very PHPesque of you [00:22] You should escape the string according to the various syntaxes that are involved [00:23] Not just blindly remove characters until it works :) [00:24] wgrant: IE, remove everything that isn't alphanumeric? [00:25] That's precisely the opposite of what I suggested [00:25] wgrant: Hold on [00:25] Escaping in the ' or " case doesn't work [00:25] Why not? [00:26] Because as you said earlier, it isn't a valid tsquery syntax [00:26] Confused... [00:26] A plain ' is not valid tsquery syntax [00:26] Because ' is special in tsquery syntax [00:27] Like in SQL [00:27] How do we resolve this in SQL? [00:27] (and Python, and JavaScript, and...) [00:28] * StevenK is clearly missing something [00:30] ' delimits a quoted string in Python, SQL, and a tsvector [00:31] How do we normal handle ', and other special characters, in a Python or SQL string? [00:31] By escaping them [00:31] Right [00:31] So how might we handle a ' in a tsvector string? [00:31] By escaping it [00:31] :) [00:32] '''' isn't already escaped, or it needs to be escaped twice? [00:32] How's that escaped? [00:32] '''' is the SQL equivalent of Python's "'" [00:32] Ah [00:33] It's escaped in SQL terms [01:01] wgrant: So, I have two MPs -- I'm happy to self-review the drop-garbo one, it just deletes 182 lines of code. What about the index MP? [01:03] That should only be landed once the query branch is ready [01:03] In case it requires changes [01:03] Mmmmm [01:04] I should drop my hacks on DF, add two rows to LDR and update it [01:04] (none of this sort of thing should ever land until the whole thing is confirmed to work, really) [01:04] We had confirmed it was working on Friday [01:05] We hadn't ever done a full test [01:06] * StevenK executes on his plan [01:08] wgrant: The lib/lp/registry/model/person.py cowboy on DF is yours? [01:08] And lib/lp/soyuz/model/binarypackagebuild.py, I guess [01:11] Indeed [01:11] Well, person.py is me, binarypackagebuild.py is cjwatson [01:12] wgrant: Do you want your change preserved? [01:12] No [01:52] wgrant: No objections to landing the death of garbo branch? [01:52] StevenK: As long as it's done on all four DBs [01:52] Yeah, all five -- dev, dogfood, qas, staging and prod [02:52] * StevenK stabs dogfood [02:53] Damn it, load [02:58] wgrant: DF's front page is unhappy, but +queue looks snappy [02:59] Hmmm, I think JS is still broken [03:00] Yeah [03:00] No convoy, remember [03:00] Then remove the flag? [03:00] Won't help [03:00] 3.5 won't work without convoy [03:01] Bleh [03:01] Though we still haven't removed 3.3, don't rely on that for much longer :) [03:01] Why do you want JS? [03:01] I can't see render times [03:01] And I'd like to [03:01] Ah [03:01] view source :) [03:01] At least 255 queries/external actions issued in 9.53 seconds [03:02] That doesn't sound indexed [03:02] What's the query? [03:02] However, that's 'xorg' for Quantal done [03:02] (or it could just be a really really cold cache) [03:02] Let me resubmit [03:02] At least 255 queries/external actions issued in 3.97 seconds [03:03] Right [03:03] Can you track down the main search query and see how long it took? [03:03] Cause an OOPS? [03:03] Or look at the query log that's inline in the page [03:04] (you'll probably need to poke with firebug, or view source) [03:05] SELECT DISTINCT PackageUpload.* FROM PackageUpload WHERE PackageUpload.distroseries = 110 AND PackageUpload.status IN (3) AND PackageUpload.archive IN (1, 534) AND ((searchable_names::tsvector @@ 'xorg:*')) ORDER BY PackageUpload.id DESC LIMIT 31 OFFSET 0 == 54 ms [03:07] Let me cause an OOPS, I think we're missing some preloading [03:08] OOPS-48338103ccc22a3accd4c8c2476a6107 [03:10] Hmmm, and why isn't that OOPS ID on neem? :-( [03:10] you did a grep, not a find, right ? [03:10] StevenK: DF doesn't transmit to neem [03:11] /srv/launchpad.net/dogfood-logs [03:11] The OOPS id isn't used for the file name, because that would lead to trusting untrusted data. [03:11] lifeless: No, I did a load up the website and put the id in the field [03:12] I was hoping for it to be formatted nicely, not in a file on DF :-( [03:17] lifeless: How is trusting that data problematic, as long as you verify it's safe for use as a filename? [03:17] If someone wants to collide, they already can [03:17] It searches by ID, not filename [03:18] * StevenK finishes writing a novel for CA [03:20] * StevenK glares at this OOPS on mawson [03:23] Holy crap it's slow the first time [03:23] boost is still pathetic at 8.4 seconds [03:28] But which part is slow? [03:28] If boost is slow, it probably just means that the loading of the binaries is slow [03:28] Not that the search is [03:30] Mmmm [03:31] wgrant: http://wedontsleep.org/~steven/f.html is the query log [03:31] wgrant: mmm true [03:31] There is no one query that is terribly slow [03:31] C IS FOR COOKIE [03:31] Bad StevenK [03:32] wgrant: So hurry up and load it so I can rm it [03:33] I've loaded it and erased your cookie [03:35] We can see from the log that the search query is below 50ms [03:35] So for the purposes of this change it is fine [03:35] What's not fine is... the entire rest of the page [03:37] Indeed [03:37] But we already knew that :) [03:40] wgrant: So looks like I'll be sticking to DistroSeries:+queue after this work lands ... [03:40] wgrant: Anyway, index MP? [03:40] Several decades ago I had a branch which mostly fixed DistroSeries:+queue [03:41] I'm not sure if it's still around, and it's probably changed enough now that it's not useful [03:41] I didn't propose it at the time because cachedproperty on model objects was banned [03:42] I think the first step is loading everything related in one query, rather than this looping garbage it's doing [03:42] Well [03:42] You need to preload things [03:42] It's pretty simple [03:43] There's just a number of different sorts of things you'll need to preload [03:49] wgrant: Are you satisified about the working-ness of this change? [03:49] lifeless's contribution to lp: unbanning cachedproperty. [03:50] lifeless: Yep [03:50] StevenK: Well, your app code is wrong, but otherwise it seems to be good [03:51] wgrant: What about it? [03:53] StevenK: It'll probably still go horribly wrong if I include a : in my search term, for example [03:55] The precise parsing rules don't seem to be very well documented [03:56] wgrant: I need to escape : in the same way? [03:56] Parsing rules for tsquery seem to not be documented at all [03:58] http://www.postgresql.org/docs/9.1/static/textsearch-controls.html#TEXTSEARCH-PARSING-QUERIES goes some way [03:59] Ah, http://www.postgresql.org/docs/9.1/static/datatype-textsearch.html is what I was looking for [04:00] So, ' ", : * look special [04:01] Without that errant comma [04:02] Well [04:02] abel was the last person to look hard at this [04:02] I'd just put the whole thing in single quotes, backslashing any ' or \ [04:02] Given we want the whole thing treated as a single token [04:03] wgrant: Which doesn't handle your : case [04:03] Doesn't it? [04:03] : isn't special when inside a quoted string [04:03] Ah [04:03] (and " and * aren't ever special) [04:04] Does that mean it turns into @@ 'foo':* for substring? [04:04] Right [04:04] Er [04:04] No [04:04] @@ '''foo'':*' [04:05] Or @@ E'\'foo\':*' if you're using that syntax [04:05] Or @@ E'\'foo\\\'\\\\\':*' if you want to search for foo'\ :) [04:06] raise StupidUserSearchError(wtf) [04:06] Haha [04:07] What is all this for btw? Someone working on the antique database text search? [04:08] stub: Searching on DistroSeries:+queue [04:08] Using tsvector because we like fast queries [04:08] And not an eight table join and four likes [04:09] Oh god, doesn' [04:09] Bleh [04:09] Oh god, doesn't this turn into .replace(r'\\', r'\\\\') [04:13] In that case, why have you got odd characters you need to match? For word search, you want to normalize the data in the index. [04:13] Oh... this is for the search [04:18] I don't know what the data looks like you are searching. But if odd characters are coming in, you have bigger problems with fuzzy matching like tsvector/tsquery such as case sensitivity, stop words, stemming [04:19] stub: Well [04:19] stub: stemming only happens if you use to_tsvector [04:20] So it's not a problem here [04:20] We just skip stemming and stopwords etc [04:20] Creating the tsvector and tsquery directly with exactly the content we desire [04:21] (I only opted to use ts2 because otherwise we'd need a custom GIN operator to do prefix search on an array of strings) [04:21] God damn I'm confused [04:21] lovely that this is all documented now. i might be able to understand that old code now. [04:21] StevenK: What's confusing? [04:22] If I paste this, I may get murdered [04:22] Let me sharpen my knives [04:22] :-( [04:22] Waiting for an error, anyway [04:23] name = name.replace("'", "\\'").replace(r'\\', r'\\\\') [04:23] name = "''%s''" % name [04:23] And then we enclose it in '' in the SQL condition [04:23] Ew [04:23] ProgrammingError: syntax error in tsquery: "''':*" [04:23] LINE 1: ...chive IN (16) AND ((searchable_names::tsvector @@ '''\''':*'... [04:23] One level at a time pls :) [04:24] Create the string as you desire ::tsquery to see it [04:24] Then wrap that in sqlvalues() or similar, to the do the SQL escaping [04:25] (also, you need to escape the backslashes first, or you'll be escaping escape characters [04:26] Hmm, that may not be helping [04:27] wgrant: The index and NOT NULL MPs are up [04:27] The branch I'm working on depends on the indexes, but the NOT NULL is optional [04:28] I think sqlvalues isn't helping me [04:31] Search string: ''unique-from-factory-py-line3348-101271'' [04:31] sqlvalues escaped: '''''unique-from-factory-py-line3348-101271''''' [04:32] Oh right [04:32] sqlvalues still uses the horrid old non-standard escaping syntax [04:33] I'm tempted to ignore it [04:33] And just build the string by hand [04:33] Why? [04:34] Mainly because it's annoying me [04:34] However, I win [04:34] What is your code? [04:35] wgrant: http://pastebin.ubuntu.com/1444536/ [04:35] Line 3 is odd [04:36] Are you sure you really want a raw string? [04:36] that'll replace \\ with \\\\ [04:36] Not \ with \\ [04:36] Ah [04:36] Now '\\', '\\\\' [04:36] I seem to not be stabbed, so that's a good thing ... [04:36] In a raw string, a backslash causes the following character to be interpreted literally, but the escaping backslash is retained in the final string [04:37] Well [04:37] You're still using sqlvalues [04:37] Which you should stop doing [04:37] But it's better than manually doing the SQL escaping [04:37] Oh? [04:37] SQL("whatever @@ ?", params=('blah',)) [04:38] Then search_string turns into "''%s''" ? [04:39] Confused [04:39] Why? [04:39] Because you had ''' in your earlier example [04:39] params handles escaping for you [04:39] That's why it exists [04:40] You give it a normal string, it will inject it into the SQL safely [04:43] wgrant: sqlvalues gutted from name and version searching [04:43] Lovely [04:43] sqlvalues is thoroughly deprecated; you should avoid it except in legacy code [04:46] wgrant: http://pastebin.ubuntu.com/1444550/ [04:52] StevenK: searchable_versions can be done with a native Storm expression [04:52] There are Array and ArrayContains (though the latter is @> rather than <@, I'm sure you can flip your args :)) bits in lp.services.database.stormexpr [04:53] And the parens in the searchable_names SQL are likely redundant [04:53] And "if name is not None and name != ''" sounds suspiciously close to "if name" [04:54] Old code [04:55] I'm an equal opportunity complainer [05:01] wgrant: http://pastebin.ubuntu.com/1444568/ [05:02] Indentation! [05:02] That looks like you're appending an ArrayContains and an Array [05:03] Fixed [05:03] Any other complaints? [05:03] * StevenK waits for the flood [05:07] Sadly not [05:08] wgrant: Should I put up an MP for this branch, then? Then you can happily nitpick more sections of the code. [05:09] yep [05:20] wgrant: That MP is now up [05:28] wgrant: Thanks. Shall I wait for stub about the index branch, or which? [05:33] Looks good, but let me just test something [05:41] wgrant: Ah, seeing if a composite helps? [05:42] Yeah [05:42] It still doesn't [05:42] :-( [05:42] It's still fast enough [05:42] Sure [05:43] And beats the pants off the 15 seconds it was :-) [05:43] (also, it's "indices", not "indicies") [05:44] Right, index branch landing [05:45] When stub reappears and has applied them both to qas and prod, I'll land the use branch [05:45] And when he approves the NOT NULL branch, I'll land that. [05:49] Right [05:50] Hmm [05:52] Hah [05:52] DRS is already used for DistroSeries:+queue [05:52] Yes [05:52] IIRC it only preloads PU* [05:52] Not everything else [05:53] BPF, SPRF, SPR [05:54] It may be trying to add in BPB [05:54] But it at least needs BPB and BP [05:54] *PB [09:27] stub: O HAI. Can you have a look at 40-2 in devel and apply the indicies to prod and qas? [09:30] * stub looks [09:31] stub: https://code.launchpad.net/~stevenk/launchpad/index-searchables-for-pu/+merge/140120 is the MP [09:32] StevenK: Backups still running, so it needs to wait a bit longer [09:32] I'll do staging and qas [09:42] StevenK: It is on qas, staging already had it from nightly updates [09:44] StevenK: you can drop my binarypackagebuild.py cowboy if you like [10:02] cjwatson: It was two lines, and didn't hurt to restore. [10:02] I certainly committed more evils to DF [10:03] stub: Surprised the backups are still going [10:03] yeah [10:08] 9 hours so far, only got as far as translationmessage [10:08] less than 30 mins into that one [10:09] oh, only one minute into translationmessage [10:10] I thought they only took six? [10:12] wildcherry's backup would normally have finished about 20 minutes ago [10:12] yeah, suspect high io. Running at just over 10% iowait atm. [10:12] It's possible that allocate-revision-karma is causing excessive load as it catches up with 15 million revisions of backlog [10:12] revisionkarma catchup [10:12] yer [10:12] It's about half done [10:13] and our io utilization has been creeping up anyway as things get optimized and we become less cpu bound. [10:13] Yeah [10:14] I've been noticing more and more IO issues in explain analyzes lately. [10:14] Blocks just don't read as quickly as they used ot [10:18] wgrant: production or dogfood? [10:18] Prod [10:19] dogfood's always been dreadful :) [10:19] Reading in cold blocks on prod is just not as fast as it probably should be [10:19] I'll blame contention on production, but we have a few knobs to twiddle if it looks like it needs it [10:20] At the moment, we are tuned so pg thinks random access is pretty darn fast due to almost always having our data in disk cache. That might be a lie now. [10:20] Yeah [10:20] With the DB this big... [10:21] its always been big, it is the size of the hot set of data that is important. [10:21] It wasn't all that long ago that the DB was 250GB [10:21] Now it's more than 350GB [10:21] We've gone from 2x RAM to 3x RAM [10:21] yeah, but ram is 128GB [11:01] Hmm, that's a bit horrible [11:02] * StevenK glares at buildbot [11:02] Anyway, this is wgrant's fault. [11:02] He told me to Storm-ify it. [11:07] StevenK: When in doubt, blame wgrant? [11:08] It works with other tests [11:08] So maybe the doctest is just bong [11:11] StevenK: The doctest is probably calling it with a str, not a unicode [11:12] Yes [11:12] wgrant: The previous call has unicode, but the one that errors doesn't. [11:13] Right [11:13] Easy fix [11:13] Yeah, landing it in a few === benji changed the topic of #launchpad-dev to: http://dev.launchpad.net/ | On-call reviewer: benji | Firefighting: - | Critical bugs: <150 === slank_away is now known as slank [15:00] sinzui, hey. Am I right that https://code.launchpad.net/~chrisjohnston/launchpad/fdl/+merge/140216 will not need a db change, because 330 is all we have in the DB? On a related note, can you or benji get that landed soon? I don't have an LP dev environment ATM [15:02] correct [15:03] mornin sinzui [15:18] thanks sinzui [15:18] np, hi cjohnston [15:18] benji do you have an LP dev environment to land cjohnston's branch, above, sometime today? [15:19] gary_poster: I'll get it going. cc/ benji [15:19] thank you much rick_h_ [15:19] thanks rick_h_ [15:26] cjohnston: went ahead and sent it off to ec2 just in case it hits some stray tests. See you in the afternoon. [15:27] thanks [15:29] rick_h_: I spoke with gary_poster more about my issue with lpsetup, I guess it's because I'm on quantal [15:30] that's my hope anyway :-/ was working for us on precise [15:30] cjohnston: ah the lxc issue? I have lxc working on here, but I did things more the hard way to customize the setup so not sure [15:31] cjohnston: oh right, benji started to help me with some issues on quantal and I ended up just manually creating an lxc the way I wanted and doing more manual setup in there [15:31] back in budapest [15:32] Gotcha.. It would be really nice if the barrier to help for the community as low as possible === teknico_ is now known as teknico === yofel_ is now known as yofel [21:47] rick_h_: ping [21:56] cjohnston: pong [21:56] cjohnston: what's up? around just a couple min [21:57] cjohnston: reforcing the build on buildbot, spurious failure [21:57] hopefully that's what you were looking for [21:58] will check back in later tonight [21:58] yup [21:58] so its all good then [21:58] thanks [21:59] cjohnston: yea, ec2 came back clean so just need to smack build-not around a bit === benji changed the topic of #launchpad-dev to: http://dev.launchpad.net/ | On-call reviewer: - | Firefighting: - | Critical bugs: <150 [22:00] cool