[00:00] <StevenK> Ah
[00:00] <StevenK> So the two tests are terrible?
[00:00] <StevenK> Effectively, they are getUtility(IPackageUploadSet).getAll(distroseries, name=u"'") with and without exact_match=True
[00:00] <wgrant> Huh?
[00:01] <wgrant> No
[00:01] <wgrant> The tests have revealed a bug in your code
[00:01] <wgrant> If API input can cause a crash, then our code is buggy
[00:02] <StevenK> That's not API input, that's calling the method directly
[00:02] <wgrant> Sure
[00:02] <wgrant> But I can pass in a name of "'" through the API
[00:21] <StevenK> wgrant: But name = name.replace("'", "").replace('"', '') probably makes you hate me ...
[00:22] <wgrant> That's very PHPesque of you
[00:22] <wgrant> You should escape the string according to the various syntaxes that are involved
[00:23] <wgrant> Not just blindly remove characters until it works :)
[00:24] <StevenK> wgrant: IE, remove everything that isn't alphanumeric?
[00:25] <wgrant> That's precisely the opposite of what I suggested
[00:25] <StevenK> wgrant: Hold on
[00:25] <StevenK> Escaping in the ' or " case doesn't work
[00:25] <wgrant> Why not?
[00:26] <StevenK> Because as you said earlier, it isn't a valid tsquery syntax
[00:26] <wgrant> Confused...
[00:26] <wgrant> A plain ' is not valid tsquery syntax
[00:26] <wgrant> Because ' is special in tsquery syntax
[00:27] <wgrant> Like in SQL
[00:27] <wgrant> How do we resolve this in SQL?
[00:27] <wgrant> (and Python, and JavaScript, and...)
[00:28]  * StevenK is clearly missing something
[00:30] <wgrant> ' delimits a quoted string in Python, SQL, and a tsvector
[00:31] <wgrant> How do we normal handle ', and other special characters, in a Python or SQL string?
[00:31] <StevenK> By escaping them
[00:31] <wgrant> Right
[00:31] <wgrant> So how might we handle a ' in a tsvector string?
[00:31] <StevenK> By escaping it
[00:31] <wgrant> :)
[00:32] <StevenK> '''' isn't already escaped, or it needs to be escaped twice?
[00:32] <wgrant> How's that escaped?
[00:32] <wgrant> '''' is the SQL equivalent of Python's "'"
[00:32] <StevenK> Ah
[00:33] <wgrant> It's escaped in SQL terms
[01:01] <StevenK> wgrant: So, I have two MPs -- I'm happy to self-review the drop-garbo one, it just deletes 182 lines of code. What about the index MP?
[01:03] <wgrant> That should only be landed once the query branch is ready
[01:03] <wgrant> In case it requires changes
[01:03] <StevenK> Mmmmm
[01:04] <StevenK> I should drop my hacks on DF, add two rows to LDR and update it
[01:04] <wgrant> (none of this sort of thing should ever land until the whole thing is confirmed to work, really)
[01:04] <StevenK> We had confirmed it was working on Friday
[01:05] <wgrant> We hadn't ever done a full test
[01:06]  * StevenK executes on his plan
[01:08] <StevenK> wgrant: The lib/lp/registry/model/person.py cowboy on DF is yours?
[01:08] <StevenK> And lib/lp/soyuz/model/binarypackagebuild.py, I guess
[01:11] <wgrant> Indeed
[01:11] <wgrant> Well, person.py is me, binarypackagebuild.py is cjwatson
[01:12] <StevenK> wgrant: Do you want your change preserved?
[01:12] <wgrant> No
[01:52] <StevenK> wgrant: No objections to landing the death of garbo branch?
[01:52] <wgrant> StevenK: As long as it's done on all four DBs
[01:52] <StevenK> Yeah, all five -- dev, dogfood, qas, staging and prod
[02:52]  * StevenK stabs dogfood
[02:53] <StevenK> Damn it, load
[02:58] <StevenK> wgrant: DF's front page is unhappy, but +queue looks snappy
[02:59] <StevenK> Hmmm, I think JS is still broken
[03:00] <wgrant> Yeah
[03:00] <wgrant> No convoy, remember
[03:00] <StevenK> Then remove the flag?
[03:00] <wgrant> Won't help
[03:00] <wgrant> 3.5 won't work without convoy
[03:01] <StevenK> Bleh
[03:01] <wgrant> Though we still haven't removed 3.3, don't rely on that for much longer :)
[03:01] <wgrant> Why do you want JS?
[03:01] <StevenK> I can't see render times
[03:01] <StevenK> And I'd like to
[03:01] <wgrant> Ah
[03:01] <wgrant> view source :)
[03:01] <StevenK> At least 255 queries/external actions issued in 9.53 seconds
[03:02] <wgrant> That doesn't sound indexed
[03:02] <wgrant> What's the query?
[03:02] <StevenK> However, that's 'xorg' for Quantal done
[03:02] <wgrant> (or it could just be a really really cold cache)
[03:02] <StevenK> Let me resubmit
[03:02] <StevenK> At least 255 queries/external actions issued in 3.97 seconds
[03:03] <wgrant> Right
[03:03] <wgrant> Can you track down the main search query and see how long it took?
[03:03] <StevenK> Cause an OOPS?
[03:03] <wgrant> Or look at the query log that's inline in the page
[03:04] <wgrant> (you'll probably need to poke with firebug, or view source)
[03:05] <StevenK> SELECT DISTINCT PackageUpload.* FROM PackageUpload WHERE PackageUpload.distroseries = 110 AND PackageUpload.status IN (3) AND PackageUpload.archive IN (1, 534) AND ((searchable_names::tsvector @@ 'xorg:*')) ORDER BY PackageUpload.id DESC LIMIT 31 OFFSET 0 == 54 ms
[03:07] <StevenK> Let me cause an OOPS, I think we're missing some preloading
[03:08] <StevenK> OOPS-48338103ccc22a3accd4c8c2476a6107
[03:10] <StevenK> Hmmm, and why isn't that OOPS ID on neem? :-(
[03:10] <lifeless> you did a grep, not a find, right ?
[03:10] <wgrant> StevenK: DF doesn't transmit to neem
[03:11] <wgrant> /srv/launchpad.net/dogfood-logs
[03:11] <lifeless> The OOPS id isn't used for the file name, because that would lead to trusting untrusted data.
[03:11] <StevenK> lifeless: No, I did a load up the website and put the id in the field
[03:12] <StevenK> I was hoping for it to be formatted nicely, not in a file on DF :-(
[03:17] <wgrant> lifeless: How is trusting that data problematic, as long as you verify it's safe for use as a filename?
[03:17] <wgrant> If someone wants to collide, they already can
[03:17] <wgrant> It searches by ID, not filename
[03:18]  * StevenK finishes writing a novel for CA
[03:20]  * StevenK glares at this OOPS on mawson
[03:23] <StevenK> Holy crap it's slow the first time
[03:23] <StevenK> boost is still pathetic at 8.4 seconds
[03:28] <wgrant> But which part is slow?
[03:28] <wgrant> If boost is slow, it probably just means that the loading of the binaries is slow
[03:28] <wgrant> Not that the search is
[03:30] <StevenK> Mmmm
[03:31] <StevenK> wgrant: http://wedontsleep.org/~steven/f.html is the query log
[03:31] <lifeless> wgrant: mmm true
[03:31] <StevenK> There is no one query that is terribly slow
[03:31] <wgrant> C IS FOR COOKIE
[03:31] <wgrant> Bad StevenK
[03:32] <StevenK> wgrant: So hurry up and load it so I can rm it
[03:33] <wgrant> I've loaded it and erased your cookie
[03:35] <wgrant> We can see from the log that the search query is below 50ms
[03:35] <wgrant> So for the purposes of this change it is fine
[03:35] <wgrant> What's not fine is... the entire rest of the page
[03:37] <StevenK> Indeed
[03:37] <wgrant> But we already knew that :)
[03:40] <StevenK> wgrant: So looks like I'll be sticking to DistroSeries:+queue after this work lands ...
[03:40] <StevenK> wgrant: Anyway, index MP?
[03:40] <wgrant> Several decades ago I had a branch which mostly fixed DistroSeries:+queue
[03:41] <wgrant> I'm not sure if it's still around, and it's probably changed enough now that it's not useful
[03:41] <wgrant> I didn't propose it at the time because cachedproperty on model objects was banned
[03:42] <StevenK> I think the first step is loading everything related in one query, rather than this looping garbage it's doing
[03:42] <wgrant> Well
[03:42] <wgrant> You need to preload things
[03:42] <wgrant> It's pretty simple
[03:43] <wgrant> There's just a number of different sorts of things you'll need to preload
[03:49] <StevenK> wgrant: Are you satisified about the working-ness of this change?
[03:49] <lifeless> lifeless's contribution to lp: unbanning cachedproperty.
[03:50] <wgrant> lifeless: Yep
[03:50] <wgrant> StevenK: Well, your app code is wrong, but otherwise it seems to be good
[03:51] <StevenK> wgrant: What about it?
[03:53] <wgrant> StevenK: It'll probably still go horribly wrong if I include a : in my search term, for example
[03:55] <wgrant> The precise parsing rules don't seem to be very well documented
[03:56] <StevenK> wgrant: I need to escape : in the same way?
[03:56] <StevenK> Parsing rules for tsquery seem to not be documented at all
[03:58] <wgrant> http://www.postgresql.org/docs/9.1/static/textsearch-controls.html#TEXTSEARCH-PARSING-QUERIES goes some way
[03:59] <wgrant> Ah, http://www.postgresql.org/docs/9.1/static/datatype-textsearch.html is what I was looking for
[04:00] <StevenK> So, ' ", : * look special
[04:01] <StevenK> Without that errant comma
[04:02] <wgrant> Well
[04:02] <stub> abel was the last person to look hard at this
[04:02] <wgrant> I'd just put the whole thing in single quotes, backslashing any ' or \
[04:02] <wgrant> Given we want the whole thing treated as a single token
[04:03] <StevenK> wgrant: Which doesn't handle your : case
[04:03] <wgrant> Doesn't it?
[04:03] <wgrant> : isn't special when inside a quoted string
[04:03] <StevenK> Ah
[04:03] <wgrant> (and " and * aren't ever special)
[04:04] <StevenK> Does that mean it turns into @@ 'foo':* for substring?
[04:04] <wgrant> Right
[04:04] <wgrant> Er
[04:04] <wgrant> No
[04:04] <wgrant> @@ '''foo'':*'
[04:05] <wgrant> Or @@ E'\'foo\':*' if you're using that syntax
[04:05] <wgrant> Or @@ E'\'foo\\\'\\\\\':*' if you want to search for foo'\ :)
[04:06] <stub> raise StupidUserSearchError(wtf)
[04:06] <StevenK> Haha
[04:07] <stub> What is all this for btw? Someone working on the antique database text search?
[04:08] <StevenK> stub: Searching on DistroSeries:+queue
[04:08] <StevenK> Using tsvector because we like fast queries
[04:08] <StevenK> And not an eight table join and four likes
[04:09] <StevenK> Oh god, doesn'
[04:09] <StevenK> Bleh
[04:09] <StevenK> Oh god, doesn't this turn into .replace(r'\\', r'\\\\')
[04:13] <stub> In that case, why have you got odd characters you need to match? For word search, you want to normalize the data in the index.
[04:13] <stub> Oh... this is for the search
[04:18] <stub> I don't know what the data looks like you are searching. But if odd characters are coming in, you have bigger problems with fuzzy matching like tsvector/tsquery such as case sensitivity, stop words, stemming
[04:19] <wgrant> stub: Well
[04:19] <wgrant> stub: stemming only happens if you use to_tsvector
[04:20] <wgrant> So it's not a problem here
[04:20] <wgrant> We just skip stemming and stopwords etc
[04:20] <wgrant> Creating the tsvector and tsquery directly with exactly the content we desire
[04:21] <wgrant> (I only opted to use ts2 because otherwise we'd need a custom GIN operator to do prefix search on an array of strings)
[04:21] <StevenK> God damn I'm confused
[04:21] <stub> lovely that this is all documented now. i might be able to understand that old code now.
[04:21] <wgrant> StevenK: What's confusing?
[04:22] <StevenK> If I paste this, I may get murdered
[04:22] <wgrant> Let me sharpen my knives
[04:22] <StevenK> :-(
[04:22] <StevenK> Waiting for an error, anyway
[04:23] <StevenK>             name = name.replace("'", "\\'").replace(r'\\', r'\\\\')
[04:23] <StevenK>             name = "''%s''" % name
[04:23] <StevenK> And then we enclose it in '' in the SQL condition
[04:23] <wgrant> Ew
[04:23] <StevenK> ProgrammingError: syntax error in tsquery: "''':*"
[04:23] <StevenK> LINE 1: ...chive IN (16) AND ((searchable_names::tsvector @@ '''\''':*'...
[04:23] <wgrant> One level at a time pls :)
[04:24] <wgrant> Create the string as you desire ::tsquery to see it
[04:24] <wgrant> Then wrap that in sqlvalues() or similar, to the do the SQL escaping
[04:25] <wgrant> (also, you need to escape the backslashes first, or you'll be escaping escape characters
[04:26] <StevenK> Hmm, that may not be helping
[04:27] <StevenK> wgrant: The index and NOT NULL MPs are up
[04:27] <StevenK> The branch I'm working on depends on the indexes, but the NOT NULL is optional
[04:28] <StevenK> I think sqlvalues isn't helping me
[04:31] <StevenK> Search string: ''unique-from-factory-py-line3348-101271''
[04:31] <StevenK> sqlvalues escaped: '''''unique-from-factory-py-line3348-101271'''''
[04:32] <wgrant> Oh right
[04:32] <wgrant> sqlvalues still uses the horrid old non-standard escaping syntax
[04:33] <StevenK> I'm tempted to ignore it
[04:33] <StevenK> And just build the string by hand
[04:33] <wgrant> Why?
[04:34] <StevenK> Mainly because it's annoying me
[04:34] <StevenK> However, I win
[04:34] <wgrant> What is your code?
[04:35] <StevenK> wgrant: http://pastebin.ubuntu.com/1444536/
[04:35] <wgrant> Line 3 is odd
[04:36] <wgrant> Are you sure you really want a raw string?
[04:36] <wgrant> that'll replace \\ with \\\\
[04:36] <wgrant> Not \ with \\
[04:36] <StevenK> Ah
[04:36] <StevenK> Now '\\', '\\\\'
[04:36] <StevenK> I seem to not be stabbed, so that's a good thing ...
[04:36] <wgrant> In a raw string, a backslash causes the following character to be interpreted literally, but the escaping backslash is retained in the final string
[04:37] <wgrant> Well
[04:37] <wgrant> You're still using sqlvalues
[04:37] <wgrant> Which you should stop doing
[04:37] <wgrant> But it's better than manually doing the SQL escaping
[04:37] <StevenK> Oh?
[04:37] <wgrant> SQL("whatever @@ ?", params=('blah',))
[04:38] <StevenK> Then search_string turns into "''%s''" ?
[04:39] <wgrant> Confused
[04:39] <wgrant> Why?
[04:39] <StevenK> Because you had ''' in your earlier example
[04:39] <wgrant> params handles escaping for you
[04:39] <wgrant> That's why it exists
[04:40] <wgrant> You give it a normal string, it will inject it into the SQL safely
[04:43] <StevenK> wgrant: sqlvalues gutted from name and version searching
[04:43] <wgrant> Lovely
[04:43] <wgrant> sqlvalues is thoroughly deprecated; you should avoid it except in legacy code
[04:46] <StevenK> wgrant: http://pastebin.ubuntu.com/1444550/
[04:52] <wgrant> StevenK: searchable_versions can be done with a native Storm expression
[04:52] <wgrant> There are Array and ArrayContains (though the latter is @> rather than <@, I'm sure you can flip your args :)) bits in lp.services.database.stormexpr
[04:53] <wgrant> And the parens in the searchable_names SQL are likely redundant
[04:53] <wgrant> And "if name is not None and name != ''" sounds suspiciously close to "if name"
[04:54] <StevenK> Old code
[04:55] <wgrant> I'm an equal opportunity complainer
[05:01] <StevenK> wgrant: http://pastebin.ubuntu.com/1444568/
[05:02] <wgrant> Indentation!
[05:02] <wgrant> That looks like you're appending an ArrayContains and an Array
[05:03] <StevenK> Fixed
[05:03] <StevenK> Any other complaints?
[05:03]  * StevenK waits for the flood
[05:07] <wgrant> Sadly not
[05:08] <StevenK> wgrant: Should I put up an MP for this branch, then? Then you can happily nitpick more sections of the code.
[05:09] <wgrant> yep
[05:20] <StevenK> wgrant: That MP is now up
[05:28] <StevenK> wgrant: Thanks. Shall I wait for stub about the index branch, or which?
[05:33] <wgrant> Looks good, but let me just test something
[05:41] <StevenK> wgrant: Ah, seeing if a composite helps?
[05:42] <wgrant> Yeah
[05:42] <wgrant> It still doesn't
[05:42] <StevenK> :-(
[05:42] <StevenK> It's still fast enough
[05:42] <wgrant> Sure
[05:43] <StevenK> And beats the pants off the 15 seconds it was :-)
[05:43] <wgrant> (also, it's "indices", not "indicies")
[05:44] <StevenK> Right, index branch landing
[05:45] <StevenK> When stub reappears and has applied them both to qas and prod, I'll land the use branch
[05:45] <StevenK> And when he approves the NOT NULL branch, I'll land that.
[05:49] <wgrant> Right
[05:50] <wgrant> Hmm
[05:52] <StevenK> Hah
[05:52] <StevenK> DRS is already used for DistroSeries:+queue
[05:52] <wgrant> Yes
[05:52] <wgrant> IIRC it only preloads PU*
[05:52] <wgrant> Not everything else
[05:53] <StevenK> BPF, SPRF, SPR
[05:54] <StevenK> It may be trying to add in BPB
[05:54] <StevenK> But it at least needs BPB and BP
[05:54] <StevenK> *PB
[09:27] <StevenK> stub: O HAI. Can you have a look at 40-2 in devel and apply the indicies to prod and qas?
[09:30]  * stub looks
[09:31] <StevenK> stub: https://code.launchpad.net/~stevenk/launchpad/index-searchables-for-pu/+merge/140120 is the MP
[09:32] <stub> StevenK: Backups still running, so it needs to wait a bit longer
[09:32] <stub> I'll do staging and qas
[09:42] <stub> StevenK: It is on qas, staging already had it from nightly updates
[09:44] <cjwatson> StevenK: you can drop my binarypackagebuild.py cowboy if you like
[10:02] <StevenK> cjwatson: It was two lines, and didn't hurt to restore.
[10:02] <StevenK> I certainly committed more evils to DF
[10:03] <StevenK> stub: Surprised the backups are still going
[10:03] <stub> yeah
[10:08] <stub> 9 hours so far, only got as far as translationmessage
[10:08] <stub> less than 30 mins into that one
[10:09] <stub> oh, only one minute into translationmessage
[10:10] <StevenK> I thought they only took six?
[10:12] <wgrant> wildcherry's backup would normally have finished about 20 minutes ago
[10:12] <stub> yeah, suspect high io. Running at just over 10% iowait atm.
[10:12] <wgrant> It's possible that allocate-revision-karma is causing excessive load as it catches up with 15 million revisions of backlog
[10:12] <stub> revisionkarma catchup
[10:12] <stub> yer
[10:12] <wgrant> It's about half done
[10:13] <stub> and our io utilization has been creeping up anyway as things get optimized and we become less cpu bound.
[10:13] <wgrant> Yeah
[10:14] <wgrant> I've been noticing more and more IO issues in explain analyzes lately.
[10:14] <wgrant> Blocks just don't read as quickly as they used ot
[10:18] <stub> wgrant: production or dogfood?
[10:18] <wgrant> Prod
[10:19] <wgrant> dogfood's always been dreadful :)
[10:19] <wgrant> Reading in cold blocks on prod is just not as fast as it probably should be
[10:19] <stub> I'll blame contention on production, but we have a few knobs to twiddle if it looks like it needs it
[10:20] <stub> At the moment, we are tuned so pg thinks random access is pretty darn fast due to almost always having our data in disk cache. That might be a lie now.
[10:20] <wgrant> Yeah
[10:20] <wgrant> With the DB this big...
[10:21] <stub> its always been big, it is the size of the hot set of data that is important.
[10:21] <wgrant> It wasn't all that long ago that the DB was 250GB
[10:21] <wgrant> Now it's more than 350GB
[10:21] <wgrant> We've gone from 2x RAM to 3x RAM
[10:21] <stub> yeah, but ram is 128GB
[11:01] <StevenK> Hmm, that's a bit horrible
[11:02]  * StevenK glares at buildbot
[11:02] <StevenK> Anyway, this is wgrant's fault.
[11:02] <StevenK> He told me to Storm-ify it.
[11:07] <nigelb> StevenK: When in doubt, blame wgrant?
[11:08] <StevenK> It works with other tests
[11:08] <StevenK> So maybe the doctest is just bong
[11:11] <wgrant> StevenK: The doctest is probably calling it with a str, not a unicode
[11:12] <StevenK> Yes
[11:12] <StevenK> wgrant: The previous call has unicode, but the one that errors doesn't.
[11:13] <wgrant> Right
[11:13] <wgrant> Easy fix
[11:13] <StevenK> Yeah, landing it in a few
[15:00] <gary_poster> sinzui, hey.  Am I right that https://code.launchpad.net/~chrisjohnston/launchpad/fdl/+merge/140216 will not need a db change, because 330 is all we have in the DB?  On a related note, can you or benji get that landed soon?  I don't have an LP dev environment ATM
[15:02] <sinzui> correct
[15:03] <cjohnston> mornin sinzui
[15:18] <gary_poster> thanks sinzui
[15:18] <sinzui> np, hi cjohnston
[15:18] <gary_poster> benji do you have an LP dev environment to land cjohnston's branch, above, sometime today?
[15:19] <rick_h_> gary_poster: I'll get it going. cc/ benji
[15:19] <gary_poster> thank you much rick_h_
[15:19] <cjohnston> thanks rick_h_
[15:26] <rick_h_> cjohnston: went ahead and sent it off to ec2 just in case it hits some stray tests. See you in the afternoon.
[15:27] <cjohnston> thanks
[15:29] <cjohnston> rick_h_: I spoke with gary_poster more about my issue with lpsetup, I guess it's because I'm on quantal
[15:30] <gary_poster> that's my hope anyway :-/ was working for us on precise
[15:30] <rick_h_> cjohnston: ah the lxc issue? I have lxc working on here, but I did things more the hard way to customize the setup so not sure
[15:31] <rick_h_> cjohnston: oh right, benji started to help me with some issues on quantal and I ended up just manually creating an lxc the way I wanted and doing more manual setup in there
[15:31] <rick_h_> back in budapest
[15:32] <cjohnston> Gotcha.. It would be really nice if the barrier to help for the community as low as possible
[21:47] <cjohnston> rick_h_: ping
[21:56] <rick_h_> cjohnston: pong
[21:56] <rick_h_> cjohnston: what's up? around just a couple min
[21:57] <rick_h_> cjohnston: reforcing the build on buildbot, spurious failure
[21:57] <rick_h_> hopefully that's what you were looking for
[21:58] <rick_h_> will check back in later tonight
[21:58] <cjohnston> yup
[21:58] <cjohnston> so its all good then
[21:58] <cjohnston> thanks
[21:59] <rick_h_> cjohnston: yea, ec2 came back clean so just need to smack build-not around a bit
[22:00] <cjohnston> cool