[01:34] <timrc> It is possible to get the Ubuntu series from the build record? I don't see an obvious way to do it :(
[01:37] <wgrant> Indeed, apparently the distroarchseries attribute is not exported
[01:41] <timrc> doh
[01:41] <timrc> current_source_publication.distro_series I think could work
[01:44] <StevenK> wgrant: This reached out and slapped me yesterday evening:
[01:44] <StevenK>      def test_displayarchs_for_copy_job_is_sync(self):
[01:44] <StevenK> -        # For copy jobs, displayarchs is "source."
[01:44] <StevenK> +        # For copy jobs, displayarchs is "sync."
[01:46] <StevenK> wgrant: Can triage https://bugs.launchpad.net/launchpad/+bug/1089615 , by the way?
[01:46] <_mup_> Bug #1089615: Source builds fail for packages with "3.0 (quilt)" format and unapplied patches <Launchpad itself:New> < https://launchpad.net/bugs/1089615 >
[01:56] <wgrant> StevenK: https://rt.admin.canonical.com/Ticket/Display.html?id=46345 will hopefully fix that
[02:00] <StevenK> wgrant: How are bzr-builder and 3.0 (quilt) related?
[02:01] <timrc> Hm, is the "Date finished" part of the description for 'date_first_dispatched' a copy-paste error? (see, https://launchpad.net/+apidoc/devel.html#build)
[02:03] <wgrant> StevenK: Because that's an SPRB
[02:03] <wgrant> SO it uses bzr-builder
[02:03] <wgrant> timrc: Yes
[02:14] <timrc> wgrant, Thanks... one last question... does the time between 'datecreated' and 'date_first_dispatched' represent the time the build spends waiting to build?
[02:15] <wgrant> timrc: Usually, yes. But for private PPAs it will also include the publication latency, as the build will not be dispatched until its source is published
[02:15] <timrc> or would it be more accurate to use the package_upload.date_created timestamp inconjunction with 'date_first_dispatched'?
[02:15] <timrc> ah, that's a useful bit of info
[02:16] <wgrant> The source package_upload.date_created and build.datecreated should usually be identical for PPA packages
[02:17] <timrc> wgrant, Okay, thanks
[02:51] <StevenK> wgrant: http://pastebin.ubuntu.com/1428915/
[02:54] <StevenK> wgrant: Have you seen buildbot's waterfall? It looked like it lost its mind
[03:00] <wgrant> StevenK: What was wrong with it?
[03:00] <wgrant> prasé had some disk issues and was offline for a while overnight
[03:01] <StevenK> wgrant: The pastebin? I don't like the line "+        if kind in ([], [u'']):"
[03:02] <wgrant> I was speaking of the waterfall
[03:02] <wgrant> Looking at the diff now
[03:02] <StevenK> Ah
[03:02] <wgrant> Why does that line exist?
[03:03] <wgrant> It should never be called with an empty string
[03:03] <StevenK> Because u''.split(' ') == [u''] :-(
[03:03] <wgrant> Ah, right
[03:03] <StevenK> wgrant: Yes, it was masking a bug
[03:03] <wgrant> kind has to be the worst name ever
[03:04] <StevenK> If I don't split and we call addBuild() twice, then searchable_names becomes 'a c f g h j newname'
[03:05] <StevenK> wgrant: s/kind/existing/ ?
[03:05] <wgrant> Yes
[03:05] <wgrant> So, you would perhaps do well to simply exclude elements that evaluate to False
[03:06] <StevenK> [u''] != False
[03:06] <StevenK> >>> u'' == False
[03:06] <StevenK> False
[03:07] <wgrant> This isn't JavaScript or PHP
[03:07] <wgrant> By "evaluate to" I mean in the context of a boolean expression
[03:07] <wgrant> ie. an if statement or clause of a list comprehension
[03:07] <StevenK> >>> bool(u'')
[03:07] <StevenK> False
[03:07] <wgrant> Yes
[03:08] <StevenK> Hmmm, can't we that using filter() or map() ?
[03:08] <wgrant> If you want, but I'd just say [x for x in existing if x]
[03:09] <wgrant> Mostly because Guido hates lambdas
[03:09] <StevenK> Construct a list from those elements of iterable for which function returns true. iterable may be either a sequence, a container which supports iteration, or an iterator. If iterable is a string or a tuple, the result also has that type; otherwise it is always a list. If function is None, the identity function is assumed, that is, all elements of iterable that are false are removed.
[03:09] <StevenK> filter(None, ...)
[03:10] <wgrant> Oh, didn't know about that special case
[03:10] <wgrant> How hideous
[03:11] <StevenK> Does that mean I can't use it? :-)
[03:11] <wgrant> You can
[03:12] <StevenK>     def _appendSearchables(self, existing, new):
[03:12] <StevenK>         return sorted(filter(None, set(existing) | set(new)))
[03:13] <StevenK> BUILDBOT
[03:15] <StevenK> wgrant: Or is that function now hideous? :-)
[03:16] <wgrant> That's fine
[03:28] <nigelb> What's the link for the LXC setup?
[03:29] <nigelb> Ah, nvm. Found it.
[03:30] <StevenK> nigelb: apt-get install lpsetup ; lp-setup install-lxc or so
[03:30] <nigelb> oh
[03:30] <nigelb> that's all?
[03:31] <lifeless> nigelb: not quite
[03:31] <lifeless> nigelb: need to add the lp ppa first I think
[03:31] <nigelb> oh right.
[03:31] <nigelb> but I was in the middle of doing it the all fashioned way from the LXC page.
[03:32] <wgrant> I use lp-setup on my laptop. It seems to work.
[03:32] <nigelb> should I stop that and do this? or does lp-setup accomplish the same thing?
[03:32] <wgrant> They're pretty much equivalent
[03:32] <wgrant> lp-setup just sets things up slightly unconventionally and in a more automated fashion
[03:32] <wgrant> The instructions on Running/LXC are what I use on my main dev machine, so I know they work fine :)
[03:33] <nigelb> ok :)
[03:37] <StevenK> wgrant: Hmm, I am stupid and thought contains_string for searchable_versions would work :-(
[03:39] <wgrant> Oh
[03:39] <wgrant> You landed a DB patch without working out whether you could actually use it? :)
[03:39] <StevenK> Surely postgres has a function for that
[03:40] <StevenK> And then we SQL() in it
[03:41] <wgrant> 'foover' = ANY(searchable_versions) with a GIN index should work, but testing now
[03:46] <wgrant> Nope, the ANY doesn't get indexed
[03:46] <wgrant> SELECT * FROM packageupload WHERE ARRAY['1.2.3'] <@ searchable_versions;
[03:46] <wgrant> works, though
[03:46] <wgrant> (the index now exists on DF)
[03:59] <StevenK> And no tgrm
[04:00] <wgrant> The trgm index has been building for a few minutes
[04:00] <wgrant> Must be nearly done
[04:02] <wgrant> There we are
[04:02] <wgrant> As expected the queries aren't exactly fast, but it's better than what we have now
[04:03]  * wgrant tries a composite
[04:29] <nigelb> The LXC is lucid.
[04:30] <StevenK> Intended behaviour
[04:30] <wgrant> Hopefully only for another month or so, though :)
[04:31] <nigelb> woo.
[04:31] <nigelb> So, I have python-virtualenv and virtualenv wrapper.
[04:31] <nigelb> They aren't in Lucid yet :)
[04:33] <nigelb> Oh, I still need to let launchpad take over my machine. just that it'll be the LXC that it'll take over.
[04:37] <wgrant> Right
[04:40] <wgrant> Hmm
[04:40] <wgrant> I wonder how big the biggest queue is
[04:40] <wgrant> Oh, <200k items
[04:43] <wgrant> So it may not be worth indexing either of the search columns at all
[04:44] <wgrant> Although the table won't be hot
[04:48] <StevenK> wgrant: Oh?
[05:00] <wgrant> Oh
[05:00] <wgrant> The tsearch2 prefix matching stuff from years ago actually got merged
[05:00]  * wgrant experiments
[05:03] <wgrant> So we may be able to ts2 this nicely rather than having to hack up a custom GIN operator
[05:03] <wgrant> (this is a shortcut that will let us quickly do prefix matching on each name, which is probably roughly as useful as the current exact_match behaviour, but much much faster)
[05:05] <StevenK> Neat
[05:08] <wgrant> (GIN indices are basically just btree indices which know how to do stuff like split an array up into multiple keys, so they can obviously internally be used for prefix searching on keys. But utilising that functionality through SQL can be a problem, and you sometimes have to end up defining your own GIN operator classes.)
[05:10] <StevenK> wgrant: So, do we want to deploy?
[05:11] <StevenK> It will drop us below 150 total criticals
[05:13] <wgrant> Is the db-stable merge through?
[05:13] <wgrant> Should be by now, I guess
[05:14] <StevenK> Yeah, it is
[05:15] <wgrant>  Total runtime: 55.598 ms
[05:15] <wgrant> But does it actually work...
[05:15] <wgrant> Yes
[05:16] <wgrant> Well that makes things easy
[05:17] <wgrant> As long as cjwatson agrees we can live with just prefix matching on each name, rather than full substring matching
[05:17] <StevenK> wgrant: I have a deployment request ready, if you have no objections?
[05:18] <wgrant> StevenK: doit
[05:18] <wgrant> Oh
[05:18] <wgrant> It was only added in 8.4
[05:19] <wgrant> So I don't have to feel like a complete idiot :)
[05:19] <wgrant> StevenK: SELECT * FROM packageupload WHERE (to_tsvector('simple', searchable_names) @@ to_tsquery('simple', 'ubiquity:*')) AND distroseries = 103 AND archive IN (1, 534) AND status = 3;
[05:20] <wgrant> Is a prefix match on ubiquity
[05:20] <wgrant> Drop the :* for exact match
[05:20] <StevenK> Hmmm
[05:20] <StevenK> Neat
[05:20] <wgrant> (the 'simple' makes lexeme conversion a no-op, so no stemming and such)
[05:20] <wgrant> Are the columns fully populated on DF?
[05:21] <StevenK> No
[05:21] <wgrant> :(
[05:21] <wgrant> Still, this should be rather fast
[05:21] <StevenK> I think some of the customs probably have NULL versions
[05:21] <wgrant> Ah
[05:21] <wgrant> But the names should be populated?
[05:22] <StevenK> Not fully
[05:22] <wgrant> :(
[05:22] <StevenK> ~ half the table is done
[05:22] <StevenK> I was planning to sorched earth the two of them and start it again
[05:22] <StevenK> *scorched
[05:23] <StevenK> And probably hack out the timeout
[05:24] <wgrant> I think a composite index might be more successful here
[05:24] <wgrant> Trying
[05:24] <wgrant> (prefixing the trigram index with (archive, distroseries) was not effective, simply because the index has so many entries)
[05:27] <wgrant> Hm, no, doesn't work
[05:27] <wgrant> How odd
[05:27] <wgrant> Still, fast enough without it :)
[05:28] <StevenK> Using ts2?
[05:30] <wgrant> Yes
[05:31] <wgrant>     "temp_pu5" gin (to_tsvector('simple'::text, searchable_names))
[05:31] <StevenK> And just GIN on searchable_versions?
[05:32] <wgrant> Yeah
[05:32] <wgrant>     "temp_pu" gin (searchable_versions)
[05:33] <wgrant> Its selectivity estimates are even pretty good
[05:34] <wgrant> http://paste.ubuntu.com/1429076/
[05:36] <StevenK> rows=0, though?
[05:36] <wgrant> BitmapAnd nodes lie sometimes
[05:36] <wgrant> Look at the end of the plan
[05:36] <wgrant>  Bitmap Heap Scan on packageupload  (cost=1062.97..1424.58 rows=4 width=135) (actual time=5.810..5.835 rows=7 loops=1)
[05:37] <wgrant> rows=7
[05:37] <StevenK> Yeah
[05:37] <StevenK> That looks pretty awesome
[05:38] <StevenK> exact match drops it to 3.3ms
[05:39] <wgrant> Well, yeah
[05:39] <wgrant> Once you include a version match it's going to be pretty much trivial
[05:39] <wgrant> A long enough search string on a trigram index can take a couple of seconds, but I'd be surprised if this ever exceeded 100ms
[05:42] <StevenK> wgrant: So shall I scorched earth searchable_* and garbo them up?
[05:42] <StevenK> Which I think requires the indicies die
[05:43] <wgrant> These indices shouldn't have a huge impact on performance, but it's probably still worth it to nuke them
[05:43] <wgrant> garbo performance, that is
[05:43] <wgrant> They only take about 5 minutes to regen afterwards, so feel free to delete temp_*
[05:43] <StevenK> wgrant: I was going to DROP COLUMN/ADD COLUMN because it's MUCH faster
[05:44] <wgrant> Ah, right
[05:44] <wgrant> k
[05:47] <wgrant> Hmm
[05:47] <wgrant> We may actually want to skip to_tsquery and to_tsvector at this point
[05:47] <wgrant> And instead just cast directly to tsquery and tsvector
[05:47] <wgrant> Since we don't eg. want to separate lexemes by -
[05:48] <StevenK> 2012-12-13 05:48:20 DEBUG2  [PopulatePackageUploadSearchables] Iteration 4 (size 26.8): 0.658 seconds
[05:49] <wgrant> Hopefully it'll accelerate...
[05:49] <StevenK> It will, it just takes a while
[05:49] <wgrant> Just be glad we're no longer doing that inline in searches :)
[05:49] <StevenK> Haha
[06:01] <wgrant> StevenK: Hm, it seems to be pretty slow
[06:01] <wgrant> Only 15k done
[06:01] <StevenK> Yeah
[06:02] <StevenK> If last run is anything to go by, the first 800 iterations are very slow
[06:02] <StevenK> Then it speeds up massively
[06:02] <wgrant> O_o
[06:04] <StevenK> 2012-12-13 06:04:04 DEBUG2  [PopulatePackageUploadSearchables] Iteration 264 (size 144.6): 4.541 seconds
[06:08] <wgrant> Hm
[06:08] <wgrant> Still surprisingly awful
[06:08] <wgrant> But I guess it'll get there ventually.
[06:08] <wgrant> Only 4.6 million to go
[06:17] <wgrant> StevenK: Oh
[06:17] <StevenK> Oh?
[06:18] <wgrant> Should be faster now
[06:18] <wgrant> By a factor of several
[06:18] <wgrant> Can you confirm?
[06:18] <StevenK> Indeed, it just sped up
[06:18] <StevenK> 2012-12-13 06:18:37 DEBUG2  [PopulatePackageUploadSearchables] Iteration 523 (size 1284.2): 4.860 seconds
[06:18] <wgrant> Lovely
[06:18] <wgrant> That's more like it
[06:18] <StevenK> And you did what?
[06:18] <wgrant> Magic
[06:19] <wgrant> aka. ANALYZE
[06:19] <StevenK> ANALYZE packageupload ?
[06:19] <wgrant> The stats for the nullness of searchable_* were bad
[06:19] <wgrant> Yes
[06:19] <StevenK> Oh
[06:19] <StevenK> Right
[06:19] <wgrant> So it was doing a seqscan to find them
[06:19] <StevenK> Because I've been bad
[06:19] <wgrant> Rather than an index scan on id
[06:20] <wgrant> Right, that's a little bit faster
[06:20] <StevenK> Right. Happily, that will affect us not at all on prod.
[06:20] <wgrant> Um
[06:20] <wgrant> It might
[06:20] <wgrant> But a vacuum would probably have fixed it in an hour or so anyway
[06:21] <StevenK> If we deploy 40-1 tonight, the earliest the garbo job will hit prod is tomorrow morning
[06:21] <wgrant> Sure
[06:24]  * StevenK waits for staging to finish building
[09:03] <cjwatson> wgrant: hmm.  I've definitely used full non-prefix substring matching in the not too distant past.  I'm not certain whether all of those were purely workarounds for lack of SPN.
[09:05] <wgrant> I can't think of many circumstances in which it would actually be useful, assuming all the names are searchable
[09:11] <cjwatson> Mostly surprising because all other non-exact matches in LP (AFAIK) are substring, not prefix
[09:14] <wgrant> There aren't very many non-exact matches in LP
[09:14] <wgrant> (and most of those that do exist should not :))
[09:14] <wgrant> There's no real use-case for getPublishedSources(exact_match=False), for example
[09:15] <wgrant> Full substring matches are extremely expensive
[09:17] <cjwatson> So, we'd have to document it, and it's a tad surprising, but we can live without full substring matches given the other improvements
[09:19] <wgrant> Mmm
[09:19] <wgrant> It's not really surprising
[09:19] <wgrant> Only because people didn't think at all before introducing substring matching in the first place
[09:20] <cjwatson> It's potentially surprising to people who've been using it
[09:20] <cjwatson> Which I don't think is often, but now and again
[09:20] <wgrant> Right
[09:20] <cjwatson> Mostly in "oh, I need to reject these three packages and they happen to be the only ones containing 'splotnib'" kinds of ways, probably
[09:21] <wgrant> Hopefully everyone will now ask themselves "is this is any way not stupid" when designing their next webservice API :)
[09:22] <wgrant> 'cause in any database context, substring matching is pretty much entirely impractical and not all that useful :(
[09:28] <wgrant> I dream that one day the argument can be removed from getPublishedSources and getPublishedBinaries, perhaps letting those API methods actually be possibly sustainable.
[11:15] <jtv> jam: got a moment for a maas-hardware-db question?
[11:16] <jtv> Want to know how you pull the lshw output out of the maas database.  Is it all in the maas codebase, or do you have something else somewhere?
[11:17] <jtv> Oh, is it the special case for "01-lshw" that's in the metadataserver api.py?
[11:17] <jtv> allenap: think I found the answer to your question ^  :)
[11:18] <jtv> It's in the diff.
[11:30] <allenap> jtv: Ta!
[11:33] <mgz> jtv: it's in maas, basically there's an api for getting the lshw output, which exists as the db is on the region controller but we want the cluster controllers to do the work
[11:35] <jtv> mgz: I guess it's the bit in the metadata API then, where the node calls "signal" and the API has a special-case check for "is the name of the file 01-lshw.out?"
[11:36] <mgz> that's for putting it *into* the db
[11:59] <jtv> We may be talking about different "the" dbs.
[11:59]  * jtv resists the temptation to get pedantic about the definition of Database
[12:00] <jtv> (Would have been a cheap dig at mongo)
[14:02] <timrc> sinzui, Hi, out of curiosity, do proprietary and embargoed projects hide membership details now as well?
[14:04] <sinzui> timrc, team are not proprietary/embargoed
[14:05] <sinzui> timrc, if you mean Private teams, no one can see them by default, if they are placed in a  public relationship like a bug subscription, only the Launchpad id, name, and icons are revealed
[14:06] <sinzui> so members are never shown to non-private-team-members
[14:09] <timrc> sinzui, ok
[14:15] <timrc> sinzui, so if a project is set to proprietary or embargoed and the client attempting to access it is not authenticated or authorized will it 404?
[14:15] <sinzui> It will 404...it does not exist to non-shared users
[14:19] <timrc> sinzui, good to know, that's what I'd expect
[16:50] <sinzui> oh sweet, Lp will call focus() on <input type="hidden"/>
[17:23] <jml> sinzui: :D
[19:39] <timrc> It's known (or maybe it isn't) that our use of PPAs is a little unorthodox.  Specifically we really only use them to build packages.  When the package is built and published, it gets funneled into another repository which in turn is used to build Ubuntu images.
[19:40] <timrc> One architectural problem with this design is that a user has the ability to upload or copy-rebuild the same package in multiple PPAs.  If they do this, it will cause a package collision "downstream", as the archive they're going to has a shared pool, and so only one copy of the package can exist there.
[19:41] <timrc> What would be nice, for us, at least, is if we could at least disable copy-rebuild.  That does not completely solve the problem, but it solves the most common cause for us.
[19:41] <timrc> Does anyone have outright objections to doing this?
[21:59] <lifeless> sinzui: radical honesty in bugs? zomg :)
[22:00] <sinzui> yes. the number is true
[22:00] <lifeless> sinzui: oh, I know. I'm just waiting for the slashdot now :)
[22:01] <sinzui> I doubt the 90 days remaining to get the list to zero though. Maybe I am too pessimistic. I fixed 7 last week, so there are still 1-day bug fixes
[22:12] <timrc> sinzui, So, we have Bug sharing policy set to "Proprietary, can be public" for a project and they want to use 'also effect another project' option but can't... this seems odd if the bugs are configured to share the same way for both projects... what am I missing?
[22:13] <timrc> hm actually the other project has the Bug sharing policy as "Proprietary" not "Proprietary, can be public"
[22:15] <sinzui> timrc, I think you found the problem
[22:20] <timrc> sinzui, thanks and sorry for disturbing ya :)
[22:21] <maxb> timrc: Re your PPA thing above.... surely this comes down to ensuring the users understand the tools?
[22:21] <timrc> maxb, you'd think
[22:22] <timrc> maxb, it's been an on going problem for us for the last 2 years... it turns out people don't like to read documentation and / or periodic PSA's on our lists are not enough
[22:23] <timrc> it is sort of a weird thing to consider when using a PPA, so I don't exactly blame people for doing it... also the default is to rebuild, so the opportunity for human error is greater
[22:25] <maxb> Copying packages between PPAs isn't usually something novice packagers do - what's different about your workflow that makes it more likely that people do it without the experience to understand the consequences?
[22:25] <maxb> (Though I do agree that the default is probably the wrong way around)
[22:41] <sinzui> wgrant, StevenK, Do you want to deal with this: https://answers.launchpad.net/launchpad/+question/216665
[23:04] <StevenK> wgrant: SELECT regexp_matches(json_data, 'package_version": "([^"]+)') FROM packagecopyjob;
[23:13] <StevenK> SELECT regexp_matches(json_data, 'package_version": "([^"]+)"') FROM packagecopyjob;
[23:13] <StevenK> Now we can scare wallyworld_
[23:13] <StevenK> SELECT regexp_matches(json_data, '"package_version": "([^"]+)"') FROM packagecopyjob;
[23:13] <wallyworld_> i'm not scared yet
[23:14] <StevenK> wallyworld_: regex matches on a JSON field in a SQL statement? :-)
[23:14] <wallyworld_> is regexp matching new?
[23:14] <StevenK> Nope
[23:14] <StevenK> Was around in 8.3, at least
[23:14] <wallyworld_> does lp use it much?
[23:15] <StevenK> Not at all, I think.
[23:15] <wallyworld_> but it does now :-)
[23:15] <StevenK> For a short time. That is going into a garbo job.
[23:24] <wgrant> SELECT (regexp_matches(json_data, '"package_version": "([^"]+)"')::debversion[])[1] FROM packagecopyjob;
[23:37] <wgrant> StevenK: Indices done
[23:37] <wgrant> EXPLAIN ANALYZE SELECT * FROM packageupload WHERE (searchable_names::tsvector @@ 'ubiquity:*') AND distroseries = 103 AND archive = 1 AND status = 3 AND searchable_versions @> ARRAY['2.1.0'];