[00:41] <lifeless> I wonder if my with patch will be past buildbot in time for the deploy
[00:41]  * lifeless has a cunning plan to add as many things as possible
[00:41] <wgrant> lifeless: Will we have an mthaddon in time?
[00:42] <lifeless> also a good question
[00:42] <wgrant> I was assuming not, otherwise I would have been landing stuff more aggressively today.
[00:42] <lifeless> staging takes 20 minutes
[00:42] <lifeless> if he shows up at ~8 as normal
[00:42] <lifeless> then yes
[00:42] <lifeless> otherwise no
[00:43] <wgrant> Hmm, the docs say the tree should be prepped 2.5 hours before :)
[00:43] <lifeless> yes
[00:43] <lifeless> and we have such a tree
[00:43] <wgrant> But I guess you could try to get another pushed out, and just use the old one if it's not there in time.
[00:43] <lifeless> as long as we don't bork it we're fine
[00:43] <lifeless> the 2.5 hours - if you are reading the losa docs - is time for 2 staging attempts + discussion
[00:43] <wgrant> Oh.
[00:44] <wgrant> staging as in staging, not staging, right. I was wondering why you were talking about staging, and how you'd managed to get it to update in 20 minutes.
[00:46]  * wgrant lunches.
[00:46] <StevenK> But it isn't even midday yet?
[00:47] <StevenK> Another day, another firefox update.
[01:07] <thumper> StevenK: it is past midday
[01:07] <thumper> well past :)
[01:08] <StevenK> thumper: Timezone fail :-)
[01:08] <wgrant> StevenK: Shh.
[01:09] <StevenK> wgrant: Do you have dinner at 4pm, too?
[01:09] <wgrant> StevenK: No :(
[01:11] <wgrant> thumper: Is the EnumChoiceWidget suitable for the bugtask table too?
[01:12] <thumper> wgrant: it should be, perhaps with a little tweaking
[01:12] <thumper> wgrant: in the same way the InlineEditPickerWidget should be for selecting a person
[01:12] <wgrant> thumper: Why does one have InlineEdit and the other not?
[01:13] <thumper> wgrant: because I never got around to renaming it, and that is what it was originally called
[01:13] <wgrant> Ah, good. Was hoping I wasn't missing some difference.
[01:16] <thumper> nah...
[01:16] <thumper> I'm just fixing a few tests on my blueprint-magic branch
[01:16] <thumper> which widgetizes the blueprint page as a proof of concept
[01:16] <thumper> or I should say
[01:16] <wgrant> Excellent.
[01:16] <thumper> another proof of use
[01:16] <StevenK> Awww. Here I was hoping it removed blueprints.
[01:17] <thumper> I like blueprints
[01:17] <StevenK> Blueprints are annoying
[01:17] <thumper> personally I think merging blueprints and bugs is wrong
[01:44]  * thumper runs blueprint-magic through ec2
[02:46] <wgrant> lifeless: what benefit does colocation provide us?
[02:55] <lifeless> wgrant: we need to support the protocol: more metadata, streaming fetch of N branches at onces, multiple heads etc
[02:55] <lifeless> wgrant: + [possibly] get rid of stacking and massively simplify things
[02:55] <wgrant> I guess.
[02:56] <StevenK> Get rid of stacking?
[02:57] <lifeless> wgrant: think about ideal loom behaviour
[02:57] <lifeless> wgrant: pushing 200 vim patches == pain; pushing 1 collection of branches == nice
[02:58] <lifeless> StevenK: stacking is the source of many bugs and slowdowns in bzr
[03:00] <StevenK> I like it for LP development
[03:03] <lifeless> StevenK: you like the performance
[03:04] <lifeless> StevenK: if it was faster than it is now, would you really whine?
[03:05] <StevenK> lifeless: TBH, with stacking I don't mind bzr push performance, and I'm happy about the disk space win for crowberry. If it was faster without losing the win for crowberry, that would be awesome.
[03:08] <lifeless> it has the potential to be smaller
[03:08] <lifeless> we don't delete branches
[03:09] <lifeless> and the minimum size for a stacked branch is the size of one inventory - which doesn't compress well
[03:09] <lifeless> if all those branches were combined, the incremental overhead per branch could be a lot lower
[03:09] <lifeless> the question is whether the baseline overhead would be more or less
[03:09] <StevenK> But we can't do that today, right?
[03:10] <lifeless> no
[03:10] <lifeless> its a nontrivial discussion
[03:10] <lifeless> and we have other fish to fry
[03:10] <StevenK> Right.
[03:11] <wgrant> lifeless: Why does it need a full inventory? Because it starts a new compression group?
[03:12] <wgrant> (my knowledge of 2a and above is sorely lacking)
[03:12] <StevenK> And above?
[03:12] <StevenK> There's another format after 2a?
[03:13] <wgrant> development-subtree, for one. But it's not exactly very different.
[03:19] <lifeless> wgrant: it has to be able to generate a delta
[03:19] <lifeless> wgrant: for anything in it
[03:19] <StevenK> Sigh. Minutes after I say I'm happy with push performance, I'm stuck waiting for it.
[03:20] <wgrant> lifeless: And it can't use just a delta on top of the stacked-on CHK tree?
[03:20] <wgrant> I should probably read how CHK actually works :)
[03:20] <lifeless> wgrant: fetch operations are single repo always
[03:20] <lifeless> wgrant: consider: client A, servers B and C with a firewall between B and C
[03:21] <lifeless> wgrant: if sftp to B and C worked but bzr+ssh didn't it would be unpleasant
[03:21] <lifeless> wgrant: so the way we did it is to say that a repository must:
[03:21] <lifeless>  - for any rev R it has:
[03:22] <lifeless>    - be able to return the content of the texts in R [in a repo specific format - e.g. fulltext, delta against some ancestor, whatever]
[03:22] <lifeless>    - be able to describe the content of R as a delta against the immediate ancestors of R on all sides
[03:23] <lifeless>  - on pushes the server says "I am missing the parents of revisions X,Y,Z"
[03:24] <wgrant> Oh, right.
[03:25] <lifeless> we could, in theory, have a partial CHK tree for a given rev
[03:25] <lifeless> so far we haven't implemented that
[03:25] <lifeless> hmm, new timeout
[03:25] <lifeless> SourcePackage:+index
[03:26] <wgrant> What's the bad query?
[03:26] <lifeless> dunno yet
[03:27] <wgrant> Ah, there.
[03:27] <StevenK> wgrant: https://code.launchpad.net/~stevenk/launchpad/derive-common-ancestor/+merge/52796 -- given it's our work, I'm not asking for a reviewer, but look it over?
[03:28] <wgrant> lifeless: Ouch, 9s in 3 repeated queries.
[03:28] <wgrant> Rather one triplicated query.
[03:29] <wgrant> It seems to be exactly the same query.
[03:29] <wgrant> (looking at https://lp-oops.canonical.com/oops.py/?oopsid=OOPS-1894B1679, queries 20, 40 and 41)
[03:30] <lifeless> and there is AND SourcePackagePublishingHistory.status IN (2)
[03:30] <lifeless> on q 19
[03:30] <wgrant> So there is.
[03:30] <lifeless> so maybe 4 calls
[03:31] <wgrant> I would normally say that two of them are probably TAL guarding the display of the third... but these are full queries.
[03:31] <wgrant> Might try to get tracebacks from DF.
[03:31] <wgrant> Or go TAL-diving, but that is more tedious.
[03:32] <lifeless> I'm doing some houseworky stuff atm
[03:32] <lifeless> but if you wanted to identify the call sites, that would be awesome
[03:32] <wgrant> Sure, just trying not to collide with you.
[03:33] <StevenK> We can't get full tracebacks from qas?
[03:33] <wgrant> StevenK: I have a patch to give OOPSes a traceback for every query,.
[03:33] <wgrant> But it makes them unparseable, so it's not really usable on qas.
[03:34] <wgrant> So I pretend that Julian doesn't exist and use it on mawson.
[03:34] <StevenK> Haha
[03:34]  * StevenK ponders ringing Subaru
[03:34] <wgrant> Oh?
[03:35] <StevenK> They've had my car for 5 and a half hours now. Surely they're done servicing it.
[03:36] <wgrant> :(
[03:36] <StevenK> wgrant: Can haz opinion on MP? Or is it on your list?
[03:36] <wgrant> StevenK: Looking.
[03:37] <wgrant> mawson will take about forever to update.
[03:37] <StevenK> Just one or two eons
[03:37] <wgrant> +                'derived': dervied_changelog,
[03:37] <wgrant> typo
[03:37] <StevenK> Sigh
[03:38] <StevenK> All instances fixed, thanks.
[03:38] <wgrant> You want to factory out the madness into something like get_ancestry(SPR)
[03:39] <wgrant> Otherwise that looks good.
[03:39] <StevenK> I do? I figured _updateBaseVersion() was self-contained enough.
[03:40] <wgrant> You're duplicating the set(Changelog(spr.changelog.read()).versions)
[03:41] <wgrant> Also, what does debian.changelog do if the changelog is unparsable?
[03:42] <StevenK> Returns an empty list
[03:42] <StevenK> Which is fine by me
[03:43] <wgrant> It's not going to raise exceptions in any case?
[03:43] <StevenK> wgrant: I'm happy to write a test for that case.
[03:43] <wgrant> That would be handy.
[03:44] <StevenK> wgrant: get_ancestry in DSD or SPR?
[03:44] <wgrant> StevenK: SPR is big enough already.
[03:44] <wgrant> Keep it in DSD until we need it elsewhere, I think.
[03:54] <StevenK> wgrant: http://pastebin.ubuntu.com/578173/
[03:55] <wgrant> StevenK: Looks reasonable.
[03:58] <StevenK> No manual entry for subunit-stats
[03:59] <StevenK> Subunit has to be the most poorly documented set of scripts ever
[04:04] <lifeless> StevenK: I give you perl
[04:04] <lifeless> StevenK: seriously, subunit-stats --help.
[04:04] <StevenK> help2man, kthxbye
[04:04] <lifeless> patches appreciated kthxdeal
[04:05] <wgrant> lifeless: So, I've found the sources of the queries.
[04:06] <wgrant> lifeless: They are very fast on DF when caches are hot.
[04:06] <wgrant> Well, not very fast, but <200ms.
[04:06] <StevenK> But very fast on DF is still 2 seconds.
[04:07] <wgrant> lifeless: SourcePackage.summary and SourcePackage.published_by_pocket. A couple of calls to each.
[04:07] <wgrant> Can you get a plan from a staging?
[04:07] <lifeless> sure
[04:09] <wgrant> lifeless: Thanks for the review.
[04:09] <wgrant> I hope next week that the OOPS counts will be low enough that I can sensibly go through and tear out the old OOPS reporting stuff.
[04:09] <wgrant> And then clean up the exception handling.
[04:10] <lifeless> I'm using query 11 from https://lp-oops.canonical.com/oops.py/?oopsid=1894B1679#statementlog
[04:10] <wgrant> Now that we know (hopefully) everywhere it's needed.
[04:10] <lifeless> note that its not a hold/cold issue because its consistnetly slow in the oops
[04:11] <wgrant> Indeed, I noticed that.
[04:11] <lifeless> sadly, tis fast on qas
[04:11] <wgrant> Hmmmmmmm.
[04:11] <lifeless> want me to check staging ?
[04:11] <wgrant> Worth a try, I guess :/
[04:12] <lifeless> oh but
[04:12] <lifeless> there is also the deserialiation overhead
[04:12] <lifeless> same results on staging
[04:12] <wgrant> :(
[04:13] <lifeless> no, not that
[04:13] <lifeless> 72 rows
[04:14] <lifeless> tagged it dba
[04:14] <wgrant> Thanks.
[04:14] <lifeless> we need to start capturing db hostnames
[04:15] <lifeless> still, we should fix
[04:15] <lifeless> no need to do 3 lookps
[05:02] <StevenK> Huzzah, I have my car back
[05:23] <lifeless> stub: hi
[05:23] <stub> yo
[05:23] <stub> lifeless: https://dev.launchpad.net/Database/LivePatching
[05:23] <lifeless> stub: I saw - looking good
[05:23] <lifeless> stub: I am drafting a 'ReliabileDBDeployments' LEP too
[05:24] <lifeless> stub: which will frame whatever work we need to invest in this
[05:24] <stub> Ok. Do you want to incorporate what I put together?
[05:24] <wgrant> I was very pleased to see LivePatching this morning.
[05:25] <lifeless> stub: I think the are complementary - the L-P page is how and implementation strategies
[05:25] <lifeless> stub: the LEP will be what, goals, constraints, requirements
[05:25] <lifeless> s/the/they/
[05:25] <stub> Ok. I'll update that wiki document if I think of anything new or get feedback then.
[05:26] <lifeless> excellent
[05:26] <lifeless> stub: we have some queries running slow on prod slaves, but fast on qastaging/staging
[05:27] <lifeless> stub: two so far that I know of : the duplicate bug detection FTI queries (the ones you said we can't do realtime) and the one in https://bugs.launchpad.net/launchpad/+bug/732398
[05:27] <_mup_> Bug #732398: SourcePackage:+index timeout <dba> <timeout> <Launchpad itself:Triaged> < https://launchpad.net/bugs/732398 >
[05:27] <stub> wgrant: So I'm worried the extra overhead (sometimes needing 3x as many db patches, extra code to support 'old' and 'new' schemas) could deter devs. You disagree and think you would make use of the process?
[05:27] <wgrant> stub: We can push things out more quickly and without hideous amounts of downtime.
[05:27] <stub> Sounds like we need to pull some RAM out of prod....
[05:27] <stub> ;)
[05:28] <wgrant> Why wouldn't people make use of it, even if it slightly more cumbersome?:
[05:28] <wgrant> (I note that the page doesn't define what a light patch is, though.
[05:28] <lifeless> stub: if you could get an explain analyze on all three db's for the query in https://bugs.launchpad.net/launchpad/+bug/732398/comments/1 - that would be awesome
[05:28] <_mup_> Bug #732398: SourcePackage:+index timeout <dba> <timeout> <Launchpad itself:Triaged> < https://launchpad.net/bugs/732398 >
[05:29] <stub> wgrant: I'm just a born devil's advocate. There is more overhead in this process, and I'm interested in if the extra overhead will overcome the desire to get stuff rolled out 'now' rather than 'next cycle'
[05:29] <lifeless> bah, brb
[05:30] <lifeless> am I back?
[05:30] <stub> lifeless: your back
[05:30] <lifeless> cool
[05:30] <wgrant> You're not gone.
[05:30] <lifeless> so - can has analyze ?
[05:31] <lifeless> stub: and then, I'd like to talk fti briefly, possibly voice, possibly here
[05:31] <wgrant> Interestingly enough, as soon as I said "you're not gone", freenode lagged for 30s.
[05:31] <lifeless> stub: (short story, I want to know what I'm missing on the query - it's plan and behaviour on staging seem totally fine)
[05:32] <lifeless> wgrant: had to bounce wifi to stop openid trashing all my open tabs when I restarted chromium
[05:32] <wgrant> Hah.
[05:32] <lifeless> and I had to restart chromium becuase it had forgotten about a popup window which was permanently stuck inthe foreground
[05:33] <cody-somerville> Software sucks :(
[05:33] <lifeless> (not a browser window, right mouse context window)
[05:34] <stub> Cold, that query ran in 500ms or less on all prod servers
[05:34] <lifeless> stub: argh
[05:34] <lifeless> stub: so, we have a diagnostic challenge
[05:34] <wgrant> Because it took 3s hot.
[05:34] <stub> lifeless: Somehow get more information about locks
[05:34] <lifeless> wgrant: lets be precise
[05:35] <lifeless> wgrant: our timelime which records query serialisation, queuing, deserialistion and upcasting to objects and any time given to another worker thread, showed 3 seconds.
[05:35] <wgrant> True.
[05:36] <stub> Interestingly, the fastest one (launchpad_prod_1) had a slightly different plan
[05:36] <stub> Sorry - launchpad_prod_2
[05:36] <lifeless> perhaps we should capture the plan for any query over 1 second
[05:36] <lifeless> into the timeline
[05:38] <lifeless> there is already a per thread timeline bug
[05:38] <lifeless> bug 243554
[05:38] <_mup_> Bug #243554: oops report should record information about the running environment <lp-foundations> <oops-infrastructure> <Launchpad itself:Triaged> <OOPS Tools:Triaged> < https://launchpad.net/bugs/243554 >
[05:39]  * lifeless retitles
[05:40] <lifeless> stub: so the same query is run three times in that page
[05:40] <stub> In this case, I think the plan is a red herring and just an artifact of different statistics. The costs of the different parts of the plans are close enough to identical.
[05:40] <lifeless> stub: https://launchpad.net/ubuntu/lucid/+source/chromium-browser/+index - and its timing out now
[05:41] <lifeless> OOPS-1895M373
[05:41] <lifeless> trigging an lpnet sync
[05:42] <stub> Just reran the previous query - slowest was 318ms
[05:42] <wgrant> Is that with or without status IN (2)?
[05:43] <lifeless> stub: would lock contention explain the same query being slow 3 times in a row ?
[05:43] <stub> no
[05:44] <stub> Well... maybe.
[05:44] <lifeless> https://lp-oops.canonical.com/oops.py/?oopsid=1894B1679#repeatedstatements
[05:44] <lifeless> 4th row
[05:44] <lifeless> 3 calls to it, average time 2911ms
[05:44] <stub> If it is a slow process like the publisher it could be locking rows in the same set returned by the slow query in different transactions
[05:45] <lifeless> ok, so lets see the times for these oopses
[05:45] <stub> Oh... 3 queries in one transaction, no - not lock contention
[05:45] <lifeless> they are spread over the day
[05:47] <stub> locks shouldn't be blocking selects anyway.
[05:48] <lifeless> time of day for the oopses: 1937 1420 1937 0846 1201 1335 1258 1056
[05:48] <lifeless> wgrant: what time range does the publisher run in ?
[05:48] <wgrant> lifeless: Primary archive? Normally 03-40
[05:49] <lifeless> ok, not that then
[05:49] <wgrant> But it should release locks a good 10-15 minutes before it finishes.
[05:49] <stub> High replication load possibly - look for corresponding lag spikes
[05:49] <lifeless> all the oopses have exactly the same pattern
[05:49] <lifeless> stub: where is the replication lag graph ?
[05:49]  * stub is looking for it
[05:49] <lifeless> stub: and do we have one right now ?
[05:50] <lifeless> https://launchpad.net/ubuntu/lucid/+source/chromium-browser/+index is the page I hit to generate an oops
[05:50] <stub> Not lagged atm.
[05:51] <lifeless> stub: then thats likely not it, cause its still timing out on prod :>
[05:51] <stub> Graph is here anyway: https://lpstats.canonical.com/graphs/ProductionDBReplicationLag/
[05:51] <wgrant> Seems to time out on the master too.
[05:51] <wgrant> OOPS-1895ED372
[05:52] <stub> So the same query with different parameters from the bug is still not having any problems.
[05:52] <stub> Anyone have the actual query currently timing out handy yet?
[05:53] <lifeless> stub: the one I linked is the one timing out
[05:53] <lifeless> https://lp-oops.canonical.com/oops.py/?oopsid=OOPS-1895M373
[05:54] <lifeless> http://pastebin.com/f8Vdi5p0
[05:54] <lifeless> 3.7 seconds in prod
[05:55] <stub> When I run it, slowest 536ms.
[05:55]  * stub checks the pastebin matches
[05:56] <stub> yup
[05:56] <lifeless> ok thats strange
[05:56] <lifeless> stub: try this
[05:56] <lifeless> stub: run it, *not explain*, and press 'end'
[05:56] <lifeless> make sure you have \timing on
[05:56] <stub> The query is returning 5k rows and building 10k objects - would this be appserver time hidden from our metrics?
[05:57] <lifeless> hidden from an explain analyze anyhow
[05:57] <stub> That will give crap results atm.... I'll open some local psql shells.
[05:57] <stub> erm... remote shells
[05:58] <stub> Ooh... perhaps there are some silly large text fields in there?
[05:58] <wgrant> SPR is pretty fat.
[05:58] <wgrant> Because of SPR.copyright.
[05:59] <wgrant> There shouldn't be that many rows, though :/
[05:59] <wgrant> I'd expect a few dozen at most.
[05:59] <lifeless> count(*) says 72 rows for me
[05:59] <lifeless> doing select count(*) from (SE...) as _tmp;
[06:00] <lifeless> stub: how are you measuring the 5K ?
[06:00] <stub> Sorry - I was looking at the estimate, not the actual returned count.
[06:00] <wgrant> Ahh
[06:00] <wgrant> I was scared that our kernel developers were even more insane than I thought.
[06:01] <lifeless> wgrant: can you put the functions into the bug ?
[06:02] <wgrant> lifeless: SUre.
[06:03] <lifeless> so
[06:03] <lifeless> theory is the analyse is not showing us the cost of a file processing of the rows in some fashion
[06:04] <stub> It certainly is taking a lot longer to get the results to the client than it is to get the query plan.
[06:05] <lifeless> yeah
[06:05] <lifeless> one mystery solved
[06:05] <lifeless> another item for the db performance tips page
[06:06] <stub> Just seeing bug queries causing large temporary files in the logs
[06:06] <wgrant> Oh hah.
[06:06] <wgrant> I guess linux might have an enormous copyright file.
[06:07] <wgrant> As well as a few uploads.
[06:09] <wgrant> We should really do away with that column.
[06:09] <wgrant> stub: Can you see how big a column is across an entire table?
[06:09] <stub> I'm about to do that.
[06:10] <stub> max(length(foo)) stuff
[06:10] <wgrant> It could well be a few times the size of the rest of the table.
[06:10] <wgrant> As a bonus, the data is not used by anything yet.
[06:11] <lifeless> I selected into a temp table
[06:11] <lifeless> \dt+ foo2
[06:11] <lifeless>                     List of relations
[06:11] <lifeless>    Schema   | Name | Type  | Owner | Size  | Description
[06:12] <lifeless> ------------+------+-------+-------+-------+-------------
[06:12] <lifeless>  pg_temp_30 | foo2 | table | ro    | 96 kB |
[06:12] <lifeless> its only 96kB apparently
[06:12] <wgrant> Did you select all the SPR changelogs into a temp table?
[06:12] <wgrant> Or is that the result of the problematic select?
[06:13] <lifeless> thats the result of the problematic select
[06:13] <wgrant> I find that difficult to believe.
[06:13] <wgrant> But I guess it's possible.
[06:13] <lifeless> with id as id2, component as c2 etc etc
[06:13] <lifeless> launchpad_qastaging=> select count(*) from foo2;
[06:13] <lifeless>  count
[06:13] <lifeless> -------
[06:13] <lifeless>     72
[06:14] <stub> 1.1 mb is the largest text field in the current slow query
[06:14] <stub> (the one from the last oops)
[06:14]  * stub checks the entire table.
[06:14] <lifeless> stub: whats the sum of that field ?
[06:15] <wgrant> DF is being a bit slow at summing all the lengths.
[06:15] <stub> 81MB...
[06:15] <wgrant> Ouch. Which SPR is that?
[06:16] <lifeless> stub: whats the length() function return ?
[06:16] <stub> That is the sum of copywrite... so multiple
[06:16] <lifeless> I got noddy small values
[06:16] <wgrant> Hah.
[06:16] <wgrant> 2.5GB of linux copyright files on DF.
[06:16] <lifeless> but the text is big
[06:16] <stub> sum of the length? 84466050
[06:16] <lifeless> no
[06:16] <stub> 81MB
[06:16] <lifeless> I mean hte function
[06:16] <lifeless> select max(length(changelog_entry)) from foo2;
[06:16] <lifeless>  max
[06:16] <lifeless> ------
[06:16] <lifeless>  3418
[06:17] <stub> 1.1MB
[06:17] <lifeless> what does the 3418 mean ?
[06:17] <lifeless> is that pages? sectors?
[06:17] <stub> Its bytes
[06:17] <wgrant> changelog_entry?
[06:17] <stub> Sorry - characters
[06:17] <wgrant> You mean changelog?
[06:17] <lifeless> wgrant: changelog is an int
[06:17] <wgrant> changelog_entry is different.
[06:17] <wgrant> Er.
[06:17] <wgrant> copyright, not changelog
[06:17] <stub> So UTF-8 might theoretically be 4x as many bytes?
[06:18] <wgrant> changelog_entry is always going to be small; it's only the latest.
[06:18] <stub> I've checked all text fields on the table. The problem is copyright as you suspected.
[06:18] <lifeless> select max(length(copyright)) from foo2;
[06:18] <lifeless>    max
[06:18] <lifeless> ---------
[06:18] <lifeless>  1126214
[06:18] <lifeless> (1 row)
[06:18] <lifeless> ok, confusion sorted
[06:18] <stub> http://paste.ubuntu.com/578202/
[06:19] <lifeless> right
[06:19] <lifeless> dropping copyright fixes it
[06:19] <lifeless> what do we have this in there for ?
[06:19] <wgrant> Nothing at all.
[06:19] <wgrant> It's populated, but not used.
[06:19] <lifeless> drop the column definition from launchpad ?
[06:19] <wgrant> We could possibly drop it from the class immediately, and just set it on upload, and then migrate it out of the DB later.
[06:19] <lifeless> that will stop storm querying it
[06:19] <wgrant> Right.
[06:20] <stub> So yeah, that column needs to be split into a separate table. Code only fix might be to remove that field from the main Storm class, and have a separate Storm class with the extra column and use that only where necessary.
[06:20] <wgrant> stub: s/table/librarian/, I think.
[06:20] <lifeless> query can show all the rows in 300ms with that field removed.
[06:20] <stub> wgrant: If there is no need for it to be in the DB, sure.
[06:21] <lifeless> wgrant: want to do this one?
[06:21] <stub> wgrant: If it is being shown inline on the page, I guess that is a performance call (pull it from the librarian and render it will be slower than from the db... unless it is ajax, and then search engines won't see it)
[06:21] <wgrant> lifeless: I'll drop the column from the Storm definition now.
[06:21] <lifeless> wgrant: cool
[06:21] <lifeless> stub: its not used at all
[06:21] <lifeless> stub: premature optimisation years ago
[06:21] <wgrant> stub: It may plausibly be displayed in the page.
[06:21] <stub> k
[06:21] <wgrant> But it isn't yet.
[06:21] <wgrant> And if someone wants to, they can damn well pull it from the librarian.
[06:22] <lifeless> iframe embedding ftw
[06:22] <wgrant> And if someone wants to display multiple on a single page, they are probably wrong.
[06:22] <wgrant> So we don't have to consider that case.
[06:23] <stub> So asuming ascii text, and Python decoding it into Unicode, that is 324MB of RAM needed before it even gets past psycopg2. This sort of thing will certainly be driving our memory footprint.
[06:23] <lifeless> or they can just hand out librarian urls
[06:23] <wgrant> I wonder if the uploader can update SPR. I guess I'm about to find out.
[06:23] <stub> Given Python won't clean up, and that is per thread.
[06:24] <lifeless> stub: ok, next one up is fti
[06:24] <wgrant> I've been concerned about SPR.copyright for well over a year now, but never had the data to confirm that it was a problem.
[06:24] <lifeless> wgrant: data is wonderful isn't it
[06:24] <stub> I prefer a bucket of sand
[06:24] <stub> ;)
[06:25] <stub> nah... nah... fti... can't hear you... nah... nah
[06:25] <jtv> stub: I just put up a security.py cleanup for your review that also gets it to run under 3 seconds.
[06:25] <stub> So the query I looked at the other day had 32 lines of multiple boolean operations going on. I'm surprised it doesn't timeout just generating the plan for that, let alone get around to running the query
[06:26] <stub> jtv: Ta.
[06:26] <lifeless> stub: the plan is simple :)
[06:26] <lifeless> stub: its an 800ms query on qastaging
[06:26] <jtv> stub: Just testing if you really couldn't hear anyone.  :)  https://code.launchpad.net/~jtv/launchpad/faster-security/+merge/52804
[06:28] <lifeless> stub: http://pastebin.com/7DuF11Z8
[06:31]  * stub finds the bug 726175
[06:31] <_mup_> Bug #726175: DistributionSourcePackage:+filebug Timeout trying to file bug due to FTI timeout <dba> <timeout> <Launchpad itself:Triaged> < https://launchpad.net/bugs/726175 >
[06:33] <stub> And 3.6 seconds on production
[06:34] <lifeless> stub: doing the same show-the-results test?
[06:34] <stub> By accident, yes....
[06:34] <lifeless> :)
[06:34]  * stub waits for his terminal to return to normal
[06:34] <lifeless> so 3.6 is << 15
[06:38] <stub> So width isn't the problem here - no insane targetnamecaches or statusexplanation
[06:39] <lifeless> stub: right, and it performs tolerably interactively
[06:40] <stub> 65% of the index is data, which is a little bloated but not bad (launchpad_prod_2 here)
[06:40] <lifeless> using package git not installed failed to install upgrade ErrorMessage: subprocess installed post-installation script returned error exit status 1 as a search term the search completes right now on prod
[06:41] <lifeless> at /ubuntu/+filebug
[06:41] <lifeless> huwshimi: hey
[06:41] <lifeless> huwshimi: I have a challenge for you
[06:42] <lifeless> huwshimi: when an ajax/api request is made, I'd like to put the time in the top right - you know where - into a journal of items
[06:42] <stub> The limit isn't helping at all - all the rows are going to be pulled to calculate the rank so things can be ordered.
[06:44] <lifeless> sure
[06:45] <lifeless> it would be nice to set a cap on the relevance but I don't htink our tsearch setup is ready for that
[06:45] <huwshimi> lifeless: so you'd like to figure out how to get the time for the roundtrip?
[06:46] <lifeless> huwshimi: and glue it all together :)
[06:46] <lifeless> huwshimi: its not urgent
[06:46] <lifeless> huwshimi: but its the next step in visibility to devs of perofrmance
[06:46] <stub> http://paste.ubuntu.com/578204/ is the plan I'm looking at btw.
[06:46] <lifeless> stub: yeah, thats what i see too
[06:47] <lifeless> stub: except mine is a little faster
[06:47] <lifeless>  Sort  (cost=41.08..41.09 rows=1 width=650) (actual time=554.699..554.699 rows=1 loops=1)
[06:47] <stub> lifeless: So wtf is the query doing for the first 1.4 seconds?
[06:47] <stub> The fti stuff starts 1412 ms in.
[06:48] <lifeless> stub: search me
[06:49] <stub> Oh.... remember I mentioned those big temporary tables in the pg logs?
[06:49] <lifeless> stub: I think you may have *meant* to, but I don't remember a discussion
[06:49] <lifeless> stub: or you did tell me and I've lost it ;)
[06:49] <wgrant> He mentioned them.
[06:49] <huwshimi> lifeless: I'm not sure about YUI, but with other ajax frameworks you can add hooks to every ajax event. And then it's just a matter of recording the time the ajax event fires and then comparing it to the time when the ajax event completes. I'll take a look some time
[06:49] <stub> (13:06:15) stub: Just seeing bug queries causing large temporary files in the logs
[06:50] <wgrant> Gnargh gina.
[06:50] <lifeless> stub: ah thaks
[06:50] <StevenK> wgrant: ?
[06:50] <lifeless> stub: you think this is it ?
[06:50] <huwshimi> lifeless: The one thing I've learnt though is that everything is an order of magnitude harder with YUI than other frameworks :)
[06:51] <lifeless> huwshimi: what framework should we be using?
[06:51] <wgrant> StevenK: I'm removing SPR.copyright.
[06:51] <jtv> stub: I may be missing the point since I just jumped in, but all it's got at the point where the 1.4 seconds are lost is 1 tuple
[06:51] <jtv> (See all the way at the bottom of the plan)
[06:51] <huwshimi> lifeless: Do you really want to go into that? :)
[06:51] <StevenK> wgrant: Why? I think some people want it.
[06:51] <lifeless> huwshimi: sure, why not?
[06:51] <wgrant> StevenK: I'm removing it from the class for now. It will be moved into the librarian later.
[06:52] <lifeless> stub: launchpad_qastaging=> SELECT BugTask. ... assignee |  bug   | bugwatch | date_assigned | date_closed | date_confirmed | date_fix_committed | date_fix_released | date_incomplete | date_inprogress | date_left_closed | date_left_new | date_triaged |        datecreated         | distribution | distroseries |   id   | importance | milestone |  owner  | product | productseries | sourcepackagename | status | statusexplanation | t
[06:52] <lifeless> ----------+--------+----------+---------------+-------------+----------------+--------------------+-------------------+-----------------+-----------------+------------------+---------------+--------------+----------------------------+--------------+--------------+--------+------------+-----------+---------+---------+---------------+-------------------+--------+-------------------+-----------------
[06:52] <lifeless>           | 715778 |          |               |             |                |                    |                   |                 |                 |                  |               |              | 2011-02-09 14:06:41.463232 |              |              | 836597 |          5 |           | 3439478 |   24742 |               |                   |     10 |                   | Linaro GCC
[06:52] <lifeless> (1 row)
[06:52] <lifeless> Time: 861.816 ms
[06:52] <lifeless> jtv: thats a relative offset, not absolute
[06:52] <lifeless> jtv: because its in the nested loop
[06:53] <lifeless> jtv: so its at 1413.054 + 0.040 and takes 0.042
[06:53] <jtv> lifeless: it's in the same nested loop where the other part starts at 1.4 seconds
[06:53] <jtv> Don't they both count from the start of the loop?
[06:53] <lifeless> no
[06:54] <huwshimi> lifeless: OK, I think there are a bunch of issues with YUI. Its development is really slow and can not keep up with the pace of other frameworks. With the way things are going at Yahoo I also worry that there will get to a point where it may stop being developed by Yahoo at all.
[06:54] <lifeless> or at least, I've only been able to make sense from plans if I assume the answer is no
[06:55] <huwshimi> lifeless: The community around YUI is tiny and I don't think a serious community effort would be made to keep it going (I could be wrong and by the community taking it over it might help drastically)
[06:55]  * jtv retouches his fading formerly-yellow note saying "read actual documentation"
[06:55] <lifeless> huwshimi: so thats a bunch of risks
[06:55] <huwshimi> lifeless: All of this means that there are a lot of half baked features in YUI.
[06:56] <lifeless> not really an argument for where to aim *at*
[06:56] <stub> jtv: From the plan, it seems we start looking at the fti index at 1.4 seconds in. Before that, the only thing that was done was an index scan on bugtask_product__bug__key that completed less than 1ms in.
[06:56] <huwshimi> lifeless: Some things people seem to really like. Like the testing framework
[06:56] <stub> (looking at the start times in actual time=)
[06:57] <stub> I can't see it being a temporary file issue since the plan is reporting memory sorting.
[06:57] <huwshimi> lifeless: But for many things it seems that you write more code and spend more time getting around YUI's drawbacks than you should.
[06:58] <jtv> stub: so… some weird combinatorial behaviour in the planner for those booleans?  Try removing one and see if the time halves.  :)
[06:58] <huwshimi> lifeless: Also, I'm not a YUI expert, from discussions with sinzui, he has very similar feelings about a lot of this and he might be a better person to talk to.
[06:58] <stub> It might be spending 1.4seconds working out wtf that obscene boolean calculates down to...
[06:58] <stub> I can time that...
[06:59] <huwshimi> lifeless: Another side of YUI having such a small community is that the plugin ecosystem is tiny.
[06:59] <lifeless> stub: that last line is a per-found-bug inner loop
[06:59] <lifeless> stub: it can /only/ execute after the fti starts spitting out rows
[06:59] <huwshimi> lifeless: We would save a lot of development time if we didn't have to reinvent the wheel for a lot of stuff.
[07:00] <lifeless> huwshimi: so, there is a slightly larger discussion to have if we decide that a different framework would be better
[07:01] <huwshimi> lifeless: And most of the plugins that do exist were developed a number of years ago and are not maintained or don't work with new version of YUI
[07:01] <lifeless> huwshimi: but I want to be clear that it is a discussion we can have if you want
[07:01] <huwshimi> lifeless: I would be *very* open to that discussion.
[07:02] <lifeless> huwshimi: then I suggest you start it up :) - talk to sinzui, talk to rockstar
[07:02] <lifeless> huwshimi: take it to the canonical-rhinos list if those two folk are agreeable
[07:02] <lifeless> feel free to cc me
[07:02] <huwshimi> lifeless: my concern is that we have a lot of existing YUI dependant code
[07:02] <lifeless> huwshimi: we had a lot of slow code 6 months ago
[07:03] <huwshimi> lifeless: Haha
[07:03] <lifeless> huwshimi: but we're down to 2.7seconds for our 99th percentile request time
[07:06] <lifeless> huwshimi: so big things can be don
[07:06] <lifeless> e
[07:06] <lifeless> huwshimi: and we've much less yui code than slow code
[07:07] <LPCIBot> Project windmill build #32: FAILURE in 1 hr 8 min: https://hudson.wedontsleep.org/job/windmill/32/
[07:07] <wgrant> Heh. It hasn't failed in more than a week, and it fails while we are discussing its demise...
[07:07] <wgrant> Although it looks like it could be a real failure.
[07:07] <huwshimi> lifeless: Is there a particular reason you're bringing this up (I'm just kind of surprised that you've raised the topic)?
[07:08] <lifeless> huwshimi: you whinged
[07:08] <huwshimi> lifeless: Haha ok
[07:08] <lifeless> huwshimi: my job can be summarised as:
[07:08] <lifeless>  - figure out what makes delivering on our goals hard for our devs
[07:08] <lifeless>  - and arrange for it to be fixed
[07:10] <lifeless> stub: to make sure we are looking at the same thing
[07:10] <lifeless> http://paste.ubuntu.com/578212/
[07:11] <huwshimi> lifeless: I've very aware that just because I have opinions on (or prefer) things it doesn't mean that I'm right. I wouldn't want to duplicate a bunch of effort in rewriting/training etc. without it being worth it.
[07:11] <lifeless> huwshimi: indeed, so thats part of the discussion that you need to have
[07:13] <huwshimi> lifeless: As we're moving to a more javascript heavy version of the site this is probably a good time to discuss it
[07:14] <stub> lifeless: So the difference between the qastaging query plan and the production query plan is the qastaging query plan starts doing the fti index scan 230ms in, and the production query plan starts doing the fti index scan 1412ms in. That seems to account for most of the difference.
[07:15] <lifeless> stub: ok
[07:15] <lifeless> stub: how do we do we figure out the 15000seconds case ?
[07:16] <stub> So on #postgres, seems unanimous you see this when PG is waiting for disk
[07:20] <lifeless> stub: -> #postgres then
[07:35] <wgrant> Hah.
[07:36] <wgrant> The test suite breaks if you have debian-keyring installed.
[07:36] <lifeless> \o/
[07:36] <wgrant> Because dpkg-source then has keys to verify some of the packages in the gina test archive.
[07:36] <wgrant> It by default allows an unverifiable signature, but refuses to unpack a bad one.
[07:36] <lifeless> win
[07:43] <lifeless> \o/ the with query works
[07:44] <lifeless> wgrant: https://bugs.launchpad.net/launchpad/+bug/221938
[07:44] <_mup_> Bug #221938: Email interface crashes when an attachment file name contains a slash <email> <lp-bugs> <oops> <qa-needstesting> <Launchpad itself:Fix Committed by thumper> < https://launchpad.net/bugs/221938 >
[07:44] <wgrant> lifeless: Currently fixing Windmill breakage (real bug, but it won't affect any production data).
[07:44] <lifeless> bah, no wally
[08:10] <wgrant> :(
[08:10] <wgrant> ec2 mail fails if your default bzr email address isn't @canonical.com.
[08:11] <StevenK> My instances mail me at my debian.org address just fine
[08:11] <wgrant> You're probably not using the canonical.com SMTP server, though.
[08:11] <StevenK> I am, and my from address is @ubuntu.com
[08:11] <StevenK> So it's fine
[08:12] <wgrant> Hmm.
[08:12] <wgrant> It stopped working for me today, and I changed my default email address last night...
[08:13] <StevenK> wgrant: Happy to share headers if it will help.
[08:14] <wgrant> StevenK: bazaar.conf would be handy.
[08:17] <StevenK> Hmm, when did smtp.c.c start being youngberry ...
[08:17] <stub> debian.org might be special cased given the background of our admins...
[08:18] <StevenK> wgrant: https://pastebin.canonical.com/44501/ is the relevant bit
[08:19] <wgrant> StevenK: Your default bzr email address looks a lot like ubuntu.com
[08:19] <StevenK> Which is what I said my From address was ...
[08:19] <wgrant> Oh, mail you *at* your debian.org address.
[08:19] <wgrant> Right.
[08:19] <wgrant> Fail.
[08:20] <StevenK> wgrant: You can't talk via smtp.c.c if your From address isn't canonical.com or ubuntu.com. If you want to use something else as your From address, use a different SMTP server.
[08:21] <wgrant> StevenK: Sure. But my email address is @canonical.com for my branches... ec2 must not use locations.conf, or it uses some other path.
[08:22] <StevenK> I didn't think your bazaar config was copied to the instances ...
[08:23] <wgrant> It does some evil stuff.
[08:23] <wgrant> Not copying the file directly, but some of its values.
[08:28] <lifeless> wgrant: what are you fighting
[08:28] <wgrant> lifeless: Hm?
[08:29] <lifeless> js
[08:29] <lifeless> whining
[08:29] <lifeless> whats up
[08:29] <wgrant> Bug #732442
[08:29] <_mup_> Bug #732442: disable_existing_builds compares series name to display name <recipe> <regression> <ui> <Launchpad itself:Triaged> < https://launchpad.net/bugs/732442 >
[08:29] <wgrant> Given up for now, will ask wallyworld tomorrow.
[08:29] <lifeless> stub: hey, so the analyze might have helped ?
[08:29] <lifeless> wgrant: oh right
[08:29] <lifeless> wgrant: ;(
[08:30] <wgrant> I say it's qa-ok, although that is sort of cheating.
[08:31] <wgrant> lifeless: https://code.launchpad.net/~wgrant/launchpad/unuse-spr-copyright/+merge/52808 may interest you.
[08:38] <lifeless> stub: how often do we do full backups ?
[08:41] <stub> lifeless: daily dumps
[08:41] <stub> lifeless: No, analyze didn't change anything.
[08:45] <stub> lifeless: So waiting a while, I still see the initial query with a high startup time and a query immediately after a lower startup time. So I think we must be seeing the effects of shuffling data between disk cache and the pg shared memory area.
[08:46] <lifeless> stub: thats plausible
[08:46] <stub> lifeless: Currently, the shared memory area is set to 5GB on all the production boxes, which is the high side of best practice and when people start seeing degraded performance.
[08:46] <lifeless> stub: can we change this during this downtime ?
[08:46] <lifeless> stub: ah
[08:46] <lifeless> stub: how can we test to see whether we would suffer
[08:47] <stub> lifeless: I'm considering bumping it up to 7GB on one of the slaves. Even though we will be going 'too high' in most peoples opinions, we do have an unusual load so best practice might not apply here.
[08:47] <stub> We have run with 8GB before, no particular ill effects, so I don't think it will hurt. And we can then check out differences in performance between the two slaves.
[08:47] <lifeless> +1
[08:48] <stub> But still, even 8GB will be lower than the hotset of data.
[08:48] <lifeless> whats it set to on staging?
[08:48] <stub> probably about 3...
[08:48]  * stub checks
[08:49] <stub> Of course, it could be counter intuitive and lowering the value might work better ;)
[08:49] <wgrant> Hah
[08:49] <stub> 2GB on staging
[08:49] <adeuring> good morning
[08:49] <wgrant> Hm, no.
[08:49] <wgrant> stupid ec2.
[09:00] <jam> morning all
[09:01] <wgrant> Morning jam.
[09:02] <jam> man, living in a country where you don't speak the language causes all sorts of web confusion
[09:02] <jam> every site wants to default me to Dutch
[09:02] <jam> stupid geoip :)
[09:05] <jam> maybe google is just on crack. Because I manage to get to the "Settings" page, set my language as English, looks ok
[09:06] <jam> go back to account settings, and everything is back in dutch
[09:06] <bigjools> yeah that's annoying when travelling to sprints
[09:09] <jam> signing out and back in again seemed to help in the end
[09:09] <jam> but yeha
[09:09] <jam> yeah
[09:09] <jam> wgrant: are you watching the upgrade? I'll be happy to start testing/monitoring once things are up
[09:10] <wgrant> jam: I'm watching.
[09:10]  * wgrant opens the graph.
[09:27] <jam> wgrant: looks like bzr-sftp still isn't wanting to shut down cleanly. getting the "cannot shutdown reactor that isn't running"
[09:28] <jam> Maybe that is the code that wants to always cleanly shut down, waiting for the last connection to exit
[09:28] <wgrant> jam: There were 5 connections remaining when it was forcibly killed.
[09:28] <jam> wgrant: k, where do you get this info? Maybe I'm in the wrong channels?
[09:30] <wgrant> jam: #launchpad-ops is where it happens, and you seem to be there.
[09:30] <wgrant> But I checked the connection count myself.
[09:30] <jam> how did you check it?
[09:30] <jam> Yeah, I'm there, but I haven't been following as much as I should :)
[09:32] <wgrant> jam: Certainly machines can see bazaar.launchpad.net:8022.
[09:33] <jam> wgrant: "certain" machines ?
[09:33] <wgrant> Um, yes, that.
[09:33] <jam> wgrant: devpad doesn't appear to be one of them
[09:38] <jam> wgrant: do you know the rsync module for the crowberry-sftp-log subdir?
[09:39] <wgrant> jam: I did... let me check.
[09:39] <wgrant> Could be sftp-logs
[09:39] <wgrant> logs-sftp
[09:39] <jam> crowberry::sftp-logs/ => unknown module
[09:40] <wgrant> 20:39:26 < wgrant> logs-sftp
[09:41] <jam> yeah, just found that myself
[09:41] <wgrant> Great.
[09:43] <henninge> Hi adeuring!
[09:44] <adeuring> hi henninge
[09:44] <wgrant> jam: Looking OK?
[09:44] <henninge> Did you change anything on the branch since you last pushed it?
[09:44] <henninge> adeuring: ^
[09:44] <jam> wgrant: so far, haven't gotten to the end. the sftp service started slightly before the forking service, and was already getting connection requsetts
[09:44] <adeuring> henninge: I just merged devel yesterday evening. nothing else yet
[09:45] <henninge> yes, I saw that
[09:45] <wgrant> jam: We do have slight ordering issues in both directions :(
[09:45] <henninge> adeuring: You used "sharing status" throughout while I had been talking about "sharing details".
[09:45] <jam> wgrant: as long as both are up when we consider the site 'live' I'm not really worried :)
[09:46] <wgrant> Response times are still fine from here.
[09:46] <henninge> adeuring: any particular reason for that naming?
[09:46] <wgrant> No graphs yet.
[09:46] <adeuring> henninge: no particular reason. I'll change it to details
[09:46] <henninge> adeuring: I can do that.
[09:46] <adeuring> henninge: ok
[09:47] <henninge> adeuring: I finished the dummy template.
[09:47] <adeuring> henninge: sounds great
[09:47] <henninge> adeuring: you will have to add the conditions and such from the view.
[09:47] <adeuring> henninge: ok
[09:48] <henninge> adeuring: I will push that when I am done with the renaming.
[09:48]  * allenap assumes abentley is no longer reviewing.
[09:51] <StevenK> In a slightly related topic, is Firefox history editable?
[09:51] <StevenK> (I have a whole bunch of edge URLs there that need to die. Slowly.)
[09:52] <StevenK> allenap: https://code.launchpad.net/~stevenk/launchpad/derive-common-ancestor/+merge/52796 (and thanks!)
[09:52] <allenap> StevenK: Got it.
[09:53] <StevenK> allenap: afk for dinner, if you have questions queue them up here and I'll answer them when I can.
[09:54] <allenap> StevenK: Cool.
[09:57] <jam> wgrant: is it just that you're machine is allowed through the firewall to :8022?
[09:58] <wgrant> jam: I guess it must be. I presumed carob could too, but I guess not.
[09:58] <jam> wgrant: well 'w3m http://bazaar.launchpad.net:8022' didn't do much for me
[09:58] <jam> could be a proxy issue
[09:58] <wgrant> Oh, it's certainly not going to work externally :)
[09:59] <jam> no, from carob
[09:59] <jam> but I can do wget just  fine
[09:59] <jam> so good enough, I guess
[09:59] <jam> 134 conn
[09:59] <jam> still says "unavailable" though, which is odd
[09:59] <jam> I though spiv had a fix for that
[10:00] <jam> wgrant: I'm a bit surprised about IP addresses that have 10+ connections active
[10:00] <jam> (wget | sort)
[10:01] <lifeless> jam: spiv couldn't reproduce it
[10:03] <jam> just jumped to 246 conns...
[10:05] <jam> wgrant: but I'm not seeing any failures, yet
[10:05] <jml> https://code.launchpad.net/~jml/launchpad/what-is-in-the-web-ui/+merge/52594 up for review
[10:05] <jam> note that conns includes ones that aren't authenticated, IIRC
[10:05] <wgrant> jam: 246? That's not good.
[10:08] <jam> not many access failures in the log, though
[10:09] <jam> wgrant: 259
[10:09] <jam> but no failures in the other logs
[10:09] <henninge> adeuring: I will have to wait for the roll-out to finish before I can push my changes. :(
[10:09] <adeuring> henninge: no problem
[10:09] <wgrant> henninge: Hm? It should all be back now.
[10:09] <henninge> oh, already?
[10:09] <henninge> cool
[10:10] <wgrant> henninge: Codehosting has been back for like 25 minutes.
[10:10] <wgrant> jam: How are your connection times? Up to 8s here ;/
[10:10] <henninge> oh, I just didn't try. thanks wgrant
[10:12] <henninge> adeuring: pushed
[10:12] <adeuring> henninge: I'll look
[10:13] <henninge> adeuring: "pull" is the right term ;-)
[10:48] <jam> wgrant: https://code.launchpad.net/~jameinel/lp-production-configs/disable-forking/+merge/52818
[10:49] <wgrant> jam: LOSAs review those.
[10:52] <jam> wgrant: https://lpstats.canonical.com/graphs/CodehostingPerformance/20110310/20110311/nocache/
[10:52] <jam> the 900s spike is going to kill the graph for a long time to come... :(
[10:52] <wgrant> Heh, yes.
[11:05] <StevenK> allenap: Thank you for the review!
[11:05] <allenap> StevenK: You're welcome :)
[11:07] <StevenK> allenap: How did you find the trailing whitespace?
[11:07] <allenap> StevenK: I load the diff into my editor, and I've set it to show trailing whitespace as big red blocks. I can't miss it :)
[11:08] <StevenK> allenap: Is your editor vim?
[11:08] <allenap> StevenK: The other one.
[11:08] <StevenK> Heh
[11:48] <jam> I'm trying to set up lp on a new virtual host, but it is giving me failures during "make-lp-user".
[11:48] <jam> https://pastebin.canonical.com/44511/
[11:48] <jam> any ideas?
[11:50] <wgrant> Ah,.
[11:50] <wgrant> Well then.
[11:51] <wgrant> Lucid?
[11:51] <jam> yes
[11:51] <wgrant> Did you use rocketfuel-setup?
[11:51] <jam> yep
[11:52] <jam> and then tweaked after for being in a VM
[11:52] <jam> but not much changed there
[11:52] <wgrant> You ran launchpad-database-setup?
[11:52] <wgrant> make schema should have failed without it, but it's possible you did enough manually to unbreak it.
[11:53] <wgrant> There should be "Launchpad configuration" bits at the end of postgresql.conf.
[11:53] <leonardr> my internet connection is up and down rightn ow
[11:54] <jam> wgrant: running it directly
[11:54] <jam> I did very little manually
[11:54] <jam> but I didn't see a Launchpad configuration bit at the end.
[11:55] <jam> and I did have earlier problems connecting to postgres
[11:55] <wgrant> jam: Try running it again, I guess.
[11:55] <wgrant> It is meant to change the default search path.
[11:55] <jam> yeah, running it now, but have to wait for "make schema" to finish
[11:56] <jam> wgrant: looks like that did it. Thanks!
[11:57] <wgrant> jam: Great.
[11:59] <wgrant> 2011-03-10 09:40:34+0000 [SSHChannel session (0) on SSHService ssh-connection on ProtocolWrapper,36,80.11.180.42] Forking returned pid: 18870, path: /tmp/lp-forking-service-child-kwWa4B
[11:59] <wgrant> echan, but yeah.
[12:00] <jam> wgrant: that was the first death?
[12:01] <wgrant> jam: It's the first one that's still alive at the end.
[12:01] <jam> wgrant: if you grep around there, you can see what user it was
[12:02] <wgrant> Indeed, but that was less than a minute after the service started...
[12:03] <jam> wgrant: yay, even have codehosting serving on 5022. Though I wonder if there is a way to do that without hacking the source code.
[12:03] <jam> so I don't have to worry about accidentally committing that
[12:04] <wgrant> jam: You could possibly create a config overlay and run it with that.
[12:04] <jam> wgrant: wouldn't integrate well with 'make run_codehosting' I imagine
[12:05] <wgrant> jam: 'LPCONFIG=mycustomconfig make run_codehosting' should work.
[12:06] <deryck> Morning, all.
[12:31] <wgrant> leonardr: Hi.
[12:32] <leonardr> wgrant, hey
[12:32] <wgrant> leonardr: What is the recommended way to use a keyringed launchpadlib from a cron job?
[12:33] <leonardr> wgrant: you need to store the credential in an unencrypted file, and pass it in as credentials_file
[12:33] <leonardr> see section 5 of https://lists.launchpad.net/launchpad-users/msg06239.html
[12:34] <wgrant> :(
[12:34] <wgrant> OK.
[12:53] <LPCIBot> Project windmill build #33: STILL FAILING in 1 hr 10 min: https://hudson.wedontsleep.org/job/windmill/33/
[13:19] <vila> Hey all, what the update frequency for the downloads stats ?
[13:20] <vila> as in https://launchpad.net/bzr/+download for example
[13:34] <bac> hi mrevell
[13:35] <mrevell> hi bac
[13:52] <jam> anyone else here use Empathy? I'm trying to use it, since it integrates with the OS (and is the suggested default)
[13:52] <jam> but it only really shows about 10-20 lines of actual content per page
[13:52] <jam> with all the extra formatting
[13:52] <jam> Is there a way to make all the bubbles smaller?
[14:00] <deryck> henninge, adeuring -- ping for standup
[14:01] <wgrant> bigjools: Hi.
[14:05] <jml> allenap: thanks for the review. I had posted some updates as you were doing it (to docs, mostly) and have now fixed the zcml problem
[14:06] <allenap> jml: Cool, I'll have a look.
[14:08] <allenap> jml: Did you consider parsing the zcml directly?
[14:08] <jml> allenap: to what end?
[14:09] <jml> allenap: I mean, yes I did, but then I thought that all I'd be doing is constructing objects much like the adapters that Zope provides in the first place.
[14:09] <allenap> jml: Because it might end up being less hacky... I haven't given it much thought, but wondered if you had.
[14:10] <jml> allenap: it might fix some of the hackiness in format_page_adapter, but not much else. It would come at a cost of parsing ZCML myself
[14:10] <allenap> jml: I guess the only thing you might avoid is filtering through the mro to find the real view.
[14:10] <jml> allenap: yeah, exactly.
[14:10] <jml> hmm.
[14:11] <allenap> jml: Yeah, and processing includes. Okay. I don't think the mro thing is too bad actually.
[14:11] <jml> another approach would be overriding the browser:page handler
[14:11] <jml> but I'm not sure that would be less hacky, since our custom handler is specified in zcml also
[14:12] <jml> so I'd have to find that ZCML and somehow exclude it.
[14:13] <jml> also, Launchpad has 462 different types of page.
[14:20] <deryck> henninge, I'm wondering about this "deactivate translation imports on a per package basis" card.  Is there a bug for this?
[14:24] <leonardr> allenap or jcsackett, could you take a look at https://code.launchpad.net/~leonardr/lazr.restful/operation-must-be-versioned/+merge/52858?
[14:24] <jtv> And I have one that's particularly relevant to allenap: https://code.launchpad.net/~jtv/launchpad/bug-730460-job-class/+merge/52857
[14:24] <jcsackett> allenap: i'll take leonardr, you take jtv?
[14:24] <allenap> Sounds good.
[14:25] <jcsackett> leonardr: this the one that's been causing you and sinzui so much pain?
[14:26] <henninge> deryck: hm ...
[14:26] <henninge> oh!
[14:26] <henninge> deryck: no, there is not.
[14:27] <henninge> deryck: It came up in a discussion and is something that still needs to be done before the feature can really go live.
[14:27] <deryck> henninge, why do we need per-package disabling?
[14:27] <henninge> deryck: but it should be quite simple
[14:27] <henninge> we just need to find the place in the code where it needs to be done.
[14:27] <deryck> henninge, can't you already disable by changing the template name?
[14:28] <henninge> deryck: once a sourcepackage has translation sharing set up, imports from package uploads must stop.
[14:28] <henninge> deryck: it should only import the template from then on.
[14:28] <henninge> deryck: I don't know what you mean about changing thetemplate name.
[14:28] <henninge> deryck: I will file a bug and explain there.
[14:29] <deryck> henninge, ah!  I get you now.  I misunderstood.
[14:29] <deryck> henninge, yes, please file a bug and tag it with the story.
[14:35] <jml> wgrant: I thought you said PQM was way faster now.
[14:39] <henninge> deryck: bug 732612
[14:39] <deryck> henninge, thanks!
[14:39] <henninge> deryck: updated card
[14:40] <deryck> henninge, thanks, again! :-)
[14:40] <leonardr> jcsackett: no, i'm only working a half day today so i'm not even touching that one
[14:41] <jcsackett> leonardr: sounds wise. :-)
[14:42] <leonardr> oops, forgot to push the latest version
[14:42] <leonardr> it's pushed now
[14:44] <deryck> abentley, bug 719521 is fix released, yes?
[14:44] <jcsackett> leonardr: i gather this branch is part of helping us not randomly bork the 1.0 version of the webservice?
[14:45] <leonardr> jcsackett: exactly
[14:45] <abentley> deryck: I don't think so.
[14:45] <deryck> abentley, no?  The card is in done-done.
[14:45] <abentley> deryck: I can move it back if you'd like :-)
[14:45] <deryck> abentley, heh. well, I don't know. :-)  What's left to do?
[14:46] <abentley> deryck: the code is in place, but the configs need updating and the cron script needs to be configured before unlinking will actually split translations.
[14:47] <deryck> abentley, ok, the cron script is the same for linking, right?
[14:47] <abentley> deryck: right.
[14:47] <jcsackett> leonardr: r=me.
[14:47] <deryck> abentley, ok, so I will close the bug as fix released.  Since the coding is done.  We have a card for the configs.  and the cronscript is an RT away.
[14:48] <leonardr> jcsackett: just in time!
[14:48] <jcsackett> :0()
[14:48] <abentley> deryck: from an end-user perspective, no fix is released, so we could get pushback.
[14:48] <jcsackett> replate that with ":-)"
[14:49] <abentley> deryck: but of course it's your call.
[14:49] <deryck> abentley, thanks, and I can live with that risk.  also, bug 696009 is fix released, too?
[14:50] <jtv> I suppose jml was thinking låünchpäd
[14:50] <abentley> deryck: yes.
[14:50] <deryck> abentley, great, many thanks for the chat.
[14:50] <jml> jtv: huh?
[14:50] <abentley> deryck: np
[14:51] <jtv> jml: wasn't that you?  About sticking umlauts on the name?
[14:51] <jml> jtv: oh, right.
[14:51] <jtv> "It's not bad, for a two-umlaut site"
[14:51] <jml> :D
[14:52] <jtv> Låüñčhpäđ
[14:52] <deryck> abentley, sorry, one more.  bug 706005 is released?
[14:52] <abentley> jtv: lauñçħpad?
[14:53] <jtv> Funny how most of the diacritics I can think to stick on there are actually more or less appropriate…  the first "a" really is like "å," the "n" really sounds like "ñ," and so on.
[14:53] <jtv> abentley: what's the "h-like" letter?
[14:53] <abentley> deryck: yes, but not deployed, like 719521.
[14:54] <jtv> Ah, forgot one… Łåüñčħpäđ
[14:54] <deryck> abentley, gotcha. thanks.
[14:54] <abentley> jtv: No idea, I've just seen it in names.
[14:55] <jtv> You clearly hang out at cooler clubs than I do.
[14:56] <abentley> deryck: should I be using YUI attributes or normal Javascript object properties by default?
[14:56] <jtv> allenap: I'm going to have to call it a day… can we continue the review offline?
[14:56] <jtv> And by "we" I mean you.
[14:57] <allenap> jtv: Sure, no worries. Have a good evening :)
[14:57] <jtv> Thanks. :)
[14:58] <deryck> abentley, YUI attrs.  Since it makes it obvious when you're doing get and set on an attr.
[14:58] <deryck> and not silently adding something that didn't previously exist.
[15:00] <abentley> deryck: coming from python, the risk of silently adding something that didn't previously exist doesn't seem very serious.
[15:02] <deryck> abentley, yeah, maybe it's not in javascript, too.  but behavior is not always clear.  do I check for undefined or null if I want to be sure, for example?  Using YUI attrs offers a consistent API, among other benefits.
[15:06] <deryck> henninge, adeuring -- I filed bug 732633 about your work, and made it team assigned.  please link branches there.
[15:06] <adeuring> ok
[15:06] <jml> wgrant: never mind, I'm behind an lp-production-configs branch.
[15:08] <deryck> gah.  can't save two cards with the same bug id again.  I thought this was fixed.
[15:08] <deryck> adeuring, because of bug ^^ your card lacks the bug link.  sorry.
[15:10] <adeuring> 732633
[15:12] <deryck> abentley, and I made bug 732639 for the js work if you want to link branches there.
[15:13] <abentley> deryck: okay.
[15:19] <bigjools> WOA, stop adding tests to Lanchpad.  "13373 tests run..."  :-)
[15:21] <rvba> sinzui: when we were talking about modifying distroseries (to clean up this registrant/owner) thing, you said I had to options: migrate the field owner to registrant or add a registrant field that would return the content of the owner field.
[15:21] <rvba> sinzui: At first I thought the most clean way to do this was to migrate the field but after looking at the code I'm not so sure because this object would then differ from most of the others ... any thought on this?
[15:23] <sinzui> rvba: firstly, distroseries and productseries, being series should be the same in this case, I suspect milestones and releases should be the same. 1, the real owner is always the project/distro owner. The creator is the registrant. and has no power. So all four object use owner when we mean registrant
[15:25] <sinzui> ^ well the 'firstly' and '1' never got answered in that sentence
[15:35] <rvba> sinzui: should I be waiting to a secondly or a 2. then :-) ?
[15:35] <sinzui> sorry, did I miss a message? I had display issues so I left the channel for a moment
[15:37] <sinzui> rvba I think the distroseries is is larger than you started with. You are looking at a group of objects  that could have different owners in 2005, but, by 2007, the project/distro became the owner in the permissions rules, so we resused the owner field as the registrant
[15:38] <rvba> sinzui: so I guess your advice is to fix the interface then
[15:38] <rvba> creating a registrant field returning the content of owner
[15:39] <rvba> sinzui: or do you think I should engage in refactoring the schemas to create a proper registrant field
[15:39] <sinzui> rvba: storm does not require that the field name be the same as the db column. Several objects differ. The simply fix might be to rename the owner => registrant in the interface and model for the objects, and ensure that column=owner
[15:40]  * sinzui looks for example
[15:40] <rvba> sinzui: I get it, seems like a very simple fix
[15:40] <sinzui> yes. I might have done it years ago if I had this conversation.
[15:41] <rvba> once this is done, it's really nothing to migrate the data properly and rename the field in the database for consistency
[15:42] <rvba> well, I guess :-)
[15:44] <sinzui> great, the models look good they all specify dbname="owner". rvba: I think you can change the interface and models attribute names owner => registrant of distroseries, productseries, and productrelease
[15:44] <abentley> henninge: do you expect the sharing details page to control translation permissions or translation group?
[15:45] <rvba> sinzui: all right ... so I think it's most clean to migrate the db column as well
[15:45] <sinzui> rvba: So you will need to update all the callsites and templates, but since this information is not very important, you will not need to change many.
[15:46] <henninge> abentley: we need to set the permission to "Closed" if the project is unmaintained.
[15:46] <sinzui> rvba: you certainly can change the column name
[15:46] <rvba> sinzui: ok, that is pretty much what I did for the distributions so I think I'll manage thx
[15:46] <abentley> henninge: how would we do that from the sharing details page?
[15:47] <rvba> sinzui: you'll get to be the reviewer on this though :)
[15:48] <sinzui> okay
[15:49] <henninge> abentley: actually, thinking about it, I am not sure that we need to do it through ajax.
[15:49] <rvba> sinzui:thanks a lot
[15:49] <henninge> abentley: it's just that projects are created with "OPEN" permissions by default and that needs to be changed if the project is not using Launchpad.
[15:50] <sinzui> I am happy to help
[15:50] <abentley> henninge: so if a project is unmaintained, then it's owned by Registry Admins, and the logged-in user probably can't change the setting.
[15:50] <henninge> abentley: so it's "if product.translation_usage != LAUNCHPAD: permission =CLOSED"
[15:50] <henninge> true.
[15:51] <henninge> so it's probably something that we need to change in the product creation itself.
[15:51] <sinzui> has leonardr been about? I really hope not
[15:51] <abentley> henninge: oh dear.  You add the card :-P
[15:52] <henninge> sinzui: did anybody ever answer your question about reviewing translation imports?
[15:52] <henninge> abentley: gee, thanks ;)
[15:53] <sinzui> abentley: henninge about the previous conversation. Users often ask for the project back or ask me to change settings once they realise that giving the project to ~registry excludes them from configuring it...
[15:54] <henninge> maybe we can do it in a way that it is owned by the creating user until all is done?
[15:55] <sinzui> abentley: henninge: I proposed they we let anyone or trusted users be permitted to configure unconfigured projects or those owned by ~registry. The conversation went into audit logs and new showing permissions in the UI. So that was way out of scope. I stopped working on the bug
[15:55] <jam> well, I at least managed to track down the cause of one of my critical failures today (bug #732481). Always funny when releasing code X triggers a bug in code Y.
[15:56] <jam> anyway, EOD for now. Maybe I'll be back on to finish it later tonight.
[15:56] <sinzui> henninge: as to my question, no, it was not answered. I am trying to answer the translations questions, but I am very slow at providing answers
[15:56] <jam> mthaddon: if you are still around, can you post what you did to make HAPROXY use a GET request instead of a HEAD request, we probably want to revert it once we get bug #732481 fixed.
[15:56] <henninge> sinzui: ok, let me see if I can shed some light on that
[15:57] <abentley> henninge: If we did that, it would cover the case where the upstream project is new.
[15:58] <henninge> abentley: yes, that is the one I am mostly thinking of. In the other case they'd need the help of the project's owner.
[15:58] <abentley> henninge: which might be ~registry.
[15:58] <henninge> oh
[15:58] <sinzui> abentley: henninge: we did want project registration to be guided, take the user through all the configuration screens. Such a process should make the ~registry the owner at the end
[15:59] <henninge> abentley: in that case we'd have to make the current user the owner temporarily.
[16:00] <henninge> but how to know when "temporarily" ends?
[16:00] <abentley> henninge: I don't know.
[16:01] <abentley> henninge: we should be able to write a script to ensure all the ~registry-owned projects already have the right settings.
[16:15] <bigjools> gary_poster: if a tarball in dists is not mentioned in versions.cfg, I can completely remove it right?  CherryPy has a zip file but is not in versions.cfg.
[16:16] <gary_poster> bigjools, that *should* be true, but fwiw, versions.cfg is about enforcing the versions, not about enforcing their presence or absence.  One sec, there is an easy way to doublecheck...
[16:18] <gary_poster> bigjools, there may be a nicer way, but ./bin/buildout -vvv | grep 'Getting required' will get you a list of everything that buildout wants.  you could grep the same output for CherryPy if you like
[16:19] <bigjools> excellent, thanks
[16:19] <gary_poster> and fwiw, no, we don't use it
[16:19] <gary_poster> (having just tasted my own medicine)
[16:19] <bigjools> :)
[16:20] <bigjools> I was gonna leave 2 versions of everything "just in case", but bugger it, I am going to be ruthless.
[16:20] <bigjools> bzr is used for a reason
[16:20] <gary_poster> :-) being ruthless will require being careful, but I admire the idea
[16:21] <gary_poster> and now, just after a rollout, is a good time to do it
[16:21] <bigjools> as long as both db-devel and trunk build locally, I'm happy
[16:21] <bigjools> exactly!
[16:22] <bigjools> gary_poster: here's a weird one - Jinja 2.2 is required in versions.cfg but it doesn't exist
[16:23] <bigjools> Jinja2 even
[16:23] <bigjools> oh bugger
[16:23] <bigjools> yes it does :)
[16:23] <gary_poster> bigjools, as I said, versions.cfg is just about enforcing version numbers.  presence or absence is a dependency tree based on setup.py declarations.  So it sounds like you can remove it from versions.cfg.  buildout will complain if it doesn
[16:24] <gary_poster> oh well there you go :-)
[16:24] <bigjools> 2.4.1 and 2.5.5 are there though, I wonder why they're not used
[16:24] <gary_poster> someone was unduly optimistic, I'm guessing :-)
[16:25] <bigjools> very likekly :)
[16:26] <bigjools> and other words that are spelled right
[16:26] <gary_poster> heh
[16:32] <sinzui> henninge: should ~rosetta-admins really own translation teams? I see from many teams listed on the +subscribe page that I own https://launchpad.net/gedit-plugins/+subscribe
[16:33] <sinzui> henninge: When I see odd teams like this, they are often because someone made ~registry own a team, which is a no-no. So I either delete the team or make someone else an owner.
[16:34] <henninge> sinzui: no, I don't think we should be the owner.
[16:34] <henninge> they should be owned by someone from that team
[16:35] <sinzui> okay, I will put that on my todo list
[16:35] <sinzui> thank your
[16:35] <sinzui> thank you!
[16:36] <jml> who has filed the most bugs that are still open? http://paste.ubuntu.com/578421/ (yeah, I'm easily distractable today)
[16:37] <sinzui> before I look, I think it is mpt
[16:37] <henninge> sinzui: you win
[16:37] <LPCIBot> Project windmill build #34: STILL FAILING in 1 hr 9 min: https://hudson.wedontsleep.org/job/windmill/34/
[16:49] <rvba> sinzui: """mars 08 16:46:20 <sinzui> It is not possible BTW to change a series owner. We removed the field from edit forms""" ... but the /ubuntu/hoary/+reassign is still there (and tested) ... is there something I don't understand or is this wrong?
[16:53] <bigjools> jcsackett, the difference is that the upload processor is the lp_queue user and the ftp server lp_upload
[16:53] <maxb> So basically we are questioning what exactly getUtility(IGPGHandler).getVerifiedSignatureResilient() verifies
[16:53] <jcsackett> but you earlier had the gpg conf copied from one to the other, right? so our simplest solution is hosed? :-P
[16:54] <bigjools> jcsackett: lp_upload had no gpg.conf at all
[16:54] <bigjools> we copied lp_queue's
[16:54] <bigjools> maxb: yes
[17:01] <jcsackett> okay, i don't see anything obvious in getVerififiedSignature. there's a fair bit of cruft in there XXX notice-wise, but nothing obviously weird.
[17:02] <bigjools> in lib/lp/archiveuploader/dscfile.py look for its usage in there - it's no different to the ftp server
[17:02]  * bigjools scratches head
[17:02] <adeuring> abentley: how can I figure out if a transtion sharing job is pending for a source package?
[17:03] <bigjools> sinzui: can you think of anything that would make IGPGHandler.getVerifiedSignatureResilient return an error in one account but not another?
[17:03] <maxb> bigjools: It couldn't possibly be simply a difference in where stderr points in the two processes, could it?
[17:04] <bigjools> maxb: maybe, why would that have an affect?
[17:04] <maxb> The only problem here is an unnecessary warning, right?
[17:04] <maxb> Could that just be harmlessly disappearing into an ignored logfile in the upload processor?
[17:05] <abentley> adeuring: You'll need to write a new query.
[17:05] <adeuring> abentley: ok... so, other question: where are these jobs created? (I need a straing point to look around ;)
[17:05] <adeuring> ...starting point..
[17:06] <bigjools> maxb: the stderr goes to the log in both, AFAIK
[17:06] <abentley> adeuring: Are you sure you don't want me to point out similar queries instead?
[17:06] <adeuring> abentley: well, that would be fine too :)
[17:07] <maxb> hrm. So I tried running a verify locally in an interactive python interpreter and got none of that output
[17:08] <bigjools> maxb: the pastebin output is from dput though
[17:08] <abentley> adeuring: see registry.model.packagingjob.PackagingJob.iterReady() and translations.model.translationpackagingjob.TranslationPackagingJob.iterReady for inspiration.
[17:08] <bigjools> on the server side, it needs to see getVerifiedSignatureResilient throwing exceptions to return errors down the ftp session
[17:09] <adeuring> abentley: thanks
[17:09] <maxb> *blink*
[17:09] <maxb> bigjools: Oh, I've completely misunderstood the issue :-)
[17:09] <bigjools> maxb: yeah :)
[17:09] <abentley> adeuring: But instead of checking whether the jobs are ready_jobs, you want to check whether their status is "pending".
[17:09] <adeuring> ok
[17:10] <abentley> adeuring: Actually, it's JobStatus.WAITING
[17:11] <adeuring> thanks
[17:11] <maxb> bigjools: gah, ok. So we need to trap whatever the exception is that is supposedly bubbling out of PoppyFileWriter.close() and breaking things?
[17:11] <bigjools> maxb: yup.
[17:11] <abentley> adeuring: you probably also want to look for running jobs.
[17:11] <maxb> And that's not being logged anywhere useful at all currently?
[17:12] <adeuring> right
[17:12] <bigjools> trying to get to the logs right now, they're a bit out of date...
[17:12] <sinzui> jcsackett: bigjools: the keyserver issue I was looking at was that that keys from keyserver.ubuntu.com were taking 24 hours to get to canonical's keyserver
[17:12] <jcsackett> dig. not related.
[17:12] <bigjools> yeah
[17:15] <sinzui> rvba, mars was confused at the time. remember that the owner field on series has no power, users think it does so they want to change it. It is really the registrant. Some users think the registrant has power, but it does not. We should never let users edit the historical record of who registered it.
[17:16] <rvba> sinzui: I'm sorry "mars = march" in french, this was a quote from what you told me yesterday.
[17:16] <sinzui> rvba: just in case you ask, the driver or a series is called a release manager. That user can edit the series and can create milestones. The project owner delegates power over the series to the RM so that the user/team can accomplish their task
[17:17] <rvba> sinzui: so for the distroseries, productseries, and productrelease it should not be possible to reassign them right?
[17:18] <sinzui> correct
[17:18] <rvba> sinzui: all right ... I'll have a few tests to refactor then ;-) ...
[17:19] <rvba> I meant I'll have _quite_ a few tests to refactor
[17:23] <sinzui> rvba: do we have tests that show changing the owner of a series?
[17:24] <rvba> sinzui: ./lib/lp/registry/stories/distroseries/xx-reassign-distroseries.txt
[17:25]  * sinzui looks
[17:25] <sinzui> rvba: that ancient test can be deleted
[17:26] <rvba> sinzui: so I figured if series don't have an owner ;-)
[17:28] <sinzui> rvba: a story test is a user acceptance test. The narrative is written from the user perspective and show a simple path to accomplish a task. That test does not state who the user use, what is task is, show how he accomplishes it, and how he knows he is done. When you see stories like this, or testing error conditions, there is a good chance you can delete instead of refactor
[17:29] <rvba> ok
[17:31] <jml> allenap: could you please take a look at https://code.launchpad.net/~jml/launchpad/reported-by-me-121646/+merge/51148
[17:31] <jml> allenap: would appreciate someone familiar w/ bugs taking a look at it.
[17:31] <rvba> sinzui: I'll have to go someplace now ... but I think I'm ok to continue, I've made the structural changes (interface, model, sql patch) and then I correct things bit by bit as tests fails. Good to know that I can delete tests also ;-).
[17:32] <sinzui> Have a good evening
[17:34] <rvba> thanks a lot for your support. I think I'll have something for you to review when you log in tomorrow.
[17:39] <bigjools> sinzui: hi, did you get my email about the ftp server bug?
[17:54] <sinzui> bigjools: I did and I subscribed to the bug
[17:55] <bigjools> sinzui: great, not sure out of you and tim who's best suited to fix it
[17:55] <sinzui> we will talk about it in a few hours
[17:55] <bigjools> something is arsed up with the logging which causes gpg verification to fail
[17:55] <bigjools> jml might be able to help
[19:16] <jml> what
[19:16] <jml> I don't help, I *strategize*
[19:16] <lifeless> thats very strategic of you
[19:17] <jml> lifeless: hi
[19:17] <lifeless> ola
[19:17] <jml> there are a bunch of critical bugs for loggerhead that are fix committed
[19:18] <jml> what has to happen to get them fix released?
[19:18] <lifeless> I'm not sure
[19:18] <jml> hmm.
[19:18] <lifeless> we could do what we do with lp
[19:18] <lifeless> and say 'once lp has deployed it, we're done'
[19:19] <lifeless> encourage the packagers of loggerhead to package trunk
[19:19] <lifeless> or we could do what we do with e.g. lazr.restful
[19:19] <lifeless> and cut releases regularly
[19:21] <jml> lifeless: we have a regular release for lazr.restful?
[19:21] <lifeless> jml: we make a release when we fix something
[19:21] <jml> oh right.
[19:22] <Ronnie> if someone files a private bug to a project where im administrator of, why cant i view the bug?
[19:22] <lifeless> Ronnie: are you the bug supervisor as well ?
[19:22] <Ronnie> lifeless: how can i see that?
[19:23] <lifeless> its on the front page for the project IIRC
[19:24] <Ronnie> lifeless: the project-group where im admin of, is maintainer and driver
[19:24] <lifeless> Ronnie: whats the project group and project in question
[19:24] <Ronnie> https://staging.launchpad.net/ubuntu-nl-artwork
[19:24] <jcsackett> sinzui: i think you and i got irritated by pqm at the same time.
[19:25] <Ronnie> and bug: https://api.staging.launchpad.net/1.0/bugs/728920
[19:25] <Ronnie> the bug is created with a script were testing
[19:27] <sinzui> jcsackett: have you submitted a fix
[19:28] <lifeless> Ronnie: https://bugs.staging.launchpad.net/ubuntu-nl-artwork
[19:28] <jcsackett> sinzui: not yet submitted, but https://code.launchpad.net/~jcsackett/launchpad/resolve-conflicts has no text conflicts and the conflicted test file passes.
[19:28] <lifeless> Ronnie: see the bug supervisor is a team, not you
[19:29] <jcsackett> i'm not 100% certain it's a good resolution, as i wasn't entirley certain what some of the changes were doing.
[19:29] <lifeless> Ronnie: have you added yourself to the team ?
[19:29] <Ronnie> https://staging.launchpad.net/~ubuntu-nl-artwork/+members (name=Ronnie)
[19:29] <sinzui> jcsackett: if db-devel does not conflict with stable, someone post a fix without says so
[19:30] <jml> oh
[19:30] <Ronnie> im also the owner of the team
[19:30] <jml> I'm fixing that too
[19:30] <lifeless> Ronnie: I know it shows you as an admin, but https://staging.launchpad.net/~ronnie.vd.c/+participation
[19:30] <jcsackett> sinzui: no, there were conflicts when i merged stable into a checkout of db-devel.
[19:30] <jml> jcsackett: I'll look
[19:30] <jcsackett> jml: you mean pqm conflicts?
[19:30] <lifeless> Ronnie: you may not actually be in the team
[19:30] <jml> jcsackett: yeah. I'll take a look at your branch.
[19:30] <lifeless> Ronnie: can other folk in that team see the bug ?
[19:30] <jcsackett> jml: sinzui: okay. it's the garbo file i'm not sure i cleaned up right, but again, test_garbo passes.
[19:31] <jml> I got distracted while make schema was running
[19:31] <jml> jcsackett: yeah. I'll check to see if your fixes match mine
[19:31] <jcsackett> jml: cool.
[19:31] <jml> jcsackett: did you just union the changes?
[19:31] <jcsackett> jml: yeah, i just whacked all the conflict gunk.
[19:31] <jcsackett> most naive fix possible, but tests pass, so...
[19:31] <jml> jcsackett: ok. that's what I did. And, as you say, the tests pass.
[19:31] <sinzui> jcsackett: sorry for misunderstanding. I am struggling to concentrate this week. Too many issues in my head
[19:32] <jml> jcsackett: go ahead and land it, r=jml
[19:32] <jcsackett> jml: dig.
[19:32] <jcsackett> jml: skip ec2?
[19:32] <jml> jcsackett: I would.
[19:32] <jml> jcsackett: thanks for resolving it, those mails are a pain.
[19:33] <sinzui> jcsackett: yes, submit to pqm
[19:34] <sinzui> okay, I will now fix leonardr's bugtask branch
[19:34] <lifeless> http://webnumbr.com/.join(launchpad-oops-bugs.all,launchpad-timeout-bugs.all,launchpad-critical-bugs.all)
[19:36] <sinzui> \o/ no critical bugs
[19:36]  * sinzui opens the champaign.
[19:36] <lifeless> sinzui: heh, not yet :)
[19:37] <Ronnie> lifeless: another member of that team cant access the bug
[19:38] <lifeless> ok
[19:38] <lifeless> then its probably your script thats at fault
[19:38] <lifeless> the rules for visibility on private bugs are:
[19:38] <lifeless>  - if you are subscribed, you can see it
[19:38] <lifeless>  - you aren't, you can't.
[19:39] <lifeless> the default for private bugs filed through the web ui is to subcribe the security team or failing that the bug maintenance team
[19:39] <Ronnie> new_bug = launchpad.bugs.createBug(title=mail['Subject'], description=message, private=True, target=forum_project, tags=tags)
[19:40] <Ronnie> lifeless: thats the line that creates the bug
[19:40] <sinzui> lifeless: Ronnie: has the bug supervisor changed? Lp really does not let you change it once set. Changing the bug supervisor will not change the permission on existing bugs. eg, the old supervisor still has access, and the new one does not
[19:40] <sinzui> This scenario is a common way users shoot themselves in the foot
[19:41] <Ronnie> sinzui: i didnt change the team settings for months now
[19:41] <Ronnie> and were testing the script today
[19:41] <lifeless> so first step
[19:41] <lifeless> I'l file a bug for you manually
[19:41] <lifeless> see if you can see it
[19:42] <lifeless> bug 728921
[19:42] <Ronnie> lifeless: nope, private
[19:43] <Ronnie> ow wait, i used the api link, moment
[19:43] <Ronnie> lifeless: i can see the bug you made
[19:43] <lifeless> great
[19:43] <lifeless> so its the api made bug not having subscribers
[19:48] <sinzui> really! surely such an issue would have been reported years ago
[19:48]  * sinzui looks
[19:50] <sinzui> Ronnie: lifeless bug 398846 says it is low. I think that is insane since you can make it private at the same time
[19:50] <sinzui> lifeless: High at least, do you think it is critical
[19:50] <sinzui> https://bugs.launchpad.net/launchpad/+bug/398846
[19:51] <sinzui> I see I glanced at the problem from a permission's sperpective
[19:53] <Ronnie> is there some way to create the bug non private, add a bug supervisor and then make the bug private?
[19:55] <lifeless> Ronnie: the usercode running the script can subscribe the security team
[19:57] <lifeless> I've commented on the bug
[19:58] <lifeless> bug.subscribe(person=yoursecurityteam)
[19:58] <lifeless> you can probably get the security team from the api
[19:58] <Ronnie> lifeless: ill try
[20:04] <allenap> jcsackett: Do you need a mentored review for jml's branch from earlier?
[20:05] <jcsackett> allenap: no, i'm a graduate. :-)
[20:05] <allenap> jcsackett: Woohoo :)
[20:07] <jml> allenap: it's already in EC2, but I would appreciate a quick glance if you've got the time
[20:07] <allenap> jml: Sure.
[20:07] <jml> allenap: mostly because I'm not familiar with searchTasks and maybe there are booby traps.
[20:11] <lifeless> jml: jcsackett: before you land that patch
[20:11] <lifeless> are you aware you are going to cause timeouts ?
[20:11] <allenap> lifeless: The beauty of that portlet is that few people will notice ;)
[20:12] <lifeless> allenap: /everyone/ will notice
[20:12] <jml> lifeless: no, I'm not.
[20:12] <lifeless> allenap: and we should care deeply about it
[20:13] <jml> lifeless: why is it going to cause timeouts?
[20:13] <lifeless> jml: because 'reported by me' is an expensive query that times out regularly on Person:+bugs
[20:13] <allenap> lifeless: I'm being silly. But in this case, the portlet with links is rendered in the initial request, and the stats are calculated and slotted into place after the fact. So, only the people who read the numbers will notice. But, with my serious hat on, yes, we should care.
[20:14] <lifeless> allenap: folk will notice, and think less of LP; folk will notice and ask in #launchpad; we'll have added another bug to the critical pile we're trying to reduce.
[20:14] <jml> lifeless: the search never times out
[20:14] <jml> lifeless: at least, not for me.
[20:15] <lifeless> bug 421901
[20:15] <lifeless> jml: you're adding the overhead of the search to the overhead of the bug counts portlet
[20:16] <jml> lifeless: it's a different query
[20:18] <Ronnie> lifeless: that worked:    new_bug.subscribe(person=project.owner_link)
[20:19] <jcsackett> \o/ pqm success.
[20:19] <lifeless> Ronnie: cool
[20:19] <Ronnie> thx all
[20:20] <lifeless> jml: reported by in ubuntu is up to 4 seconds of overhead.
[20:20] <jml> well that was a huge waste of time
[20:21] <lifeless> jml: you've added 14 bugs in ubuntu, so for you the query will be fast
[20:21] <lifeless> at the other send of the scale, keybuk has filed nearly 2000 bugs in ubuntu
[20:21] <lifeless> jml: I may be wrong
[20:22] <lifeless> jml: we will need to qa carefully though - particularly by taking over canonical-scott on qastaging and making sure that the portlet doesn't time out
[20:22] <lifeless>  bug 711071
[20:22] <lifeless> is the existing bug about the portlet timing out
[20:22] <jml> too late. I've cancelled the branch. I've got too much on at the moment to go through that for what was an opportunistic bug fix.
[20:23] <lifeless> jml: I understand
[20:23] <lifeless> jml: to help you assess things in future - if you add work to a page, assume its nontrivial.
[20:24] <lifeless> jml: that includes adding links to person objects we don't already show, aggregate and non aggregate stats, tables, and branding
[20:26] <lifeless> jcsackett: ^ when reviewing - you need to also think about the performance impact of a change : not /really really deeply/ - but just : 'has performance been considered? If not, what could go wrong?'
[20:27] <jcsackett> lifeless: yeah, i saw searchTasks and thought about it, but didn't think that was an expensive query.
[20:27] <jcsackett> i'll remember next time.
[20:27] <lifeless> jcsackett: 4 seconds worst case isn't /terrible/
[20:27] <lifeless> jcsackett: but the portlet in the distro context is 10 seconds already.
[20:27] <allenap> jml: Is it worth leaving the link in, without the stats?
[20:27] <lifeless> 10+4 ...
[20:27]  * jcsackett nods.
[20:27] <jcsackett> yeah, that's hitting our timeout threshold.
[20:27] <jcsackett> not to mention sort of sucking. :-P
[20:28] <lifeless> its a shame that we have things like the portlet which are already so slow
[20:28] <lifeless> we're making progress though
[20:29] <jml> allenap: for me, it would be an incremental improvement over having to type my nick in on the advanced search page every time I want to do it
[20:30] <allenap> lifeless: The non-personal bug counts could be cached quite readily, and refreshed out of band.
[20:30] <jml> allenap: but visually, it would look like a bug
[20:30] <jml> allenap: like we forgot to add the count.
[20:30] <lifeless> allenap: indeed, as long as we inject them rather than doing on-miss
[20:30] <lifeless> allenap: -but- we can make it massively faster just using aggregate queries
[20:30] <lifeless> allenap: see what I did for series + tag counts
[20:30] <allenap> jml: Yeah, that's a good point.
[20:31] <lifeless> allenap: we should make the miss case faster before going out of band: its more efficient for when we do need out of band / denormalisation in the future
[20:31] <jml> lifeless: when we lower the timeout, do we automatically add exceptions for the pages that are still over it?
[20:32] <lifeless> jml: if we have things go completely out of whack
[20:32] <lifeless> jml: we don't default to do doing that
[20:32] <jml> lifeless: hmm ok.
[20:33] <jml> lifeless: the kernel team remonstrated with me about how our timeout lowering is blocking them from doing critical things
[20:34] <lifeless> possibly https://bugs.launchpad.net/launchpad/+bug/732398 ?
[20:35] <lifeless> which wgrant is landing a branch for today
[20:35] <jml> lifeless: maybe. they are mostly from scripts they run, I gather.
[20:35] <lifeless> if they have a particular thing that has /stopped/ working they shuld pop into #launchpad and we'll see what we can do
[20:36] <jml> lifeless: they feel quite strongly that it is wrong to stop a fraction of users from doing their work so we can lower our timeouts.
[20:38] <lifeless> that isn't the intention, and if they come talk to us when a particular thing stops working, we'll look and see whats up
[20:38] <lifeless> yesterdays oops report shows 531 timeouts
[20:38] <lifeless> and 6.3Million non-monitoring web pages served
[20:39] <lifeless> thats 0.009% failure rate
[20:40] <jml> lifeless: that includes API calls?
[20:41] <lifeless> yes
[20:41] <jml> lifeless: hmm. so either they are very unlucky or were speaking from stale data.
[20:41] <lifeless> the page ids with # in them
[20:41] <lifeless> https://bugs.launchpad.net/launchpad-project/+bugs?field.tag=timeout
[20:41] <lifeless> are api calls
[20:42] <lifeless> jml: they need to come communicate when the problem is happening. its /cheap/ for us to workaround *if* its a simple 'this is on the edge'
[20:42] <lifeless> jml: in the past things have been taking 30+ seconds to complete.
[20:42] <jml> lifeless: OK. I'll pass that on.
[20:42] <lifeless> jml: which -regretfully- there is no sensible way for us to work around
[20:42] <Ronnie> lifeless: another question, which characters are allowed for tags?
[20:42] <jml> lifeless: they are working on a two week cadence now
[20:43] <lifeless> jml: I hold the failure rate below 0.05%
[20:43] <jml> lifeless: if they do come to us for help and you aren't around, is it clear what we should do?
[20:43] <lifeless> Ronnie: lowercase
[20:43] <Ronnie> lowercase and dash
[20:44] <lifeless> jml: the CHR should evaluate the problem and decide what to do
[20:44] <lifeless> jml: I don't think there is a cut and dried precanned answer
[20:44] <lifeless> jml: in previous discussions with e.g. andy, they have not considered the costs of us leaving the timeout high
[20:44] <jml> lifeless: ok. so that's a "no", at least for the moment.
[20:45] <lifeless> jml: I have pretty huge overlap with the bulk of the kernel team - them being most us based AFAIK
[20:45] <jml> lifeless: yeah. it's understandable. having the timeout high is a cost borne by the commons
[20:45] <lifeless> jml: and by then - slower limits = choppier queuing = more latency on their own requests
[20:45] <lifeless> s/then/them/
[20:48] <jml> lifeless: yeah, tragedy of the commons.
[20:49] <lifeless> jml: I've said in previous mails to the list that I'm happy for any dev to request a timeout exception - and for losas to Just Do Them
[20:50] <lifeless> jml: all I ask is that it be <= 20 seconds, and there must be a critical timeout bug associated with it.
[20:51] <jml> lifeless: ok, thanks.
[20:51] <lifeless> I'm happy for losas to add them off their own bat as well, of course
[20:51] <jml> lifeless: fwiw, apw reported that the site feels faster in the morning.
[20:51] <lifeless> its very much a 'look after the site' kindof thing - exactly what ops should be doing
[20:51] <lifeless> jml: naturally, we deployed nearly a weeks worth of timeout fixes
[20:52] <lifeless> jml: its the downtick on http://webnumbr.com/.join(launchpad-oops-bugs.all,launchpad-timeout-bugs.all,launchpad-critical-bugs.all)
[20:52] <jml> lifeless: not quite what I meant. During mornings in general.
[20:52] <lifeless> oh
[20:52] <jml> lifeless: before the US awakes.
[20:52] <lifeless> thats probably in dc queuing
[20:52] <jml> lifeless: that was my guess.
[20:52] <lifeless> we are out of the terrible phase there
[20:52] <lifeless> but still have a ways to go
[20:52] <jml> (insofar as anecdata needs explanations)
[20:52] <lifeless> we're bringing up 50 more appserver instances
[20:53] <lifeless> probably not next week, as we're still a losa short
[20:53] <lifeless> maybe even two
[20:53] <jcsackett> jml: i officially love the word "anecdata"
[20:54] <jml> jcsackett: it's one of my favourites.
[20:55] <lifeless> thumper: Revision 12563 can not be deployed: needstesting
[20:55] <thumper> lifeless: which one is that?
[20:56] <lifeless> Strip any path information off email attachments when storing in the librarian
[20:56] <thumper> lifeless: I thought I marked that as untestable
[20:56] <thumper> https://bugs.launchpad.net/launchpad/+bugs?field.tag=qa-needstesting doesn't show it
[20:57] <lifeless> thumper: ah, just recently
[20:58]  * lifeless refreshes the report
[20:58] <thumper> lifeless: well... in the last half hour
[20:58] <lifeless> thumper: yeah, I'm jus tseeing if we can fix things for oem
[20:58] <thumper> 9:17 according to the email
[20:59] <sinzui> thumper: abentley: I am looking at https://answers.launchpad.net/launchpad/+question/148614 . I have seen this before. I think the svn repo is corrupt and we cannot complete the import. I do not see this kind of problem described in https://help.launchpad.net/VcsImportRequests
[21:00] <thumper> sinzui: oh god
[21:01] <thumper> sinzui: get them to request another import
[21:01] <thumper> sinzui: as this one uses CSCVS
[21:01] <thumper> sinzui: bzr-svn is much more robust
[21:01] <thumper> that'll most likely fix it
[21:01] <sinzui> oh, that's right. I was surpised to see cscvs but did not think getting a new one was the right thing to do
[21:01] <sinzui> noted
[21:01] <sinzui> thanks thumper
[21:07] <wgrant> lifeless: Does garbo-hourly run regularly on qas?
[21:08] <lifeless> wgrant: we did start it running, I don't know if its still running
[21:08] <lifeless> wgrant: check with losa
[21:09] <abentley> sinzui: you can simulate getting a new one using bzr-svn on your local machine, if you like.
[21:09] <sinzui> abentley: good to know
[21:10] <wgrant> lifeless: Can you do a quick 'SELECT COUNT(*) FROM sourcepackagerelease WHERE changelog IS NULL' on qas, so we can check?
[21:10] <lifeless> not exactly quick.
[21:11] <wgrant> Really?
[21:11] <lifeless> 586610
[21:11] <wgrant> Thanks.
[21:15] <wgrant> This seems like it deserves an incident-level response...
[21:15] <LPCIBot> Project windmill build #35: STILL FAILING in 1 hr 14 min: https://hudson.wedontsleep.org/job/windmill/35/
[21:15] <wgrant> Particularly since we can't fix it today.
[21:15] <wgrant> Because ELOSA.
[21:16] <lifeless> wgrant: I agree
[21:16] <wgrant> This should have been a drop-everything situation 8 hours ago :/
[21:16] <wgrant> I realise that we have no EU maintenance teams, but...
[21:18] <lifeless> benji: I filled out the deploy report
[21:18] <benji> thanks!
[21:55] <jml> huwshimi: hi
[21:55] <huwshimi> jml: Hey there. Sorry I got distracted :(
[21:56] <jml> huwshimi: np
[21:56] <jml> huwshimi: skype?
[21:56] <huwshimi> jml: yes
[21:59] <jml> huwshimi: http://pastebin.ubuntu.com/578349/
[22:04] <LPCIBot> Project windmill build #36: STILL FAILING in 48 min: https://hudson.wedontsleep.org/job/windmill/36/
[22:04] <wgrant> wallyworld: Hi.
[22:05] <wallyworld> hello
[22:05] <wgrant> wallyworld: Bug #732442
[22:05] <wgrant> That's the Windmill failure.
[22:05] <wgrant> It's a real bug, but doesn't affect production data.
[22:06] <wgrant> wallyworld: I looked at it last night, but it's non-trivial.
[22:07] <wallyworld> wgrant: thanks. that was on my todo list to follow up. i'll fix it.
[22:07] <wallyworld> good that the production issue is fixed
[22:17] <thumper> wallyworld: hi, mumble?
[22:17] <wallyworld> thumper: ok
[22:33] <wallyworld> thumper: https://code.launchpad.net/~wallyworld/launchpad/inline-recipe-distro-series-edit
[22:34] <wallyworld> thumper: https://code.launchpad.net/~wallyworld/launchpad/inline-recipe-distro-series-edit/+merge/52940
[22:36] <wallyworld> thumper: http://people.canonical.com/~ianb/distroseries-checkbox.png
[22:45] <wallyworld> thumper: https://code.launchpad.net/~wallyworld/launchpad/inline-multicheckbox-widget/+merge/52943
[22:57] <wallyworld> thumper: http://people.canonical.com/~ianb/distroseries-popup.png
[22:58] <lifeless> wallyworld: does it need a popup?
[22:58] <lifeless> wallyworld: why not just edit in-place?
[22:59] <wallyworld> lifeless: because you need to show all the choices
[22:59] <wallyworld> lifeless: the html only shows the selected ones
[22:59] <thumper> lifeless: just a design decision
[22:59] <lifeless> wallyworld: hmm, I guess I'm thinking down the track
[23:00] <thumper> lifeless: we aren't using this pattern anywhere else
[23:00] <lifeless> wallyworld: e.g. youtubes 'save this video' ui
[23:00] <thumper> lifeless: if we decide it is worth a change, we can look at it then
[23:00] <lifeless> thumper: of course
[23:00] <thumper> lifeless: it should be implemented as a widget, so easy to fix
[23:04] <lifeless> benji: I'm volunteering to be the next 'rm'
[23:07] <elmo> -fr
[23:07] <lifeless> elmo: yes, thats my plan
[23:16] <lifeless> gary_poster: with_ is deployed and working.
[23:16] <lifeless> gary_poster: we're likely to change the api to make storm upstream happy, but changing a few callsites will be easy enough
[23:25] <thumper> lunch is calling