[00:02] <thumper> bet that farmer is pissed
[00:02] <thumper> how's he going to have a straight line of trees now?
[00:03] <lifeless> wgrant: anyhow, I'mm not asking you to decide, I'm just going for as much input as I can get
[00:03] <wgrant> lifeless: I know.
[00:03] <wgrant> I don't really like the idea of rolling out with a known hole.
[00:04] <wgrant> Plus it's easily CPable to the single host once the request path is fixed.
[00:04] <wgrant> So I don't see the urgency, since the appserver code is fine.
[00:05] <wgrant> (librarian updates can be done without downtime now, yeah?)
[00:12] <lifeless> handwave
[00:12] <lifeless> in principle yes
[00:12] <lifeless> spm: do you have a few minutes?
[00:13] <lifeless> spm: I want to talk RT 41202 and some related bits
[00:14] <spm> ha! no. not atm. need abot 30+ mins. 11+ reds to deal with.
[00:14] <lifeless> ok
[00:14] <lifeless> I'll list out the bits here for when you get time
[00:14] <lifeless>  - check the cert order can be done
[00:15] <lifeless>  - check host headers getting to th ebackend librarian
[00:15] <lifeless>  - check we can deploy librarian updates w/out downtime
[00:15] <lifeless> and seperately
[00:15] <lifeless>  update edge once we figure out whats up with bb again
[00:16] <wgrant> So, I suggest the following:
[00:17] <wgrant>  - Hook in the new view, but initially FF'd out.
[00:17] <wgrant>  - Get the hack in r11506 controlled by FF>
[00:17] <wgrant>  - Release.
[00:18] <wgrant>  - Get request path sorted.
[00:18] <wgrant>  - Get domain match fix CPed to librarian.
[00:18] <wgrant>  - Test.
[00:18] <wgrant>  - Flip the two FF flags
[00:18] <lifeless> same flag, surely ?
[00:18] <wgrant> Could be.
[00:19] <lifeless> it hits the same view
[00:20] <lifeless> so when the view is controlled by the ff, it will affect both, no ?
[00:20] <lifeless> bbiab
[00:21] <lifeless> spm: OOPS reporting on edge is naffed; just needs a redeploy. Add to the bottom of your reds list ?
[00:21]  * lifeless awols
[00:21] <spm> snort
[00:21] <wgrant> lifeless: Yeah, but I'd like to have a way out.
[00:21] <wgrant> Or Platform might kill someone :)
[00:26] <wgrant> lifeless: Is IBuilder:+history featuring on the timeout reports?
[01:08] <lifeless> wgrant: well I haven't filed a bug for it yet
[01:08] <lifeless> wgrant: its gotta be close though
[01:09] <lifeless> 17.68 99% completion time
[01:09] <lifeless> wasn't in the top oops report in the weekend though
[01:09] <wgrant> lifeless: Yeah, I've had a few reports of it today.
[01:09] <lifeless> ok
[01:10] <lifeless> did anyone give you an OOPS?
[01:10] <lifeless> not that they are any use till edge is redeployed
[01:10] <wgrant> OOPS-1709A1140 is one.
[01:10] <lifeless> wgrant: FWIW https://bugs.edge.launchpad.net/launchpad-project/+bugs?field.tag=timeout is my master list
[01:11] <lifeless> wgrant: no such oops, well not yet anyhow
[01:12] <wgrant> Really?
[01:12] <wgrant> It should exist by now, surely.
[01:12] <wgrant> It was 8 hours ago.
[01:13] <lifeless> got any from different server?
[01:13] <wgrant> OOPS-1709F1169
[01:14] <lifeless> that works
[01:14] <lifeless> so, 'A' may not be rsyncing right
[01:16] <lifeless> wgrant: https://bugs.edge.launchpad.net/soyuz/+bug/631206
[01:16] <_mup_> Bug #631206: Builder:+history timeouts <timeout> <Soyuz:New> <https://launchpad.net/bugs/631206>
[01:17] <wgrant> It's not fixed on edge.
[01:17] <wgrant> OOPS-1709EB1925, for example.
[01:20] <lifeless> wgrant: if you want to let me know what constants should be in there, or how to determine them, I'll do an explain analyze on staging
[01:21] <lifeless> poolie: hi
[01:21] <poolie> hi there lifeless
[01:21] <lifeless> poolie: feature flags; whats the best place to look for a howto ?
[01:21] <poolie> the docstring
[01:21] <poolie> which i think is now up on the web
[01:22] <lifeless> [and I hope you had a good weekend etc etc]
[01:23] <poolie> you too
[01:32] <lifeless> it was interesting
[01:33] <lifeless> poolie: I'm trying to match up with sinzui did in commit 11470 and the docstring
[01:33] <poolie> r11470 of devel?
[01:33] <lifeless> poolie: lp.services.features.__init__ is the docstring I'm looking at
[01:33] <lifeless> poolie: yes
[01:34] <poolie> heh, that diff is interesting as an example of the kind of tests we should make it easier to write
[01:34] <poolie> it's not all that bad, but itcould be smaller
[01:35] <poolie> i'm really sorry it caused disruption
[01:35] <lifeless> poolie: meh, don't worry about it
[01:35] <lifeless> our process makes lp fragile like that
[01:37] <poolie> +        def in_scope(value):
[01:37] <poolie> +            return True
[01:37] <poolie> +
[01:37] <poolie> +        return FeatureController(in_scope).getFlag('gmap2') == 'on'
[01:37] <poolie> this is a bit strange
[01:37] <poolie> i would have just used the default controller, and relied on only trusting the 'default' scope for now
[01:38] <poolie> it may be that api wasn't there in the initial landing to db-devel, which only had the sql and model code
[01:38] <lifeless> http://itsnotthecoffin.blogspot.com/2010/09/christchurch-earthquake-new-zealand.html
[01:38] <poolie> i mean, two orthogonal things:
[01:38] <poolie> 1- that might have been the api skew that bit him
[01:38] <poolie> 2- it's not wrong but it's not how i would have written it
[01:38] <lifeless> how would you have written it? (So I can do the same)
[01:39] <wgrant> lifeless: SELECT id FROM builder WHERE name='adare';
[01:39] <wgrant> That's the first one.
[01:40] <lifeless> 3
[01:40] <poolie> lifeless, return getFeatureFlag('gmap2') == 'on'
[01:40] <poolie> also perhaps we should add a thing for casting to bools, etc, so we don't have "== 'on'", "== 'yes'" etc all over
[01:41] <wgrant> lifeless: The two for archive privacy are false.
[01:41] <wgrant> lifeless: And use just about anyone for the person.
[01:41] <poolie> lifeless, i'm going to ask spm to send the error output from process-mail.py to a log file synced to devpad
[01:41] <wgrant> I forget my ID.
[01:41] <poolie> rather than as at present being sent as mail and being ignored
[01:42] <poolie> can i get a t-a rs for this?
[01:42] <lifeless> poolie: theres an ongoing effort to rearrange things along those lines
[01:42] <lifeless> I understand it to need code changes
[01:42] <lifeless> rather than being a deployment issue
[01:43] <poolie> code changes might help
[01:43] <lifeless> my understanding from mthaddon is that the goal is:
[01:43] <lifeless>  - must-see remain in email
[01:43] <poolie> but here we just have a cron job and i want to put a '>file' into it
[01:43] <lifeless>  - activity, details, etc go to a log
[01:43] <lifeless> and we have the script-last-run mechanism to cope with as well
[01:44] <lifeless> I haven't looked into how that works yet
[01:45] <poolie> hm
[01:45] <poolie> i wonder if anyone is reading them now
[01:45] <lifeless> I was speaking with thumper about it this morning
[01:45] <poolie> i suspect they get a mail for every mail received by launchpad
[01:46] <poolie> so should i file this, or not?
[01:46] <lifeless> poolie: anyhow, I'm ok in principle with any change to make the thing more diagnosable and useful; I have two concerns here: there are automated things that look for scripts running/not running and I don't know if changing mails will affect that
[01:46] <lifeless> secondly I don't know where mthaddons change to make only errors be emailed is at either
[01:47] <lifeless> oh, and thirdly disk space - in the current setup the logs are archived in the mail archive system, devpad isn't setup to scale to huge numbers of logs (e.g. we have to prune OOPS reports to conserve space)
[01:47] <lifeless> poolie: I'll happily put my stamp to a change once those things are looked into
[01:48] <poolie> maybe i'll just ask spm to pull things out of mail for me one-off then
[01:48] <lifeless> I wish I knew this bit of the system better to be able to just say 'yeah, doit', but sadly I don't
[01:49] <lifeless> I wonder if you can get a filtered mbox from mailman
[01:51] <wgrant> lifeless: Hm, edge has 40x more non-SQL time?
[01:51] <wgrant> It'd be handy if the timeline showed when we were GIL-blocked :(
[01:52] <lifeless> wgrant: thats extremely har dto do
[01:53] <wgrant> lifeless: Of course.
[01:54] <lifeless> wheee that query is slow
[01:55] <lifeless> 22 seconds
[01:55] <wgrant> Ow.
[01:57] <lifeless> plan in the bug
[01:57] <wgrant> Yep, already reading.
[01:57] <lifeless> 171MB disk merge.
[01:57] <lifeless> \o/
[01:59] <wgrant> I wonder why it decided to do that.
[02:03] <wgrant> I can't see where the final 20s went, though.
[02:03] <wgrant> Unless it was in that disk merge, and didn't show up in the Sort times.
[02:05] <lifeless> qTime: 1811.852 ms
[02:05] <lifeless> thats my rearranged one
[02:05] <wgrant> What did you do?
[02:06] <lifeless> put the condition in the join
[02:06] <lifeless> rather than bringing back a metric tonne of unrelated data and filtering
[02:06] <wgrant> Interesting.
[02:06] <lifeless> no guarantee I got it right
[02:07] <lifeless> but the first few rows certainly look the same
[02:07] <lifeless> conceptualy we want:
[02:07] <lifeless> oh and here
[02:07] <wgrant> That is slightly confusing.
[02:08] <lifeless> I had one bit I wasn't totally happy with
[02:08] <lifeless> tweaking now
[02:10] <lifeless> so the archive left join team-p thing brings back one row from team participation per archive, not *all the rows of the team*
[02:10] <wgrant> Ah, good.
[02:10] <lifeless> because (person, team) is unique in teamparticipation
[02:11] <lifeless> archive.owner will be correlated against archive
[02:11] <lifeless> but the planner has some choice there
[02:11] <lifeless> we end up with an archive, teamparticipation table with one row per archive
[02:11] <lifeless> and we then join that to packagebuild where packagebuild has an archive set
[02:15] <lifeless> hmm, I think we can cut a left join here
[02:16] <wgrant> Where?
[02:16]  * wgrant looks.
[02:16] <lifeless> packagebuild - archive
[02:16] <lifeless> we always want to look at archive
[02:16] <lifeless> if bfj.packagebuild is set
[02:16] <wgrant> We do.
[02:16] <wgrant> But how does that let us eliminate a left join?
[02:17] <lifeless> (packagebuild inner join archive)
[02:17] <wgrant> Oh, one of your compound joiny thingies which I haven't seen widely applied before.T
[02:17] <wgrant> True.
[02:19] <lifeless> 2.6 seconds. heh
[02:19] <lifeless> fiddling at this level is risky
[02:20] <lifeless> ahh
[02:20] <lifeless> get rid of both, its much happier
[02:20] <lifeless> 930ms
[02:21] <lifeless> spm: how goes the meltdown ?
[02:22] <lifeless> wgrant: left outer join means you have to iterate both sides rather than iterating one side and doing specific lookups on the other side
[02:22] <spm> lifeless: more in a semi solid state atm; just kicked off a u1 staging DB whatsits; so should be able to spare you some attention shortly
[02:22] <lifeless> getting rid of the packagebuild query may help a great deal
[02:22] <lifeless> spm: ok, say in 40 ?
[02:23] <spm> lifeless: that should be fine; hopefully sooner... but I was ever optimistic.
[02:29] <lifeless> brb
[02:31] <lifeless> wgrant: you may wish to read http://www.postgresql.org/docs/8.4/static/explicit-joins.html
[02:33] <wgrant> lifeless: Ah.
[02:45] <lifeless> basically the goal of the planner is to do the most effective work first
[02:45] <lifeless> as soon as we explicitly constrain things it can't
[02:45] <lifeless> and left outer joins explicitly constrain things simply by being used.
[03:14] <lifeless> spm: hi
[03:14] <lifeless> spm: thats 40 and a bit :)
[03:19] <lifeless> wgrant: if you're tweaking soyuz queries
[03:19] <lifeless> wgrant: bug 629921 may be of interest
[03:19] <_mup_> Bug #629921: Archive:+packages with empty name search does like '%%' search. <timeout> <Soyuz:Triaged> <https://launchpad.net/bugs/629921>
[03:23] <lifeless> spm: tap tap tap
[03:23] <spm> :-)
[03:27] <spm> lifeless: so aiui, 41202 is predominantly a GSA thing. Also; you've set the pri to 89, but haven't given us any indication of timing needs around this? do you need this for the rollout this week? or later this month? or Mid next. ??
[03:31] <lifeless> spm:
[03:31] <lifeless> blah
[03:31] <lifeless> soon as we cna, before rollout if possible
[03:32] <lifeless> spm: but first, can you please trigger an edge redeploy
[03:32] <lifeless> spm: to fix OOPSes
[03:33] <wgrant> lifeless: So, what do you think of the plan I outlined?
[03:33] <wgrant> For the librarian stuff.
[03:33] <wgrant> And has kees looked at the thing yet?
[03:33] <lifeless> wgrant: sure, something like that
[03:34] <lifeless> no response from him yet
[03:34] <wgrant> :(
[03:34] <spm> lifeless: does that mean we *need* it for this rollout? we're operating *really* short staffed this week, so "I want" vs "I *need*" is really necessarilly separated atm.
[03:36] <lifeless> critical path is knowing that *we can get them*
[03:36] <lifeless> can't land the code till thats acked.
[03:36] <lifeless> The code is intended to be able to be turned on at will, so if we get the certs a few days later, thats fine.
[03:37] <lifeless> When I say can't I mean 'it would be a bit odd to land something we're really not sure if we can do'
[03:38] <lifeless> spm: the code proposal https://code.edge.launchpad.net/++oops++/~lifeless/launchpad/private-librarian/+merge/31020
[03:39] <lifeless> spm: so, short story - this has had plenty of eyeballs.
[03:39] <lifeless> spm: there's still hair and fine tuning, but less than we had live 3 months back for bug attachments
[03:40] <lifeless> spm: requests for a file will allocate a token; the token will expire; folk can copy the token to e.g. wget if they want, content secured in this way is partitioned off from all other content by browser security rules
[03:41] <lifeless> spm: to make this happen we need:
[03:41] <lifeless>  - the certs
[03:41] <lifeless>  - to check Host headers reach the librarian (we need to cross-check the domain)
[03:42] <lifeless>  - various small code changes over and above the current patch
[03:44] <lifeless> spm: the first two need your assistance
[03:49] <lifeless> spm: so (ping)
[03:50] <lifeless> spm: I get GSA on the certs; should I ask in is, or just wait :)
[03:50] <spm> yup. been looking at the mp
[03:50] <lifeless> spm: on the host headers side
[03:50] <lifeless> we need to figure out if the requests squid is making to the librarian (for launchpadlibrarian.net requests) are preserving the Host header.
[03:50] <lifeless> I'm not sure how to do that offhand ;)
[03:50] <spm> I'll chase the vg and see if we can get some traction
[03:51] <lifeless> thank you
[03:51] <spm> hmm. except I'm not sure who the vg is today...
[03:51] <lifeless> '-'
[03:51] <lifeless> brb
[03:52] <spm> gawd. I'm yak shaving again. need email to figure that; but home server (which holds email) is kaput. Need monitor on that server to WTF it; but desk is full of other crud and needs (mild) cleaning for room for a monitor. sigh.
[04:00] <lifeless> :|
[04:04] <wallyworld_> thumper: you having trouble with an unresolvable z3c lib? - "Getting distribution for 'z3c.recipe.scripts==1.0.1'"
[04:05] <wallyworld_> thumper: this project doesn't seem to exist in launchpad? i tried to view the revision history of buildout.cfg but an getting bzr error:
[04:05] <wallyworld_> bzr: ERROR: exceptions.TypeError: 'bzrlib._known_graph_pyx._MergeSortNode' object is not iterable
[04:05] <lifeless> \o/
[04:05] <wallyworld_> this was after running utilities/update-source-code
[04:06] <spiv> Ooh, I haven't seen that one before.
[04:08] <jtv> hi folks
[04:12] <lifeless> hi
[04:14] <spm> heya jtv
[04:14] <jtv> hi lifeless, hi spm
[04:14] <lifeless> \o/ oops are sane now
[04:14] <lifeless> https://lp-oops.canonical.com/oops.py/?oopsid=1710ED237
[04:14] <jtv> lifeless: thanks for fixing that oops problem in the weekend!
[04:14] <lifeless> jtv: de nada
[04:14] <lifeless> now we has sensible oopses
[04:15] <lifeless> with librarian stats :)
[04:15] <jtv> de algo… it was above and beyond.
[04:15] <lifeless> thumper: ^ have a look at that one
[04:15] <lifeless> thumper: items 98 and 99 in the 'sql' log
[04:15]  * thumper has to run kids to art class
[04:15] <lifeless> thumper: later will do, its user created
[04:16] <lifeless> no idea why the analyser things the librarian stuff is repeated though
[04:16] <jtv> Unfortunately I screwed up a little bit in my TranslationGroup fix.  I prefetched a lot of objects in queries that I moved to the slave store, when they're supposed to come from the default store.  So fetch the page from the master store and you've got lots of icon queries back.
[04:16] <lifeless> something is naffed there
[04:16] <lifeless> jtv: ahh
[04:16] <jtv> That's why we still got a timeout on edge.  :-(
[04:17] <lifeless> Store.of() might help too
[04:17] <jtv> Well yes, whatever gets the default store.
[04:17] <jtv> Or I use ISlaveStore(object).icon instead of object.icon
[04:18] <jtv> lifeless: so librarian interaction is now also logged in the oops?  I took the page from 1050 db queries to 303 actions, and if those 303 actions actually count more than just queries, that's extra-great.
[04:18] <lifeless> jtv: librarian download connects and reads are now logged yes
[04:24] <lifeless> wgrant: I have an Idea
[04:26] <lifeless> when you get back, tell me what you think:
[04:26] <lifeless>  - i123.restricted... - must match domain and LFA
[04:27] <lifeless>  - https?://launchpad-librarian.net/....  - also supports tokens
[04:27] <lifeless> the tests will work
[04:27] <lifeless> I can test once, directly, that when restricted is in the domain they must match
[04:27] <lifeless> this will prevent injecting content into the security context of a restricted file
[04:28] <jtv> lifeless: we've got a pretty annoying problem—there's a db column that was supposed to become obsolete ages ago but was still in use.  We landed a branch that stops initializing it and stops using it.  Can you guess the problem?
[04:29] <lifeless> other than db-devel going red because of conflicts?
[04:29] <lifeless> or skew between two things modelling the same content
[04:31] <jtv> edge vs production.
[04:32] <lifeless> prod reads it back
[04:32] <lifeless> with the old queries
[04:33] <jtv> yup
[04:33] <jtv> edge produces them without the data
[04:38] <jtv> *I* *hate* so-called single-sign on for oopses!
[04:39] <jtv> Open a dozen oopses in as many tabs: each and every one needs to go through the "single" sign-on page, and only the first one succeeds.  The rest just forwards to a different "single" sign-on page, which just loops back to itself.
[04:39] <lifeless> please file a bug on that
[04:39] <lifeless> and/or an RT
[04:39] <jtv> Oh, and then there's a few that just fail with "invalid transaction"
[04:39] <thumper> lifeless: what exactly about that oops should I be looking at?
[04:39] <jtv> Yes, I will thanks.
[04:39] <thumper> lifeless: apart from the general inefficiencies
[04:40] <lifeless> thumper: search for librarian in it
[04:40] <lifeless> or go to the row index in the sql statements that I pointed out to you
[04:40] <thumper> lifeless: ah, nice
[04:41] <thumper> 0ms  	librarian-read
[04:41] <thumper> that's pretty fast
[04:41] <lifeless> thumper: it will be in the socket buffer
[04:41] <lifeless> thumper: so effective a noop
[04:44] <wallyworld_> solved: TypeError: 'bzrlib._known_graph_pyx._MergeSortNode' object is not iterable
[04:44] <wallyworld_> historycache plugin is bad
[05:19] <wallyworld_> thumper: figured out the other problem - had to explicitly update the download cache. builds working again :-)
[05:22] <thumper> wallyworld_: :)
[05:24] <wallyworld_> thumper: bzr 101 question - can't recall, but i'm sure i did a bzr merge (and maybe also update) at the top level. is it sop to have to explicitly update the download cache?
[05:25] <thumper> wallyworld_: the download cache is likely to be a heavyweight checkout
[05:25] <thumper> wallyworld_: so it is the only thing you do bzr update one
[05:25] <thumper> s/one/on/
[05:25] <wallyworld_> thumper: thanks
[05:25] <thumper> wallyworld_: *I* do an explicit update
[05:25] <thumper> but I don't use the rocketfuel scripts
[05:26] <wallyworld_> thumper: i don't use them either atm. is there anything in the workflow to indicate when one should update the d/l cache or should it be done say once per day?
[05:27] <thumper> wallyworld_: I update the download cache when I pull devel/db-devel;
[05:29] <wallyworld_> thumper: cool, i'll do the same. i was thinking one could also look for changes in the buildout.cfg file or other 3rd party depenency cfg file
[05:50] <lifeless> wgrant: and its pushing
[05:50] <lifeless> wgrant: I'm happy with this now, moving onto polish and integration
[05:57] <jtv> spm: I'm landing an RC fix that's only needed on edge, and only until the rollout.  What can I do to ensure that it hits edge soon?
[05:58] <thumper> jtv: cowboy hat?
[05:58] <spm> jtv: get it landed in the normal manner asap. we're really unkeen on cowboying patches onto edge (aka prod-lite) without an incident report
[05:58] <thumper> jtv: how do I find a product series that has a translation link for the branch?
[05:59] <thumper> jtv: I want to test deletion (on staging ofcourse)
[05:59] <jtv> thumper: Translations takes an interest in 2 branch links on a productseries:
[05:59] <wgrant> lifeless: I'm still a little wary that we're allowing users to shoot themselves in the foot without noticing.
[05:59] <thumper> lifeless: any idea how to record in oopses what other app server threads were doing?
[05:59] <wgrant> lifeless: If launchpadlibrarian.net itself doesn't work, then people know not to use it.
[06:00] <thumper> lifeless: I'm thinking of oopses caused by other long running requests
[06:00] <jtv> thumper: 1: the development branch (it imports files that appear there, but it can also fire off build farm jobs based on changes there)
[06:00] <thumper> lifeless: causing database locks
[06:00] <jtv> thumper: 2: the translations_branch, which is where it can write snapshots of the series' translations.
[06:00] <thumper> jtv: I guess I could just do a staging query now I have the power
[06:00] <jtv> thumper: Finding cases of 2 is easy: WHERE ProductSeries.translations_branch IS NOT NULL
[06:00] <thumper> mwa ha ha
[06:01]  * thumper waits for staging to come back up
[06:01]  * jtv looks up the enum for 1
[06:01] <wgrant> lifeless: What's the benefit of allowing it?
[06:01] <wgrant> lifeless: If you have to craft a URL to test, you might as well craft it to the restricted librarian.
[06:02] <jtv> thumper: you find cases of 1 with WHERE ProductSeries.translations_autoimport_mode <> 1 AND branch IS NOT NULL
[06:03] <jtv> spm: I'm landing in the normal manner asap.  It undoes one line of change from an earlier branch.
[06:14] <thumper> how the hell am I supposed to submit someone else's work to pqm without a local copy of it?
[06:16] <wallyworld_> lifeless: bug 631010? any eta on a fix to allow lp tests to run again? i've upgraded to maverick and can't run any tests
[06:16] <_mup_> Bug #631010: ProgrammingError: operator does not exist: text = bytea <database> <maverick> <storm> <Launchpad Foundations:New> <https://launchpad.net/bugs/631010>
[06:22] <thumper> phew
[06:22] <thumper> finally
[06:31] <wgrant> wallyworld_: You could try downgrading to Lucid's python-psycopg2
[06:32] <wallyworld_> wgrant: thanks, i'll give that a try. been having a few issues with maverick and kde :-(
[06:32] <wgrant> I should upgrade this week.
[06:32] <wgrant> I normally upgrade around alpha 1...
[06:32] <wallyworld_> you running kde or gnome?
[06:33] <wgrant> GNOME.
[06:33] <wallyworld_> they skipped the alpha this time i think
[06:33] <wgrant> No, there were the usual alphas.
[06:33] <wallyworld_> i mean they skipped the last one?
[06:33] <wallyworld_> went to beta early
[06:33]  * thumper off to get munchkins
[06:34] <wallyworld_> i really hope kde gets fixed with maverick. me and gnome don't get on very well :-(
[06:40] <lifeless> hi
[06:40] <lifeless> wgrant: I do craft the right url
[06:40] <wallyworld_> wgrant: thanks, downgrading to lucid's psycopg2 fixes it for now
[06:40] <wgrant> wallyworld_: Great.
[06:40] <lifeless> wgrant: the benefit is that we don't need wildcard dns on developers machines with https certs and -all that stack-
[06:41] <wgrant> lifeless: We can't make the dev config use restricted librarian URLs?
[06:41] <lifeless> wallyworld_: no info on the eta for it
[06:41] <lifeless> wgrant: restricted librarian urls are different
[06:41] <wgrant> Or at least only activate the tokens-on-launchpadlibrarian.net mode in the dev config?
[06:41] <wallyworld_> lifeless: as per wgrant suggestion i downgraded to lucid's copy and it seems to be ok for now. thanks
[06:42] <lifeless> wgrant: we do want to delete the proxy code
[06:42] <lifeless> wgrant: so that isn't a tenable goal; as a stop gap maybe, but I don't see that its better or worse
[06:42] <lifeless> wgrant: as for people foot-bulleting themselves; who are you thinking of ?
[06:43] <wgrant> lifeless: What's not a tenable goal?
[06:43] <lifeless> having the dev encironment run the current mode
[06:43] <wgrant> Not the current mode.
[06:43] <wgrant> Either linking directly to the restricted librarian (which is presumably accessible from localhost...), or having the local public librarian accept tokens on its primary name.
[06:44] <lifeless> wgrant: the latter is what I've implemented
[06:44] <wgrant> But only in dev mode.
[06:44] <wgrant> Not on launchpadlibrarian.net.
[06:44] <lifeless> I can add an if to turn it off, but I don't understand why
[06:44] <wgrant> I can't think of a good reason to allow it.
[06:45] <wgrant> And if there's not a good reason to allow access to private data through a second mechanism, can we please not do it?
[06:45] <wgrant> Same-origin is a useful safetynet.
[06:45] <lifeless> right ...
[06:46] <wgrant> We probably want private files to be protected by it.
[06:46] <lifeless> you're not joining the dots here; its the same mechanism
[06:46] <wgrant> So let's not introduce a way to work around it.
[06:46] <wgrant> Even if we can't immediately think of any attacks.
[06:46] <lifeless> we'll want the apache front ends to be filtering tokens on http anyway
[06:46] <lifeless> no harder to have them filter on subdomains too
[06:46] <wgrant> Hm?
[06:46] <lifeless> but adding another conditional in that code adds to the complexity there for no good reason
[06:47] <lifeless> is anyone looking at the testfix ?
[06:48] <wgrant> It's a workaround that's only require for dev installations. It is probably a single line of code to restrict it to that context, and it means that private content is forced to live in its own domain. That has to be a good thing.
[06:48] <wgrant> Despite the slight increase in complexity.
[06:48] <lifeless> so lets not add the workaround
[06:49] <lifeless> that seems simpler to me
[06:49] <wgrant> Then dev systems break.
[06:50] <lifeless> The vector you are talking about is 'public content happens to know the url and token for some private content'...
[06:50] <lifeless> they can just damn well load that directly
[06:51] <wgrant> Or someone notices that omitting the 'i3532523.' works.
[06:51] <wgrant> They proceed to do that.
[06:51] <wgrant> There's nothing telling them that it's dangerous.
[06:51] <lifeless> I just said above the frontends can enforce that if we want
[06:51] <lifeless> trivially
[06:51] <wgrant> Ah, I see.
[06:51] <wgrant> I guess.
[06:52] <lifeless> we have to have the front ends enforce httpS anyway
[06:52] <lifeless> because if someone is going to disclose a private file it shouldn't be us. :)
[06:52] <wgrant> True.
[06:52] <wgrant> OK, well, as long as they're changed to do that, my objection is retracted.
[06:52] <wgrant> And the plan seems good.
[06:53] <lifeless> wgrant: folk can't omit the ix bit anyway
[06:53] <wgrant> Why not?
[06:53] <lifeless> wgrant: because the only way they get urls to use is via the appserver proxy service.
[06:53] <wgrant> True.
[06:53] <lifeless> they can, for the short period a token is active, manually edit and change the url
[06:53] <lifeless> but honestly, really?
[06:53] <wgrant> What is the limit?
[06:53] <wgrant> An hour?
[06:53] <wgrant> I forget.
[06:54] <lifeless> current code says 1 day
[06:54] <wgrant> I think paranoia is appropriate here.
[06:54] <lifeless> thats arbitrary; no reasoning at all has gone into it.
[06:54] <wgrant> But OK.
[06:54] <lifeless> I wanted it to be longer than an ISO download to south africa
[06:55] <wgrant> Right.
[07:01]  * wallyworld_ off to pick up Martin Pool form the airport
[08:44] <adeuring> good morning
[08:57] <jtv> hi adeuring
[08:58] <jtv> adeuring: to start the week off with a happy note, most buildbots are broken and even though you're probably innocent, you're on the blamelist.  :)
[08:58] <jtv> Me, I suspect lifeless of landing python2.6 code before all buildbots support it
[09:01] <adeuring> jtv: thanks for the heads uo ;)
[09:08] <lifeless> adeuring: hi
[09:08] <adeuring> hi lifeless
[09:08] <lifeless> jtv: I'm -very- sure I haven't landed any 2.6 only code
[09:09] <lifeless> jtv: for starters ec2 is still 2.5, and I run through ec2 religiously
[09:09] <jtv> lifeless: ok, just trying to get your attention for this problem.  :)
[09:09] <lifeless> adeuring: lp:~lifeless/launchpad/private-librarian is much closer to being done
[09:09] <jtv> A lot of the failures are 2.6-isms afaict
[09:10] <adeuring> lifeless: yes, I've seen your mp
[09:10] <lifeless> adeuring: I mean 2 minutes ago :P I just pushed up more
[09:10] <adeuring> lifeless: sounds great!
[09:10] <lifeless> jtv: mmmm, not the best way to get my attention.
[09:11] <lifeless> jtv: anyhow, I can se eyou and noodles775 looking into it; you're both good folk, I'm sure it will come good rapidly.
[09:11] <jtv> lifeless: I wouldn't trust me—I merely spotted a 2.6-ism in propertycache breaking one  of the builds
[09:12] <jtv> I wouldn't have any idea how to fix it
[09:12] <lifeless> what line is it ?
[09:12] <jtv> lifeless: it'll take me a moment to look that up, but it was a "next(counter)"
[09:13] <lifeless> in devel ?
[09:13] <jtv> lifeless: yes, it was in the "lp" buildbot log
[09:15] <noodles775> jtv: I thought its only the lucid_db_lp buildbot that is critical (and it's a different error)
[09:15] <noodles775> (by critical, I mean, stopping landings)
[09:16] <jtv> noodles775: I just found I'm well out-of-date with what breaks what; we don't have hardy servers any more then?
[09:16] <lifeless> noodles775: any of the main buildbots failing is textfix
[09:16] <lifeless> we have hardy servers
[09:16] <lifeless> if 'lp' breaks textfix is triggered
[09:16] <noodles775> OK. I'm looking at the lucid_db_lp one then.
[09:18] <wgrant> lifeless: The hostname restriction code is... um... But apart from that, your branch looks fine to me now.
[09:18] <lifeless> oh, in the doctest
[09:18] <lifeless> wgrant: buggy?
[09:19] <wgrant> lifeless: Ugly.
[09:19] <lifeless> wgrant: patches considered
[09:19] <wgrant> I'm not sure there's a better way.
[09:19] <wgrant> I just know that this way is ugly :)
[09:22] <wgrant> lifeless: Why use a regex rather than just "hostname == 'i%d.restricted.%s' % (self.aliasID, netloc)"?
[09:23] <lifeless> wgrant: good point, I should
[09:23] <lifeless> I realised half way through that there was an attack
[09:23] <lifeless> I was going to be fairly relaxed on the check
[09:23] <lifeless> but
[09:24] <wgrant> I also don't really get what you're doing with netloc and :. Shouldn't urlparse already do that?
[09:24] <lifeless> i1234.restricted.iunderattack.restricted.launchpad-librarian.net would be bad
[09:24] <wgrant> Oh, port.
[09:24] <wgrant> Right.
[09:25] <wgrant> lifeless: I wouldn't mind a comment saying that you're stripping off the port there.
[09:25] <lifeless> jtv: so, if next(iterator) advances it
[09:25] <wgrant> It's not blindingly obvious.
[09:25] <wgrant> Or maybe I'm just tired.
[09:25] <lifeless> wgrant: I'll add one
[09:25] <wgrant> Thanks.
[09:25] <jtv> lifeless: python2.5 doesn't seem to know about "next"
[09:26] <wgrant> Now the only bad bit is the '.restricted.' check, but there's not much that can be done about that yet.
[09:27] <lifeless> jtv: next(i) == i.next()
[09:28] <lifeless> jtv: whats the next glitch
[09:28] <lifeless> jtv: the difference is the ability to say "next(i, 42)"
[09:28] <jtv> lifeless:     NameError: global name 'next' is not defined
[09:28] <jtv> the line was         return next(counter)
[09:28] <lifeless> right
[09:29] <lifeless> replace it with counter.next()
[09:30] <jtv> But it's only one of so many failures that I'm now trying to figure out what bigger picture I'm missing.  I'm told the failures on lp are not the cause of the testfix mode; lucid_lp_db is broken with a different failure.
[09:30] <lifeless> its 8:30 herel; I need to go remind my wife what I look at.
[09:31] <lifeless> s/at/like/
[09:31] <lifeless> I'm sure there will be multiple failures
[09:31] <lifeless> I'm also sure that lp failing will cause testfix, because bb is watching *both*
[09:31] <lifeless> (or all 6 actually, anyhow)
[09:32] <lifeless> jtv: I suggest you, or someone you get to agree to it, puts together a branch to fix all the known devel issues; send that to devel (labelled testfix). then look at db-devel.
[09:32] <jtv> lifeless: noodles775 has been looking at db-devel already, and I've been trying to do the other thing for the past half day
[09:33] <jtv> so yeah, I agree with the approach basically :)
[09:33] <jtv> Go show your face to your family.  :)
[09:39] <lifeless> jtv: doing so; at least to the extent of not staring balefully at the laptop
[09:39]  * jtv would reply if it weren't for the expected consequence of a certain person staring at laptop even longer
[09:47] <lifeless> adeuring: stub is reviewing the branch, but I don't know if all tests pass (the ones for code I've been directly changing do, of course)
[09:48] <lifeless> adeuring: so perhaps you'd like to : throw it at ec2; start dealing with any fallout it has, and I'll finish up tomorrow with whatever you push (as long as you tell me what you've pushed ;P)
[09:48] <adeuring> lifeless: ok, sounds good
[09:49] <lifeless> adeuring: it is in principle feature complete though
[09:49] <adeuring> ok
[09:49] <lifeless> all cleanups etc deferred until we have successfully migrated
[09:49] <lifeless> gmb: there is another release-critical thing up for you
[09:50] <gmb> lifeless, Go ahead
[09:50] <lifeless> gmb: its in your mail already
[09:50] <lifeless> gmb: this is high risk high reward
[09:50]  * gmb looks at his mail client
[09:50]  * gmb sees a greyed-out window
[09:50] <gmb> Hmm.
[09:50]  * gmb switches to the web interface
[09:59] <gmb> lifeless, Okay, I get - and like - the rewards. Spell out the risks for me since I'm under-caffeined this morning.
[09:59] <lifeless> gmb: its a change to a fairly magical part of the system
[09:59] <lifeless> anything could happen
[09:59] <lifeless> we'll be able to QA that privat ebugs are no -worse- on staging
[10:00] <lifeless> it will be hard to check the new stuff until we get the certs (but not impossible)
[10:01] <gmb> lifeless, Do we have any alternative solutions that will fix the private attachment problem and will be ready before tomorrow evening (UK time)?
[10:01] <lifeless> in principle the private attachment problem is fixed for a limited time via the firewall hole (though I may be out of date)
[10:01] <gmb> adeuring, Can you confirm lifeless's statement ^^?
[10:02] <lifeless> but we need to deliver a permanent fix fairly promptly
[10:02] <adeuring> gmb: yes, the temporary fix for the retracers works
[10:02] <adeuring> but we should get rid of it quite soon
[10:03] <gmb> lifeless, I agree. Since adeuring's temporary fix is in place, go ahead and land yours - I'm less worried about backing it out if we've got a kludge for the problem already.
[10:03] <gmb> I'll update the merge proposal.
[10:03] <wgrant> And this solution is the first one that doesn't make me cry :)
[10:03] <gmb> Well, naturally we wouldn't want that.
[10:04] <wgrant> I mean, it is actually a good, effective long-term solution this time. Which is really nice.
[10:04] <gmb> Agreed.
[10:05] <lifeless> well, we can hope.
[10:07] <wgrant> Well, there are no obvious fatal flaws in this one.
[10:07] <wgrant> So I think it should be good.
[10:08] <wgrant> bigjools: Is there a known issue with non-virt dispatches failing sometimes?
[10:08] <wgrant> bigjools: I had some complaints about amd64 distro builders doing it this morning.
[10:08] <wgrant> (the build restarted a few times)
[10:09] <bigjools> have you got examples so I can check the log?
[10:10] <wgrant> bigjools: kdeedu on amd64, reported a little under 10 hours ago.
[10:10] <wgrant> Not sure when it actually happened, though.
[10:10] <wgrant> Looks like it was not long before it was reported.
[10:10] <wgrant> So look around 10 hours ago.
[10:11] <bigjools> it looks like temporary network brownouts
[10:12] <bigjools> or at least a lack of response from builders
[10:13] <wgrant> Hmm.
[10:13] <bigjools> ah I see what it is
[10:13] <bigjools> slow reset
[10:13] <wgrant> But it's non-virt.
[10:13] <wgrant> There is no reset.
[10:13] <bigjools> good point
[10:14] <wgrant> Otherwise, yes, slow reset is the obvious thing.
[10:14] <bigjools> well, it's still timing out anyway
[10:14] <bigjools> the log has "User timeout caused connection failure."
[10:14] <bigjools> followed by "reset failure"
[10:14] <bigjools> which is odd
[10:14] <bigjools> (for crested)
[10:14] <wgrant> Er.
[10:14] <wgrant> WTF?
[10:15] <bigjools> I'd say that it's all working fine from a software PoV, it does what it's supposed to under the conditions
[10:15] <bigjools> that log message is a little odd though
[10:15] <wgrant> Except that it shouldn't actually be trying to reset a non-virt builder.
[10:15] <wgrant> Or is the log lying?
[10:16] <bigjools> I would not worry too much
[10:16] <bigjools> I've completely re-written the failure handling for the next release
[10:17] <wgrant> Yeah, I saw that.
[10:17] <wgrant> Looks good.
[10:17] <bigjools> it won't catch build failures, just dispatch failures
[10:17] <bigjools> we need to add that feature at some point to stop bad jobs jumping around builders
[10:19] <bigjools> my blood levels are dangerously high in my caffeine stream, I need to go fix that
[10:20] <noodles775> voidspace: re. your oops - one of the LP registry team will be interested to look at fixing the bug, but for what its worth, the exception is being raised while it is trying to tell you that the email address is already in use.
[10:20] <wgrant> For which we have ShipIt to blame :(
[10:21] <wgrant> I suspect.
[10:21] <noodles775> wgrant: Ah, does it create email addresses where email.person is None?
[10:21] <wgrant> noodles775: ShipIt does Accounts, not Persons.
[10:21] <wgrant> So, yes :(
[10:22] <wgrant> And for no very good reason it still haunts the same DB as LP.
[10:46] <lifeless> adeuring: bunch of incremental changes from stuarts review pushed now; I expected librarian.txt to fail, fixing that now.
[10:46] <lifeless> adeuring: anything other than that failing is unexpected (but sadly probably predicatable :P)
[10:46] <adeuring> lifeless: ok, I'll see what ec2 will tell us ;)
[11:01] <lifeless> night y'all
[11:02] <wgrant> Night.
[12:22] <thumper> gmb: sha'ping
[12:23] <thumper> gmb: I'm aware of the outstanding QA for code
[12:23] <thumper> gmb: but staging has been futzed all day
[12:23] <thumper> gmb: so I'll be looking at it tomorrow morning
[12:23] <thumper> gmb: just letting you know
[12:24] <gmb> thumper, Yep, I expected as much. No worries, and thanks for the update.
[12:24]  * thumper -> cuppa
[12:24] <thumper> gmb: np
[12:33] <bigjools> wgrant: got a sec to help work out WTF is going on with bug 629835
[12:33] <_mup_> Bug #629835: cannot delete 'linux' from the ubuntu-security-proposed ppa <oops> <Soyuz:New> <https://launchpad.net/bugs/629835>
[12:44] <wgrant> bigjools: Sure.
[12:44] <wgrant> Is that the delayed copy one?
[12:44] <wgrant> Yes.
[12:45] <wgrant> bigjools: You need to look for the initial error.
[12:45] <wgrant> It's possible that that's it.
[12:45] <wgrant> But I doubt it.
[12:45] <bigjools> wgrant: it's allowed a 2nd copy of the same thing somewhere, somehow
[12:45] <bigjools> which is scary
[12:46] <wgrant> bigjools: Or it's just continuing to process the same copy, because it failed somehow the first time.
[12:46] <wgrant> This has happened once before.
[12:46] <wgrant> We couldn't work out what it was.
[12:46] <bigjools> hmmm
[12:46] <bigjools> he's done a pocket copy in the same archive
[12:46] <wgrant> I wish we tracked where they were copied from.
[12:47] <wgrant> Can you see when that delayed copy was initially processed?
[12:47] <bigjools> I wonder if the delayed copy checks are broken
[12:47] <bigjools> I can grep logs
[12:47] <wgrant> There is a race in the delayed copy mechanism.
[12:47] <wgrant> But it's pretty unlikely.
[12:49] <bigjools> logs don't go back far enough :/
[12:49] <bigjools> it's been doing this since 1st Sep
[12:49] <wgrant> Er.
[12:49] <wgrant> What?
[12:49] <bigjools> at least
[12:49] <wgrant> Yeah, it's been doing it since the 26th.
[12:49] <bigjools> we only keep 5 days of publisher logs
[12:49] <wgrant> ...
[12:49] <wgrant> 26th, 23:01Z
[12:49] <wgrant> Er, 25th, 23:01Z.
[12:50] <wgrant> Why so short?
[12:50] <bigjools> that's the default log rotation
[12:50] <wgrant> Baaah.
[12:50] <wgrant> Can you check for any other delayed copies of that source?
[12:51] <bigjools> sigh, loads of PPA pub;isher OOPSes getting logged but not reported
[12:51] <bigjools> PoolFileOverwriteError
[12:51] <wgrant> Pool file overwrites, mostly, I guess?
[12:51] <wgrant> Yeah.
[12:51] <bigjools> the same file, over and over
[12:51]  * bigjools wonders how that can happen
[12:52] <wgrant> I used to know.
[12:53]  * wgrant hunts.
[12:53] <bigjools> so, the inconsistent state error first happened 2010-08-25T23:05:39.797695+00:00
[12:53] <bigjools> which tallies
[12:54] <wgrant> Ah, you have oopses?
[12:54] <bigjools> yes
[12:54] <wgrant> Awesome.
[12:54] <bigjools> lot sof 'em
[12:54] <wgrant> So, that's interesting.
[12:54] <bigjools> all the same as the one I put in the bug
[12:55] <wgrant> There's nothing four minutes earlier?
[12:55] <wgrant> That's when it was processed first.
[12:55] <wgrant> I expect a different error.
[12:56]  * bigjools hunts
[12:57] <wgrant> bigjools: Are any of the PoolFileOverwriteErrors for files published after May?
[12:57] <wgrant> Before then the copy checker didn't actually check for contents conflicts.
[12:58] <bigjools> hard to tell
[12:58] <wgrant> (devel r10701)
[12:58] <wgrant> We can query for that reasonably easily.
[12:59] <wgrant> Find Pending PPA publications older than not very old.
[13:01] <bigjools> there are two
[13:01] <bigjools> 2010-07-27 and 2010-08-25
[13:02] <wgrant> bigjools: Actually, can you tell (from the repetitve PFOE oopses?) if 23:01 was the same publisher run as 23:05?
[13:02] <bigjools> and 9 more from 2010-09-03
[13:02] <wgrant> Hm...
[13:02] <wgrant> These are Pending, in undisabled PPAs?
[13:02] <wgrant> Er, undisabled, and with the published flag set, too.
[13:02] <bigjools> ACCEPTED, not sure about the status, hang on
[13:02] <wgrant> Oh, these are the delayed copies of that source?
[13:03] <bigjools> all enabled apart from the 07-27 one
[13:03] <wgrant> Are these the delayed copy PUs?
[13:04] <wgrant> If so, that's an awful lot.
[13:05] <bigjools> 2 of them are delayed
[13:05] <bigjools> one is the disabled PPA one is u-s-p
[13:06] <bigjools> dated 2010-08-25
[13:06]  * bigjools scratches head
[13:07] <wgrant> Can you tell by the oopses if the 23:00 publisher finished in time, making the 23:05 OOPS the second run?
[13:08] <bigjools> I can't tell
[13:09] <wgrant> Not even by looking for the repeated PFOEs?
[13:14] <bigjools> oh I am looking at staging which is out of date, which explains the load of 09-03
[13:16] <wgrant> Ah.
[13:20] <bigjools> I'm going to set that upload to rejected
[13:20] <bigjools> to remove this OOPS
[13:21] <wgrant> Sounds reasonable.
[13:22] <wgrant> Can you also increase log retention, or make the OOPSes less easy to ignore?
[13:22] <bigjools> I'm working on the latter
[13:22] <bigjools> the former is a good idea
[13:23] <bigjools> I'd love to know how this happens though :/
[13:23] <wgrant> We'll find out next time :)
[13:23] <bigjools> Jamie said he copied it from their private PPA in hard to maverick in the public one
[13:23] <wgrant> I don't think I've tried a cross-series delayed copy.
[13:23] <bigjools> I wonder if the change in series has anything to do with it
[13:24] <bigjools> and then he deleted it
[13:24] <bigjools> could have been before or after it was published in the new archive
[13:24] <wgrant> You can't tell?
[13:31] <bigjools> wgrant: http://pastebin.ubuntu.com/489219/
[13:31] <bigjools> ummm interesting
[13:32] <wgrant> The one that was never published is a little odd.
[13:32] <wgrant> I noticed it before, but thought little of it.
[13:33] <wgrant> However, there's something really odd there.
[13:33] <bigjools> that's the most recent one
[13:33] <wgrant> No, I mean the second one in that list.
[13:33] <bigjools> oh right, missed that
[13:33] <wgrant> Looks like he deleted it 25 seconds after it was published.
[13:33] <wgrant> Which seems unlikely.
[13:34] <wgrant> However, that's still well after everything initially broke.
[13:35] <bigjools> deleting 25 seconds after sounds reasonable to me
[13:35] <wgrant> After an out-of-band delayed copy?
[13:35] <bigjools> it's not delayed at that point
[13:35] <wgrant> It isn't?
[13:36] <bigjools> the first one will make it instant for future copies
[13:36] <wgrant> Wasn't that publication just created by the retry?
[13:36] <wgrant> Yes.
[13:36] <wgrant> But AIUI p-a is doing the copying.
[13:36] <bigjools> retry?
[13:36] <bigjools> only for delayed copies
[13:36] <bigjools> aegh, I wonder if this is the "hit copy twice" bug
[13:37] <bigjools> I bet it is
[13:37] <wgrant> There'd be a DONE PU in that case.
[13:37] <wgrant> There appears to not be.
[13:38] <wgrant> Yeah.
[13:38] <wgrant> That source has only ever been in Maverick in that PPA.
[13:38] <wgrant> And there's only one delayed copy to Maverick.
[13:38] <wgrant> And it's the Accepted one.
[13:38] <wgrant> 2069696
[13:40] <wgrant> So.
[13:40] <wgrant> The bug suggests that the delayed copy keeps being reprocessed.
[13:40] <wgrant> But the OOPS suggests that it's failing.
[13:41] <wgrant> I wonder if it stops OOPSing when the source is deleted.
[13:42] <wgrant> Oh look, yes it does.
[13:42] <wgrant> So there will be some publisher runs where it doesn't OOPS.
[13:42] <wgrant> That's immediately after jdstrand deleted the last publication.
[13:42] <wgrant> So the next p-a reprocesses it successfully, creating new publications.
[13:42] <wgrant> But somehow still fails to set the status.
[13:43] <wgrant> Maybe custom uploads.
[13:43] <wgrant> There must be more OOPSes there that we are missing.
[13:43] <wgrant> If you can't see them, delete it again and watch the next p-a run.
[13:43] <wgrant> Something has to show up.
[13:43] <bigjools> yeah
[13:43] <bigjools> I'll grab jamie later, I can't delete it
[13:44]  * bigjools is desperate for food now
[13:44] <wgrant> bigjools: The last publication is from the third.
[13:44] <wgrant> 00:12Z
[13:44] <wgrant> You should have logs for then.
[13:44] <wgrant> The copy was processed, but not set to Done.
[13:44] <bigjools> are you querying on the API or using my sql output?
[13:45] <wgrant> API and web UI.
[13:45] <bigjools> k
[13:46] <bigjools> wgrant: so it started publishing it again at 2010-09-03 00:00:45
[13:47] <bigjools> the publisher run at 00:10 has no errors
[13:47] <bigjools> the next one goes bang
[13:47] <wgrant> Did 00:10 actually run at all?
[13:47] <bigjools> yes
[13:48] <wgrant> Or was it skipped because 00:00 overran twice?
[13:48] <wgrant> Hmm.
[13:48] <bigjools> oh wait
[13:48] <bigjools> no
[13:48] <bigjools> damn this log file
[13:49] <bigjools> the next run was at 00:15 and it failed on the same queue item
[13:49] <bigjools> 2010-09-03 00:00:45 DEBUG   Publishing source linux/2.6.24-28.77 to ubuntu/maver
[13:50] <bigjools> so it ignored the bad PU for that run only, it fails for the run before and after
[13:50] <wgrant> bigjools: What's the extent of its logging of the 00:00 publication?
[13:50] <wgrant> Any errors?
[13:50] <wgrant> Any success?
[13:50] <wgrant> I expect to see something there.
[13:51] <wgrant> Preferably an error.
[13:51] <bigjools> argh
[13:51] <bigjools> I didn't scroll down enough
[13:51] <wgrant> Heh.
[13:51] <bigjools> so it published it for ubuntu/maverick/amd64
[13:51] <bigjools> then failed for that queue item again
[13:53] <wgrant> Hmmm.
[13:53] <wgrant> I wonder.
[13:53] <wgrant> I wonder.....
[13:55] <wgrant> bigjools: There's nothing about lpia?
[13:55] <wgrant> Publishing that build could blow everything up.
[13:55] <wgrant> Badly.
[13:55] <bigjools> nup
[13:56] <wgrant> Or hppa?
[13:56] <wgrant> I expect a NotFoundError when publishing either of those.
[13:57] <wgrant> Ahem.
[13:57] <wgrant> That would explain why amd64 is the only thing published there, if it's doing it alphabetically, which is plausible.
[13:57] <bigjools> yep
[13:58] <bigjools> then bails out with an exception
[13:58] <wgrant> Can you see anything like that in the log?
[13:59] <bigjools> like what?
[13:59] <bigjools> NFE?
[14:00] <bigjools> it does amd64, then bails out on the dodgy PU
[14:00] <wgrant> Well, I want to see where it tries to accept the hppa build.
[14:00] <bigjools> it doesn't get that far
[14:00] <wgrant> Huh.
[14:00] <wgrant> Is there only one PUB on the PU?
[14:03] <wgrant> Hm, no.
[14:03] <wgrant> It has all of them, AFAICT.
[14:03] <wgrant> Including hppa, which is probably killing it.
[14:03] <wgrant> But it should be logged!
[14:03] <wgrant> DAmmit.
[14:03] <bigjools> oO
[14:04] <bigjools> huh?
[14:04] <wgrant> So. The 00:00 publisher. Can you show me the lines between where it starts publishing the source, and where it starts publishing the next item?
[14:05] <wgrant> I'm now reasonably sure that it's dying due to an NFE when attempting to realise the hppa build.
[14:05] <wgrant> But it should be logging that somewhere.
[14:07] <bigjools> this is your lot: http://pastebin.ubuntu.com/489241/
[14:07]  * bigjools has to run for 30 mins
[14:08] <wgrant> What's OOPS-1707PPA1?
[14:08] <wgrant> I doubt it's the usual.
[14:08] <wgrant> OK.
[14:08]  * bigjools checks quickly
[14:09] <bigjools> WTF
[14:09] <bigjools> FatalUploadError: Signing key C51BC47D4C80ED00E96B4DC1839AEABCCF5C0A1F not registered in launchpad.
[14:09] <wgrant> uH.
[14:09] <wgrant> I doubt it, really.
[14:10] <wgrant> Must be a different location.
[14:10] <bigjools> that's can't be right
[14:10] <wgrant> It's not.
[14:10] <wgrant> They're throwing oopses in different directories.
[14:10] <wgrant> So we have conflicts.
[14:10] <wgrant> Yayyyyy.
[14:11] <bigjools> the reporting tool is crack
[14:11] <bigjools> the real error is:
[14:11] <bigjools> NotFoundError: u'Unknown architecture hppa for ubuntu maverick'
[14:11] <bigjools> :)
[14:11] <wgrant> AHA
[14:11] <wgrant> As expected.
[14:11] <wgrant> Phew.
[14:11] <bigjools> so, we've copied builds that can't be published
[14:11] <wgrant> Exactly.
[14:11]  * bigjools high-5s wgrant
[14:11] <wgrant> So it was the cross-seriesness.
[14:12] <wgrant> But with added complexity.
[14:13] <bigjools> thanks for helping wgrant
[14:16] <wgrant> I am glad we no longer have a copy corruption mystery.
[15:11] <wgrant> bigjools: Just to be sure, can you confirm that the NotFoundError also occurs at the time of the initial publication?
[15:11] <wgrant> Would be nice to be absolutely positive that it was the root.
[15:11] <bigjools> ok
[16:14] <benji> gmb: good afternoon; I noticed that one of my branches was reverted with the commit message of "Revert the merge of the check-in-WADL branch, which was causing the build to break."  I'm looking for more info on the breakage so I can fix it.  Any hints?
[16:18] <stub> benji: Something about breaking the launchpad/stable -> launchpad/db-devel auto merge, because it would always conflict.
[16:19] <gmb> benji, Sure, let me dig out the failure messages. There's also an ML thread about it between lifeless and I.
[16:19] <benji> hmm
[16:19] <gmb> benji, "WADL test will break db-devel merges regularly (was) Re: buildbot failure in Launchpad on lucid_db_lp" is the thread you're looking for.
[16:20] <gmb> benji, And this is the build that failed: https://lpbuildbot.canonical.com/builders/lucid_db_lp/builds/176
[16:21] <benji> gmb: thanks; what list was that thread on?  I don't think I saw it.
[16:21] <gmb> benji, canonical-launchpad.
[16:21] <gmb> benji, I'll find the thread in the archive for you.
[16:21] <benji> much appreciated
[16:22] <gmb> benji, https://lists.ubuntu.com/mailman/private/canonical-launchpad/2010-September/060017.html is the top message in the thread; everything else is under that.
[16:22] <benji> thanks
[19:53] <lifeless> adeuring1: hi
[19:54] <lifeless> benji: hi
[19:54] <benji> hello there
[19:58] <adeuring1> hi lifeless
[20:05] <lifeless> adeuring1: how did it go?
[20:05] <lifeless> benji: see the thread, I think we covered all the salient points there
[20:05] <adeuring1> there were basically two failures, one in test_db
[20:05] <adeuring1> (fixed)
[20:05] <lifeless> benji: there was one extra thing not really covered which is that as a way of prevented api incompatabilities, I think it will need to be ----way---- easier.
[20:05] <lifeless> adeuring1: \o/
[20:06] <lifeless> adeuring1: so its landed?
[20:06] <adeuring1> lifeless: no, I could not fiugre out how to fix the other one, in lib/canonical/launchpad/ftests/../doc/librarian.txt
[20:06] <lifeless> gmb: up still ?
[20:06] <adeuring1> lifeless: AttributeError: 'thread._local' object has no attribute 'features'
[20:06] <lifeless> adeuring1: ok, I'll fix the other. where is your branch with the db fix ?
[20:07] <lifeless> adeuring1: that will be because a test has no request object live but is using a view.
[20:07] <benji> lifeless: as soon as I get access to the list archives, I'm sure your comments will make total sense :)
[20:07] <adeuring1> lifeless: lp:~adeuring/launchpad/private-librarian
[20:07] <lifeless> benji: ah -
[20:07] <lifeless> benji: ok so
[20:08] <adeuring1> lifeless: I also moved some ZCML stuff forSafeStreamOrRedirectLibraryFileAliasView to l/c/l/zcmnl/librarian.zcml
[20:08] <lifeless> adeuring1: great stuff
[20:11] <gmb> lifeless: Yes, I'm still around.
[20:11] <lifeless> rc stampy stampy needed
[20:11] <lifeless> https://code.edge.launchpad.net/~gary/launchpad/bug627442/+merge/34701
[20:11] <gmb> lifeless: Ah, yes. Looking at that now.
[20:11] <lifeless> thanks
[20:11] <gmb> gary_poster, lifeless : rc=me
[20:12] <gary_poster> thank you gmb
[20:12] <gmb> np
[20:12] <lifeless> gary_poster: the new librarian stuff should land today
[20:12] <gary_poster> awesome!
[20:13] <lifeless> gary_poster: stub did a full review, there were two test failures abel found overnight, and he fixed one, I'm fixing the other now.
[20:13] <gary_poster> fanstastic
[20:13] <lifeless> I'll queue up the RT tickets later and get flumper to hi-prio them
[20:15] <gary_poster> :-) k
[20:15] <lifeless> gary_poster: also the memcache/email/librarian client stuff seems happy
[20:16] <gary_poster> I didn't see that, but from the list I'm guessing that's timeout related?
[20:16] <lifeless> logging those actions in OOPS
[20:16] <lifeless> as part of the request timeline
[20:16] <gary_poster> ah ok
[20:16] <gary_poster> yeah I did see that actually
[20:17] <gary_poster> cool
[20:17] <lifeless> oopstools are getting a -little- confused on some of them
[20:17] <lifeless> so we've got some fine tuning to do
[20:18] <lifeless> but nothing major, probably just a stray \n ors something
[20:18] <gary_poster> k
[20:18] <lifeless> hah
[20:18] <lifeless> and the 99999 thing
[20:18] <lifeless> overflows oopstools :P
[20:18] <gary_poster> :-(
[20:19] <lifeless> ttps://lp-oops.canonical.com/oops.py/?oopsid=OOPS-1710EC1430
[20:19] <lifeless> I'll change it to 0
[20:19] <lifeless> actually
[20:19] <lifeless> I'll change it to consider it finished 'now' for that code path.
[20:19] <lifeless> if you look in that oops, its all sensible
[20:20] <lifeless> the last one is (6x9)ms as expected
[20:20] <gary_poster> hee hee, I do like the overflow
[20:20]  * gary_poster hasn't had lunch yet, and is starting to feel a bit...out-of-body
[20:20] <lifeless> shoo
[20:20]  * gary_poster should rectify that
[20:20] <gary_poster> biab
[20:35]  * gmb -> afk for a while; will check back periodically
[20:43] <lifeless> man, this race condition on librarian startup is really annoying
[21:06] <jelmer> lifeless: TacException: Unable to start /home/jelmer/lp/daemons/librarian.tac. Content of /var/tmp/librarian.log: ?
[21:06] <lifeless> jelmer: ?
[21:07] <jelmer> lifeless: Is that one of the symptoms of that race condition?
[21:07] <lifeless> no, I don't think so
[21:07] <lifeless> running just a librarian using test and having them deadlock is
[21:07] <lifeless> the client has sent the reqest, the librarian is still reading it
[21:07] <lifeless> or something
[21:08] <jelmer> ok, that's different from what I've seen that
[21:08] <jelmer> *then
[21:09] <lifeless> it may be
[21:10] <lifeless> I've seen that too, but very rarely
[21:10] <jelmer> it's quite rare here as well, I'd say about once every two dozen test runs
[21:26] <lifeless> \o/
[21:26] <lifeless> now I just need an incremental review
[21:33] <lifeless> hi poolie
[21:33] <lifeless> I imagine you're running out how; will you be working today?
[21:37] <thumper> gmb: ping when you are ready
[21:38] <gmb> Hi thumper; Do you want to have a call or just a chat?
[21:38] <thumper> gmb: call is fine
[21:38] <thumper> my skype is running
[21:38] <thumper> mumble doesn't like nz
[21:38] <lifeless> those things used to be synonyms
[21:38] <gmb> thumper: Okay, give me a minute to shift locations.
[21:38] <thumper> lifeless: there is a different local isp here that offers SDSL
[21:39] <thumper> lifeless: who doesn't get affected by telecom's shaping
[21:39] <lifeless> thumper: ooo
[21:39] <lifeless> thumper: who?
[21:39] <thumper> lifeless: I'm wanting to try mumble with it
[21:39] <thumper> wicked networks
[21:39] <lifeless> I signed up for a year with telecom, just to get settled, its tolerable
[21:39] <thumper> I know an office that has it all set up
[21:39] <thumper> they offered for me to go and try it out
[21:39] <lifeless> but yeah, I would totally pay as much as I was paying in .au for /actual internet/
[21:40] <thumper> it is only 2 meg up/down
[21:40] <thumper> as opposed to 8 down and .5 up that I get now
[21:40] <lifeless> same average :P
[21:40] <thumper> yeah, but I get a lot more down than up
[21:41] <gmb> thumper: Okay, calling now.
[22:12] <mwhudson> my up/down ratio at home is stupid
[22:13] <mwhudson> 18 meg up, 0.5 down
[22:13] <lifeless> nice
[22:13] <lifeless> I'm getting 4MB here
[22:13] <lifeless> but I suspect its the line quality as much as anything
[22:15] <mwhudson> yeah, i think i'm very close to the nearest cabinet
[22:15] <ajmitch> thumper: we use that isp at work, it seems OK most of the time. sometimes has some poor international connectivity, but generally not too bad
[22:15] <lifeless> mwhudson: there are 5 phone points in the house
[22:18] <lifeless> benji: did you get clarity?
[22:18] <lifeless> benji: or would you like me to expound on the issue here?
[22:18] <benji> not yet; I haven't gotten a response on my subscription request to the list
[22:19] <benji> if you know who to pester, I'd appreciate to know
[22:19] <lifeless> we really should move that to LP
[22:19] <lifeless> uhm, its possibly out of date; #is would be a good place to ask - the gsas.
[22:19] <lifeless> anyhow, let me recap
[22:19] <lifeless> we have two branches
[22:20] <lifeless> db-devel and devel both feeding into a single tree - the 'db-devel buildbot test tree'
[22:20] <lifeless> if both branches have had API changes made directly on them, then every single time that devel change the API, the merge to db-devel will alter - correctly - the WADL, but the test will fail.
[22:21] <lifeless> as gmb and I understand the goal of the test to be fixing the apidoc issue primarily, we rolled it back as the simplest way to unblock the release.
[22:22] <lifeless> benji: but there are a couple of extra quirks to bear in mind when retackling it
[22:22] <lifeless>  - as a way of checking for API regressions, examining a big bytestring is very human intensive. I fully expect the human response to be 'oh, I need to run X and commit it' - that is, it won't prevent incompatible changes at all.
[22:26] <lifeless>  - its generally a bad idea to check in the output of build processes; VCS's can interact badly with that, and its wasteful to store the duplicate/derived data in the VCS.
[22:26] <benji> I didn't intend it as a way of checking for API regressions; I had the goal of speeding up make.  The tests were a way of being sure the files checked in were current.  The tests also provide a guard against unintenional API changes (if you make a change and the WADL changes and you didn't know you were impacting the web service, then you've been warned).
[22:30] <lifeless> benji: so, the pragmatic issue is: this must reliably, work when two branches with API changes are merged, with no human intervention.
[22:30] <lifeless> benji: *I* strongly suspect that means that making wadl generation faster is an easier approach.
[22:31] <lifeless> naively, I don't see why it would be more than a second or so's processing.
[22:32] <thumper> lifeless: do you know if we test against the ubuntu-bug script?
[22:32] <thumper> lifeless: or which version of the api it uses?
[22:32] <lifeless> thumper: there is something done with apport yes.
[22:33] <lifeless> 1.0 I believe, just because the distro did an audit-and-sweep for beta->1.0 a while back.
[22:48] <gary_poster> hey poolie.  I'd like to briefly show off bzr in a talk I'm giving.  I was having trouble with bzr-git until I just upgraded to the new packages (https://launchpad.net/~bzr/+archive/ppa)--my example works now, yay!  why isn't bzr-hg in that ppa though?  In a perfect world, I'd show that too.
[22:49] <thumper> lifeless: I'd like to talk to you at some stage to talk about how to understand some oopses I'm seeing
[22:50] <mwhudson> gary_poster: bzr-hg is not nearly as polished as bzr-git
[22:50] <mwhudson> i don't know if that's why
[22:50] <lifeless> thumper: sure thing
[22:50] <gary_poster> mwhudson: ok, thank you.
[22:50] <lifeless> thumper: skype?
[22:51] <thumper> lifeless: ack
[22:51] <thumper> https://lp-oops.canonical.com/oops.py/?oopsid=1698XMLP108
[22:52] <thumper> https://lp-oops.canonical.com/oops.py/?oopsid=1698XMLP110
[22:52] <thumper> https://lp-oops.canonical.com/oops.py/?oopsid=1698XMLP115
[22:52] <thumper> https://lp-oops.canonical.com/oops.py/?oopsid=1698XMLP118
[22:54] <gary_poster> mwhudson, I've actually come to trust bzr-svn so much that I use it to commit to public svn repos.  I didn't always feel that way.  Do you happen to know if bzr-git is similarly polished?
[22:55] <mwhudson> gary_poster: it's pretty close
[22:55] <gary_poster> cool
[22:55] <gary_poster> thanks again
[22:55] <wgrant> Apart from the whole push vs dpush thing.
[22:55] <mwhudson> it's less polished than bzr-svn probably, but it's a bit less of a model change so it's job is a bit easier
[22:55] <mwhudson> oh right yeah, and it doesn't roundtrip
[22:56] <jelmer> wgrant: roundtripping support is on the way and will be in the next non-bugfix release
[22:56] <james_w`> \o/
[22:56] <gary_poster> awesome :-)
[22:57] <gary_poster> I see this page with caveats, and links to more http://doc.bazaar.canonical.com/migration/en/foreign/bzr-on-git-projects.html
[22:57] <wgrant> jelmer: Ooh, nice.
[22:57] <wgrant> jelmer: We're you storing the data?
[22:58] <jelmer> wgrant: a special file in the tree and the commit message
[22:58] <wgrant> jelmer: The file stores file IDs?
[23:00] <jelmer> wgrant: yes, for file ids introduced by bzr
[23:00] <lifeless> thumper: https://devpad.canonical.com/~stub/ppr/lpnet/latest-daily-pageids.html
[23:01] <wgrant> jelmer: Isn't that going to be merge conflict fun?
[23:11] <jelmer> wgrant: yes, there might be some issues in that regard
[23:12] <wgrant> rockstar: FWIW, the lp-buildd chroots have nothing to do with security.
[23:12] <jelmer> wgrant: fortunately it will only happen when people merge from multiple others who have used bzr to introduce new files
[23:12] <wgrant> chroots are useless when you have root.
[23:12] <wgrant> jelmer: Yep.
[23:12] <wgrant> Still not ideal, but such is git...
[23:12] <jelmer> wgrant: It's the best I could come up with that was scalable, reliable and practical.
[23:25] <Snorlax> Anyone here, that has deployed a launchpad install on a private server?
[23:25] <lifeless> jelmer: wgrant: merge is hookable. Hook and win.
[23:26] <jelmer> lifeless: we were talking about git though
[23:26] <lifeless> its hookable too :P
[23:26] <jelmer> lifeless: fair enough :-)
[23:26] <wgrant> So Git really doesn't have any custom metadata support?
[23:27] <wgrant> *At all*?
[23:27] <jelmer> wgrant: well, there is notes, which are basically branches with metadata about e.g. commits
[23:27] <wgrant> But I hear they don't propagate.
[23:27] <jelmer> they are used to make after-the-fact modifications to existing fields in git
[23:28] <jelmer> so we can't use them - their contents would still show up in the git ui
[23:28] <wgrant> Hah.
[23:39] <lifeless> mars: around ?
[23:39] <lifeless> OOPS-1710EC915, OOPS-1710ED880
[23:50] <Ursinha> lifeless, I guess he's out for labor's day
[23:52] <lifeless> ah yes