[00:00] <mwhudson> mbarnett: ok, that's not what i expected
[00:00] <mwhudson> mbarnett: which machines did you try it on?
[00:00] <lifeless> sinzui: browser/team.py has an intersting thing
[00:00] <lifeless> it looks up the action from the form
[00:01] <mbarnett> mwhudson: galapagos, pear, russkaya
[00:01] <lifeless> but it doesn't seem to check that the action is *for* that row
[00:02] <mwhudson> !?
[00:03] <mwhudson> mbarnett: sorry if this is tedious, but can you pastebin the ~importd/.bazaar/bazaar.conf files from pear and russkaya?
[00:05] <spm> mwhudson: I'll sort that mbarnett needs to go and en-cake-enate
[00:05] <sinzui> Lifeless yeah. I am staring at the template. I think the message is adapted to a widget and it builds ids from the message id...we automatically discard duplicate message ids
[00:05] <maxb> jelmer: Did the bzr-svn on launchpad somehow just not ever try "discovering revprop revisions" before the last rollout?
[00:05] <jelmer> maxb: it's always done that
[00:05] <maxb> huh
[00:05] <mbarnett> yes yes, cake levels dangerously low!
[00:05] <jelmer> maxb: But we didn't do KDE imports until recently
[00:06] <maxb> Yes, but the KDE imports which ran before the rollout didn't "discover revprop revisions"
[00:06] <maxb> oh, wait, I'm getting my dates wrong.
[00:07] <lifeless> sinzui: anyhow all tests are passing, except for
[00:07] <lifeless> >>> find_tag_by_id(admin_browser.contents, 'batchnav_first')['class']
[00:07] <sinzui> Yes that will be an issue for a few more moments. I have a patch
[00:08] <lifeless> sinzui: and I don't think the bug with POST there is any better or worse due to batching; if its buggy its always been buggy.
[00:08] <sinzui> I agree
[00:11] <mwhudson> spm: cool, and good morning
[00:11] <sinzui> lifeless: https://bugs.edge.launchpad.net/launchpad-foundations/+bug/637654 has a proposed patch. It fixes the most common case of upper and lower. I image a page with two sets of BNs will continue to be broken
[00:12] <spm> mwhudson: hmm. you may have hit the nail. pear doesn't have that file
[00:12] <lifeless> sinzui: is this likely to cause other tests to fail ?
[00:12] <spm> mwhudson: https://pastebin.canonical.com/37119/ <== russkaya
[00:12] <lifeless> (I mean, will there be other fixups to do with that patch)
[00:13] <lifeless> sinzui: could you commit that patch somewhere and push it? I'll pull it in
[00:13] <sinzui> I bet not since there are no tests reporting that we have rubbish ids. I reported it as a separate bug because I think this issue is a separate concern from your branch now
[00:13]  * sinzui does
[00:13] <lifeless> sinzui: it is a seperate concern, but either I leave the test out or I include your branch
[00:14] <mwhudson> spm: gar
[00:14] <spm> mwhudson: I assume C&P from one t'other?
[00:14] <mwhudson> spm: does /home/importd/.bazaar/sign-vcs-import exist on pear?
[00:14] <spm> nope
[00:15] <mwhudson> did pear get reinstalled at some point?
[00:15] <mwhudson> spm: can you look at  /home/importd/.bazaar/sign-vcs-import on russkaya?
[00:15] <spm> mwhudson: I'd assume it being a new machine; some things may have been missed :-(
[00:15] <mwhudson> it probably references something ridiculous like ~importd/hoover/keys/key.gpg
[00:16] <spm> holy dooly
[00:16] <mwhudson> https://wiki.canonical.com/InformationInfrastructure/OSA/LPHowTo/SetUpCodeImportSlave -> "# Make sure ~importd/.bazaar/ and ~importd/botslave look like they do on a working slave. "
[00:16] <mwhudson> :(
[00:16] <spm> mwhudson: exec /usr/bin/gpg --no-default-keyring --keyring /home/importd/botslave/gpg/vcs-imports@canonical.com.pub --secret-keyring /home/importd/botslave/gpg/vcs-imports@canonical.com.secret --default-key A60FA0E1 "$@"
[00:17] <mwhudson> spm: i was sure this was working at some point on pear :(
[00:17] <mwhudson> maybe not
[00:17] <mwhudson> it only affects cscvs imports i guess....
[00:17] <spm> I'm surprised it's working on russkaya....
[00:22] <sinzui> lifeless: lp:~sinzui/launchpad/batch-ids-0 There are no tests for these links. Nor can I find any tests for the existing of the upper or lower navs. All the tests for the BN are getting the Next link using TestBrowser
[00:22] <lifeless> oulling
[00:23] <sinzui> I checked for uses the the default BN. Subclasses like root search and bug do their own layouts
[00:24] <lifeless> sinzui: so 'batchnav_first' is not what to look for
[00:27] <sinzui> lifeless try 'upper-batch-nav-batchnav-first' but keep in mind the nav will not rendered if there is only one batch. We need 6 messages in the testrunner env or we add ?batch=1 to the url we are tesing to set the size to 1 message
[00:30] <lifeless> and for the bottom lower-batch-nav-batchnav-first ?
[00:30] <sinzui> yep
[00:31] <lifeless> ?batch=1 doesn't do it
[00:32] <lifeless> http://paste.ubuntu.com/493359/
[00:33] <lifeless> hmm
[00:33] <lifeless> 1 is too small
[00:34] <lifeless> moving it lower down
[00:37] <lifeless> sinzui: doesn't appear to be rendering the nav links to me
[00:39] <lifeless> Continue to hold the message, deferring\n          your decision until later.</li>\n        </ul>\n      </div>\n\n        <table class="listing">\n
[00:39] <sinzui> lifeless sorry, my screen was locked for a moment
[00:39] <lifeless> sinzui: ^ thats with batch=1 and 2 messages
[00:39] <wgrant> So, we have an issue with the OpenID identifier migration last week, causing incorrect accounts to be linked together... can someone poke around on staging to work out WTF is going on?
[00:39] <lifeless>  2\n        \n        <span>\n        messages have</span>\n        been posted to
[00:39] <lifeless> wgrant: sure, once I'm finished here.
[00:40] <wgrant> lifeless: Thanks.
[00:40] <lifeless> sinzui: only one message is shown
[00:40] <lifeless> so the batch param worked
[00:40] <lifeless> its the naviation bit that isn't
[00:41] <sinzui> I am a bad advisor. you are on the first batch. We should be checking for upper--batch-nav-batchnav-next
[00:42] <lifeless> no, you're fine
[00:42] <lifeless> its no there
[00:42] <sinzui> :(
[00:42] <lifeless> after the advice
[00:42] <lifeless> Discard</strong> - Throw the message aw
[00:42] <lifeless> etc
[00:42] <lifeless> your decision until later.</li>\n        </ul>\n      </div>\n\n        <table class="listing">\n        <thead><tr>\n          <th>Message detail
[00:42] <lifeless> is whats in the browser.contents
[00:43] <lifeless> the navigation bit is just awol
[00:43] <lifeless> it should be after that </div>
[00:44] <sinzui> We suppress rendering of the lower if there is no additional batches, so maybe that template fragment is wrong
[00:45] <lifeless> well there is an additional batch - batch=1 and two messages to moderate
[00:45] <lifeless> the count on the page shows '2' so we know there are two there
[00:45] <lifeless> and there is only one "approve" in the output, so we know only one got shown
[00:47] <sinzui> lifeless, the upper template must rendered since there is clearly a batch. the view guards the rendering with this: ``if self.context.currentBatch():``
[00:47]  * sinzui is looking at canonical/launchpad/webapp/batching.py
[00:48] <lifeless> what is the context object going to be - held_messages ?
[00:49] <spm> mwhudson: pear now has those dirs/files setup per russkaya
[00:49] <mwhudson> spm: great, thanks
[00:50] <sinzui> lifeless: yes held_messages. we are adapting a BN
[00:50] <lifeless> so I did this
[00:50] <lifeless> +++ lib/canonical/launchpad/webapp/batching.py  2010-09-13 23:50:20 +0000
[00:50] <lifeless> @@ -40,7 +40,7 @@
[00:50] <lifeless>      def render(self):
[00:50] <lifeless>          if self.context.currentBatch():
[00:50] <lifeless>              return LaunchpadView.render(self)
[00:50] <lifeless> -        return u""
[00:50] <lifeless> +        return u"not rendered"
[00:50] <lifeless> not rendered was not included in the output
[00:51] <sinzui> ?
[00:51]  * sinzui checks zcml 
[00:51] <lifeless> I wanted to see if that code path was shortcircuiting or something
[00:54]  * sinzui checks other batches with the hacked template
[00:58] <lifeless> and this:
[00:58] <lifeless> +++ lib/canonical/launchpad/webapp/batching.py  2010-09-13 23:52:07 +0000
[00:58] <lifeless> @@ -38,6 +38,7 @@
[00:58] <lifeless>      css_class = "upper-batch-nav"
[00:58] <lifeless>  
[00:58] <lifeless>      def render(self):
[00:59] <lifeless> +        return u" fooo "
[00:59] <lifeless>          if self.context.currentBatch():
[00:59] <lifeless>              return LaunchpadView.render(self)
[00:59] <lifeless>          return u""
[00:59] <lifeless> also doesn't show up in the output
[00:59] <sinzui> right. I am not seeing my template change when testing https://blueprints.launchpad.dev/firefox?batch=2
[01:00] <sinzui> Or I could run the instance that I made the change in instead
[01:00] <lifeless> and to cap it off,when I make that raise an Exception I don't get an error
[01:01] <sinzui> oh.. I wonder. I see >>> admin_browser.reload() which has a history if being buggy
[01:01] <lifeless> have a look at lib/lp/blueprints/templates/person-specworkload.pt
[01:02] <lifeless> sinzui: I'm sure its not that, I made the view crash and the page rendered
[01:02] <sinzui> I see my hack in specs now that I am running the right branch
[01:06] <lifeless> I'm thoroughly confused
[01:07] <lifeless> is there a sample data team w/list ?
[01:08] <sinzui> lifeless me too, this always just works. Can you humour me by adding this link before we do the call the find_tag_by_id
[01:08] <sinzui>     >>> admin_browser.open(
[01:08] <sinzui>     ...     'http://launchpad.dev/~guadamen/+mailinglist-moderate')
[01:08] <lifeless> of course
[01:09] <sinzui> lifeless there are no mls in data. I have a make harness note about making them after a request in made in the UI
[01:09] <lifeless>     >>> admin_browser.open(
[01:09] <lifeless>     ...     'http://launchpad.dev/~guadamen/+mailinglist-moderate?batch=1')
[01:09] <lifeless>     >>> find_tag_by_id(admin_browser.contents, 'upper-batch-nav-batchnav-first')['class']
[01:09] <lifeless>     first
[01:09] <lifeless>     >>> admin_browser.contents
[01:09] <lifeless> thats what the story does
[01:09] <sinzui> :(
[01:10] <lifeless> whats weirded
[01:10] <lifeless> I added a string literal and I can't see it
[01:11] <lifeless> its almost like those divs are eaten
[01:11] <lifeless> when I put a literal above it works
[01:11] <lifeless> in the list of action descriptions
[01:12] <lifeless> but when I add another div at the place we have the navigation ones it disappears
[01:12] <sinzui> lifeless as a desperate act to to verify this we could add size=1 to the BN instantiation in the view to be certain that the URL is not being ignore
[01:12] <sinzui> d
[01:12] <lifeless> I'm certain its not
[01:12] <lifeless> because only one "approve" action is in the contents
[01:12] <sinzui> Ah, yes, that is what I did to be certain something showed up in my env
[01:13] <lifeless> I suspect the metal:form stuf
[01:14] <lifeless> I'm positive its simply not evaluation things without metal:fill-slot in that container
[01:14] <lifeless> I think if we add a div around it it wil work, moving the widgets slot up
[01:15] <sinzui> we are adapting the message in the same manner that we want to adapt the BN
[01:15] <lifeless> yes, but the metal interpreter isn't evaluating things without slots
[01:15] <sinzui> We can certainly move the navs out of the form to be sire it works
[01:16] <lifeless> bet you that that is is
[01:16] <lifeless> is it
[01:16] <lifeless> yes
[01:16] <lifeless> it was
[01:16] <lifeless> I have this now
[01:16] <sinzui> \o/
[01:16] <lifeless> <a class="next" rel="next"\n           href="http://launchpad.dev/%7Eguadamen/+mailinglist-moderate?start=1&amp;batch=1"\n           id="upper-batch-nav-batchnav-next">
[01:16] <lifeless> -      <table class="listing" metal:fill-slot="widgets">
[01:16] <lifeless> +      <div metal:fill-slot="widgets">
[01:16] <lifeless> +      <tal:navigation
[01:16] <lifeless> +        replace="structure view/held_messages/@@+navigation-links-upper" />
[01:16] <lifeless> +
[01:16] <lifeless> +      <table class="listing">
[01:16] <lifeless> thats the key
[01:17] <sinzui> 1.5h of confusion and 1 minute to fix with insight.
[01:17] <lifeless> we must be programming
[01:18] <lifeless> Thank you for this; I'll push up and propose for merge
[01:19] <sinzui> Thanks
[01:19] <lifeless> and I'll write a mail to the list with a) a howto and b) asking for where it should go
[01:20] <lifeless> does anyone remember the wiki page for the bug sprinty thing at the end of the year?
[01:24] <lifeless> sinzui: https://code.edge.launchpad.net/~lifeless/launchpad/registry/+merge/35354
[01:40] <wgrant> lifeless: So, I worked out what was up with the broken accounts.
[01:40] <wgrant> Sadly more will likely break soon.
[01:40] <lifeless> wgrant: ok cool
[01:40] <lifeless> I learnt how to batch stuff
[01:40] <lifeless> and to hate metal:form
[01:40] <wgrant> Heh.
[01:41] <wgrant> Who owns our OpenID consumer these days?
[01:42] <lifeless> consumer? foundations
[01:43] <lifeless> lnchtime
[01:53] <lifeless> thumper: does transaction time == scan time ?
[01:53] <thumper> lifeless: luckily, no
[01:54] <thumper> lifeless: 5.5 minutes to get the ancestry from bzrlib :(
[01:57] <lifeless> wheee
[01:58] <lifeless> I think your idea of decoupling the tip change may not be enough
[01:58] <lifeless> I'd start with autocommit
[01:58] <lifeless> IMBW
[01:58] <lifeless> but it seems like low effort for big return
[02:15] <lifeless> thumper: can I borrow your eyeballs
[02:22] <thumper> lifeless: no, they're mine
[02:22] <thumper> lifeless: what do you need?
[02:22]  * mwhudson is reminded of the end of hotshots
[02:23] <lifeless> thumper: a review
[02:23] <lifeless> its small, it will fix mailing list moderation (or make it fixable by further tuning)
[02:43] <MTecknology> Just on the wild off chance... Is there anyone that knows very basic accounting principles in here?   I know there are very smart people in here and hoping one might be able to help me out..
[02:47] <lifeless> MTecknology: I do, enough to say 'run run away'
[02:48] <lifeless> spm: hey, don't suppose in the losa wiki you have a sql fragment to report on locks in the db ?
[02:48] <lifeless> spm: I know I wrote one up years ago ...
[02:48] <spm> yup sure do
[02:48] <lifeless> thumper needs its.
[02:48] <spm> it's a tad obscure to find tho. lp howto, troubleshooting from ememory
[02:48] <MTecknology> lifeless: How about enough to help me figure out this problem that's driving me absolutely bonkers? I have the book - but the book doesn't cover the material.
[02:48] <thumper> lifeless: did you mute?
[02:49] <spm> thumper: https://wiki.canonical.com/InformationInfrastructure/OSA/LPHowTo/BlockedProcessesDBLocks as a general
[02:49] <spm> https://wiki.canonical.com/InformationInfrastructure/OSA/LPHowTo/PostgresOldQueries is also vaguely relevant
[02:50] <spm> MTecknology: google isn't helping find it? if they're basic accuonting principles, there should be heap of online references that explain them??
[02:50] <MTecknology> spm: My issue is understanding the basics of what I'm even reading online
[02:52] <spm> MTecknology: being quite serious here (I've got a few ni teh series): Perhaps "Accounting for Dummies"? serious suggestion, the dummies series are excellent for explaining the basic concepts. ??
[02:54] <MTecknology> spm: might be worth buying from ya - any chance you could try to help me in a query with this one?
[02:54] <MTecknology> or else there's a barnes & nobel here if you weren't offering to sell
[02:55] <spm> MTecknology: accounting? hell no. I never studied it at school or uni. wouldn't have a clue. I just have a few Dummies books that I've found excellent for explaining early concepts in the topics in question. :-)
[02:55] <MTecknology> oh
[02:55] <MTecknology> BTW - This is what I'm fighting.  Pearson Brothers recently reported an EBITDA of $13.5 million and net income of $2.6 million. It had $2.0 million of interest expense, and its corporate tax rate was 35%. What was its charge for depreciation and amortization?
[02:56] <spm> http://www.amazon.com/Accounting-Dummies-John-Tracy/dp/0764550144 fwiw
[02:57] <MTecknology> cheap, and probably much more useful than this $150 unbound see through sheets of paper book I have
[02:59] <spm> probably :-)
[03:10] <cr3> lifeless: hi there, sorry I couldn't answer you earlier. still around?
[03:13] <lifeless> yes
[03:15] <cr3> lifeless: so, regarding test runs, do you also feel that's the best way to describe a group of test results run at a point in time in a given context?
[03:15] <cr3> lifeless: typically, I prefer to name things with one word, like submission instead of test run, but I think the latter might be clearer
[03:17] <lifeless> I commentted on thta in #testrepository
[03:17] <lifeless> sorry otp now
[03:19] <cr3> lifeless: heh, you seem to have been on the phone all day :)
[03:22] <cr3> on an unrelated topic, I have a question about defining interfaces: if a class implements IBugTarget which inherits from IHasBugs, using bugs as an example, then that class typically defines a createBug method.
[03:23] <cr3> however, why not have the class have a bugs attribute which returns a IBugSet which, in turn, implements a create method
[03:24] <cr3> in other words, the difference is like product.createBug compared to product.bugs.create, does this make sense to anyone?
[03:25] <lifeless> thumper: https://dev.launchpad.net/LEP/FeatureFlags#preview
[03:28] <lifeless> thumper: if features.getFeature('code.incrementaldif') == 'on':
[03:28] <lifeless> in templates, you do view/features/code.incrementaldiff
[03:28] <lifeless> or something like that
[03:35] <lifeless> spm: how many cpus on the master db?
[03:36] <spm> lifeless: 16
[03:57] <lifeless> cr3: hi
[03:57] <lifeless> uhm
[03:58] <lifeless> in reverse order, I don't know, IBugSet really isn't the specific code I'd use if sketching it that way, and don't forget that all calls to SQL are ~ 1000 times slower than python.
[03:59] <lifeless> I odn't have a brilliant name for the result of running many tests other than 'test run'
[04:08] <wgrant> lifeless: Any idea how to debug the +filebug issue?
[04:10] <lifeless> hmm
[04:10] <lifeless> spm: we really do need a hand
[04:10] <lifeless> spm: when you can, its approx the top timeout on prod
[04:10] <spm> lifeless: sure, was on a call, earlier hence the terse reply. sup?
[04:11] <lifeless> spm: +filebug gives an apache/haproxy error, reliably, on staging and prod
[04:11] <lifeless> wgrant has been looking at it
[04:11] <lifeless> we need to know a bit more about whats actually going on.
[04:11] <wgrant> Bug 636801
[04:11] <spm> um. since when? I happily filed a bug earlier?
[04:11] <lifeless> or for someone to make the request to a naked appserver
[04:11] <wgrant> I guess we need someone to watch staging Apache and see why it errors.
[04:11] <lifeless> or something
[04:12] <wgrant> spm: Only when filing with lots of apport attachments.
[04:12] <lifeless> spm: with apport on a package with 20+ subscribers?
[04:12] <spm> no, just a soyuz one. qed. ;-)
[04:12]  * wgrant kicks mup.
[04:12] <lifeless> mup has mastered the fine art of silence.
[04:12] <spm> mup appears to have left the channel
[04:12] <wgrant> spm: WRT that Soyuz one, it seems to be a general problem. I've received complaints that lots of builds are dispatching repeatedly.
[04:13] <lifeless> spm: _mup_
[04:13] <cr3> lifeless: the interface question was mostly related to something containing other objects. put another way, I could have projects['bzr'].create_test_run() or projects['bzr'].test_runs.create()
[04:13] <spm> ahh. it hides under a new name.
[04:14] <lifeless> cr3: is this python or LP API's ?
[04:14] <lifeless> cr3: if its LP API's you probably want to design to the wire protocol, given how round-trip-happy it is.
[04:15] <cr3> lifeless: ok, so every dot is a roundtrip potentially
[04:15] <wgrant> Not just potentially :(
[04:15] <lifeless> if by potentially you mean 'almost guaranteed'
[04:15] <lifeless> and by dot you mean 'python method invocation'
[04:15] <lifeless> (which includes __getattr__ aka '.')
[04:15] <cr3> I thought that perhaps launchpadlib could potentially cache information on the client side, sometimes avoiding a roundtrip
[04:16] <lifeless> cr3: optimise for cold cache :)
[04:16] <lifeless> (it can, under very limited circumstances)
[04:17] <lifeless> which I suspect we'll be limiting to about 2.5 hours in the near future
[04:17] <cr3> ok, that answers my question and provides good guidelines for the future. thanks!
[04:20] <cr3> lesson learned, now time for bed. cheerio folks!
[04:23] <wgrant> Bah, no staging.
[04:29] <spm> lifeless: so, been doing some log snarfing and head scratching. not finding any errors in apache - but if a POST, and timing out; tbh I wouldn't expect to. :-( If this can be reliably repeated, I'd suggest 'a' way forward, would be to sniff the traffic at the client end when doing such a thing. even tho the connection'd be ssl'd, I'd betcha we'd get useful info out of the flow.
[04:31] <lifeless> stub: https://code.edge.launchpad.net/~stub/launchpad/cronscripts/+merge/35279 reviewedish
[04:31] <spm> wgrant: ref soyuz; yeah, I'm sure I'd seen comments around this bug before; but didn't have enough "knowledge" to find 'em. So figured a new with some detailed timeing info may help Julian. Being a private build I had to be a tad circumspect in what I put in unf. :-/
[04:31] <lifeless> spm: we don't see the response on the client,thats the point.
[04:31] <lifeless> spm: client -> server, pause, 'could not connect to launchpad'
[04:32] <spm> lifeless: the tcp conenction stays open forever until it gets client killed?
[04:32] <lifeless> spm: so we don't get an oops, don't get zip
[04:32] <lifeless> spm: no we get the haproxy/apache lalala page
[04:32] <spm> after what time period, repeatedly?
[04:32] <lifeless> wgrant: please tell spm how to make it happen, then he'll see
[04:32] <spm> same time period? longer? shortly? varies by moon phase and tides?
[04:32] <lifeless> 10seconds sometimes apparently, though I think that was during the overload
[04:33] <lifeless> 30 normallyish, I thinks.
[04:33] <spm> different browsers to make a diff?
[04:33] <lifeless> spm: don't think we've tested, because the browser is working fine.
[04:34] <spm> just wondering if it's an internal browser timeout that's then kicking the server error
[04:34] <lifeless> I don't even know how to parse that
[04:34] <spm> ie. are packets actually flowing and then dying.
[04:34] <spm> or no packets flowing at all
[04:35] <lifeless> spm: its http - request/response model
[04:35] <lifeless> spm: and apport does preuploading of the bugs, so its not a big post.
[04:35] <spm> for sure; I'm looking at the tcp layer to get clues for wtf is happening at the http layer.
[04:35] <lifeless> wgrant: whats a package that this has happened to ?
[04:36] <lifeless> spm: I don't think its an http problem myself, I think its appserver lalalalala land time genuinely, but we don't see the oops clearly
[04:36] <spm> fwiw, it should be pretty simple in staging: intranettertubers -> apache -> appserver. no squid, no haproxy.
[04:36] <lifeless>  ok
[04:37] <wgrant> spm, lifeless: I was trying to prepare a case, but staging is borked.
[04:37] <lifeless> so we're seeing the apache 'server fail' message
[04:37] <spm> actually there's a point. wonder if the oops are being generated; we're just not seeing 'em. looks...
[04:37] <lifeless> OOPS-1717E1745, OOPS-1717G1716, OOPS-1717H1810, OOPS-1717K1882, OOPS-1717L1760
[04:37] <lifeless> OOPS-1717E1218, OOPS-1717E1837, OOPS-1717K1949, OOPS-1717M1234, OOPS-1717N1211
[04:37] <lifeless> OOPS-1717D703, OOPS-1717G778, OOPS-1717K884, OOPS-1717K885
[04:37] <lifeless> they are listed as soft timeouts on +filebug
[04:38] <lifeless> we also have some 'OffsiteFormPostError'
[04:38] <lifeless> OOPS-1717M14
[04:38] <spm> process-apport-blobs.log is remarkably unhelpful
[04:38] <wgrant> process-apport-blobs is fine.
[04:38] <lifeless> that happens async
[04:39] <lifeless> its all in the appserver at this point
[04:39]  * spm is doing the Sherlock Holmes method of debug - eliminate the working, to discover the not ;-)
[04:40] <lifeless> :)
[04:40] <wgrant> Can we expect staging to return at some point?
[04:40] <wgrant> It's been down a lot lately...
[04:40] <lifeless> wgrant: theres about 6 queries per attachment
[04:40] <wgrant> lifeless: Really?
[04:41] <spm> launchpad-trace.log has zip with 'filebug' in it. orsum
[04:41] <lifeless> INSERT INTO BugAttachment (message, bug, libraryfile, type, title) VALUES (%s, %s, %s, %s, %s) RETURNING BugAttachment.id
[04:41] <wgrant> Message, BugAttachment, BugNotification, FUCKLOADS * BugNotificationRecipient
[04:41] <lifeless> SELECT Person.account, Person.creation_comment, Person.creation_rationale, Person.datecreated, Person.defaultmembershipperiod, Person.defaultrenewalperiod, Person.displayname
[04:41] <lifeless> SELECT BugTask.assignee, BugTask.bug, BugTask.bugwatch, BugTask.date_assigned, BugTask.date_closed, BugTask.date_confirmed, BugTask.date_fix_committed, BugTask.date_fix_released
[04:41] <spm> wgrant: for some reason, staging is being updated 'continuously' regardless of need. haven't had a chance to chase. yet.
[04:41] <lifeless> SELECT StructuralSubscription.blueprint_notification_level, StructuralSubscription.bug_notification_level, StructuralSubscription.date_created, StructuralSubscription.date_last_updated
[04:41] <lifeless> SELECT Person.account, Person.creation_comment, Person.creation_rationale, Person.datecreated, Person.defaultmembershipperiod, Person.defaultrenewalperiod, Person.displayname,
[04:41] <lifeless> SELECT Person.account, Person.creation_comment, Person.creation_rationale, Person.datecreated, Person.defaultmembershipperiod, Person.defaultrenewalperiod, Person.displayname
[04:41] <wgrant> lifeless: 'ugh' comes to mind.
[04:41] <lifeless> SELECT Person.account, Person.creation_comment, Person.creation_rationale, Person.datecreated, Person.defaultmembershipperiod, Person.defaultrenewalperiod, Person.displayname,
[04:42] <lifeless> SELECT Person.account, Person.creation_comment, Person.creation_rationale, Person.datecreated, Person.defaultmembershipperiod, Person.defaultrenewalperiod, Person.displayname,
[04:42] <lifeless> SELECT BugTask.assignee, BugTask.bug, BugTask.bugwatch, BugTask.date_assigned, BugTask.date_closed, BugTask.date_confirmed, BugTask.date_fix_committed, BugTask.date_fix_released,
[04:42] <lifeless> SELECT StructuralSubscription.blueprint_notification_level, StructuralSubscription.bug_notification_level, StructuralSubscription.date_created, StructuralSubscription.date_last_updated,
[04:42] <lifeless> SELECT Person.account, Person.creation_comment, Person.creation_rationale, Person.datecreated, Person.defaultmembershipperiod, Person.defaultrenewalperiod, Person.displayname,
[04:42] <lifeless> SELECT Person.account, Person.creation_comment, Person.creation_rationale, Person.datecreated, Person.defaultmembershipperiod, Person.defaultrenewalperiod, Person.displayname,
[04:42] <lifeless> SELECT LibraryFileContent.datecreated, LibraryFileContent.filesize, LibraryFileContent.id, LibraryFileContent.md5, LibraryFileContent.sha1 FROM LibraryFileContent WHERE LibraryFileContent.id = %s LIMIT 1
[04:42] <lifeless> INSERT INTO BugActivity (oldvalue, datechanged, whatchanged, message, newvalue, bug, person) VALUES (%s, CURRENT_TIMESTAMP AT TIME ZONE 'UTC', %s, %s, %s, %s, %s) RETURNING
[04:42] <lifeless> INSERT INTO Message (datecreated, owner, subject, rfc822msgid) VALUES (CURRENT_TIMESTAMP AT TIME ZONE 'UTC', %s, %s, %s) RETURNING Message.id
[04:43] <lifeless> I'm going to stop there
[04:43] <lifeless> 'lots'
[04:44] <lifeless> wgrant: mailed you, its an open package, normal person
[04:44] <wgrant> lifeless: Is this from a hidden OOPS?
[04:45] <wgrant> Hm, that's only 500 queries.
[04:45] <wgrant> Is this a soft timeout?
[04:45] <lifeless> yes
[04:45] <wgrant> Ah.
[04:45] <lifeless> but we've no reason to assume that this is unrelated ;)
[04:46] <wgrant> I'm more concerned with the bad error than the fact that there's an error.
[04:46] <wgrant> We know why it's timing out.
[04:46] <wgrant> We don't know why it's timing out like this.
[04:47] <lifeless> oh it concerns me too
[04:47] <lifeless> wgrant: are you doing some perf stuff today? It is tuesday...
[04:49] <wgrant> Are we going to be able to work this out on staging soon, or should we do it on prod now?
[04:49] <lifeless> prod it up
[04:51] <wgrant> OK.
[04:51] <wgrant> https://bugs.edge.launchpad.net/ubuntu/+source/linux/+filebug/5ca89d78-bfa3-11df-905e-0025b3df357a breaks pretty repeatedly.
[04:51] <lifeless> wgrant: whats your ip for spm to look in apache logs
[04:52] <wgrant> Isn't the token in the URL sufficient?
[04:52] <spm> "trust me, I'm a sysadmin"
[04:52] <wgrant> Otherwise 122.108.38.217
[04:52]  * lifeless looks in shock at spm
[04:52] <jtv> wgrant: heya
[04:52] <wgrant> jtv: Morning.
[04:53] <jtv> wgrant: may or may not be related but last night at least, we had some edge breakage where one of the edge instances reported the wrong revision.
[04:53] <jtv> So it'd say it was at r11532 but actually seemed to be stuck at r11522 like the rest.
[04:53] <wgrant> jtv: Unrelated -- this has been going on since ~10.09.
[04:53] <jtv> Oh ok
[04:54] <jtv> nm that then :)
[04:54] <wgrant> Heh.
[04:56] <spm> bleh. aapche say '502'
[04:56] <wgrant> Nothing useful in the error log?
[04:56] <wgrant> Or does that mean the appserver said 502?
[04:57] <lifeless> spm: now the question is, did an oops get generated
[04:57] <spm> [14/Sep/2010:04:50:42 +0100]
[04:57]  * wgrant stabs BST in the face.
[04:58] <spm> indeed
[04:59] <spm> [Tue Sep 14 04:50:55 2010] [error] [client 122.108.38.217] (70014)End of file found: proxy: error reading status line from remote server localhost, referer: https://bugs.edge.launchpad.net/ubuntu/+source/linux/+filebug/5ca89d78-bfa3-11df-905e-0025b3df357a
[04:59] <spm> [Tue Sep 14 04:50:55 2010] [error] [client 122.108.38.217] proxy: Error reading from remote server returned by /ubuntu/+source/linux/+filebug/5ca89d78-bfa3-11df-905e-0025b3df357a, referer: https://bugs.edge.launchpad.net/ubuntu/+source/linux/+filebug/5ca89d78-bfa3-11df-905e-0025b3df357a
[04:59] <spm> ^^ errorlog
[04:59] <wgrant> Aha.
[04:59] <wgrant> So, the appserver died.
[04:59] <wgrant> Or otherwise closed the connection.
[05:00] <spm> hmm. haproxy/squid are in there somewhere. there may be in ter est ing comp li ca tions
[05:00] <wgrant> I thought they were on the other side.
[05:00] <wgrant> But I could well be wrong.
[05:03] <wgrant> So I guess you need to go through all the layers :(
[05:03] <lifeless> apache -> ha -> appserver
[05:04] <wgrant> With Squid in front of Apache?
[05:04] <spm> apache -> (squid)? -> ha -> app; POsts don't go via squiddly
[05:04] <wgrant> Ahh
[05:04] <wgrant> Handy.
[05:04] <lifeless> nor do authenticated requests IIRC
[05:05] <spm> correct
[05:05] <lifeless> thumper: https://bugs.edge.launchpad.net/launchpad-code/+bug/637758 please put the code walkthrough we did in there, for gary's info when he sees the other bug I'm filing :)
[05:06] <thumper> ok
[05:08] <lifeless> spm: how many appservers for lpnet ?
[05:09] <spm> 15
[05:09] <poolie> how's it going, wallyworld?
[05:09] <wgrant> I was a bit surprised to see O oopses over the weekend.
[05:09] <lifeless> so 60 threads
[05:09] <wgrant> I didn't realise there were quite that many.
[05:11] <lifeless> thumper: https://bugs.edge.launchpad.net/launchpad-foundations/+bug/637761
[05:13] <poolie> wgrant: because the counter was broke, or because we actually had 0?
[05:13] <poolie> remarkably good if os
[05:13] <poolie> *so
[05:13] <mwhudson> does that 15 include the login and shipit servers?
[05:13] <wgrant> poolie: O != 0
[05:14] <poolie> haha
[05:14] <poolie> O meaning some particular category?
[05:14] <lifeless> poolie: server ID in the oops code
[05:14] <lifeless> A, B, C, ...
[05:14] <spm> mwhudson: no, theose are extras
[05:14] <mwhudson> wow
[05:14] <mwhudson> lots of hardware
[05:14] <lifeless> some machines have multiple instances
[05:14] <lifeless> but yes.
[05:15] <poolie> aren't some higher letters used for something other than a machine id?
[05:15] <poolie> or maybe that's a different field
[05:15] <lifeless> its an arbitrary string
[05:15] <lifeless> e.g. XML
[05:15] <lifeless> date before, serial after
[05:15] <lifeless> thumper: rt 41361 if you want to high-pri it
[05:16] <thumper> lifeless: ack, dealing with a user on #launchpad right now
[05:16] <wgrant> The appservers are single letters. Others are longer strings (eg. CW, FTPMASTER, PPA)
[05:16] <spm> woo. progress. Sep 13 23:26:16 localhost haproxy[15039]: 127.0.0.1:39282 [13/Sep/2010:23:26:00.844] lpnet-app lpnet-app/potassium_lpnet_5 0/0/0/-1/15230 502 1184 - - SH-- 67/38/38/2/0 0/0 "POST /ubuntu/+source/linux/+filebug/8dc224d8-bf85-11df-806b-0025b3df357a HTTP/1.1"
[05:16] <lifeless> I wonder how hard it would be to port storm & zope to stackless
[05:17] <lifeless> or jython
[05:17] <wgrant> spm: But what does it mean?
[05:17] <lifeless> wgrant: it means SH--
[05:17] <wgrant> Heh.
[05:17] <lifeless> wgrant: one thing it tells us
[05:17] <spm> lpnet 5 on potassium did the "work"
[05:17] <lifeless> potassium should have the oops
[05:18] <wgrant> If there was an OOPS.
[05:18] <wgrant> I was hoping it would tell us on what terms the response ended.
[05:20] <thumper> lifeless: I'm thinking that we are seeing other xmlrpc problems from the bzr client
[05:20] <thumper> lifeless: as it does lp name resolution lookups
[05:21] <lifeless> wgrant: divorced
[05:21] <spm> lifeless: wgrant: it also tells us, this timed out after 15 seconds; some other logs around there have 300secs, so ... funky.
[05:21] <lifeless> thumper: sorry, can you expand on that please.
[05:22] <spm> or succeeded after 270 seconds; so I'd suggest this is *unlikely* (but not ruled out) to be a timeout issue directly.
[05:22] <wgrant> spm: Does potassium have an opinion?
[05:22] <lifeless> the 270 seconds will be a file attachment
[05:22] <lifeless> wgrant: it likes water
[05:22] <wgrant> spm: Also, it didn't time out after 15 seconds.
[05:23] <wgrant> I don't think.
[05:23] <wgrant> Because I get that error in less than 14 seconds.
[05:23] <spm> 15230 <== ms, ~ 15 secs
[05:23] <lifeless> :23:26:00 -> 23:26:16
[05:23] <wgrant> (now, at least -- not sure about that request)
[05:23] <spm> :-)
[05:23] <wgrant> So it's not a pure timeout.
[05:24] <spm> I'd be inclined to rule out an apache/haproxy/squid timeout, not exlcude, but look elsewhere.
[05:24] <lifeless> spm: is that url in the zop elogs on potassium
[05:29] <spm> huh. not the *same* url, but related: https://lp-oops.canonical.com/oops.py/?oopsid=OOPS-1718F124
[05:31] <lifeless> spm: its odd for the request to dispatch and not be in the access log
[05:31] <lifeless> spm: wouldn't you say?>
[05:31] <spm> which log? the lp one?
[05:31] <lifeless> lpnet_5
[05:31] <spm> oh yes.
[05:32] <lifeless> isn't there an access log for it?
[05:32] <spm> I've founjd it in the trace log
[05:32] <lifeless> \o/
[05:32] <spm> but not in launchpad-access5.log-20100914
[05:32] <lifeless> ok that OOPS is also a soft timeout
[05:32] <lifeless> I wonder if there is something special going on
[05:32] <wgrant> There is clearly something special going on.
[05:32] <lifeless> wgrant: I have an experiment you could do
[05:32] <wgrant> Sure.
[05:33] <spm> lifeless: https://pastebin.canonical.com/37129/
[05:33] <lifeless> configure launchpad.dev to have a 1 second soft timeout and 1.2 second hard timeout
[05:33] <lifeless> point apport at it and have fun
[05:33] <lifeless> wgrant: theory: hard timeouts are breaking in this case
[05:33] <lifeless> and - boom - I think I know why
[05:33] <thumper> lifeless: could have been something else, don't worry about it
[05:33] <wgrant> Let's see...
[05:33] <lifeless> requesttimeline stuff we saw yesterday.
[05:34] <wgrant> Yeah, I wondered if that was realted.
[05:34]  * wgrant finds the timeouts.
[05:34] <lifeless> spm: is there an OverlappingActionError in the lpnet5 appserver log ?
[05:34] <lifeless> spm: (not the trace log)
[05:35] <wgrant> lifeless: I only see soft_request_timeout
[05:35] <wgrant> No hard_request_timeout.
[05:35] <lifeless> db_statement_timeout
[05:35] <wgrant> Oh.
[05:35] <wgrant> The comment on that is misleadding.
[05:35] <lifeless> really?
[05:35]  * thumper looks at the daily timeout candidates
[05:35] <wgrant> # SQL statement timeout in milliseconds. If a statement
[05:35] <wgrant> # takes longer than this to execute, then it will be aborted.
[05:35] <wgrant> # A value of 0 turns off the timeout. If this value is not set,
[05:35] <wgrant> # PostgreSQL's default setting is used.
[05:35] <thumper> what is BranchSet:CollectionResource:#branches ?
[05:35] <lifeless> thumper: an API call
[05:35] <wgrant> thumper: API branch collection.
[05:35] <thumper> yeah but which?
[05:35] <lifeless> I've got a bug open on the clarity for that
[05:36] <lifeless> thumper: you have to look at the oops
[05:36] <wgrant>  /branches
[05:36] <lifeless> which is why i have a bug open
[05:36] <lifeless> wgrant: *any* collection.
[05:36] <wgrant> lifeless: It says BranchSet
[05:36] <lifeless> ah true, damn your eyes.
[05:36] <wgrant> But yes, normally it's stupid.
[05:40] <spm> lifeless: not that I can find. afaict, that "request" doesn't exist in the lpnet5 access log. even looking at the full 10 min period around
[05:40] <lifeless> wgrant: http://pastebin.com/naNudsp3
[05:40] <lifeless> spm: check nohup
[05:40] <spm> hrm. point
[05:40] <wgrant> Proxy Error
[05:40] <wgrant> BOOM
[05:40] <lifeless> for for OverlappingActionError
[05:40] <wgrant> OverlappingActionError: (<lp.services.timeline.timedaction.TimedAction object at 0xe91ebac>, <lp.services.timeline.timedaction.TimedAction object at 0xe71ebec>)
[05:40] <lifeless> wgrant: you reproduced ?
[05:40] <wgrant> You win.
[05:41] <lifeless> wgrant: \o/
[05:41] <wgrant> Question is... which are they...
[05:41] <spm> 1st problem. nohup doesn't log times.
[05:41] <lifeless> apply my patch
[05:41] <wgrant> Ah!
[05:41] <lifeless> spm: never mind, my WAG was spot on
[05:41] <spm>     raise OverlappingActionError(self.actions[-1], result)
[05:42] <spm> OverlappingActionError: (<lp.services.timeline.timedaction.TimedAction object at 0x2b034a6f7990>, <lp.services.timeline.timedaction.TimedAction object at 0x1338fd90>)
[05:42] <spm>     raise OverlappingActionError(self.actions[-1], result)
[05:42] <spm> OverlappingActionError: (<lp.services.timeline.timedaction.TimedAction object at 0x13ff2f90>, <lp.services.timeline.timedaction.TimedAction object at 0x147b1f50>)
[05:42] <spm> lifeless: cool, fwiw tho ^^
[05:42] <lifeless> spm: thanks
[05:42] <wgrant> Uh.
[05:42] <wgrant> OverlappingActionError: (<TimedAction SQL-launchpad-main-master[UPDATE Bug SET heat_]>, <TimedAction SQL-session[ UPDATE ]>)
[05:42] <wgrant> How?
[05:42] <lifeless> grah
[05:42] <wgrant> Oh.
[05:42] <wgrant> Maybe when it times out it doesn't close the action?
[05:42] <lifeless> see the comment in errorlog about this
[05:43] <wgrant> Which?
[05:43] <wgrant> Ah.
[05:43] <wgrant> I see.
[05:43] <lifeless> sorry
[05:43] <lifeless> in logTuple
[05:43] <lifeless> storm tracers are not a stack.
[05:45] <lifeless> our having a timeout tracer and a log tracer doesn't work as well as it should in theory.
[05:45] <lifeless> I think I'm going to create a stack-lock tracer that delegates to two other tracers and combine them.
[05:45] <lifeless> long term.
[05:45] <lifeless> for now, lets get fugly, lets get fugly.
[05:46]  * spm decides now would be a good time to run away for lunch
[05:47] <lifeless> spm: I have a cowboy
[05:47] <lifeless> spm: when you return
[05:47] <lifeless> wgrant: what was the bug for this?
[05:47] <spm> lifeless: cool; I assum by then you'll also have an incident report to go with? ;-)
[05:47] <wgrant> Bug 636801
[05:51] <lifeless> spm: I can make one
[05:52] <lifeless> wgrant: please confirm that http://pastebin.com/iPnkpPpF fixes it
[05:55] <wgrant> lifeless: Great success.
[06:02] <lifeless> spm: we have the thing to cowboy
[06:22] <wgrant> stub: Hi.
[06:22] <stub> yo
[06:22] <wgrant> stub: The multiple OpenID identifiers stuff has had some interesting consequences.
[06:23] <wgrant> stub: In particular the bit where it respects email address linkage more than identifier linkage.
[06:23] <wgrant> Which results in people being logged in as the wrong person, and the real person OOPSing because they no longer have an identifier.
[06:24] <wgrant> I guess the users can fix it by merging the accounts... but I'm not sure that respecting the email address in the first place is a good idea.
[06:24] <lifeless> I don't get not respect.
[06:24] <wgrant> Hm?
[06:24] <lifeless> playing with words
[06:25] <wgrant> Hah.
[06:25] <lifeless> badly
[06:26] <stub> I don't understand what the problem case is. If you are logging into the OP using an email address, you want to login as the Launchpad Person attached to that email address.
[06:28] <stub> I suspect the cases that are broken where broken already, caused when LP accounts which were merged (the main bug this change was supposed to tackle)
[06:28] <lifeless> spm: when you return : the thing to cowboy is https://code.launchpad.net/~lifeless/launchpad/cp/+merge/35364
[06:29] <lifeless> spm: its going through the motions now to get into prod-devel, and I'll request a normal reroll tomorrow or so with it in it, but we should fix it now.
[06:29] <lifeless> wgrant: so, care to work on filebug ?
[06:29] <lifeless> wgrant: -huge- room for improvement.
[06:32] <lifeless> spm: and for edge, we need https://code.launchpad.net/~lifeless/launchpad/oops/+merge/35363
[06:32] <lifeless> again, its in the pipe to be done the normal way
[06:32] <wgrant> stub: In the cases I know of, the user had changed their LP email address to blah+launchpad@some.domain
[06:33] <wgrant> stub: A package or translations import then recreated blah@some.domain
[06:33] <wgrant> So the next time they log in, they land in a different account.
[06:33] <stub> I see.
[06:34] <wgrant> What is the purpose of the email address match?
[06:34] <stub> Because people can change their email details in the OpenID Provider.
[06:35] <stub> Edit your emails, create a new account with the old email, be unable to log into Launchpad.
[06:36] <wgrant> Huh?
[06:36] <wgrant> If the LP person was tied to an identifier, then email addresses don't matter.
[06:36] <wgrant> It could be a little confusing in some cases, until the OpenID associations are listed clearly... but it wouldn't do strange things like this.
[06:38] <lifeless> -> shops. If issues, ring me
[06:38] <lifeless> oh, this is needed primarily on appservers, so just them for now.
[06:38] <stub> Create foo@example.com in the OP. Log into Launchpad. Edit the account to be bar@example.com. Create a new account for foo@example.com. Now if you log in as foo@example.com, you can't log into Launchpad as your email address in Launchpad is associated with a different identifier.
[06:39] <stub> Although the way we really triggered this was account merging.
[06:39] <wgrant> lifeless: ECHAN?
[06:39] <wgrant> stub: Ah, I see.
[06:39] <wgrant> Hmm.
[06:39] <stub> People had multiple accounts in the OP, and a Launchpad person with multiple email addresses. They had to log into the OP using the email address that happens to be linked to the correct Person.
[06:41] <wgrant> I wonder what should be done here.
[06:42] <wgrant> I cannot see a good solution.
[06:42] <stub> Because we can now link multiple identifiers to a Person, and because person merge does the right thing now, we might be able to drop some of the repair work login does now if the solution is causing worse problems.
[06:43] <wgrant> Does the use case you provided above have any legitimate reason for occuring?
[06:43] <stub> If it does, it is pretty obscure.
[06:43] <wgrant> Yes...
[06:43] <wgrant> So I wonder if the repair is useful.
[06:43] <wgrant> Or if it should just tell you that you are doing bad things.
[06:46] <stub> We are in a half way stage to becoming a real OpenID consumer. I think the problems go away if we stop trusting the Canonical SSO and instead implement the work flow for attaching OpenID identifiers to Launchpad accounts. But there is a fair bit of work that needs doing first (shipit and our test infrastructure makes this more complicated)
[06:46] <wgrant> Right, this is what I was thinking.
[06:46] <wgrant> Except for the test infrastructure.
[06:46] <wgrant> What's the issue with that?
[06:46] <wgrant> Does it do something stupid like using basic auth?
[06:47] <stub> The OpenID our tests use is the old Launchpad OP code. It uses the same underlying database tables, so it is all tied up in knots.
[06:47] <wgrant> Ah, right.
[06:57] <wgrant> stub: So, what should be done? Advise the users to merge?
[06:58] <stub> At the moment, yes.
[06:59] <wgrant> Thanks.
[07:00] <stub> That might be the preferred solution too, as we ensure the SSO database and Launchpad database remain in sync. I'm not sure.
[07:00]  * mwhudson eods
[07:00] <wgrant> The separation needs to eventually be far more obvious.
[07:15] <lifeless> wgrant: not echan, info for spm
[07:15] <spm> hm?
[07:15] <lifeless> spm: the cowboy
[07:15] <spm> ah
[07:15] <lifeless> spm: see all the backlog
[07:16] <lifeless> spm: we're missing out on many OOPS at the moment, its rather important
[07:16] <spm> yarp; just getting it all together atm
[07:16] <spm> hrm. complex patch that one
[07:17] <lifeless> hah
[07:50] <stub> lifeless: by 'fail closed' do you mean if we can't load or parse the config file we should default to enabled?
[07:50] <stub> I can argue that either way. losas might have an opinion.
[07:51] <lifeless> I mean we should not run unless permitted too
[07:51] <spm> losas have lots of opinions, some of them are even relevant
[07:51] <lifeless> if theres something wrong, running is likely to add to the problem
[07:51] <lifeless> (as a default, for most cronscripts)
[07:54] <stub> ok. My reasoning for the current behaviour is a stuffup in the config mechanism (Apache dead, syntax error) shouldn't bring the Launchpad systems down. And your reasoning is just as valid :-) It does get noisy if things fail, but that is about it.
[07:55] <lifeless> its just cronscripts now isn't it ?
[07:55] <stub> I could make it a config option and make it somebody elses problem ;)
[07:55] <stub> Yes - this is just cronscripts.
[07:56] <lifeless> so if all the cronscripts are down, when apache is down, I don't think its raeally a problem :)
[07:56] <lifeless> I mean, at that point, apache is down :)
[07:56] <stub> I think a typo or mistake is more likely - this config file is being edited by humans.
[07:57] <lifeless> so there are two places that can occur
[07:57] <lifeless> the lazr config providing the url
[07:58] <lifeless> and the referenced ini file
[07:58] <lifeless> for the former, it should change nearly never
[07:59] <lifeless> for the latter, we could provide a small lint-and-update tool
[08:00] <lifeless> spm: what do you think
[08:00] <spm> I don't think. sysadmin.
[08:00] <lifeless> and if someone paid you to ? :P
[08:01] <spm> with chocolate? I'd pretend to think REAL hard.
[08:01] <lifeless> hhha
[08:02] <spm> not sure I fully follow the issue? Q&D summary? something about not running cronscripts if parts of LP are borked?
[08:02] <lifeless> ok
[08:02] <lifeless> so there is a new ini file coming in
[08:02] <lifeless> it will disable all cronscripts in one hit, no need to touch cron
[08:03] <lifeless> if there is an error obtaining it (over http) and parsing it, what should happen: should the scripts run, or not run
[08:04] <spm> where "one hit" is ~ 26 easily accessible servers and 4 difficult ones?
[08:04] <lifeless> yes
[08:04] <spm> cool, just ensuring we appreciate what "one hit" means :-)
[08:04] <lifeless> as long as they can access it over http inside the network.
[08:04] <spm> Oh! I see. Right! thats.... funky.
[08:05] <lifeless> hyou odn't like?
[08:05] <spm> No I do like
[08:05] <spm> the scripts should do whatever the last invocation was. which sucks, because now you're maintaining state as well.
[08:06] <spm> ie. network hiccups *will* occour. we don't want to clobber LP by such a transient
[08:07] <lifeless> spm: so tcp syn will retry three times anyway
[08:07] <spm> as a thought: you'd have a 2 checks. the official "check http"; with a secondary, check local/state. We can script update the state if necessary - eg apache update
[08:07] <spm> even so. soyuz used to barf badly all the time on funky network woes.
[08:08] <spm> I'm think more resiliant than what just tcp et al give.
[08:08] <spm> does that make sense? crackful?
[08:08] <lifeless> thinking
[08:09] <spm> lifeless: that should be done on PROD btw
[08:09] <lifeless> so, for daily and hourly crons
[08:09] <lifeless> wgrant: please break it
[08:09] <lifeless> wgrant: prod
[08:09] <wgrant> lifeless: OK.
[08:09] <spm> not edge, haven't done edge yet
[08:10] <lifeless> spm: for daily and hourly cronscripts, not running is fairly significant
[08:10] <lifeless> spm: OTOH daily and hourly things are background tasks mainly, and oops reporting etc is separate and not driven by this
[08:10] <wgrant> lifeless: Success.
[08:10] <wgrant> OOPS-1718H415
[08:10] <lifeless> spm: \o/
[08:11] <spm> right, which is why I'd want them to fail as safely as possible - via a local state "what did I do last time?" <== but state is also likely to shared (maybe??) so likely updated more frequently.
[08:11] <spm> \o/
[08:11] <lifeless> so, personal opinion, if things are fucked royally I'd rather have the cronscripts not running to facilitate recovery.
[08:11] <spm> I'd probably suggest shying away from per-job state; go for a global
[08:12] <lifeless> as long as when they don't run they log it, if we find tha network transients are an issue, we can iterate.
[08:12] <wgrant> As long as it keeps attempting to retrieve it frequently...
[08:12] <spm> and if things are that bad; humans are involved. and we can set the state file; or 'script roll out' an updated state file
[08:12] <stub> I understand the 'what I did last time' argument, but the extra complexity makes diagnosis complex. I'd say keep it as simple as we can.
[08:12] <lifeless> stub: I agree.
[08:12] <spm> if that case, fail quiet; don't run.
[08:12] <stub> But if that means maintaining state, we can do that (the cronscript can remember its last invokation on the file system, in /var/run or somewhere.
[08:13] <lifeless> spm: I'd fail closed, don't run, and log the failure.
[08:13] <lifeless> spm: why do you say fail quiet ?
[08:14] <stub> spm: At the moment, if the config file cannot be found (404) we emit a DEBUG and enable. Any other errors, including syntax errors, we emit a ERROR traceback and enable.
[08:14] <spm> probably via some mechanisim that makes it easy to nagios alert
[08:14] <spm> lifeless: don't cronspam bombard == quiet
[08:14] <lifeless> spm: thats in tension with 'diagnosable'
[08:14] <lifeless> spm: logging to a file would be ok?
[08:15] <lifeless> spm: also remember that this is on 404s and syntax errors
[08:15] <lifeless> spm: so things are messed up if its happening at all
[08:15] <spm> lifeless: you're talking > 260 cron tasks. if we get a global fail, that's a LOT of spam to wade thru
[08:16] <spm> logging to disk is ok
[08:16] <lifeless> so, if we said:
[08:16] <lifeless>  - on failure to get ini and parse, don't run.
[08:16] <lifeless>  - log that to disk, not stderr
[08:16] <lifeless>  - nagios should be monitoring those log files
[08:16] <lifeless> would that make sense to you?
[08:16] <spm> yup
[08:17] <lifeless> stub: to you ?
[08:17] <spm> * log that to disk, not stderr: like oops' perhaps in a vague handwavy way. known dir; date time stamped; we can nagios alert on files between 0-60 mins old type of thing
[08:18] <spm> setup a red button "archive and zot the cron logs" so the "all fixed" is not as painful
[08:18] <stub> Seems a little fragile relying on nagios like this. It is looking for an error rather than checking something is reacting correctly.
[08:18] <spm> stub: how so?
[08:19] <stub> We are relying on the cronscript to log things correctly and ...
[08:19] <stub> oh.. hang on.
[08:19] <spm> being ware of assumptions: I'd assume the apache/configs setup is monitored
[08:19] <stub> We already alert if scripts stop running.
[08:19] <spm> only via scriptactivity, but yes.
[08:20] <spm> it's arguable if that should be nagios'd. atm I'd be vehmently against it.
[08:20] <stub> So we could just disable silently (DEBUG or INFO - whatever) and rely on the existing checks to beep if things remain screwed up for too long.
[08:20] <lifeless> works for me
[08:20] <spm> that works
[08:20] <lifeless> as long as someone coming along to look can look at a file on disk to see whats up.
[08:20] <spm> I'm only aware of one script that really should have a nagios check against it - branch-merge-proposals
[08:21] <lifeless> wgrant:
[08:21] <lifeless> SQL time: 10701 ms
[08:21] <lifeless> Non-sql time: 4505 ms
[08:21] <lifeless> Total time: 15206 ms
[08:21] <lifeless> Statement Count: 536
[08:21] <spm> it's timely, and is also requiring human intervention on fail
[08:22] <stub> I think the 'too much spam to wade though' indicates too many scripts are emitting their logs via email. Perhaps they should log to file instead of stdout/email and losas look on disk when the alerts ping.
[08:23] <lifeless> stub: mthaddon wants that
[08:23] <lifeless> stub: for all basically
[08:23]  * StevenK kicks webservice.get()
[08:23] <spm> I think pretty much every LP scripts logs to STDERR by default, which is known as Doing It Wrong
[08:23] <StevenK> First call works, second call returns AttributeError: 'thread._local' object has no attribute 'interaction' :-(
[08:23] <stub> spm: There is --logfile available right now to log elsewhere.
[08:23] <lifeless> StevenK: login()
[08:24] <spm> stub: I'm sure we've used that elsewhere and it doesn't quite work as described...
[08:24] <StevenK> lifeless: But why does the first call to webservice.get() work fine?
[08:24] <lifeless> you're already logged in
[08:24] <StevenK> And how does that log me out?
[08:24] <lifeless> end of the request
[08:24] <lifeless> feel free to clean this up
[08:24] <stub> spm: That would be a bug (not surprising as nobody ever used --logfile after it got implemented 5 years ago)
[08:25] <spm> :-)
[08:25] <stub>   --log-file=LOG_FILE  Send log to the given file, rather than stderr.
[08:26] <spm> Ahh. That's not helpful as is. what you want is all "normal output" to go to stdout, or the above option. any *real* errors that *REQUIRE* manual intervention get sent to STDERR.
[08:26] <spm> atm we get craploads of "INFO" messages or "CRITCIAL ERRORS" that are nothing of the sort, sent to STDERR
[08:26] <lifeless> spm: icing on edge is borked
[08:26] <poolie> stub: yeah we were just talking about this
[08:26] <poolie> lifeless: me too
[08:27] <lifeless> spm: we haven't got that part of the deployment right yet
[08:27] <stub> spm: ok. That is a change, but we could do it globally for all scripts at once. The desired behaviour will need to be documented (bug report?)
[08:27] <spm> lifeless: bleh. edge is autodeploying atm
[08:27] <lifeless> yes
[08:27] <lifeless> spm: need a new RT ?
[08:27] <poolie> i'm also getting a 'The following errors were encountered:
[08:27] <poolie> Server error, please contact an administrator.
[08:27] <poolie> OK
[08:27] <poolie> '
[08:27] <spm> stub: i think this is a bug I logged about 2 years ago... :-)
[08:27] <poolie> in an ajax thing
[08:27] <spm> poolie: I can't do anything until you contact me per the above error. sorry.
[08:27] <stub> spm: Yes - I was expecting production scripts to all be run with -q
[08:27] <lifeless> poolie: thats possibly/probably the thing we're deploying to fix
[08:28] <poolie> :)
[08:28] <poolie> like a finely-oiled machine
[08:28]  * spm watches poolie hop on his bike to drive down here and slap me upside the head....
[08:28] <poolie> i would but it's a bit cold and wet
[08:28] <poolie> and probably doubly so down there
[08:29] <spm> "horrible" <== and not just saying to keep you away
[08:30] <lifeless> spm: so now I have 11542 with no icing
[08:30] <lifeless> spm: can you check the apache ?
[08:30] <spm> yarp
[08:31] <spm> hrm. supposedly we *are* 11542 everywhere
[08:31] <lifeless> yes
[08:31] <lifeless> but the icing ain't
[08:31] <spm> isn't this the build farkup we saw the other week?
[08:31] <lifeless> https://bugs.edge.launchpad.net/+icing/rev11542/combo.css
[08:31] <lifeless> spm: thats meant to be static, from apache
[08:31] <spm> le sigh
[08:31] <StevenK> lifeless: With webservice.get(), login(), webservice.get() I get newInteraction called while another interaction is active. for the second .get
[08:32] <lifeless> StevenK: odd
[08:32] <lifeless> StevenK: perhaps the get() isn't what was throwing.
[08:32] <lifeless> StevenK: perhaps you're actually tring to access something in  between the ( and )
[08:32] <StevenK> lifeless: Of course I am
[08:32] <lifeless> don't
[08:32] <lifeless> calculate the url outside of the function call
[08:33] <lifeless> because ...
[08:33] <lifeless> accessing objects requires a participation
[08:36] <spm> oh awesome. that file doesn't exist on edge.
[08:36] <lifeless> say what ?
[08:36] <stub> lifeless: I'll land the branch I have with the abspath and maybe the timeout if it is simple, as it is still an improvement, and open bugs and kanban tickets on the next set of changes.
[08:36] <lifeless> stub: thanks
[08:36] <spm> lifeless: exactly that. the folder is there, that particular file (and possible some small number of others) aren't.
[08:36] <lifeless> spm: they are built by 'make compile'
[08:37] <lifeless> or possibly make build
[08:37] <spm> apparenetly not in this case....
[08:39] <spm> oh ffs. the make build blewup again.
[08:39] <lifeless> spm: does the deploy script abort when that happens ?
[08:39] <wgrant> Not my fault, this time, though!
[08:39] <spm> lifeless: https://pastebin.canonical.com/37132/
[08:40] <spm> the script can't - the make is continuing, so the deploy scripts doesn't know it's aborted
[08:40] <spm> (AIUI, IMBW)
[08:40] <lifeless> spm: filing an RT - it has to.
[08:40] <spm> No. I lie. It does see the error.
[08:40] <lifeless> make: *** [compile] Error 1
[08:40] <lifeless> Error 2 running ssh launchpad@banana make -C /srv/edge.launchpad.net/edge/launchpad build LPCONFIG=edge1
[08:40] <lifeless> Running ssh launchpad@banana "rm -rf /srv/edge.launchpad.net/edge/launchpad && ln -s /srv/edge.launchpad.net/edge/launchpad-rev-11542 /srv/edge.launchpad.net/edge/launchpad"
[08:40] <lifeless> its not halting !
[08:40] <spm> yeah...
[08:41] <spm> yeah x 3
[08:41] <spm> ok. later problem. reverting.
[08:43] <lifeless> spm: why was it 11542 that rolled out ?
[08:43] <spm> no idea atm
[08:43] <lifeless> ok, thats tip of stable
[08:43] <lifeless> fair enough (but wtf with the error)
[08:43] <spm> oki, apaches rolled back; doing the app servers
[08:43] <adeuring> good morning
[08:44] <spm> truely. it's supposed to abort on errors. we use this logic all over the place. And it works on other systems :-(
[08:44] <lifeless> spm: can you do 11538 which I think was previous with the patch applied; and we may need to stop other edge rollouts till we fix.
[08:44] <spm> heya adeuring
[08:44] <spm> lifeless: that I have/am
[08:44] <spm> lifeless: launchpad@banana:/srv/edge.launchpad.net/edge$ rm launchpad ; ln -s /srv/edge.launchpad.net/edge/launchpad-rev-11538 launchpad
[08:46] <lifeless> spm: I bet its python2.5
[08:46] <spm> lifeless: shrug, I just blame wgrant for everything. faster, easier, if less accurate
[08:46] <lifeless> spm: can you do this on a machine thats still 2.5 ?
[08:46] <wgrant> But it *was* me (and python2.5) last time this happened.
[08:46] <wgrant> So it's quite accurate.
[08:47] <spm> haha. lets not let FACTS get in the way here!!!!
[08:47] <lifeless> spm: find . -name 'potemplate.py'
[08:47] <spm> let me finish getting the apps restarted :-)
[08:47] <lifeless> spm: then, for each reported file, cd to that dir and run 'python -c 'import potemplate'
[08:47] <spm> edge3 done
[08:48] <spm> 2010-09-14 07:48:01 WARNING SIGTERM failed to kill launchpad (7487). Trying SIGKILL <== yay. it's back! wooo!
[08:49] <spm> edge4 coming back
[08:50] <spm> edge1 coming back
[08:51] <stub> Is network syslog loathed by IS?
[08:51] <spm> edge2 on the way back
[08:51] <spm> stub: i don't mind it; no idea about others tho
[08:52] <spm> edge1 & 4 being difficult and not working
[08:53] <spm> edge2 is fine
[08:54] <spm> edge1 being really painful and needing to be manually killed.
[08:54] <spm> edge1 stabbing successful; it lives
[08:54] <spm> retrying edge4...
[08:56] <spm> edge5 coming back
[08:56] <spm> edge4 lives
[08:56] <spm> edge5 lives; should be done. verifying.
[08:58] <spm> lifeless: have you logged a bug on this essplosion?
[08:58] <lifeless> spm: urls like this: https://bugs.edge.launchpad.net/+icing/rev11542/combo.css - how are they served.
[08:58] <lifeless> spm: I RT'd it
[08:59] <lifeless> spm: for the explosion part
[08:59] <spm> that's the continue, but not the root cause?
[08:59] <lifeless> spm: waiting on your python2.5 test for confirmation
[08:59] <spm> ah k
[09:01] <spm> lifeless: so to recap a bit back - CP'd to prod; not to edge.
[09:02] <spm> *cowboyed* not CP'd.
[09:02] <lifeless> spm: right, can we do edge 11538 cowboy, not 11542 cowboy ?
[09:02] <spm> I'll throw that to Tom I suspect
[09:02] <lifeless> ok
[09:03] <lifeless> rev 11542 looks like the bust one
[09:04] <lifeless> which is jtv's patch
[09:04] <jtv> ?
[09:05] <lifeless> jtv: you appear to have used python 2.6 in lib/lp/translations/browser/potemplate.py line 971
[09:05]  * jtv looks
[09:05] <lifeless> jtv: look at spm's pastebin
[09:06] <jtv> Wonder what's wrong with it…
[09:06] <jtv> ah
[09:08] <jtv> lifeless: want me to write up a quick patch?
[09:10] <mrevell> Hello
[09:10] <jtv> hi mrevell
[09:10] <jtv> lifeless, spm: I'd fix it thusly: http://paste.ubuntu.com/493508/
[09:12] <lifeless> spm: ping
[09:13] <spm> lifeless: yo
[09:13] <lifeless> spm: need you to try applying jtv's patch to a 11542 dir (one of the failed ones)
[09:13] <lifeless> and see if 'make build' will then work.
[09:13] <lifeless> jtv: could you please do a few things for me, its getting on here.
[09:13] <lifeless>  - file a bug that this is broken,
[09:13] <jtv> lifeless: speak
[09:13]  * jtv files bug
[09:13] <lifeless>  - put your branch up for review etc etc - r=me to apply it, if spm confirms it works.
[09:14] <lifeless>  - arraange for someone to let the LOSAs know when it lands in stable, so that edge updates can be reenabled.
[09:14] <lifeless>  - (e.g. yourself, or your delegate)
[09:14] <jtv> On the way.
[09:14] <lifeless> I will mail the list about the process issue
[09:15] <spm> trying atm...
[09:15] <spm> try2 wit hright config...
[09:16] <jtv> bug 637868
[09:17] <lifeless> jtv: did you ec2 your test ?
[09:17] <spm> lifeless: I'd suggest that cowboy get's rolled in with jtv's fix and just a regular edge rollout rolled.
[09:17] <jtv> lifeless: not yet, not yet
[09:17] <lifeless> jtv: sorry, let me be more clear
[09:17] <lifeless> jtv: the patch that broke; did you land it:
[09:18] <lifeless>   - by running all tests locally + pqm
[09:18] <jtv> ec2 land.
[09:18] <lifeless>  - by ec2 land
[09:18] <lifeless>  - ...
[09:19] <spm> jtv: lifeless: https://pastebin.canonical.com/37134/ looks good
[09:25] <jtv> spm: still looking good?
[09:26] <jtv> The codebase, I mean, not you.  You'll always look good.
[09:27] <lifeless> spm: that looks good; thanks. jtv your patch fixes it.
[09:27] <jtv> BTW it's odd that this passed PQM, what with the pagetests exercising it.
[09:27] <lifeless> jtv: pqm doesn't run tests.
[09:27] <jtv> Sorry, buildbot.
[09:27] <lifeless> jtv: ec2 runs them, but its probaby running lucid
[09:27] <wgrant> buildbot is Lucid.
[09:27] <lifeless> buildbot has two separate jobs.
[09:27] <jtv> Well, we have lucid and hardy buildbot slaves.
[09:28] <lifeless> jtv: see my mail
[09:28]  * jtv will see mail
[09:28] <lifeless> I don't think bb requires *both* to be ok.
[09:28] <lifeless> but thats what we probably need
[09:29] <jtv> BTW should I MP the fix for stable or for devel?
[09:29] <lifeless> devel
[09:30] <jtv> OK.  I branched off stable though just to be sure.
[09:30] <lifeless> now that edge rollouts are blocked, theres no panic (no reason to delay) but no panic.
[09:31] <jtv> lifeless: the MP is at https://code.launchpad.net/~jtv/launchpad/bug-637868/+merge/35373
[09:31] <wgrant> lifeless: ec2 has been running Lucid for a long time.
[09:32] <wgrant> I think this is about the third time things have broken.
[09:32] <lifeless> wgrant: definitely second.
[09:33] <lifeless> wgrant: bug 637854
[09:34] <wgrant> _mup_, I am disappoint.
[09:34] <wgrant> lifeless: At least it tries to prejoin.
[09:34] <lifeless> wgrant: ugh!
[09:35] <jtv> lifeless: so now I can land on devel as normal and just wait for the fix to percolate?
[09:35] <lifeless> yes
[09:36] <jtv> (I would appreciate a click on the button from you btw, to prove I didn't invent your approval :)
[09:36] <lifeless> I did
[09:36] <jtv> Oh!  The MP just timed out for me is all
[09:36] <jtv> Thanks.
[09:36] <lifeless> you need to let mthaddon know when its good to go on stable
[09:36] <lifeless> jtv: interesting, what OOPS id ?
[09:36] <jtv> I already ran the applicable pagetests through ec2… guess a full EC2 run makes no sense here.
[09:37] <jtv> lifeless: I don't know; focused on fixing my bug, so just reloaded
[09:52] <bigjools> http://www.workswithu.com/2010/09/07/measuring-the-value-of-canonicals-launchpad/
[09:53] <bigjools> there's a certain person posting comments on that one
[09:53] <wgrant> Let me guess...
[09:53] <wgrant> remarkable!
[09:55] <spiv> wgrant: you're psychic, clearly.
[09:59] <wgrant> He does have some good points, as usual.
[10:04] <lifeless> and they are clearly unbiased
[10:04] <lifeless> which is refreshing
[10:06] <bigjools> shame it's the same sound of that grinding axe
[10:10] <jtv> Speaking of grinding axes…
[10:11] <jtv> The builds-list.pt template is supposed to work for any BuildFarmJob but it tries to access build/dependencies.  :(
[10:12] <jml> "No longer needed: Python 2.5"
[10:12] <wgrant> jtv: I'm glad you're completing the generalisation for us :P
[10:12] <jtv> wgrant: remember that axe I mentioned  just now?  Kindly insert it into one of your feet.
[10:13] <wgrant> Ow.
[10:14] <jtv> Good.
[10:15] <jtv> And thank you.
[10:15]  * wgrant limps away viciously.
[10:17] <bigjools> jtv: then return None
[10:17] <bigjools> you don't have any dependencies
[10:17] <wgrant> It's not on the interface.
[10:17] <wgrant> Assuming it is illegal.
[10:17] <jtv> No, that's the nasty bit.
[10:17] <jtv> It's in IPackageBuild.
[10:17] <jtv> So _implementing_ it in BuildFarmJob or BuildFarmJobDerived or my own specific buildfarmjob class isn't enough.
[10:18] <jtv> It needs to move into IBuildFarmJob, which is uglier than a Windows desktop.
[10:18] <bigjools> at least Windows has sound that works
[10:18] <jtv> To name some arbitrary example of extreme ugliness.
[10:19] <jtv> Yes, Windows often has working sound, so you can _hear_ the "I don't know this codec" error instead of just seeing it.
[10:19] <jtv> But we digress.
[10:20] <jtv> Here we are chattering about operating systems when wgrant's foot is oozing virtual blood.
[10:20] <jtv> Help the man, for God's sake!
[10:21] <wgrant> bigjools: Your sound isn't working?
[10:21] <wgrant> I think mine might be slightly crackly on Maverick.
[10:21] <wgrant> But it works mostly.
[10:21] <bigjools> maverick re-installed pulseaudio
[10:21] <jpds> jtv: He's in AU, dude.
[10:22] <bigjools> pulseaudio is a crock of shit
[10:22] <jtv> jpds: pronounced "Ow!!!"
[10:22] <wgrant> bigjools: WFM!
[10:22] <wgrant> Although I could be distracted by my poor, poor foot.
[10:23] <bigjools> it's insisting on using my laptop instead of my headset's mic
[10:23] <jtv> wgrant: nice save
[10:23] <bigjools> I've no idea how to make it do what *I* want instead of what *it* wants
[10:23] <jtv> Meanwhile, I just managed to insinuate a TranslationTemplatesBuild into a builder history!
[10:23] <wgrant> bigjools: Kubuntu doesn't have a nice control panel for it?
[10:24] <wgrant> jtv: But does it crash?
[10:24] <jtv> wgrant: no.  Not that anyone'd notice: I don't see anything in the build code that would put a BuildFarmJob into the right state to show up there.
[10:24] <wgrant> Hm?
[10:24] <bigjools> wgrant: I dunno, what is it in Gnome?
[10:25] <jtv> bigjools: a crock of shit.  Trick question.
[10:25] <wgrant> bigjools: The sound preferences thing (in the volume indicator's menu) has radio buttons for the input device.
[10:25]  * bigjools hears 2 drums and a cymbal falling off a cliff
[10:25] <allenap> Hi jml, thanks for looking into my Zope befuddlement. I think mwh's reply has now hit the nail on the head, so I'll wikify that and reply to the list.
[10:26] <jtv> AFAICS the Builder still selects and dispatches a BuildQueue object, not a BuildFarmJob.  How's it ever going to update BuildFarmJob.{status,date_started,date_finished}?
[10:27] <bigjools> finally it works
[10:27] <jml> allenap, cool.
[10:28] <wgrant> jtv: It's complicated.™
[10:28] <wgrant> But it works.
[10:28] <jtv> wgrant: well since there is absolutely currently nothing coupling my BuildFarmJobs to my BuildQueues, I don't see how it can.
[10:30] <bigjools> so, when using pulse with skype, how do I make it ring the PC speakers and not in the headphones, which I might not be wearing? :/
[10:30] <wgrant> jtv: I believe it's handled by the IBuildFarmJobBehavior.
[10:30] <wgrant> jtv: IIRC you override updateBuild_WAITING.
[10:31] <wgrant> The default calls handleStatus on the build.
[10:31] <jtv> I think we already override that.
[10:32] <wgrant> I mean you do currently.
[10:32] <wgrant> But you should probably stop.
[10:32] <jtv> Ah
[10:32] <jtv> That could hurt.
[10:33] <wgrant> It will.
[10:33] <jtv> Hand me that axe, will you?
[10:33] <jtv> Thank—eewwww, there's blood all over the blade
[10:34] <jtv> Anyway, for now <puts axe aside> I guess it's enough to get display working and then next we can focus on tying the BuildQueue and the TranslationTemplatesBuild together.
[10:46] <bigjools> wgrant: the discussion in https://bugs.launchpad.net/bugs/635103 is a little over my head at the moment, do you know why it's not working for him yet fine in Ubuntu?
[10:47] <wgrant> bigjools: He wants to not have to download and upload the whole thing.
[10:47] <wgrant> To do that we'd need an ia32-libs-specific hack, to support the conglomeration of horrid hacks that is ia32-libs.
[10:48] <wgrant> I was going to tell him to go away. But you're probably a better person to do it :P
[10:49] <bigjools> wgrant: possibly, but I don't understand what that package is doing
[10:49] <wgrant> bigjools: Do you *really* want to know?
[10:49] <bigjools> yep
[10:49] <wgrant> Well.
[10:49] <wgrant> There is a reason that the source packages is 700MB.
[10:50] <wgrant> It contains approximately an awful lot of packed source packages.
[10:50] <wgrant> It builds on amd64, but builds them for i386.
[10:51] <wgrant> Er, wait, no, it includes the binaries too.
[10:51] <wgrant> So it doesn't build them.
[10:51] <bigjools> Oo
[10:51] <wgrant> It extracts the i386 binaries, and produces a big amd64 ia32-libs binary containing all of them.
[10:51] <lifeless> fooooooogly
[10:51] <wgrant> So you have this huge source package contain dozens or hundreds of sources and binaries from the archive.
[10:51] <wgrant> Er, yes.
[10:51] <bigjools> dot com
[10:52] <wgrant> It will be rendered obsolete by multiarch.
[10:52] <wgrant> But multiarch hasn't happened yet.
[10:52] <lifeless> multiarch was 'coming' when we -started-
[10:52] <bigjools> dem's de magic words
[10:52] <wgrant> lifeless: it's seriously in development now, though.
[10:52] <wgrant> Some of the work has been done in the last year.
[10:52] <wgrant> Hell, even NMSP is happening.
[10:52] <lifeless> wgrant: I know.
[10:52] <wgrant> And derivative distros.
[10:53] <wgrant> This is incredible.
[10:53] <bigjools> I can only lever so much of the planet with the team size I have
[11:02] <jml> bigjools, well, to be true to the metaphor, you just need a better place to stand
[11:03] <bigjools> jml: I'll jump higher :)
[11:06] <bigjools> jml: when are you arriving here BTW?
[11:06] <jml> bigjools, Sunday. Let me check my ticket.
[11:07] <bigjools> you bought train ticket in advance?  gosh :)
[11:07] <bigjools> insert preposition as required. Sigh.
[11:08] <jml> I think you mean "article", and yes I did.
[11:08] <jml> it's hard to get out of the habit of booking travel in advance
[11:09] <wgrant> Is this for the buildd-manager attack session?
[11:09] <jml> indeed it is.
[11:09] <wgrant> Excellent.
[11:11] <jml> bigjools, anyway, I'll be taking an afternoon train, probably the 1442
[11:13] <bigjools> jml: ok well when you get ensconced in the pub, gimme a shout and I'll pop over for a pint
[11:14] <jml> bigjools, will do.
[11:14] <wgrant> bigjools: I've received lots of complaints in the last few days that builds keep getting redispatched.
[11:14] <wgrant> Even on non-virt builders.
[11:14] <bigjools> jml: there should be taxis at the station but if there are not let me know and I'll come and pick you up
[11:15] <jml> bigjools, thanks.
[11:15] <bigjools> wgrant: the UI is a lie
[11:15] <bigjools> the early commit, is not :/
[11:15] <wgrant> Hm?
[11:15] <lifeless> jml: can you do me a favour?
[11:16] <wgrant> Even if it is committing before it confirms successful dispatch, why is the dispatch not successful?
[11:16] <jml> lifeless, quite possibly.
[11:16] <bigjools> what's happening is that we mark the build as running before it's completely dispatched.  If there's a comms error then it looks like it gets re-dispatched after the next builder picks it up
[11:16] <lifeless> jml: my pqm-landed (nonec2) branch has a test failure
[11:16] <lifeless> https://lpbuildbot.canonical.com/builders/lucid_lp/builds/139/steps/shell_7/logs/summary
[11:16] <lifeless> jml: its -extremely- shallow.
[11:16] <lifeless> jml: (add the missing tuple)
[11:16] <wgrant> bigjools: But there shouldn't comms errors :(
[11:17] <wgrant> +be
[11:17] <lifeless> jml: however the fix needs to be done to production-devel too.
[11:17] <bigjools> wgrant: we don't live in a perfect world
[11:17] <lifeless> jml: before the oops fix can be uncowboyed.
[11:17] <bigjools> routers drop out, DC engineers kick cables
[11:17] <bigjools> etc
[11:17] <lifeless> jml: yes/no ?
[11:17] <wgrant> bigjools: Over 20 minutes?
[11:17] <jml> lifeless, you'd like me to land the fix for you?
[11:17]  * gmb hates at typos in sampledata
[11:18] <gmb> 'testible' indeed.
[11:18] <lifeless> jml: yes, on devel and production-devel
[11:18] <jml> gmb, it's a pun!
[11:18] <bigjools> wgrant: yes, that's the interval I see because of the bad scaling
[11:18] <lifeless> jml: its 22:20 here, more or less
[11:18] <gmb> jml, I noticed that about half a second after pressing enter :)
[11:18] <wgrant> bigjools: Ahh, true, forgot that bit.
[11:18] <jml> lifeless, ok, will do.
[11:18] <lifeless> jml: thank you!
[11:37] <jtv> Are we going into testfix?
[11:37] <jtv> The lucid_lp buildbot just failed.
[11:37] <jtv> Is failing, rather.
[11:37] <jtv>    lib/canonical/launchpad/webapp/ftests/test_adapter.txt
[11:38] <jtv> Line 305, in test_adapter.txt Failed example:
[11:38] <jtv>      get_request_statements()
[11:38] <jtv> Differences (ndiff with -expected +actual):
[11:38] <jtv>      - []     + [(0, 0, 'SQL-launchpad-main-master', 'SELECT 2')]
[11:40] <wgrant> Is that what lifeless was talking about above?
[11:44] <jml> wgrant, yes.
[12:01] <jml> I wonder why emacs is segfaulting for me.
[12:02] <thumper> jml: because it hates you :)
[12:04] <bigjools> it's telling you to use a real editor
[12:04] <deryck> Morning, all
[12:04] <bigjools> howdy deryck
[12:04] <jml> bigjools, yeah, you're right. I've had to revert to "emacs -nw"
[12:05] <jml> deryck, hello
[12:05] <bigjools> :)
[12:05] <thumper> jml: ?? whazzat?
[12:05] <jml> thumper, try it!
[12:05] <thumper> not right now
[12:05] <jml> thumper, ok, I give up, it's emacs in a terminal
[12:05] <thumper> I'm trying to right a talk
[12:05] <thumper> ah.. no windows, it's so obvious
[12:05] <bigjools> that's either a typo or a clever play on words
[12:06] <thumper> bigjools: which one?
[12:06] <jml> thumper, "righting" a talk.
[12:06] <bigjools> "right a talk"
[12:06]  * thumper hangs head
[12:06] <bigjools> lol
[12:06] <thumper> it is a typo
[12:06] <thumper> I'd like to be more cleverer
[13:25] <jml> I'm off for lunch & errands. Back later.
[14:31] <cr3> leonardr: when you have a moment, I would have a question for you about routes when exposing a restful interface with lazr
[14:34] <leonardr> cr3, sure
[14:34] <leonardr> routes as in the url traversal code?
[14:35] <cr3> leonardr: yes, how can a collection be contextual? for example, lets say LP had /me/bugs and /project/foo/bugs, where both person and project would implement IHasBugs, how should the "bugs" part of the url be defined?
[14:36] <leonardr> cr3: that's called a scoped collection, and lazr.restful traverses from 'leonardr' to bugs or from 'mozilla' to bugs by attribute access on the person or project object
[14:36] <leonardr> so leonardr.bugs is /~leonardr/bugs
[14:37] <cr3> leonardr: ah, so it must be defined as an attribute, I thought it might be ProjectNavigation in the browser layer or perhaps even using the Bag
[14:37] <leonardr> cr3: no, once you have identified a specific object all further traversal happens through attribute access
[14:38] <cr3> leonardr: would it make sense to have the IHasBugs define a "bugs" attribute?
[14:39] <leonardr> cr3: afaik, yes
[14:40] <cr3> leonardr: but if IHasBugs has a searchBugs method already which should essentially behave like the bugs attribute, given no parameters, then wouldn't searchBugs and bugs look a lot the same?
[14:42] <cr3> leonardr: my concern is that every class implementing IHasBugs would essentially have to do something like: @property; def bugs(self): return self.searchParams();
[14:43] <leonardr> cr3: well, you don't _have_ to put 'bugs' in IHasBugs if different implementations get the bugs differently
[14:44] <leonardr> but, i have two things to say on top of that
[14:44] <leonardr> oh, neverm ind, you're saying that all the IHasBugs feature 'bugs'
[14:45] <leonardr> be that as it may, /bugs is better for the end-user than ?ws.op=searchBugs
[14:45] <leonardr> however, there's nothing to be done about that for now
[14:46] <leonardr> my next project will include things like
[14:46] <cr3> leonardr: I was mostly using IHasBugs as an example for a collection which might be implemented by more than one context
[14:47] <leonardr> cr3: sure, i know you're not really talking about IHasBugs. but i'm trying to deal with the situation as you posed it
[14:47] <leonardr> my next project will include features like the ability to designate a method as being "the method you call to generate a scoped collection"
[14:47] <leonardr> so you could tag searchBugs as the generator for /bugs
[14:48] <cr3> leonardr: I was grepping through lazr for the concept of an alias, like "bugs" is aliased to searchBugs or somesuch
[15:01] <henninge> sinzui: ping
[15:02] <sinzui> hi henninge
[15:02] <henninge> Hi!
[15:02] <henninge> I am a bit at loss at how to downgrade a package.
[15:02] <henninge> psycopg2 in this case
[15:02] <sinzui> henninge, do you have the deb?
[15:03] <henninge> sinzui: I have done that once or twice before but I could use a pointer, please ;-) ?
[15:03] <henninge> No, I was just searching for that.
[15:03]  * sinzui checks lp history
[15:04] <sinzui> henninge, download the deb from here: https://edge.launchpad.net/ubuntu/lucid/i386/python-psycopg2/2.0.13-2ubuntu2
[15:05] <sinzui> henninge, sudo dpkg -i --force-downgrade python-psycopg2_2.0.13-2ubuntu2_i386.deb
[15:06] <henninge> sinzui: thank you very much!
[15:06]  * henninge actually forgot to look on LP for the package ...
[15:06] <sinzui> henninge, i decided not to pin the version. I hold out some small hope that lp or psycho will resolve there differences. I downgrade after every update breaks lp
[15:07] <henninge> sinzui: yes, otherwise one might forget about the pinning and hit strange errors later ...
[15:52] <gmb> rockstar: So, deryck solved our JS wizard problem :)
[15:52] <rockstar> gmb, that's because deryck is awesome.
[15:53] <rockstar> gmb, what was the issue?
[15:53] <gmb> rockstar: Two things: 1) YUI auto-generates the CSS class names based on WIDGET.name - so the hidden class was yui3-lazr-wizard-hidden, which wasn't defined anywhere.
[15:53] <gmb> Also, widgets that aren't created as hidden can never *be* hidden.
[15:54] <gmb> (At least, that's how it behaves; I suspect theres a bug there)
[15:54] <rockstar> gmb, wait, how was it never created?
[15:54] <rockstar> gmb, and it's Widget.NAME, isn't it?
[15:54] <gmb> rockstar, Yeah, sorry, bad caps.
[15:54] <gmb> rockstar: Let me just check the patch deryck gave me so that I know I'm not BSing you.
[15:55] <gmb> rockstar: http://pastebin.ubuntu.com/493666/
[15:55] <rockstar> gmb, okay, so I had it defined as Wizard.NAME = "wizard"; so I don't know where the lazr comes from either.
[15:55] <gmb> rockstar: The wizard was created but by default visible was True.
[15:55] <gmb> At least, that's how it looks based on deryck's patch.
[15:55] <rockstar> gmb, okay.
[15:56] <rockstar> gmb, I'm not sure I understand the bottom patch though, to wizard.js.
[15:56] <gmb> rockstar: Yeah. lp:~gmb/lazr-js/wizard-widget/ contains deryck's fix and some further CSS fixes.
[15:56] <rockstar> gmb, I guess he's just demonstrating that there's missing CSS somewhere?
[15:56] <gmb> I don't know if you need to do more with it.
[15:56] <rockstar> gmb, well, it needs to get finished now.  If it's firing events, it can probably start moving through steps now.
[15:56] <gmb> Well, I don't know. That seems to be related to the way the widget behaves... deryck, can you clarify exactly why your fix fixes?
[15:57] <deryck> rockstar, yeah, that's all.  The CSS in use currently assumes the widget name is "lazr-formoverlay" and it wasn't set to hide by default.
[15:57] <rockstar> deryck, okay, so we can't reuse that name, so we need to just define yui3-lazr-wizard-hidden in the CSS then?
[15:59] <deryck> rockstar, yui3-NAME-hidden, where NAME is what you define in the widget.  This is how all those CSS classes get built.
[15:59] <rockstar> deryck, yeah, so it should have been yui3-wizard-hidden
[15:59] <rockstar> deryck, and I thought I had defined that.
[15:59] <deryck> rockstar, right.  There is a yui3-NAME for every class this widget descends from.  But only the current NAME gets the hidden class added.
[16:00] <rockstar> deryck, yeah, okay.
[16:00] <deryck> rockstar, you did, but the CSS was not using it.  And you couldn't tell because you weren't hidding by default with visible: false.  So changes to the name had no affect.
[16:00] <rockstar> deryck, ah, that makes sense.
[16:00] <rockstar> So if it's not hidden by default, it can't be hidden again.
[16:01] <rockstar> That is, quite possibly, one of the silliest things I've ever heard.
[16:01] <gmb> rockstar: Yeah, I needed to mop the tea off my monitor when deryck told me.
[16:01] <rockstar> gmb, I'm digging in my junk drawer for a baby to punch as we speak.
[16:02] <gmb> ...
[16:02] <deryck> well, no, not quite accurate
[16:03] <deryck> rockstar, wizard.render() shows the widget without the hidden class.  Nothing in your code calls wizard.hide().
[16:03] <rockstar> deryck, the defaultCancel should have been doing it.
[16:03] <deryck> rockstar, I don't think so.  That's only called after UI changes, AIUI.
[16:03] <deryck> at least, my reading of the code.
[16:04] <rockstar> deryck, no, it's called when the CANCEL event is fired.  I know it was being called, because that's where the Y.log("aoeu") was.
[16:04] <rockstar> I think you and I both confirmed that the -hidden CSS class was getting set as well.
[16:04] <deryck> right, but that's not called on load.
[16:05] <rockstar> deryck, yeah, so I either have to hide by default and call .show() in the example, or call .hide() and then .show() on load.
[16:05] <deryck> CANCEL event is only fired by ESC or clicking away from the widget, no?  I never saw "aoeu" until I did some action, not on load.
[16:05] <deryck> right
[16:05] <rockstar> deryck, I saw it when I clicked the cancel button.
[16:06] <rockstar> Yeah, okay.  So what you're saying is what I'm understanding.
[16:06] <rockstar> Stupidness prevails.
[16:06] <rockstar> Thanks for sorting it out.
[16:06] <deryck> np :-)
[16:06] <rockstar> We need a page on the wiki of all the YUI gotchas.
[16:06] <deryck> yup
[16:06] <rockstar> We'll probably just forget the page exists and create it 4 more times, but whatever.
[16:07] <salgado> jcsackett, I was going to have a second look at your unknown-blueprints-service-597738 branch but noticed there's some discussion still going on, and I'm wondering whether you're going to do any more changes to it or if it's ready for a second look
[16:19] <jcsackett> salgado: i'm still working on it.
[16:19] <jcsackett> i actually needed to add an attr to the view, so i need to write some tests as well.
[16:21] <jcsackett> salgado: i'll ping you and sinzui when i've pushed changes for round 2.
[16:22] <gmb> rockstar: So - for my own clarity - are you now going to do further work on the wizard to make it do wizardly things properly?
[16:23] <rockstar> gmb, I _can_, but it sounds like it's blocking you, and I'd like to avoid that as much as possible.
[16:24] <gmb> rockstar, Right. That sounds fair. In that case, I'll get cracking on getting it doing what what we need and ping you if there are any further issues.
[16:24] <gmb> rockstar: Is there a specific bug or LEP that your work so far is tied to? I don't want to make something that doesn't do what you need it to do.
[16:24] <rockstar> gmb, does the overall design make sense?
[16:25] <rockstar> gmb, I had a kanban card with the work on it.
[16:25] <rockstar> (because we like to track our work in many different places)
[16:25] <gmb> rockstar: Yes. In fact, it's pretty much exactly what I had in mind for my hack, although that was less elegant :)
[16:25] <gmb> rockstar: Ah, cool. I shall go and find it.
[16:26] <salgado> jcsackett, ok
[16:42] <deryck> rockstar, hey, I think all the python was added to lazr-js for the testsing story.  To hook up the yui-unittest stuff with zope test runner.
[16:45] <rockstar> deryck, I'm not so sure.  Our testing story uses Java.  It can be fired up from the shell.
[16:45] <deryck> ah, ok.  Maybe not then.
[16:45] <deryck> I thought that was why.  Why all the Zope packages then?
[16:46] <deryck> and storm and lazr.restful.  god good y'all. ;)
[16:52] <rockstar> deryck, so, the testing story does use lazr.testing, but it doesn't need to.
[16:52] <rockstar> deryck, also, it could be used for the lazr-js testing, but not have to be distributed to our projects as well.
[16:52] <deryck> right
[16:55] <deryck> rockstar, just thinking more....
[16:56] <deryck> rockstar, some of what the egg provides us is the js lint stuff.... perhaps that could be broken out into it's own package.... separate the testing, python utils, and js file building stories a bit?  Smaller simpler packages?
[16:56] <rockstar> deryck, yes.
[16:56] <rockstar> deryck, the problem is that we've tied ourselves to a buoy with no anchor, so we experience pain anytime we want to change anything.
[16:57] <deryck> right
[16:57] <deryck> yeah, maybe it's not easy to do a neat and clean separation.
[16:57] <rockstar> deryck, the launchpad build system is too closely tied to the lazr-js build system, which makes it exponentially more complicated.
[16:57] <jml> clearly you should attach a sail to your buoy
[16:57] <jml> or something.
[16:57] <rockstar> jml, engineering fail.  :)
[16:58] <deryck> rockstar, it also makes adoption of lazr-js by any other web project outside Canonical difficult or impossible.
[16:58] <rockstar> deryck, exactly.
[16:58] <jml> rockstar, you're the one who's in pain and stuck to a buoy!
[16:58] <rockstar> jml, no, my boat is stuck to a buoy.
[16:59] <rockstar> jml, also, while the launchpad team does blow a lot of hot air, I think we want engines, not sails.  :)
[16:59] <jml> rockstar, ok, as long as it's an electric motor
[16:59] <rockstar> deryck, I tried to use lazr-js this weekend.  After about an hour, we gave up and used jQuery.  :)
[16:59] <jml> with batteries charged by a wind farm.
[16:59] <rockstar> jml, yeah, because we also want to be environmentally responsible.
[17:00] <rockstar> Unfortunately, Google owns the wind farms, so we have to display Google Ads on our boat the whole time.
[17:36] <jml> brb.
[18:06] <mrevell> night all
[18:54] <deryck> Some say money is the root of all evil, but it's really notifying subscribers in a web app request.
[19:01] <rockstar> deryck, :)
[19:01] <rockstar> deryck, sending mail in general...
[19:47] <rockstar> james_w, why does Recipe.__str__ need to call .parse() ?
[19:48] <james_w> rockstar: it doesn't
[19:49] <rockstar> james_w, so I could write a patch that removes it, and you'd be happy with it?
[19:49] <james_w> rockstar: but it's a good way of ensuring that we are not generating malformed recipes in __str__
[19:49] <rockstar> james_w, shouldn't tests be enough?
[19:49] <james_w> clearly you have found a case where that breaks down, but I'm not convinced that it warrants getting rid of that
[19:50] <james_w> I would remove it if I could be convinced that there are many more cases where it isn't going to be a good thing
[19:50] <james_w> rockstar: yes, they /should/ be enough
[19:52] <rockstar> james_w, I can't foresee any other issues, but if I could foresee bugs, I would fix them before they became bugs.
[19:52] <james_w> yes
[19:52] <rockstar> james_w, I think that, for reads, we should trust that it creates them properly.
[19:52] <rockstar> james_w, if we have bugs, then we can deal with them.
[19:52] <james_w> why trust when you can verify?
[19:52] <rockstar> As it is, if __str__ ever creates a bad manifest, it'll explode on a user in Launchpad.
[19:53] <rockstar> james_w, I guess the question is "Is Launchpad bzr-builder's most common use case."
[19:54] <james_w> yes
[19:54] <rockstar> james_w, I'm all for verifying if it didn't raise an exception the way it does.  I'd like it to warn and move on, but that would be difficult to look for in Launchpad.
[19:54] <james_w> bad manifest> explode how?
[19:54] <james_w> warnings> considered that, but who ever pays attention to warnings?
[19:55] <rockstar> james_w, yes, and warnings would be difficult to get out of a webapp.
[19:55] <james_w> yeah
[19:55] <james_w> well we don't have to use warn(), but still...
[19:56] <rockstar> james_w, you're verifying that functionality you wrote is working properly.  That's noble and all, but if there were a bug, RecipeParser.parse() would raise a rather exception that isn't really the user's fault.
[19:56] <james_w> yes
[19:58] <rockstar> james_w, I think the best course of action would be to remove the parse in __str__.  We could have different method be more strict, but having __str__ be that strict seems odd.
[19:59] <james_w> having it be sure to generate a valid recipe is odd?
[20:00] <rockstar> james_w, in __str__ I think it is.
[20:02] <rockstar> james_w, how 'bout this: Recipe.get_manifest() will always return a valid manifest or raise an exception, while __str__ just returns the manifest, valid or not.
[20:02] <james_w> why would you ever want to put an invalid manifest in the edit box?
[20:02] <rockstar> In Launchpad, we could then call Recipe.get_manifest(), and if it raises an exception, get the raw string.
[20:03] <rockstar> james_w, I think we have a valid reason to put an invalid manifest in the edit box.
[20:03] <james_w> if there is a bug that causes a round-trip to fail, then you will force the user to correct the error, then hit save, which will corrupt it again when displaying it
[20:03] <rockstar> We can catch the exception and add an error box that says "This recipe is totally broken.  Please fix it."
[20:03] <james_w> but you just said yourself that this would only happen for bugs where it isn't the users fault
[20:04] <james_w> so don't we want an OOPS if they are bugs?
[20:04] <rockstar> james_w, in this case, it's *kind of* the user's fault, but we let them save the bad data, and now they can't fix it.
[20:04] <james_w> yes
[20:05] <rockstar> james_w, but we need a better migration path than oopsing on crap data.
[20:05] <james_w> but I'm now starting to think we should call it a bug and go back and rethink the fix
[20:05] <rockstar> james_w, I've been calling it a bug the whole time.
[20:05] <rockstar> Because to Launchpad, it is a bug.
[20:05] <rockstar> And I'm trying to solve it for both Launchpad AND bzr-builder.
[20:06] <rockstar> james_w, in this case, the current bug is caused by a bzr-builder bug getting fixed.
[20:06] <rockstar> The user used . as a directory, and that never worked in building.  Now it fails earlier.
[20:07] <james_w> yes, but perhaps we should be saying that making the parser more strict without a format version bump is a bug, and should be fixed
[20:07] <rockstar> So the user entered bad data, but we accepted it.
[20:07] <rockstar> james_w, possibly.  I suggested that to abentley, and he had a point that this was always bad data.  It's just that it fails earlier now.
[20:07] <rockstar> james_w, and it never really affected users of bzr-builder itself like it did with Launchpad.
[20:08] <james_w> yes, it was always bad data
[20:08] <rockstar> james_w, it's just that now, the user has no way of getting to that data and fixing it themselves.
[20:08] <james_w> but there is a distinction between parse+build in the code, and perhaps trying to conflate them like that isn't the best idea
[20:09] <rockstar> So we need to teach Launchpad how to cope with crap data and encourage the user to fix the crap data.
[20:09] <abentley> james_w, I *like* validation.  I think we should do *more* of it.  For example, bogus revision specs.
[20:10] <rockstar> I'm not saying we shouldn't validate, but we should provide a path for coping with invalid data that doesn't require futzing with the database.
[20:10] <james_w> abentley: then please file bugs
[20:10] <abentley> I just think we should distinguish between "well-formed" and "valid", and allow parsing of recipes that are merely "well-formed".
[20:10] <abentley> james, there's already a bug about that.
[20:11] <james_w> rockstar: if the code had always been like this then the bad data could never get in the db. If we have a rule that more strict parsing would always result in a format bump then there are two ways we could get this problem in future:
[20:11] <james_w> 1. a bug that means that the parser doesn't detect the problem in the first place
[20:11] <james_w> 2. a bug in __str__ which is why the check is there
[20:11] <james_w> or I guess 3. that we want to make it stricter without a bump for some reason
[20:12] <rockstar> james_w, this change I'm proposing is STRICTLY for Launchpad sanity cases.
[20:12] <james_w> in either the first two cases then there is a bug that we want to know about
[20:12] <rockstar> james_w, have you seen https://bugs.edge.launchpad.net/launchpad-code/+bug/620868
[20:12] <rockstar> (this is the bug I'm addressing)
[20:13] <james_w> yes
[20:14] <james_w> abentley: I don't see a bug
[20:14] <rockstar> james_w, do you understand why that bug exists?
[20:14] <james_w> yes
[20:15] <rockstar> james_w, okay, there was a bug, #1 in your case.  It was fixed.
[20:15] <rockstar> It caused existing data (that never really worked anyway) to cause oopses, but not provide a way for the user to fix it.
[20:15] <james_w> It's not #1 in my case
[20:15] <abentley> james_w, https://bugs.edge.launchpad.net/launchpad-code/+bug/592821
[20:16] <james_w> they were well-formed recipes before, just ones that were never going to work
[20:16] <james_w> abentley: ah, so not on bzr-builder
[20:17] <abentley> james_w, right.  It doesn't have to be done there.
[20:17] <james_w> why not do it there?
[20:17] <rockstar> abentley, I'm not sure I see how your issue and mine go together her.
[20:18] <rockstar> s/her/here
[20:19] <abentley> james_w, because I don't know whether such a check is useful to bzr-builder.  Because if it is useful, I don't know why it's not there.
[20:19] <james_w> because I never thought of checking?
[20:19] <abentley> james_w, because the set of valid revision specs can vary, and maybe you don't want to get into that.
[20:19] <james_w> exactly
[20:20] <james_w> and maybe Launchpad shouldn't either?
[20:20] <rockstar> james_w, in the case of your #1, yes, the parser wasn't detecting the problem, and now it is (with a newer bzr-builder)
[20:20] <abentley> james_w, we can guarantee that it won't vary between our appservers and our builders if we choose to.
[20:21] <rockstar> Because it wasn't detecting the problem, the invalid recipe made it into the database.
[20:21] <rockstar> Now, with a new bzr-builder, it finds the bad data and oopses.
[20:21] <abentley> rockstar, they go together because they are both issues of validation, where an incorrect value was put into a recipe field.
[20:22] <rockstar> Launchpad needs a way for users to deal with bad data that made it into the data (however that happens) and allow them to change it.
[20:22] <rockstar> abentley, okay.
[20:23] <rockstar> james_w, so I'm proposing a change that would help Launchpad by providing a better interface the bzr-builder's Recipe class.
[20:24] <rockstar> Launchpad needs to be more robust.  We can validate 'til the cows come home, but if we don't give users a way to deal with invalid data, then we've only made things worse.
[20:24] <james_w> rockstar: I understand that, but I am looking to explore the issue in a little more depth. There's little point in asking the user to correct the problem if the problem was caused by us.
[20:25] <rockstar> james_w, the problem was caused by us, only in the fact that we let them put bad data into the database, but that would never actually succeed.
[20:26] <rockstar> james_w, format bump or whatever needs to happen, I'm happy with.  My big concern, however, is that the user's don't suddenly find out their recipe is broken by finding an oops where their recipe used to be.
[20:26]  * rockstar should probably go eat something, so he stops this egregious use of apostrophes.
[20:27] <james_w> rockstar: I agree with you that they should be able to fix bad data that they put it
[20:27] <rockstar> james_w, the patch I propose would do that.
[20:27] <james_w> rockstar: I'm arguing that doing this across the board leads to us possibly asking users to "fix" perfectly valid recipes due to bugs in bzr-builder
[20:28] <rockstar> james_w, maybe.  I'm less concerned with that at this point.
[20:28] <james_w> so I'm looking for ways for us to separate the two things such that we can ask them to fix bad data, while apologising for bugs and fix them
[20:29] <rockstar> james_w, I will always apologize to the user.  The fact that we're just now telling them they're wrong is a no-no on our part.
[20:32] <james_w> all of the examples given so far are recipes where we can perfectly understand the intent, they just aren't going to work
[20:32] <james_w> so as Aaron said, splitting well-formed and valid might make sense
[20:33] <rockstar> james_w, which is what I'm proposing.
[20:33] <james_w> rockstar: at one level, yes, but we can push it deeper than the patch you are suggesting
[20:33] <rockstar> james_w, here's my patch: http://pastebin.ubuntu.com/493798/
[20:34] <james_w> yes, I perfectly understand the change you are proposing
[20:35] <rockstar> james_w, I don't see necessity for going any deeper than that.  If I can catch the exception and somehow say "Hm, this is what you had, but for some reason bzr-builder doesn't think it's valid anymore."  then I'm happy.
[20:36] <james_w> sure you are
[20:37] <rockstar> james_w, I'm not sure how much more apart "valid" and "well-formed" you want.
[20:37] <james_w> a way of saying to the user "your recipe is well-formed, but these things are likely to be problems:"
[20:38] <james_w> at any time
[20:40] <james_w> then we can make the parser more "strict", without causing issues like this, provide better assistance to the user, and still have validation that what we create is at least well-formed
[20:48] <rockstar> james_w, yeah, I have no opinions on the overall architecture of bzr-builder.  I would just like something that works today.  If there's a bigger picture, great.
[20:48] <james_w> this has an impact on LP too though
[20:48] <rockstar> james_w, I think it does.  I think it'd be great for user's experience.
[20:48] <james_w> it's about user-experience, so I won't let you wash your hands of it as lying outside of LP ;-)
[20:49] <rockstar> james_w, however, right now, the user's experience is "WTF? Why can't I get to my recipe?"  That's my big concern now.
[20:49] <james_w> sure, but it's only two recipes?
[20:50] <rockstar> james_w, yes, but that's two more oopses that we don't need.
[20:50] <james_w> the change you propose is a "narrow" interface to likely problems (it can only report one), and it has a poor API to use it everywhere
[20:50] <lifeless> morning
[20:50] <james_w> sure, it's just not a stop-the-line issue IMO
[20:50] <lifeless> so the OOPS comes from where?
[20:51] <james_w> so if we can we should come up with an API that nicely gives us the better experience and implement that
[20:51] <james_w> lifeless: https://bugs.edge.launchpad.net/launchpad-code/+bug/620868
[20:52] <lifeless> so maybe I'm confused
[20:52] <lifeless> we used a plain text field to store the recipe didn't we?
[21:00] <abentley> lifeless, no.
[21:00] <abentley> lifeless, we store the recipe in object form.
[21:03] <lifeless> sourcepackagerecipedata ?
[21:04] <abentley> lifeless, yes, and the SourcePackageRecipeDataInstructions that refer to it.
[21:04] <lifeless> ok, I see
[21:04] <lifeless> thanks
[21:04] <lifeless> I was wondering if it would make sense, when the recipe is invalid to still permit it to be edited
[21:04] <lifeless> until it becomes valid.
[21:05] <abentley> lifeless, that is what we want to do.
[21:05] <lifeless> then we can handle a wider range of unexpected things like this
[21:05] <lifeless> abentley: awesome
[21:05] <abentley> lifeless, The problem is that we can't stringify the invalid recipe.
[21:05] <lifeless> what do we we use stringification for ?
[21:05] <abentley> lifeless, because bzr-builder checks for validity when it stringifies a recipe.
[21:06] <abentley> lifeless, we use stringification for displaying the field to the user so that they can edit it.
[21:06] <lifeless> does this mean that we can't show the user the invalid recipe
[21:06] <abentley> lifeless, yes.
[21:06] <lifeless> I see, certainly not going to help things along ;)
[21:06] <lifeless> and it helps me understand the chat you were having - thanks.
[21:48] <rockstar> abentley, http://pastebin.ubuntu.com/493798/
[22:19] <wallyworld_> morning
[22:20] <abentley> thumper, http://pastebin.ubuntu.com/493848/
[22:23] <lifeless> Ursinha: hi
[22:23] <lifeless> https://bugs.edge.launchpad.net/launchpad-registry/+bug/615237
[22:25] <lifeless> oh, I see whats up
[22:26] <lifeless> Ursinha: the ec2land stuff
[22:26] <lifeless> that gets bugs from an MP, does it get it from the MP, or the branch ?
[22:27] <mwhudson> lifeless: the mp gets bugs from the branch, unless i'm missing context horribly
[22:27] <lifeless> mwhudson: so I use the same branch for domain-fixues
[22:27] <lifeless> mwhudson: mps' show bugs already fixed in earlier MP's
[22:28] <lifeless> mwhudson: but I need to know which -precisely- the ec2 land code uses, or where to find that code, to make it stop including fix-committed and fix-released bugs in the bugs list in the pqm mail.
[22:29] <mwhudson> ah right
[22:29] <mwhudson> i expect the ec2 land code isn't that impentrable...
[22:30] <lifeless> indeed
[22:31] <lifeless> its blatting a growing number of bugs every time i land
[22:34] <Ursinha> lifeless, the branch
[22:34] <lifeless> Ursinha: hi
[22:34] <lifeless> https://bugs.edge.launchpad.net/launchpad-foundations/+bug/638468
[22:34]  * Ursinha looks
[22:35] <Ursinha> lifeless, don't know if that works
[22:36] <lifeless> Ursinha: 'that' ?
[22:36] <Ursinha> lifeless, sorry, let me elaborate
[22:36] <Ursinha> lifeless, problem is many times people mention bugs that weren't properly fixed, but had code landed, so bugs that are fix committed or fix released
[22:37] <Ursinha> so for that to work we'd need to ensure that all bugs that have fix to land are !fix committed/released
[22:37] <Ursinha> otherwise we'll start missing things
[22:37] <Ursinha> my thoughts would be: create another branch
[22:38] <Ursinha> than that won't happen
[22:38] <lifeless> Ursinha: hang on a sec.
[22:38] <lifeless> Ursinha: ec2 land will error if there are no valid bugs right ?
[22:39] <Ursinha> lifeless, if there are no bugs and it's not no-qa, yes
[22:39] <lifeless> right
[22:40] <lifeless> so, if someone has references only fix-committed and fix-released bugs
[22:40] <lifeless> they will get an error
[22:40] <lifeless> and in that errror you could say (bug X Y and Z are also linked on the branch but are fix committed/fix released)
[22:40] <Ursinha> lifeless, what will you do if you're trying to land a branch which is already linked to a bug which is fix committed, but that's what you want to do? are you going to unlink the bug?
[22:40] <Ursinha> I don't like this idea
[22:41] <lifeless> Ursinha: if the code I'm landing is needed to fix that bug, its not really fix committed is it ?
[22:41] <Ursinha> lifeless, it can be, but qa-bad
[22:41] <Ursinha> fix committed == there's a fix for that bug (working or not)
[22:42] <Ursinha> in progress == fix is in progress (it might have incremental fixes but the whole fix isn't committed yet)
[22:42] <Ursinha> s/committed/landed
[22:42] <lifeless> qa-bad should imply in-progress or triaged
[22:43] <lifeless> fix committed isn't 'commit in the tree' its *FIX* in the tree.
[22:43] <Ursinha> right
[22:43] <lifeless> I think that when something is bust we should make the bug be in-progress again
[22:44] <lifeless> just like --incr
[22:44] <Ursinha> lifeless, manually?
[22:44] <lifeless> we could automate it
[22:44] <lifeless> but sure, manually.
[22:45] <lifeless> qa-bad + fixreleased makes no sense.
[22:45] <lifeless> qa-bad + fixcommitted also makes no sense.
[22:45] <Ursinha> qa-bad is added by the devel; we could a) change it manually in the same time we're adding the qa-bad tag or b) make the bot to check if there are qa-bad and change it to in progress again
[22:45] <lifeless> Designing other workflows around nonsensical states is not going to work well.
[22:45] <lifeless> Ursinha: I like both a) and b)
[22:45] <Ursinha> lifeless, ok. what to say about branches that have several bugs linked, and some of them are already released
[22:45] <Ursinha> ?
[22:45] <lifeless> just list the other bugs.
[22:46] <Ursinha> why are people reusing branches?
[22:46] <lifeless> Ursinha: convenience; clarity; organisation.
[22:48] <Ursinha> lifeless, well, I think we're trying to make the scripts to workaround a situation that could be avoided by just not reusing branches
[22:48] <Ursinha> and the problem is that the script already tries to workaround some behaviors to create consistency
[22:48] <Ursinha> I think the scripts will get more and more confuse because of that, but if you think this is really worthy, I can do that
[22:49] <Ursinha> adding a mechanism to tagger to set qa-bad bugs to in progress isn't hard
[22:49] <lifeless> It makes my dev environment a lot easier to manage.
[22:49] <lifeless> I have 'librarian' for the librarian, 'registry' for registry, 'oops' for oops etc
[22:50] <lifeless> sometimes I have bug-X branches when I have multiple things in flight : but the whole kanban + RFWTAD workflow is about removing the need for parallel-tasking.
[22:50] <Ursinha> lifeless, what are you going to do about the fix committed bugs linked to your branches, when you land a new fix?
[22:50] <lifeless> Ursinha: I don't understand the question.
[22:51] <lifeless> if its fix committed there is no more work to do on the bug.
[22:51] <lifeless> landing a new fix suggests its not fix committed.
[22:51] <Ursinha> lifeless, that's not what I see following bugmail
[22:51] <lifeless> thats a contradiction
[22:51] <lifeless> Ursinha: can you point me at some examples?
[22:52] <Ursinha> lifeless, I'd have to do some gardening in my bugmail
[22:52] <Ursinha> lifeless, we can try that. the idea is to make ec2 land ignore fix committed bugs?
[22:52] <Ursinha> or error on them?
[22:52]  * wallyworld_ off to doctor appointment
[22:54] <lifeless> Ursinha: ignore fix(committed|released) bugs
[22:54] <Ursinha> lifeless, one case that came to mind now is, bug fix released but not really fixed. not tagged qa-bad or -needstesting, but has new fix
[22:54] <Ursinha> what to do in this case?
[22:54] <Ursinha> I saw that happen a few times after releases
[22:54] <lifeless> so, something has been made better
[22:54] <lifeless> but not good enough
[22:55] <lifeless> with the QA workflow using bugs to permit commits to trunk to be deployed.
[22:55] <lifeless> we need a new bug for the QA workflow.
[22:55] <lifeless> don't we?
[22:55] <Ursinha> not sure what you mean
[22:55] <lifeless> so in this scenario:
[22:55] <Ursinha> I guess ec2 land should check all bugs, see only the !fix committed/released and if no bugs left, error
[22:55] <lifeless>  - bug X
[22:56] <lifeless>  - branch lands that 'fixes X'
[22:56] <lifeless> we QA it - bugX [qa-ok]
[22:56] <lifeless> we deploy
[22:56] <lifeless>  - bugx [FixReleased]
[22:56] <lifeless> then we realise its still timingout (for instance)
[22:56] <lifeless> thats your scenario, right ?
[22:57] <Ursinha> right
[22:57] <lifeless> now, there are two places we might find this
[22:57] <lifeless> firstly, we might notice in QA
[22:57] <lifeless> and secondly we might notice after deploy
[22:57] <lifeless> if we notice in QA, because its 'better' we don't need to stop the deploy.
[22:57] <lifeless> so any solution to this needs to cater for noticing in QA.
[22:58] <lifeless> if we notice in QA, we could se the bug to qa-incremental (is that right?)
[22:58] <Ursinha> hm
[22:58] <lifeless> so In-progress status, and 'branch is ok to land'
[22:58] <Ursinha> if we notice in qa, so it's qa-untestable
[22:59] <Ursinha> or qa-bad if devel thinks the fix is going to bork prod. if rolled out
[22:59] <lifeless> https://dev.launchpad.net/QAProcessContinuousRollouts#We can QA the branch, and it is an incremental step towards the fix of one or more bugs
[22:59] <Ursinha> lifeless, but only if you landed the first fix as --incr
[23:00] <Ursinha> if you know previously that the fix might not be the last one, than land it as incremental and all of that will be done automatically
[23:00] <lifeless> Ursinha: So I guess I'm saying 'what you describe is us realising that the fix *was* incr, even if we didn't *say it was*'
[23:00] <Ursinha> bug in progress, qa-untestable
[23:00] <Ursinha> yes, I see that
[23:00] <lifeless> so the right thing to do, whether we notice in QA, is to set the bug status in the same way.
[23:00] <Ursinha> I'm saying there's room for tweaking things :)
[23:00] <lifeless> yeah
[23:01] <lifeless> and so if we set th ebug status the same way
[23:01] <Ursinha> qa-untestable, in that case
[23:01] <Ursinha> otherwise we'll be blocked
[23:01] <lifeless> we'll set it to in-progress, qa-untestable(or-qa-ok I guess for the testable-incremental-case)
[23:02] <Ursinha> doesn't matter for the script, both are "go-for-it"
[23:02] <Ursinha> I like qa-ok best
[23:02] <lifeless> right
[23:02] <lifeless> and ec2 land would correctly *include* this bug in the later landing
[23:02] <lifeless> because its in-progress
[23:02] <Ursinha> right
[23:02] <lifeless> that seems to work to me
[23:05] <Ursinha> lifeless, right. I'll update the wiki page and let you know
[23:05] <lifeless> Ursinha: thanks!
[23:16] <Ursinha> I don't like the way the theme in dev.lp.net separates the sections of the page
[23:16] <Ursinha> it's kinda hard to read
[23:17] <lifeless> yeah
[23:17] <lifeless> its rather awkward
[23:18] <lifeless> mbarnett: nevermind,m just BB flakiness
[23:23] <mars> lifeless, reading backscroll, looking sadly upon the BB waterfall - did you already pass the BB restart work along?
[23:27] <lifeless> mars: restart work ?
[23:27] <mars> lifeless, re: your "BB flakiness" comment
[23:28] <lifeless> mars: well I haven't debugged deeply, for clarity
[23:28] <mbarnett> lifeless: hehe..  kk
[23:28] <lifeless> mars: but the most recent build was for an older rev
[23:28] <lifeless> mars: so it hadn't tried tip of prod-devel
[23:29] <lifeless> (and it was nearly 12 hours old that the fix landed in prod-devel)
[23:30] <mars> lifeless, looking at the waterfall BB is completely hosed right now :(
[23:30] <mars> so I need to figure out what to tackle first
[23:30] <lifeless> mars: the machine gun ?
[23:31] <lifeless> mars: parhaps bring out the 'restart the world' card?
[23:31] <mars> lifeless, we used that card earlier today - I am worried
[23:31] <lifeless> oh
[23:31] <lifeless> that is a concern
[23:32] <mars> and lp and db_lp have been down for a week
[23:32] <lifeless> we has several CPs pending
[23:32] <mars> :(
[23:32] <lifeless> cannot rename, ubuntu bug uploads, and OOPS generation
[23:34] <mars> ok, so we need to get lucid_lp and prod up first
[23:35] <mars> mbarnett, I forced the lucid_lp build.  If the build does not start in, say, 10 minutes, you will have to restart the build slave.
[23:35] <mbarnett> mars: kk
[23:36] <mars> ok, prod_lp is offline for some reason (is it an EC2 slave?)
[23:37] <lifeless> the log shows 'substantiating'
[23:37] <lifeless> so I'd say yes
[23:37] <mars> checking master.cfg
[23:38] <mars> yes, EC2Latent
[23:38] <mars> lifeless, you said prod_lp pulled a stale tip revision for it's test run?
[23:39] <lifeless> mars: no, I said the last run in 'recent builds' was ages ago and for what is now obsolete
[23:39] <lifeless> mars: and that it hasn't has a more recent run which would get a better tip
[23:39] <lifeless> I hypothesised that it hadn't detected it
[23:39] <mars> ok, I'll force a build then
[23:39] <lifeless> alternatively, if the slave doesn't come up, I bet bb doesn't report that as a failed run.
[23:40] <lifeless> mars: I forced a build
[23:40] <mars> it is still offline
[23:40] <lifeless> see 23:18:22 in the waterfall
[23:41] <mars> ugh
[23:41] <mars> have to restart the build master then
[23:43] <mars> mbarnett, could you please restart the build master?  That should get the prod_lp builder running again
[23:44] <mars> lifeless, EC2 build slaves need a master restart in order to bring them back up (lp, db_lp, prod_lp).
[23:44] <mars> I haven't seen this problem before, but "restart the world" sounds right
[23:44] <mars> use the unstoppable super weapons first
[23:44] <mars> always the unstoppable super weapons first
[23:45] <mbarnett> build master has been restarted
[23:45] <lifeless> where's the earth shattering kaboom
[23:45] <lifeless> there's meant to be an earth shattering kaboom

[23:45] <mars> heh
[23:46] <mbarnett> does not appear to be starting back up happily
[23:47]  * mars mashes F5 a few more times in desperation
[23:48] <mars> j/k, that actually speeds server death in resource-exhausted environs :)
[23:48] <mars> mbarnett, do you have a log I could look at please?
[23:50] <mbarnett> mars: sure, give me a couple
[23:51] <mars> k