[00:35] <salgado> is there a way to move a PPA from a team to another or do we need to create a new PPA and copy all the packages over?
[00:38] <lifeless> new ppa + copy
[00:42] <salgado> hm, ok. thanks lifeless
[02:36] <rick_h> StevenK: what's up?
[02:38] <StevenK> rick_h: sidnei marked my MP as Needs Fixing :-(
[02:38] <StevenK> I didn't think combo_app() was actually tested.
[02:40] <rick_h> StevenK: yea, it is
[02:40] <rick_h> the TestApp() wraps combo_app
[02:40] <rick_h> and loads it as a wsgi appliaction that gets tested in the tests/test_combo.py
[02:41] <rick_h> StevenK: not sure what he wants test-wise, the current tests make sure combo_app functions, I suppose a test mounting _application in TestApp would work
[02:41] <rick_h> StevenK: and yea, I kind of agree with him that it'd probably be best to not _ it if we're going to import it
[02:42] <StevenK> Right
[02:42] <StevenK> I also think I didn't explain it the best in the MP, if he's asking "Why do you need this?"
[02:42] <rick_h> StevenK: yea, I didn't follow it either until you showed me how we were changing out code side
[02:43] <rick_h> though I think that the reason of "combo_app should be able to be wrapped as a wsgi app" is enough reason
[02:43] <StevenK> Right, so pasting our WSGI wrapper might help
[02:43] <rick_h> StevenK: yea, at least in the MP so there's a record I guess. Technically it's the more 'correct' way anyway.
[02:43] <StevenK> rick_h: I'll look at sorting it out after lunch.
[02:44] <rick_h> StevenK: ok, thanks
[02:44] <rick_h> I didn't see his response so sorry I didn't catch him during hte day with it
[02:44] <rick_h> just ping'd to let him know we needed it and ask him to take a peek
[04:39]  * StevenK tries to work out how to run convoy's test suite
[04:46] <wgrant> Grarrrr
[04:46]  * wgrant mauls the branch scanner
[04:46]  * wgrant chainsaws the branch scanner
[04:47] <StevenK> Haha
[04:47] <StevenK> Whyfor?
[04:48] <wgrant> It is slow.
[04:48] <wgrant> It holds locks.
[04:48] <wgrant> It randomly hangs.
[04:48] <wgrant> It's like Soyuz, except more unreliable and with a simpler task.
[04:48] <StevenK> wgrant: Can't bug 910492 be closed?
[04:48] <_mup_> Bug #910492: long urls break lazr restful object representation cache <oops> <Launchpad itself:Triaged> < https://launchpad.net/bugs/910492 >
[04:49] <wgrant> StevenK: Done.
[04:49] <wgrant>         # Bug heat increases by a quarter of the maximum bug heat
[04:49] <wgrant>         # divided by the number of days since the bug's creation date.
[04:49] <wgrant> wut
[04:49] <wgrant> lifeless: Bug heat confuses me
[04:51] <lifeless> I expect there are plentiful lies by now in the code
[04:51] <lifeless> due to age
[04:51] <wgrant> Ha ha
[04:53] <wgrant> Hm
[04:54] <spm> wgrant: while you're in a chainsawing slow services mindset, may I direct your energies towards.... checkwatches?
[04:54] <spm> or have I trolled too far?
[04:54] <wgrant> Hey, it hasn't hung in weeks.
[04:55] <spm> is it running?
[04:56] <wgrant> Unfortunately.
[04:56] <spm> :-)
[05:40] <StevenK> wgrant: O hai. So given https://code.launchpad.net/~launchpad/+recipe/launchpad-convoy . If I push changes to the packaging branch does that mean it will build 0.2.1-0~19-oneiric1 again?
[05:42] <wgrant> StevenK: Yes
[05:42] <wgrant> StevenK: The packaging revno is not included in that version template.
[05:42] <wgrant> Also, why do both thumper and StevenK ask me about recipes, when it was their project :(
[05:42] <StevenK> Haha
[05:42] <StevenK> wgrant: What would you recommend?
[05:43] <StevenK> wgrant: You know, the brain supresses bad memories ...
[05:47] <wgrant> StevenK: Either commit to trunk or include the packaging revno in the template (possibly temporarily)
[05:50] <StevenK> wgrant: I don't want the packaging in trunk
[05:52] <wgrant> StevenK: I never suggested that :)
[05:53] <StevenK> I could pull out lp:convoy, but then I/someone else has to merge lp:convoy into the packaging branch and push it
[05:53] <wgrant> What about one of the options I gave?
[05:55] <StevenK> I'm just not sure how/where to inject the packaging revno
[05:55] <StevenK> Without screwing up upgrades
[06:00] <wgrant> Let me find one of mine.
[06:01] <wgrant> StevenK: https://code.launchpad.net/~wgrant/+recipe/ivle-trunk
[06:04] <StevenK> Why +dr?
[06:04] <wgrant> debian revision
[06:04] <wgrant> It's arbitrary.
[06:04] <StevenK> Right
[06:05] <wgrant> StevenK: https://code.launchpad.net/~wgrant/launchpad/stop-this-aging-nonsense/+merge/91763
[06:06] <StevenK> It's over 9000!
[06:07] <StevenK> (IE, r=me)
[06:07] <wgrant> Thanks.
[06:09] <StevenK> I thought bug heat was already done in the DB
[06:10] <wgrant> The calculation is a PL/Python function, if that's what you mean.
[06:11] <StevenK> Right
[06:13] <nigelb> lol. stop this aging nonsense.
[06:13] <StevenK> Yes. wgrant will forever be 16.
[06:13] <nigelb> hahaha
[06:14] <nigelb> wgrant is the Australian vampire I suppose. :P
[06:14] <nigelb> StevenK: ^
[06:14] <StevenK> Haha
[06:18] <cody-somerville> He doesn't sparkle though.
[06:18] <StevenK> Spoken like a Twilight fan
[06:19] <wgrant> Heh
[07:27] <wgrant> stub: Oh, didn't know that was possible. Thanks.
[07:29] <stub> The planner gets to inline SQL functions into SQL queries. Not sure if it will help in this case.
[07:30] <wgrant> Unlikely, since the surrounding Python is terrible.
[07:30] <wgrant> But it's something.
[08:50] <adeuring> good  morning
[10:30] <bigjools> jml: who do we ping to get your updated testtools in the archive? Do you know who packages it?
[10:30] <jml> bigjools: it gets imported from Debian, lifeless is the maintainer there.
[10:31] <bigjools> jml: ok ta
[10:31] <bigjools> jml: there's an ubuntu-specific version in the archive at the moment
[10:31] <jml> oh really?
[10:31] <bigjools> jml: 0.9.11-1ubuntu1
[10:31]  * jml should really pay more attention to downstreams.
[10:31] <bigjools> I haven't looked at its local patch
[10:32] <bigjools> heh, debian has 0.9.11-1
[10:44] <jml> "  * Build using dh_python2."
[10:44] <jml> from doko
[10:44] <jml> I guess it's not much of a patch :)
[10:44] <jml> (incidentally, well done LP for making that easy to find out: https://launchpad.net/ubuntu/+source/python-testtools)
[10:46] <bigjools> \m/
[11:03] <adeuring> gmb, wgrant: could you review this MP: https://code.launchpad.net/~adeuring/launchpad/bug-829074-ui/+merge/91796?
[11:04] <gmb> adeuring: Sure thing.
[11:05] <adeuring> gmb: thanks!
[11:05] <gmb> adeuring: Just finishing another branch, but I'll get to it presently.
[11:48] <gmb> adeuring: Looks good. r=me.
[11:48] <adeuring> gmb: thanks!
[11:49] <gmb> Welcome :)
[11:52] <StevenK> adeuring: I think your change in r14748 does require QA.
[11:53] <adeuring> StevenK: yes and no -- the point is that the new featrues can't be used yet. The branch just reviewed by Graham will make that easier
[12:01] <rick_h> morning
[12:01] <adeuring> morning rick_h
[12:16] <wallyworld_> bigjools: wtf, just read that kubuntu is being killed :-(
[12:18] <jelmer> wallyworld_: it no longer has a dedicated canonical engineer working on it (as was the case when riddell was on rotation to Bazaar)
[12:18] <jelmer> wallyworld_: that's not quite the same as it being killed
[12:19] <wallyworld_> jelmer: the net effect will be the same i fear
[12:20] <jelmer> they seem to've done fine for 11.10 when jonathan was on rotation to bzr, and {x,edu,l}ubuntu seem to do fine with just infrastructure support too
[12:21] <jelmer> not saying it won't have a negative impact
[12:21] <stub> I think it would take much more than a single bullet to kill of kubuntu.
[12:21] <wallyworld_> yeah, maybe i'm being too pessimistic
[12:21] <wallyworld_> just a bit sad i guess
[12:21] <bigjools> wallyworld_: not killed
[12:22] <bigjools> wallyworld_: I think it's a good thing actually
[12:22] <wallyworld_> really?
[12:23] <bigjools> yeah, it means any criticism about it will need to be levelled at the community, not canonical
[12:23] <wallyworld_> true
[12:23] <bigjools> and the community will not be encumbered by anything
[12:26] <rick_h> StevenK: can the combo-url land? Or we waiting on RT?
[12:28] <StevenK> rick_h: We are waiting for the convoy MP.
[12:28] <rick_h> StevenK: ok cool.
[12:30] <StevenK> rick_h: If that lands, then I can update the convoy package, and land combo-url
[13:36] <rick_h> adeuring: do you know anyone that knows translations well?
[13:36] <adeuring> rick_h: jtv for example
[13:36] <rick_h> adeuring: I'm trying to find some way to mass download .pot files without any success in wiki/google
[13:36] <rick_h> adeuring: ok, thanks
[13:57] <rick_h> jtv: ping, got a sec for a translation question? someone is asking about mass downloading all ubuntu .pot files for spanish languages?
[13:58] <rick_h> jtv: I don't see any way to mass download from the wiki/webui. I see a lp-translations tools package that seems to do mass uploads though, but code doesn't seem to download?
[13:58] <rick_h> adeuring: so did maint. RT, questions, translations, and new projects
[13:59] <adeuring> rick_h: cool -- i sucked again :(
[13:59] <rick_h> bah, missed jtv
[14:01] <deryck> Morning, all.
[14:05] <rick_h> morning deryck
[14:10] <deryck> adeuring, rick_h -- I'd like to do a G+ hangout for our standup today.
[14:11] <rick_h> deryck: sounds like a plan
[14:16] <adeuring> deryck: gahh -- i still have no g+ account :(
[14:16] <deryck> adeuring, see my PM to you. :)
[14:33] <deryck> abentley, we're G+ hanging out today for standup.
[15:17] <abentley> jcsackett, sinzui: A branch's unique name is a well-established term.  Unique names do not include the lp: prefix.
[15:17] <sinzui> my apologies
[15:18] <abentley> sinzui: np, just let's keep the definition consistent.
[15:26] <jelmer> what's happening with bug heat?
[15:26] <jelmer> I thought it was going to be removed - is it going to stay around in some form (given it has just changed)?
[15:30] <jcsackett> abentley, sinzui: so do we need to roll that back?
[15:31] <abentley> jcsackett: No, but you should change the function name, or else change where you attach the prefix.
[15:31] <jcsackett> abentley: ok, i'll land a follow up to correct the name.
[15:31] <abentley> jcsackett: thanks.
[15:34] <abentley> jcsackett: You should also change the HTML so it's not called #branch-unique-name.
[15:34]  * jcsackett nods
[15:34] <jcsackett> seems odd unique name was ever used, as it's all about presenting the location, which incorporates the unique name but isn't the same thing at all.
[15:35] <abentley> jcsackett: I don't know where this is, but it's possible the authors thought it was about presenting the name, not the location.
[15:36] <jcsackett> abentley: fair. could be it was misappropriated for presenting the location later.
[15:38] <abentley> jcsackett: Nope, looks like it was always presenting the name and calling it the location.
[15:38] <jcsackett> right on. well, i'll be making it consistent shortly.
[15:39] <abentley> jcsackett: I guess you could stick the lp: in the template.
[15:39] <jcsackett> possibly, but it would have to exist outside of the node, since that gets set by the js.
[15:39] <jcsackett> seems a might bit hackish.
[15:42] <abentley> jcsackett: Yes, it's treating HTML as a template language.
[16:18] <abentley> jelmer: https://blog.launchpad.net/general/bugheatchange
[16:20] <jelmer> abentley: ah, thanks - missed that for some reason
[16:26] <sinzui> danhg, talky talky time?
[16:47] <danhg> Hey sinzui, I'm in the middle of MaaS tests, I should be free by 18:00 GMT?
[16:47] <sinzui> okay
[17:29] <deryck> adeuring, hey, any luck on those interrupt duties? (a friendly ping from one slacker to another. ;)
[17:30] <adeuring> deryck: sgh... goit distracted again by working on a branch...
[17:52] <lifeless> deryck: hey
[17:59] <deryck> hi lifeless
[18:00] <lifeless> how is your schedule to-day?
[18:01] <deryck> lifeless, unfortunately, my wife ninja-scheduled me for the dentist.  and I need to leave a little early today.
[18:01] <deryck> she knows I'm a baby and need her to hold my hand.
[18:04] <lifeless> deryck: I take it that that is in a few minutes time? If it was say 40-50 minutes away we could do a quick call...
[18:05] <deryck> lifeless, no, it's actually couple hours away still.  I'm just heads down trying to finish these hanging actions from TL call.
[18:05] <deryck> lifeless, I'd really love to chat, but I don't want to be yet another day working on these "investigations" either. :)
[18:06] <deryck> lifeless, how about tomorrow post TL call?
[18:06] <lifeless> IIRC I was going to give you a hand with one of them
[18:06] <deryck> lifeless, you gave me enough of I hand, I think.  I filed a bug this morning.
[18:06] <lifeless> uhm, I have no idea, let me check (the new time has thrown out my memosied schedule)
[18:06] <deryck> let me see bug number....
[18:06] <deryck> lifeless, bug 928327
[18:06] <_mup_> Bug #928327: codebrowse hangs due to exception/oops handling <loggerhead:Triaged> < https://launchpad.net/bugs/928327 >
[18:07] <deryck> lifeless, my guess/diagnose could easily be wrong ^^ so I appreciate you looking at the bug.
[18:07] <lifeless> gary_poster: we have a parallel testing biweekly thing conflicting with the TL new time
[18:07] <lifeless> deryck: I have a slot *before* the tl meeting; after has my 1:1 with statik
[18:08] <deryck> lifeless, that works for me better actually.  forgot about the TL call time shift.
[18:09] <lifeless> deryck: why do you think oops is implicated ?
[18:10] <deryck> the hang seems to be in oops_middleware
[18:10] <lifeless> I don't follow - oops_middleware is in the call stack yes, but its a WSGI middleware, so it will always be so.
[18:11] <lifeless> thread 11 in https://pastebin.canonical.com/59603/ is in the middle of a global GC run
[18:12] <lifeless> but the other one has no GC in it, so either different cases, or not GC.
[18:12] <deryck> lifeless, so I saw the threads that seemed stuck in sock_sendall had stuff happening in httpexceptions and oops_middleware....
[18:12] <deryck> lifeless, so I just assumed something was hanging in dealing with an oops.
[18:13] <lifeless> so this, for instance:
[18:13] <lifeless> #6 0x00000000004fa67b in sock_sendall (s=0xa8e4ba0, args=<value optimized out>) from ../Modules/socketmodule.c
[18:13] <lifeless> #7 0x00000000004a7c5e in call_function () from ../Python/ceval.c
[18:13] <lifeless> /usr/lib/python2.6/socket.py (282): flush
[18:13] <lifeless> /usr/lib/python2.6/socket.py (292): write
[18:13] <lifeless> /srv/codebrowse.launchpad.net/production/launchpad2-rev-14640/eggs/Paste-1.7.2-py2.6.egg/paste/httpserver.py (123): wsgi_write_chunk
[18:13] <lifeless> /srv/codebrowse.launchpad.net/production/launchpad2-rev-14640/eggs/oops_wsgi-0.0.8-py2.6.egg/oops_wsgi/middleware.py (131): oops_write
[18:14] <lifeless> ?
[18:15] <lifeless> deryck: ^
[18:16] <deryck> lifeless, indeed.  that's what I meant.
[18:17] <lifeless> deryck: ok, so the way wsgi works means that every layer that offers facilities will /tend/ to have its own 'write' callable that is passed down.
[18:17] <gary_poster> lifeless, yeah I noticed
[18:17] <gary_poster> lifeless, I think TL wins ;-)
[18:17] <lifeless> oops_write is the callable passed from the oops middleware to the next deeper wsgi thing
[18:18] <lifeless> and wsgi_write_chunk is the callable that was returned by the paste http server
[18:18] <gary_poster> flacoste, lifeless, should we move parallel testing to 4PM Eastern, 21 UTC?
[18:18] <lifeless> http_exceptions etc
[18:18] <gary_poster> Wed still?
[18:18] <lifeless> gary_poster: is that 1 hour later or something?
[18:18] <deryck> lifeless, ah, ok.  Didn't realize that.
[18:18] <gary_poster> lifeless, right
[18:18] <lifeless> gary_poster: I have a call with statik then
[18:19] <gary_poster> lifeless, doesn't parallel testing take precedence? ;-)
[18:19] <gary_poster> lifeless, ok.  I'll look at schedules and make another proposal later
[18:19] <lifeless> gary_poster: I'm about a month out what with budapest, sickness, the QBR.
[18:19] <lifeless> gary_poster: if I hadn't missed that many 1:1's I'd say sure...
[18:19] <gary_poster> lifeless, sure. np
[18:19] <lifeless> but IME if you don't pin statik down with a nailgun .. :)
[18:19] <deryck> lifeless, so it seems those pastes are pretty useless then, if I understand that right.  except for knowing we're stuck in socket send.  or am I missing something?
[18:20] <lifeless> deryck: well, we don't know we're stuck in socket send
[18:20] <deryck> ok
[18:20] <lifeless> deryck: there are lots of threads, and some of them were writing content when the core was taken
[18:20] <lifeless> we don't know how long they had been there
[18:21] <lifeless> deryck: it *may* be that that is a smoking gun indicating e.g. network issues talking to haproxy or something
[18:21] <lifeless> or it may be totally irrelevant
[18:21] <deryck> ah, gotcha.
[18:21] <lifeless> deryck: lets go through in some detail tomorrow, for now I've gardened the bug to have just the definitive data
[18:21] <lifeless> gary_poster: +1
[18:22] <gary_poster> cool
[18:22] <deryck> lifeless, ok
[18:22] <lifeless> deryck: note that sock_sendall is a python module, so it may well get involved in or mangled by GIL issues, bad locking etc
[18:22] <lifeless> deryck: we may end up spelunking into C
[18:23] <lifeless> deryck: that said, we're missing line numbers
[18:23] <deryck> sounds fun :)
[18:23] <lifeless> deryck: what command did you use to get the traces ?
[18:23] <lifeless> deryck: and did you get missing symbol errors when you fired up gdb in the chroot ?
[18:23] <deryck> lifeless, used pygdb.  and no, I don't think so.  I can look again now.
[18:24] <lifeless> deryck: if you could, with regular gdb, uhm, 'thread apply all bt' and see if you get line numbers for the C frames
[18:24] <lifeless> if you don't, then we haven't got the debug environment right
[18:25] <deryck> lifeless, ah, yes, that is better.  line numbers indeed.
[18:25] <deryck> lifeless, could have sworn I did this and didn't get anything, and then tried pystack macros which hung.
[18:26] <deryck> lifeless, but may the regular bt attempt was when I was running locally still, and not in right env.
[18:27] <lifeless> deryck: not to worry - you have line numbers now ;) - could you refresh the paste links in the bug ?
[18:31] <abentley> gmb: are you still ocr?
[18:35] <deryck> lifeless, done.
[18:36] <lifeless> deryck: in bug 928327 ? I still see the old numbers.
[18:36] <_mup_> Bug #928327: codebrowse hangs in production <loggerhead:Triaged> < https://launchpad.net/bugs/928327 >
[18:37] <lifeless> deryck: ahha
[18:37] <lifeless> deryck: mmm, cluster lag
[18:39] <lifeless> deryck: the trick for the source is apt-get source python2.6 in the chroot
[18:40] <lifeless> deryck: so we can see that in https://pastebin.canonical.com/59625/
[18:41] <lifeless> thread 3 is in a libc call -  n = send(s->sock_fd, buf, len, flags);
[18:42] <lifeless> thread 7 is in the same libc call - send()
[18:42] <lifeless> the content being written looks like fairly inane annotated pages
[18:43] <lifeless> mysql-5.1-wl820 in thread3
[18:43] <lifeless> same branch in thread 7
[18:44] <lifeless> totally different thing in thread 8 - ~dcplusplus-team/dcplusplus/dcpp-plugins/revision/3/win32/PluginPage.h
[18:44] <lifeless> and it renders pretty snappily
[18:45] <lifeless> thread 10 is in the python zlib module
[18:45] <lifeless> I've found race conditions / bugs in it before
[18:45] <lifeless> so little alerts are going off for me
[18:45] <lifeless> note that it is in PyEval_RestoreThread
[18:46] <lifeless> see (http://docs.python.org/c-api/init.html) - in short, this is a common place for hangs
[18:46] <lifeless> it means it will be trying to get the GIL
[18:47] <lifeless> now, looking down its frames, that is in knit extraction
[18:48] <lifeless> so this should be safe as long as loggerhead isn't sharing the same objects across threads
[18:48] <lifeless> (it may be safe if the objects are being shared, but its less of an automatic assumption)
[18:49] <flacoste> gary_poster: how about at the old TL call position?
[18:49] <lifeless> threads 14,13,12 10 are all waiting on the GIL
[18:50] <lifeless> (determined by taking the GIL lock which the call to RestoreThread identifies and searching fo rit
[18:51] <deryck> lifeless, interesting.  I had to read back a few times, but I follow now. I feel +2 times smarter now. :)
[18:51] <lifeless> if you check the code for PyEval_RestoreThread you can see how I got the GIL lock just from the backtrace
[18:51] <lifeless> because the only lock it tries to get is the GIL
[18:52] <lifeless> thread 11 is doing a GC
[18:52] <lifeless> this means thread 11 holds the GIL
[18:52] <lifeless> the threads that are in sock_sendall have released the GIL
[18:52] <lifeless> (line 2723 in socketmodule.c is wrapped in Py_BEGIN_ALLOW_THREADS / Py_END_ALLOW_THREADS
[18:53] <lifeless> again - see http://docs.python.org/c-api/init.html - that means they have released the GIL
[18:53] <lifeless> so thats all the threads
[18:53] <lifeless> the other threads have noise in their stack
[18:53] <lifeless> I *suspect* they are killed threads by the paste thread killing code
[18:54] <lifeless> e.g. dead but not joined yet
[18:54] <lifeless> now, if the server has been attempted to shutdown
[18:54] <lifeless> but hasn't gone
[18:54] <lifeless> this would explain why there is no main thread visible
[18:55] <lifeless> (thread 1 shows
[18:55] <lifeless> Thread 1 (Thread 27332):
[18:55] <lifeless> #0  0x00002b34765e5ebc in ?? ()
[18:55] <lifeless> #1  0x0000000000000000 in ?? ()
[18:55] <lifeless> )
[18:55] <lifeless> deryck: ^ probably need to ask webops for the exact sequence of events leading up to the core to validate that theory
[18:55] <lifeless> if we can validate the theory then we can make an interesting observation
[18:56] <lifeless> which is that the listen event loop *has* shutdown properly; what is missing is cleanup of these other threads
[18:56] <lifeless> which cannot happen until garbage collection completes
[18:56] <deryck> lifeless, right
[18:56] <lifeless> well, not properly :P
[18:57] <lifeless> there are 2 threads in sock operations, 4 threads waiting for the gil (and apparently fine otherwise) and 1 in gc with nothing sensible higher up its stack
[18:58] <lifeless> thats a total of 7, but we'd expect 10 worker threads IIRC, plus mainloop
[18:58] <lifeless> so I strongly suspect a SIGINT or something already sent
[18:59] <lifeless> now, lets peek at the other core
[19:00] <lifeless> thread 4 is in sendall
[19:00] <lifeless> as is 9, same spot as they have all been
[19:00] <lifeless> thread 12 is taking a threading lock
[19:00] <lifeless> lets see
[19:00] <gary_poster> flacoste, sorry, just saw this.  The old team lead call time is fine with me, but I thought lifeless would prefer not to meet that early.  The later we make it, the less likely Europeans from my team can attend, and the easier it is for lifeless, AFAIK.
[19:01] <lifeless> gary_poster: I can attend at that time, but we'll have to boot deryck :)
[19:01] <lifeless> gary_poster: who claimed that spot like a flash, before
[19:01] <gary_poster> lifeless, heh
[19:01] <gary_poster> um
[19:02] <gary_poster> ok, I'll go look at the calendar...
[19:02] <lifeless> deryck: you'd be ok ~ this time on your thursday ?
[19:02] <deryck> I'm more Batman than Flash.  are we talking about me? :)
[19:02] <gary_poster> heh
[19:02] <deryck> lifeless, sure.
[19:02] <lifeless> ok, so deryck if you move your one +24 hours or so, and gary_poster you can move the paralleltests one an hour earlier.
[19:03] <deryck> lifeless, done!
[19:03] <gary_poster> thank you lifeless & deryck
[19:03] <lifeless> deryck: thread 12 looks like its the implementation for a threaded queue or something (haven't checked the .py source yet)
[19:04] <gary_poster> flacoste, done on calendar
[19:04] <lifeless> deryck: so its not going to be the gil that its waiting ok - in fact it has just released the gil (see threadmodule.c line 46)
[19:05] <lifeless> deryck: it looks like it is waiting for another request to service, judging from the call stack
[19:06] <lifeless> thread 13 is running some code *I think*
[19:06] <lifeless> the python integration blew up though (frame 0 is the fail)
[19:06] <lifeless> so we need to check the source to see if it holds the GIL
[19:07] <lifeless> and indeed, line 943 is in the middle of the big opcode jump case statement
[19:07] <lifeless> so, thread 13 holds the gil
[19:08] <lifeless> and is processing a knit repository, which the other inventory access in the other core was doing as well
[19:08] <lifeless>  /~starbuggers/sakila-server/mysql-5.1-wl820/view/head:/plugin/java_udf/java_context_test.cc is the file
[19:10] <abentley> deryck, rick_h: Interrupt dutes done in less than an hour.  Went down the whole list.
[19:10] <lifeless>  /~starbuggers/sakila-server/mysql-5.1-wl820/view/head:/plugin/java_udf/grokjni.pl was the inventory content the other core was doing
[19:10] <deryck> abentley, nice!  I'll look forward to mine in an hour then. :)
[19:11] <rick_h> abentley: rocking
[19:11] <rick_h> deryck: coat tail rider :P
[19:11] <lifeless> so, weak correlation there
[19:11] <deryck> rick_h, now you've finally figured me out.  oh no, my secret is exposed! :)
[19:12] <lifeless> thread 14 is another waiting-for-a-request
[19:12] <lifeless> as is 15,16,17
[19:12] <deryck> lifeless, ah, so dealing with the same objects in different threads.  did I understand that right?
[19:12] <lifeless> 18
[19:12] <lifeless> deryck: no, no indication of that yet; was noting that the same branch is being accessed from each core
[19:12] <lifeless> so there may be something to do with that content
[19:12] <deryck> lifeless, ok
[19:13] <lifeless> its also a 'knit' format branch which is bzr < 1.0's native format
[19:13] <lifeless> I think, or something in that general area
[19:13] <lifeless> 18 is waiting for a request
[19:13] <lifeless> 19 and 20 too
[19:14] <lifeless> 21 is waiting for the GIL
[19:14] <lifeless> and its the actual mainloop - note the serve_forever () and the PyMain at the top of the stack
[19:14] <lifeless> Py_Main I mean
[19:15] <lifeless> deryck: I think https://pastebin.canonical.com/59626/ has two different bt's in it, its a little confusing
[19:15] <lifeless> yeah, definitely does
[19:17] <lifeless> my info is ok, because I started at the bottom which was indeed the other set of bt's
[19:17] <lifeless> anyhow, what does this mean
[19:18] <lifeless> 12,14,15,16,17,18,19,20 are workers waiting for a request, 21 is the mainloop, 13 is doing work - thats 9 waiting, one mainloop and one worker working
[19:18] <lifeless> so this second core looks totally healthy and unstuck
[19:20] <lifeless> threads 9 and 4 are a little worrying - that send() behaviour
[19:20] <lifeless> but they don't hold the GIL
[19:21] <deryck> did I cut-n-paste wrong or something, to get different bt's?  I thought I just scp'ed gdb and pasted straight as is.
[19:21] <deryck> gdb.txt, I meant.
[19:21] <lifeless> there is nothing, assuming thread 13 would come alive again, stopping the healthy workers from serving more requests
[19:21] <lifeless> deryck: https://pastebin.canonical.com/59626/ and https://pastebin.canonical.com/59625/
[19:21] <lifeless> deryck: compare the first four lines
[19:21] <lifeless> deryck: and then the bottom four lines
[19:22] <lifeless> the bottom four lines of 59625 appear in the middleish of 59626
[19:23] <lifeless> deryck: so, the core with happy workers has only one real issue and thats a busy thread; its possible that that isn't releasing the GIL for some reason, but just regular bzrlib code *should* give other threads timeslices
[19:23] <lifeless> deryck: were both cores taken from hung loggerheads? How was hung determined ?
[19:24] <deryck> lifeless, that's a webops question.  not sure.  I can ask them.
[19:24] <lifeless> deryck: the mysql urls in question both render near-instantly for me
[19:25] <lifeless> http://bazaar.launchpad.net/~starbuggers/sakila-server/mysql-5.1-wl820/view/head:/plugin/java_udf/grokjni.pl and http://bazaar.launchpad.net/~starbuggers/sakila-server/mysql-5.1-wl820/view/head:/plugin/java_udf/java_context_test.cc
[19:26] <lifeless> deryck: so, you may want to copy some of this to the bug; the bad news is I see now reason for the second process to appear hung, and the first process appears to have had its mainloop killed (e.g. via the OOM killer, manual SIGINT, whatever) and that *will* stop it serving.
[19:26] <lifeless> deryck: we now need to track down more data around the state of both of the cores, to see if we can infer anything else.
[19:27] <lifeless> deryck: I hope this has helped!
[19:27] <deryck> lifeless, I really don't mind copying this too the bug.  but it's a lot of text.  Would it be better for you to just summarize this briefly there?
[19:27] <deryck> just so I don't mis-represent.
[19:28] <lifeless> one core has damaged (I suspect killed but not joined()) threads including a missing mainloop. The missing mainloop would on its own make it appear dead to haproxy.
[19:29] <lifeless> It is in gc in another thread; one possible theory is it got too big memory wise and what we are looking at is damaged fallout from some attempt to recover it
[19:29] <lifeless> the other core appears entirely healthy except for the oddness that stuff is stuck in send(); but that is normal if the OS buffer is full, which will happen if the internets are not brilliantly happy (because buffering affects the entire chain)
[19:31] <lifeless> so we need to know for the first one, as much as we can about how it got to that state - were any sysadmin interventions applied first? (if so, the core doesn't represent the failure, it represents the failure + mangling)
[19:32] <lifeless> for the second, we need to know the symptoms that were being reported
[19:32] <lifeless> deryck: I suggest putting the transcript in an attachment for folk wanting to check the workings
[19:37] <deryck> lifeless, done
[19:41] <lifeless> cool
[19:41] <lifeless> and now, breakfast.
[20:06] <barry> sinzui: is there anything we can do to fix private mailing list archive access? :(
[20:07] <lifeless> barry: isd have a fix
[20:07] <lifeless> barry: it is 'in deployment'
[20:07] <barry> lifeless: excellent, thanks
[20:08] <sinzui> lifeless, since when.
[20:10] <sinzui> barry, lifeless bug 663923 give no indication there is a fix available
[20:10] <_mup_> Bug #663923: Cannot view list archive of private team <escalated> <mailing-lists> <ml-archive-sucks> <regression> <Apache OpenID:In Progress by mars> <Launchpad itself:In Progress by mars> < https://launchpad.net/bugs/663923 >
[20:11]  * barry subscribes
[20:11] <sinzui> I still believe grackle will be deployed and that bug will be fixed
[20:13] <barry> sinzui: what's grackle?
[20:13] <sinzui> barry, the archiver we are writing
[20:14] <barry> ah, right.  do you mean once grackle is deployed, you won't need the openid dance?
[20:14] <sinzui> correct
[20:15] <barry> cool
[20:15] <barry> that'll be nice
[20:15] <barry> heck,  i might even switch to grackle in mm3
[20:17] <sinzui> barry, possibly. I think Cassandra should be a choice rather than a requirement. We can written an almost complete memory store implementation that could be subclassed to implement a sql or simple mbox implementation
[20:17] <sinzui> s/We can written/We HAVE written/
[20:19] <barry> nice.  what's the status of it?  is code available?  is it functional yet?
[20:23] <abentley> barry: We started work on it at the Thunderdome, but I haven't been involved since.
[20:23] <barry> where are the branches? :)
[20:23] <abentley> barry: lp:grackle
[20:24]  * barry branches it for later
[20:25] <sinzui> barry, I need one more day to complete the client. We can them complete the server in a  few days
[20:26] <sinzui> barry, all the code is in trunk https://code.launchpad.net/grackle
[20:26] <lifeless> sinzui: since the ISD weekly report
[20:27] <barry> thanks.  i will definitely keep my eye on it
[20:27] <mars> sinzui, it is also in my team's goals for Q4
[20:28] <sinzui> mars, thanks
[20:28] <lifeless> sinzui: are you on isd-announce?
[20:28] <sinzui> no
[20:37] <lifeless> sinzui: I'm not sure how to get you on it; but it does have a aweekly summary of what ISD are up to that may be informative
[20:38] <sinzui> lifeless, I do not need to be more involved. This issue will be closed  soon
[20:47] <lifeless> sinzui: bug 928391
[20:47] <_mup_> Bug #928391: ProgrammingError creating new team <oops> <Launchpad itself:Triaged> < https://launchpad.net/bugs/928391 >
[20:47] <lifeless> sinzui: I think that that might be something your squad knows aboot
[20:48] <sinzui> lifeless, i learn about it an hour ago.
[20:48] <sinzui> My team will fix it
[20:53] <lifeless> kk
[21:02] <deryck> dentist time, yuck
[21:12] <abentley> rick_h: have  you closed bug #294656  ?
[21:12] <_mup_> Bug #294656: Every page requests two JavaScript libraries (remove MochiKit) <javascript> <lp-bugs> <lp-translations> <lp-web> <tech-debt> <Launchpad itself:Triaged> < https://launchpad.net/bugs/294656 >
[21:13] <rick_h> abentley: ah, sorry. Guess that never got linked to the branch. Yea, mochi is done and gone
[22:27] <wallyworld_> sinzui: https://pastebin.canonical.com/59655/
[22:31] <wgrant> fuuuu
[22:31] <wgrant> <unprintable Unauthorized object>
[22:31] <wgrant> From the isd team creation forbidden
[22:32] <jelmer> g'morning wallyworld_, wgrant
[22:33] <wallyworld_> jelmer: g'day
[22:37] <sinzui> wallyworld_, are you running tip? I see what looks like a fix: https://code.launchpad.net/~bzr-pqm-devel/bzr-pqm/devel
[22:38] <wallyworld_> sinzui: no, just whatever a default precise install provides. i'll try tip, thanks
[22:38] <wgrant> Morning jelmer.
[22:39] <wgrant> StevenK: Bug #928440
[22:39] <_mup_> Bug #928440: When attempting to create a new team, I'm told I am "Not allowed here" <fallout> <regression> <Launchpad itself:Triaged> < https://launchpad.net/bugs/928440 >
[22:39] <wgrant> See my comment
[22:42] <sinzui> wallyworld_, or revert to -r 80.
[22:43] <wallyworld_> sinzui: tip still breaks, so will try that rev
[22:47] <sinzui> wallyworld_, the branch is in ec2 now
[22:47] <wallyworld_> sinzui: and rev 80 breaks too. i need to see where RemoteBranch lives
[22:47] <wallyworld_> thanks for landing
[22:48] <wallyworld_> sinzui: the issue is the version of bzrlib
[22:48] <sinzui> wallyworld_, yes, I think I am using the system lib
[22:49] <wallyworld_> sinzui: makes sense. i am using the one from lp-sourcedeps
[23:16] <wgrant> lifeless: Shall I start landing my heat incineration branches?
[23:27] <lifeless> wgrant: yes
[23:27] <lifeless> noone has flamed us AFAICT
[23:27]  * thumper flames lifeless
[23:27] <wgrant> That was my thinking
[23:27]  * thumper flames wgrant and wallyworld for good measure
[23:27] <wgrant> Uhoh
[23:28]  * thumper leaves again
[23:28] <wallyworld> thumper: what have i done this time?
[23:28] <thumper> wallyworld: I'm sure you know...
[23:28] <wallyworld> thumper: well, it could be one or soooo many things
[23:28] <wallyworld> s/or/of
[23:29] <wgrant> lifeless: Does this count as removing complexity to offset disclosure? :P
[23:29] <lifeless> wgrant: I can see we're going ot have fun with that
[23:29] <lifeless> disclosure is offsetting user ticket complexity and performance suckiness too
[23:29] <wgrant> It is
[23:30] <wgrant> I am trying to respect the 5s rule with bug searches.
[23:30] <wgrant> As much as I can.
[23:30] <lifeless> anyhow
[23:30] <lifeless> heat was signed off on by stakeholders including ubuntu, for the changes effective monday
[23:31] <lifeless> I see no reason to wait an extended period
[23:31] <wgrant> Sure.
[23:31] <wgrant> The stakeholders aren't the only stakeholders, but indeed the outcry seems to be nonexistent.
[23:31] <lifeless> wgrant: btw
[23:31] <wgrant> Which is as I expected.
[23:31] <lifeless> wgrant: you can't delete the garbo job straight away
[23:31] <wgrant> Oh?
[23:32] <wgrant> (he says, as he Ctrl+Cs the lp-landing of the garbo job removal)
[23:32] <lifeless> exercise for the reader. You will facepalm.
[23:32] <lifeless> tell me if you timeout ;)
[23:33] <lifeless> rick_h: is bug 928500 your work?
[23:33] <_mup_> Bug #928500: 'Series and Milestones' graph not loading - LPJS is not defined <graph> <javascript> <latency> <loading> <lpjs> <milestone> <series> <Launchpad itself:New> < https://launchpad.net/bugs/928500 >
[23:33] <wgrant> lifeless: I don't see the issue.
[23:33] <lifeless> wgrant: we are changing the rule for heat calculation to not include age.
[23:34] <lifeless> wgrant: what process will we use to update bugs that *are not changed* to use the new rule ?
[23:34] <wgrant> lifeless: I decided that we don't care enough.
[23:34] <wgrant> Do we?
[23:34] <lifeless> Well, if we can point any-and-all 'wtf' bug reports to you, sure.
[23:35] <lifeless> I think its pretty cheap to let the garbo do one full scan post-heat-calculation-change, and it ensures that it is all consistent
[23:35] <wgrant> Then I'll mark the bug as affecting and then not-affecting me, and then say "wtf" back because the value is correct :)
[23:35] <wgrant> But true.
[23:36] <wgrant> So, I guess I'll put the DB patch in a separate pipe and do that first.
[23:37] <lifeless> \o/
[23:37] <wgrant> lifeless: Hm,
[23:37] <wgrant> lifeless: Except that the updater never completes.
[23:37] <wgrant> Bug #906193
[23:37] <lifeless> wgrant: I'm pretty sure it is incremental
[23:37] <wgrant> Probably better to do a one-off
[23:37] <_mup_> Bug #906193: BugHeatUpdater never completes <Launchpad itself:In Progress by wgrant> < https://launchpad.net/bugs/906193 >
[23:37] <wgrant> It's not
[23:37] <wgrant> Oh
[23:37] <wgrant> I guess it is
[23:37] <lifeless> the warning was bogus, last I looked at it
[23:37] <lifeless> it doesn't do a full scan in 1 hour
[23:38] <wgrant> Yeah, true.
[23:38] <wgrant> It probably never catches up, though.
[23:38] <wgrant> Anyway, will land the DB patch without the garbo dropping.
[23:38] <lifeless> let me check ze code
[23:39] <lifeless> wgrant: so yeah -
[23:39] <lifeless>     def _outdated_bugs(self):
[23:39] <lifeless>         outdated_bugs = getUtility(IBugSet).getBugsWithOutdatedHeat(
[23:39] <lifeless>             self.max_heat_age)
[23:39] <wgrant> But it seems to never finish.
[23:39] <wgrant> Which means it is behind.
[23:39] <wgrant> It's incremental, but probably never catches up.
[23:40] <lifeless> your new function should be cheaper
[23:40] <wgrant> I suspect it's better just to do a one-off four-line script to update everything.
[23:40] <wgrant> It is about 5 times cheaper, true.
[23:40] <lifeless> I don't have an opinion; script is fine if thats what you think is best
[23:40] <lifeless> my reading of the code is that the heat updater will, on each run, do the lowest-id N older-than-X bugs
[23:41] <lifeless> this could be always be behind but still hit everything
[23:41] <wgrant> It does, yes.
[23:41] <wgrant> Oh?
[23:41] <wgrant> What if the first hundred thousand bugs get updated regularly?
[23:41] <wgrant> The top 800000 never will
[23:41] <lifeless> so lets say it takes Y days to become stale
[23:42] <lifeless> mm, rephrase
[23:42] <lifeless> runs hourly
[23:42] <lifeless> in one hour, it does N bugs
[23:42] <lifeless> if the number of bugs *becoming stale* per hours is greater than N
[23:42] <wgrant> I wouldn't expect it to be, but it looks like isDone is never hit.
[23:42] <wgrant> Which suggests that it is.
[23:43] <lifeless> then after X days, it will have to do the first N again, and you'll have a loop over 24*Y*N bugs
[23:43] <lifeless> any bug updated for other reasons within that Y period will perturbate the loop and get other bugs updated.
[23:43] <lifeless> anyhoo; shrug. Like I say, choose the best use of your time w/curtis, and have fun.