=== Ursinha-bbl is now known as Ursinha [00:07] * poolie might try to finish my dkim branch [00:08] oh, poo, codehosting i guess is entirely off line [00:15] yup [00:15] * mtaylor once again annoyingly renews his objection to making launchpad completely unusable for 2 hours in the middle of the afternoon for the US west coast in the middle of the week [00:17] i know, it's crazy [00:18] also, i don't see any good reason why you shouldn't be able to at least read branches [00:18] and indeed why not write to them, because they're not stored in the db [00:18] but, it's getting better [00:18] i have trust in lifeless and co [00:32] so today we had a db config issue on the readonly slave; another case where schema problems have bitten us. [00:32] (the way we do schema changes, I mean) [00:38] lifeless, what server does PQM run on? [00:38] prae [00:39] pqm runs outside a chroot, but code from our branches runs inside the chroot [00:39] ah, darnit [00:39] ? [00:40] darnit that we are not there yet. [00:40] with Python 2.6 [00:40] right [00:40] there may be other gotchas [00:40] on prae? yes, maybe [00:40] the simplest thing IME is to stay compatible until we have *every possible case* covered. [00:40] no shortcuts [00:41] well, that is going to be what happens now :) [00:59] thumper, rockstar, stub, mwhudson, stevek, lifeless, wgrant -- Review Meeting starting soon [01:00] bac: thumper sends his apologies, but i'll lurk if that's ok [01:00] wallyworld__: that's great. [01:02] lifeless: ping === Chex changed the topic of #launchpad-dev to: Launchpad Development Channel | Week 4 of 10.10 | PQM is Release-Critical; devel is closed (Release manager: EdwinGrubbs) | firefighting: - | https://dev.launchpad.net/ | Get the code: https://dev.launchpad.net/Getting [01:06] Edwin-afk: so when does devel open again? :-) [01:07] bac: hi [01:08] lifeless: join us in #launchpad-meeting? [02:23] StevenK: An Ohloh reimport is in process, BTW. [02:23] https://www.ohloh.net/p/launchpad/enlistments [02:24] Huzzah [02:30] mars: Do you have a few seconds for me to bend your ear? [02:31] StevenK, sure [02:32] mars: I keep seeing failures such as https://hudson.wedontsleep.org/job/db-devel/lastFailedBuild/testReport/junit/lp.codehosting.puller.tests.test_worker/TestWorkerProgressReporting/test_network/ keep appearing in hudson, do you have any clues? [02:32] I think the same failures happen on ec2 too [02:33] looking [02:33] wgrant, this project does amazing things for one's Ohloh Kudos rank :) [02:33] mars: If that small snippet isn't helpful, there's a full console output link on the left [02:34] StevenK, unfortunately that tells me enough. This is Benji's error from ec2 earlier today: https://pastebin.canonical.com/38615/ [02:35] /with/ a patch I had to try and fix it [02:35] what's funny is the thread ID is the same [02:35] Always the 18th thread started [02:36] mars: It does! [02:36] It looks like this import might only take a few days. [02:37] StevenK, I am almost to the point of desperate measures to solve this. Two thoughts: in the tearDown, enumerate all running threads, .join(3.0) on them. Give them time to halt. [02:37] mars: There are two others linked from https://hudson.wedontsleep.org/job/db-devel/lastFailedBuild/testReport/junit/ as well [02:37] StevenK, or run something yucky like a custom tracer via threading.settrace(), then pick out whatever the heck it is that is hanging around [02:38] StevenK, I think it might be an intermittent race condition between the zope testrunner and our test infrastructure. I solved this once before (and forget how once again) [02:40] StevenK, this work will probably become a priority. When it does, you will have a fix for it. [02:40] mars: Excellent. My only other concern is why does it impact hudson and ec2, but not buildbot? [02:41] StevenK, my theory: BB servers are faster, affecting the race [02:42] That could be why many of us can't reproduce it locally [02:43] yes [02:43] mars: So it may even end up being a race in zope's testrunner, and we just happen to tickle it? [02:44] maybe. The testrunner does no thread cleanup [02:46] mars: That sounds like a zope fail to me [02:46] When's devel likely to reopen? [02:46] wgrant: It was going to be in an hour, but now it's 3. [02:46] More seriously, RSN [02:47] StevenK, got to run - fwiw, the testrunner could die horribly when doing a thread.join() - unjoined threads are test garbage and break isolation [02:47] best leave their cleanup to the test itself [02:47] same as leaking memory garbage [02:47] later === thumper changed the topic of #launchpad-dev to: Launchpad Development Channel | Week 4 of 10.10 | PQM is open | firefighting: - | https://dev.launchpad.net/ | Get the code: https://dev.launchpad.net/Getting [03:19] mars: StevenK: do we know what threads they are? [03:19] https://code.edge.launchpad.net/~wgrant/launchpad/bug-655648-a-f-maverick/+merge/37820, https://code.edge.launchpad.net/~wgrant/launchpad/bug-629921-packages-empty-filter <-- can someone please land those? [03:21] lifeless: Personally, I have no idea [03:21] energy invested in making that automatically determined would be of great evalue [03:22] or we could just disable the check, though I know thats not terribly popular an idea. [03:25] lifeless: Hudson is showing, time and again, that there is a number of them that fail the same way [03:25] Ursinha: I may be confused [03:25] Ursinha: but I thought there was a report of oopses-received-today [03:26] lifeless, it's the lpnet-oops.html [03:26] same for edge and staging [03:26] Ursinha: how often does it update? [03:26] lifeless, hourly, I guess [03:26] let me check [03:26] last updated at 11pm utc, not updated since. [03:30] lifeless: Hints on where to look would be awesome [03:30] StevenK: have the teardown that bitches introspect the thread objects [03:30] theres a few attributes that may be useful like name [03:31] but also whether its dameonised and if accessible the start function would be good. oh and the class, though I think we're seeing that already. [03:32] does anyone know if there's a bug report i can associate https://code.edge.launchpad.net/~jameinel/launchpad/lp-service/+merge/37531 with? [03:32] mwhudson: yes [03:32] hmm, there was. [03:32] but hell, file a new one. [03:37] lifeless: ok, let me know the number when you have it? [03:37] or link it, either works [03:39] lifeless: Using threading.enumerate(): http://paste.ubuntu.com/512834/ [03:40] lifeless, we don't know what threads they are. That is why I am thinking of using the threading.settrace() method to find out. My research says threads are a black box otherwise [03:40] unless you use the thread name well [03:41] mars: ^ [03:42] yes, not the most helpful thread names :) [03:42] StevenK, didnt' think of using the imp module - would that work? [03:43] err, inspect, not imp [03:43] StevenK, for each thread, can you see the .__class__ method? [03:43] attribute [03:44] bad typing tonight [03:45] mwhudson: I oculdn't find it; want to file one ? [03:46] lifeless: ok [03:46] mars: not a black box [03:47] >>> t = threading.Thread(target=lambda:None) [03:47] >>> t._Thread__target [03:47] at 0x7f75dfdd92a8> [03:47] t.daemon [03:47] StevenK: print those two things as well please [03:47] (t.name, t.daemon, t._Thread__target) [03:51] lifeless, ah, you are accessing a private object member - clever [03:52] >>> t = threading.Thread(target=lambda:None) [03:52] >>> dir(t) [03:52] and look [03:52] we may also want _Thread__args [03:52] but thats more likely to throw up in our face, I think. [03:54] um [03:54] is person search on launchpad completely horked right now? [03:54] yes [03:54] the huge vocab bug [03:54] 8.4 regression [03:54] ah ok [03:54] once Ursinha gets back to me about lpnet-oops being stale I will raise the timeout for it via flags. [03:56] its probably validpersoncache [03:56] but also we've got this bizarre thing where staging is fine and prod sucks [03:56] which reminds me, its bug fiuling time on that [04:00] lifeless, so.. it seems oops-tools can't find any oopses [04:00] aarrrgghh [04:01] I'm checking devapd [04:01] Ursinha: are there any on disk on sodium? [04:01] devpad [04:01] that's what I'm checking [04:01] kk, great minds :) [04:03] lifeless, mars: Sorry, was afk: http://paste.ubuntu.com/512844/ [04:03] lifeless, hm, I see oopses there [04:03] lifeless, /srv/launchpad.net-logs/lpnet/gandwana/2010-10-14 [04:03] I see a bunch [04:04] for instance [04:04] 340 [04:04] Ursinha: ok, so oopstools is broken ? [04:05] StevenK: so, that tells use that that one has a bzr server [04:05] lifeless, investigating what's happening.. [04:05] StevenK: two threads; making the error print this extra info would be useful :) [04:05] lifeless: Which I'm guessing it started and didn't tear down? [04:05] StevenK: theres a few possibilities [04:06] oh god [04:06] StevenK: process_request_thread may mean that there is a half closed socket, for instance. [04:06] mwhudson: been drinking ? :) [04:06] i think this is because bzrlib's test http server implementation doesn't join() its thread [04:06] lifeless: i wish [04:06] lifeless, are you the one that's viewing src/oopstools/oops/dboopsloader.py? [04:06] mwhudson: invoking the 'oh god' :) [04:07] Ursinha: no [04:07] hm [04:07] mwhudson: Does that make it a bzrlib bug, then? [04:07] eggs/bzr-2.2.0-py2.6-linux-x86_64.egg/bzrlib/tests/http_server.py:597 and thereabouts [04:08] StevenK: 'difference of opinion' vs bug [04:08] that said, i thought we joined that thread in launchpad, so maybe it's something else [04:08] Heh, I see that, reading the comment. [04:08] that's not entirely unrelated [04:08] lifeless, I see a problem here in the oops loader, have to find out how to break the lock file [04:08] Ursinha: ok [04:08] Ursinha: I wait with bated breath [04:08] StevenK: you could try putting a join() in there and seeing what happens [04:09] mwhudson, this argues for a test teardown .join() call then [04:09] /win 2 [04:10] mars: not really [04:11] lifeless, why not? [04:11] mars: it argues that our test code looking for thread leaks is wrong [04:11] mars: / unnecessarily strict [04:11] mars: looking over the life of the whole test run should be sufficient, for instance (which is what bzrlib does, more or less) [04:12] that said, there's no reason not to join there, that comment is *almost certainly* premature optimisation. [04:12] lifeless, sorry, you lost me - no reason not to join where? [04:12] Can I encourage 'you' to put a patch forward to bzr to fix this. [04:12] mars: stop_server(self) [04:14] Adding self._http_thread.join() to bzrlib has no effect on the output [04:14] lifeless, oops-tools are mostly matsubara-afk 's domain, I'm not sure what to do there without possibly breaking something [04:15] lifeless: ^ [04:16] hmm [04:16] I guess I solved :) [04:16] update_infestation is running, let's see if oops-tools loads all oopses now [04:17] StevenK: we're not sure that function is being called, are we? [04:18] hmm, stop_server is being called [04:18] and a gc.collect() is being called [04:18] at least, according to the test [04:19] Indeed [04:19] Ursinha: awesome, how long should I wait before trying [04:20] lifeless, no idea, I'll let you know as soon as it finishes, but shouldn't take long [04:21] lifeless, hm, something is really wrong, script output was "no infestation updated" [04:21] it's not recognizing the oopses [04:22] hmm [04:23] lifeless, the line was commented out the crontab... [04:23] I ran that manually and it's loading the oopses [04:23] but not sure why that was commented [04:23] hopefully it won't cause any trouble [04:32] lifeless, I ran out of ideas. It loaded 7k oopses to the database but it just cannot find it for lpnet, only edge [04:32] edge has a partial report now, but lpnet doesn't [04:32] report generator claims there are no oopses for lpnet [04:32] we'll have to wait for diogo [04:34] Ursinha: thanks for trying [04:34] stub: hey, can we have a brief skype call? [04:35] no problem lifeless, sorry not being more helpful [04:35] going to eat something and sleep [04:35] Ursinha: ciao === Ursinha is now known as Ursinha-afk [04:38] lifeless: sure [04:38] lifeless: stuartbishop [04:43] https://bugs.edge.launchpad.net/soyuz/+bug/659129 [04:43] <_mup_> Bug #659129: Distribution:+ppas timeout in PPA search [04:46] lifeless, StevenK, I tried the .enumerate() patch, but the results are nonsense. Here is the patch, in case you see anything obviously wrong with it: http://pastebin.ubuntu.com/512867/ [04:47] lifeless, StevenK, the threads are still alive at the end of the test run, after TestCaseWithTransport.tearDown(self), but the zope testrunner doesn't complain [04:51] mars: Throw that at ec2 and see what happens after runs the test suite? [04:51] StevenK, it fails locally. It is bunk :( [04:51] no point in running it through ec2 [04:59] StevenK: http://wiki.postgresql.org/wiki/Postgres-XC [04:59] bah [04:59] stub: http://wiki.postgresql.org/wiki/Postgres-XC [05:17] https://lp-oops.canonical.com/oops.py/?oopsid=OOPS-1747S227 [05:17] ^ [05:17] 06:27 < bigjools> lifeless: https://lp-oops.canonical.com/oops.py/?oopsid=OOPS-1747S227 [05:17] 06:27 < bigjools> the fixed query is not the one that's timing out [05:17] 06:28 < bigjools> I smell a problem with the ValidPersonOrTeamCache view [05:18] https://bugs.edge.launchpad.net/launchpad-registry/+bug/655802 [05:18] <_mup_> Bug #655802: Branch:+huge-vocabulary timeout (Person and team AJAX picker fails) [05:20] stub: https://lp-oops.canonical.com/oops.py/?oopsid=1740EC788 [05:26] "The PostgreSQL Project finally switched from CVS to Git in September 2010" [05:26] O.o [05:40] stub: https://bugs.edge.launchpad.net/launchpad-foundations/+bug/660291 [05:40] <_mup_> Bug #660291: inconsistent performance between staging and prod [06:20] stub: I've tagged all the bugs pg83; probably a little zealous, but better safe than sorry ;) [06:33] ALL the bugs?? [06:35] "Every bug is now tagged pg83, enjoy" ? :-) [06:44] stub: all the timeout ones === almaisan-away is now known as al-maisan [07:12] So can anyone tell me why the debugging information isn't being printed at https://lpbuildbot.canonical.com/builders/lucid_prod_lp/builds/7/steps/shell_7/logs/stdio ? Relevant code is test_on_merge.py around line 80. [07:27] Project devel build (104): STILL FAILING in 4 hr 2 min: https://hudson.wedontsleep.org/job/devel/104/ [07:27] stub: what debugging info are you expecting ? [07:28] lifeless: The information about the open connections that test_on_merge.py should emit around line 80. [07:29] stub: if not results: [07:29] break [07:30] hmm, no, that can't be triggering [07:30] yer - I don't follow. Either it is a logic error I can't see, or buildbot is stripping the information, or it isn't on that branch. [07:30] I suspect the latter [07:31] that code is in prod-devel [07:31] Given how noisy test_on_merge usually is [07:31] hangon [07:32] stub: thats not the code thats executing [07:32] print 'Cannot rebuild database. There are open connections.' [07:32] != [07:32] Cannot rebuild database. There are 1 open connections. [07:32] ahh [07:32] Can we move to Hudson now? Buildbot is annoying me. [07:33] StevenK: you were working on reliability - hows that going? [07:33] lifeless: I got distracted by real work [07:33] StevenK: fair enough [07:34] stub: I don't know what code *is* running, but its not test on merge [07:34] I can't find it in the tree [07:34] Maybe buildbot scripts? [07:34] I guess [07:34] it claims its running test_on_merge [07:34] yer... [07:35] perhaps a .pyc stale issue [07:36] thats the old output [07:36] Right. Or that branch just doesn't have my patch. [07:36] and now bb is down >< [07:36] stub: it does [07:36] stub: I've pulled and checked [07:36] rev 9848 [07:38] redoing just to be sure [07:38] nope, its good. [07:42] hmm, where *has* julians distribution:+ppas patch gone [07:43] ah, its in devel [07:44] way past eod; ciao. [07:44] stub: build 7 was old [07:44] https://lpbuildbot.canonical.com/builders/lucid_prod_lp/builds/9/steps/shell_7/logs/stdio was current [07:45] till spm nuked bb [07:45] My fault [07:48] Could someone be convinced to EC2 my two long-approved branches? [07:49] wgrant: Oh, sorry, I meant to do that. Link me again? [07:50] StevenK: https://code.edge.launchpad.net/~wgrant/launchpad/+activereviews [07:50] wgrant: Rah [07:50] Oh? [07:51] * StevenK blinks [07:51] return self._mp.queue_status == 'Approved' [07:52] AttributeError: 'Entry' object has no attribute 'queue_status' [07:52] The URL is right? [07:53] Oh, wait [07:54] wgrant: Yeah, my fault [07:55] Yay. [07:55] EC2 doesn't hate me for the seventh time :) [08:02] Hm [08:02] ec2 still uses pg8.3 [08:02] Eep. [08:03] I think we need a new machine image [08:07] wgrant: You don't really care about the ec2 URLs, right? [08:07] * StevenK also notes that ec2 land gets really unhappy if two copies are running [08:22] wgrant: Both branches are in ec2 [08:42] grah we're in testfix ? :( [08:52] lifeless: Only because buildbot sucks [08:53] good morning [08:59] heh, oops [08:59] " [08:59] Invalid stacked on location: /+branch/qbzr [08:59] " [08:59] So whilst the ssh server understands those new URLs, Launchpad gets irked if you use them as stacking locations [09:04] please file a bug [09:11] filed as bug 660358, with a bonus easter egg extra idea :-) [09:47] lifeless: still around? [09:50] mars: O hai -- our ec2 images still use postgres 8.3, could you prod them up to 8.4? [10:00] StevenK: Thanks. [10:01] * bigjools is way behind on email [10:01] mrevell: Fancy looking at bug 660283? [10:01] <_mup_> Bug #660283: Bug search pages should document valid search expressions [10:01] * mrevell looks [10:01] thanks allenap [10:03] I'm confused. Should we have "timeout" and "oops" on timeouts? Or just oops? Or just timeout? [10:08] bigjools: timeouts should have 'oops' and 'timeout' unless something has changed recently [10:08] jml: I need to have a word with Rob then :) [10:09] bigjools: is lifeless deleting tags? [10:11] jml: a bunch of soyuz bugs had "oops timeout" turned into "timeout" IIRC and when he added the pg83 tag the oops tag got removed. I don't know if that's deliberate. [10:11] bigjools: me either. [10:12] Might have been removed on purpose - pg83 indicates we don't have a current valid OOPS (although a number of non-db ones have got that tag too...) [10:13] ahh, yeah, that'll be it. === jtv is now known as jtv-afk [10:13] furry muff [10:24] bigjools: I'm told that timeout and oops tags are mutually exclusive [10:24] bigjools: by urshina [10:24] lifeless: that seems sub-optimal to me [10:25] I'll talk to her and see why, thanks [10:25] https://dev.launchpad.net/LaunchpadBugTags doesn't explain [10:25] there was a different page [10:26] anyhow, on my second day or so I was updating bugs with 'timeout oops' as per how I read the policy [10:26] and urshina said that it was meant to be one or the other [10:26] bigjools: to me one seems as good as the other, just as long as we're consistent so tools can be written [10:26] jml: bigjools: ah, here it is https://dev.launchpad.net/PolicyAndProcess/ZeroOOPSPolicy [10:26] lifeless: thanks [10:26] 'It should be tagged with either 'oops' or 'timeout' on it. [10:27] it doesn't say why [10:27] and I find it useful to have both tags [10:28] bigjools: It doesn't matter to me either way as long as I don't need to remember different rules for different parts of the same project [10:28] agreed [10:28] bigjools: I'd be delighted to change, if you want to bring it up with urshina/the list [10:29] my (probably faulty) recollection is that it was for qa tooling [10:29] I find it useful to search for just one tag "oops" and get everything related. If we don't get stuff tagged with just "timeout" it's more time-consuming, at least for me, to remember to look for the other tag too. [10:29] also, someday I'm going to make that oops/timeout graph that lifeless asked me for [10:29] heh [10:30] consistent tagging will be important [10:43] * bigjools totally loves PG84's psql that tells you what other tables reference your column [10:44] stub: regarding that sql you did in the bug comment, I vaguely remember someone saying something about there being a way to check person validity directly on Person? [10:45] bigjools: Yer - love that. Pain in the arse backtracking that stuff before [10:46] bigjools: You need person, emailaddress and account for our current 'valid person' rules. [10:46] bigjools: With just person, you can't tell if their account is active or if they have a preferred email address [10:46] ok [10:47] I still need to order on person, so I need that extra crap :( [10:48] So my timings on current staging seem ok. lifeless got one with a 10 second query. [10:48] I'll make another patch to try out with that changed query [10:51] sqlobj doesn't do LEFT OUTER JOIN does it :/ [10:53] I've blotted all that from my mind. [10:57] this is going to be tricky to change [10:58] It's not easily Stormifiable? [10:58] no [10:59] take a look at Distribution.searchPPAs [10:59] it has fti stuff - last time I tried that in Storm it was a world of pain [11:00] If necessary you could just SQL() that bit. [11:00] I could, which is what I think I did [11:00] but something else is nagging me and I can't remember what it was [11:00] The horrible horrible string concatenation in that method? [11:00] heh [11:01] that's how it was done with sqlobj [11:01] yes, but that's like so three years ago. [11:01] I do not want to stormify this query [11:01] not right now anyway [11:01] Seems the fastest approach to me [11:02] It looks easy enough to Stormify... as long as the callsites aren't braindead. [11:02] ... [11:02] you know what they say about assumptions [11:02] But this should have only one callsite. [11:02] hahaha [11:03] Well, there's only one non-test callsite that cares about the result. [11:05] * bigjools considers store.execute [11:05] * stub ♥ store.execute() [11:06] Why are you ordering by Person.name anyway? [11:06] Seems... odd... [11:06] because this stuff needs to appear on +ppas [11:07] Yer, but Person.name is a very arbitrary order. [11:07] true but it's less arbitrary than anything else [11:07] displayname is better. [11:08] But isn't relevance better still? [11:08] yes [11:08] If you pick a field in Archive to order by, you don't have to rewrite it in Storm :) [11:08] stub: I know, don't think I hadn't considered that :) [11:08] Oh, you order by relevance then name, I see. [11:09] * stub changes his Person.name to 'aaaaaaaaaaaaaaaa_stub' [11:09] I could order by relevance, then ppa name perhaps [11:09] * bigjools thinks [11:09] * nigelb points stub to http://uncyclopedia.wikia.com/wiki/AAAAAAAAA! [11:10] like :) [11:13] wgrant: actually something sensible needs to be the default for when no search term is supplied [11:14] I like displayname to some extent [11:14] bigjools: Person.name is probably not that [11:14] Archive.displayname might be. [11:14] indeed [11:14] since it will include the person.name anyway unless they changed it :) [11:14] Um, well, not any more. [11:14] There is no default [11:15] mmm true [11:15] I think it's a better default, let's see [11:15] We really need to fix that lack of default. [11:15] Although it's not so bad now that the key doesn't acquire that name permanently. [11:24]