/srv/irclogs.ubuntu.com/2010/11/23/#launchpad-dev.txt

wallyworld_thumper: should i ask sk to review those 2 daily recipe build list page branches?00:09
pooliebugclient now fails with01:04
poolieAttributeError: 'Entry' object has no attribute 'markAsDuplicate'01:04
pooliedid this recently change in the api?01:04
lifelesswhat api version are you using?01:05
lifeless?1195801:06
james_wlifeless, api/1.0> why not?01:10
lifelessjames_w: ?01:11
pooliebug 1195801:11
_mup_Bug #11958: Unable to show hidden files <nautilus (Ubuntu):Fix Released by seb128> <https://launchpad.net/bugs/11958>01:11
lifelessoh, that was a typo01:11
lifelesspasted to the wrong window01:11
james_wlifeless, <lifeless> salgado-afk: please don't put the blueprints in api/1.001:12
lifelessjames_w: right. Because 1.0 is supported for many more years.01:12
lifelessbut jml has the issue tracker thing in train for the next year.01:12
james_wwhy not stop the default being to add features to old versions then?01:13
lifelessthere's a bug open on that01:13
lifelessIIRC01:13
james_wok01:13
lifelessif there isn't, there should be - this has been discussed.01:13
lifelesswe shouldn't 'support' things we haven't finished.01:14
lifelesswe should do best effort etc01:14
lifelessbut we need to allow room for mistakes.01:14
pooliehello james_w01:15
james_whi poolie01:21
lifelesspoolie: hi, so what api version are you using?01:26
pooliei don't know01:26
lifelesspoolie: I ask, because its possible some things that were accidentally exposed have been shuffled, but we're meant to be very conservative about removing stuff.... trying to estimate whether to dig deeper or not.01:27
pooliei seem to be just taking the default?01:27
poolie<bug at https://api.edge.launchpad.net/1.0/bugs/415936>01:28
_mup_Bug #415936: Merge into new branch produces strange log <Bazaar:New> <https://launchpad.net/bugs/415936>01:28
poolieso 1.0, i guess01:28
lifelesshmm, you're using edge too :)01:29
lifelesscould you switch to LPNET_SERVICE_ROOT :)01:29
poolieheh, it's a hangover from that once being the only place to get it01:30
pooliemuch better to switch things on/off on the server01:30
pooliesure01:30
lifelesswe can't sadly01:30
lifelesslp API clients start with a POST01:30
pooliei mean in future01:30
lifelessand blow up if thats redirected or otherwise handled by non-edge.01:30
pooliesure01:31
lifelessuhm, so 1.001:31
lifelesslets see01:31
pooliewe could change lplib to make EDGE_SERVICE_ROOT == LPNET_SERVICE_ROOT01:31
pooliebut that might just cause confusion01:31
lifelesswe're going to do something like that01:31
lifelessperhaps with a deprecation warning01:31
pooliehm that name doesn't exist01:31
poolieok, i have it01:32
lifelessah cool01:33
lifelesswhat was it ?01:33
lifelessok, so markAsDuplicate is in beta01:34
lifelessnot in 1.0 or devel01:34
lifelessI can check the changelog but my guess is that you've moved from a launchpadlib that used beta by default to one that uses 1.0 by default01:35
lifelessthe current idiom is to use duplicate_of = xxx01:35
pooliehm01:35
pooliesomething like that01:35
lifelessyou can pass beta as the api version you want01:35
pooliethough, this is on my maverick machine, running the packaged lplib01:35
lifelessor update your code01:35
pooliei doubt that changed in the last few weeks?01:36
lifelessthat would be unusual.01:36
lifelesscould you file a bug? Probably a mistake then.01:36
LPCIBotYippie, build fixed!01:36
LPCIBotProject devel build (238): FIXED in 3 hr 45 min: https://hudson.wedontsleep.org/job/devel/238/01:36
LPCIBot* Launchpad Patch Queue Manager: [r=abentley][ui=none][no-qa] New "images" command for bin/ec2 to01:36
LPCIBotdisplay all current test images.01:36
LPCIBot* Launchpad Patch Queue Manager: [r=flacoste][ui=none][bug=677305] Downgrade bzr to 2.2.001:36
pooliehm, i see i was repeating myself and defaulting to edge in two places01:38
lifelessyou might find https://dev.launchpad.net/PolicyAndProcess/OptionalReviews interesting01:38
pooliei've followed some of the mail about it01:39
poolieok, so even on 1.0 non-edge, it still fails01:40
lifelessmarkAsDuplicate isn't in 1.001:40
poolie"stable interfaces in python are hard"01:40
lifelessperhaps it should be ;) - thus a bug is needed.01:40
pooliei'm pretty sure this was working a week or two ago01:40
pooliemaybe a bit more than this, but within the last couple of months01:41
pooliei will file01:41
pooliethe shorter Optional Reviews report: "data, bitches!"01:44
pooliei think that's great01:44
poolieto be devil's advocate01:44
pooliei think the previous experience is not so much showing everything needs review01:45
pooliebut rather that people will use these to route around a broken review process01:45
pooliecf john's mail01:45
pooliebut perhaps things have now changed01:45
poolies//probably01:45
lifelesswell01:50
lifelessI think that route around is better than stockpiling01:51
thumperwallyworld_: yes, fling them StevenK's way01:52
thumperwallyworld_: I'm just about to head and collect the girls from school01:52
wallyworld_thumper: ok01:52
thumperwallyworld_: we should have a chat after that01:52
wallyworld_thumper: i'll be here01:52
poolielifeless: i think in the first version of this code i did assign to 'duplicate_of', but..01:54
pooliethat didn't work01:54
pooliei can't remember if the change was not saved, or if it gave an error01:54
lifelesswgrant: yo02:01
lifelesspoolie: you need to call obj.lp_save()02:01
pooliei know that, and i think it wasn't enough02:02
poolieimbw02:02
poolieanyhow, bug 68033902:02
_mup_Bug #680339: 'Entry' object has no attribute 'markAsDuplicate' <Launchpad itself:New> <https://launchpad.net/bugs/680339>02:02
lifelessthank you02:03
wallyworld_thumper: back in 15 mins - have to drop the kid to his McJob02:16
thumperwallyworld_: ok02:17
pooliethumper, wallyworld_, we're going to do a bzr 2.2.2 release at the end of this week02:20
pooliethat should address the problems you hit in 2.2.102:20
thumperpoolie: ok02:21
pooliefeedback welcome, as always02:21
wgrantlifeless: Hi.02:22
wgrantWhat's broken?02:23
lifelesswgrant: cesium - removing cowboys02:24
lifelesswondering if you know the rev that landed the fix02:25
lifelesswgrant: how was your exam?02:25
wgrantlifeless: Exam was rather better than expected.02:25
wgrantcesium... what was the cowboy? The builder disabling thing?02:25
spmwgrant: all done? wooo!02:25
wgrantIndeed.02:26
wgrantlifeless: Unless there was more than just logging and not disabling builders, devel r11938 looks to be the one.02:30
wallyworld_thumper: call now?02:35
lifelesswgrant: there was02:37
lifelessdisabling the failure checking02:37
lifelesshttp://pastebin.com/Kr44BkbD02:38
wgrantlifeless: 11938 should render that pointless.02:39
wgrantThough I'd, er, check with someone else.02:39
lifelesswgrant: it looks complete to me02:41
lifelessthe only gap is that first hunk02:43
wgrantYeah, that's what I meant.02:44
wgrantI think.02:44
lifelessjamesh: do you remember how to tell zope to use a different thread count?03:57
jameshlifeless: there is an option in the .conf file03:57
jameshI think it might even be called thread_count03:57
jameshthis is the ZConfig .conf file03:59
lifelessis that 'launchpad.conf' ?03:59
jameshyes04:05
jameshlifeless: looking at one of the ancient launchpad trees on my system, one of the launchpad.conf files under configs/ has "threads 16" at the top level04:10
lifelessthanks!04:10
lifelessalso, terrible idea :)04:10
jameshI remembered that we had bumped the count up for the demo.launchpad.net instance04:11
jameshso checked that config04:11
jameshthat instance wasn't under heavy load, but it was easy for one user to block everyone else with the default thread count04:12
lifelessyeah,04:18
lifelesswe're in the process of (with measurement) dropping down to 1 thread per appserver04:18
poolielifeless: interesting! i don't recall you posting about it04:32
poolie(not that you have to, i suppose)04:32
lifelessin various perf tuesday mails04:32
poolieon the grounds that, if they're not wasting time, you wouldn't get more than one python thread to run anyhow?04:32
lifelesswe're seeing things that thread starvation is the best explanation for04:32
lifelessand yes04:32
lifelessthat too04:32
poolieok, i do see a mention in passing04:33
lifelessjamesh: yes, thats it04:35
lifelessjamesh: thanks for finding it04:35
lifelesspoolie: we could reasonable expect total time/database time threads to be a reasonable figure04:36
lifelessbut its simpler to let the OS manage things at that point04:36
lifeless+ if we do decide to debug something only one user will be impacted04:37
lifelesspoolie: I did some more analysis in a ubuntu-one thread04:37
poolieagree about letting the OS do it04:39
poolieistm the main drawback is there are things that python knows are shareable in memory, that the OS doesn't04:40
poolielike modules04:40
poolieand for lp they tend to be large04:40
pooliebut maybe this is not a sufficiently important factor04:40
lifelessits a couple hundred MB04:42
lifeless1.6GB on a fully loaded machine.04:42
lifelessit could be better04:42
lifelessbut we should get a win regardless, or so says the theory.04:43
lifelessyay04:46
lifelesshttps://launchpad.net/~lifeless/+commentedbugs working04:46
poolietimed out for me..04:47
lifelesshahrugle04:47
pooliein 'select count from bugtask....'04:48
pooliebut it's a lovely sounding url :)04:48
lifelesswhat revno04:48
poolie1195204:48
lifelessdamn04:48
lifelessAt least 42 queries/external actions issued in 2.16 seconds04:48
lifelessr1195204:48
lifelesspoolie: its working consistently for me04:50
lifelesspoolie: whats the OOPS id ?04:50
pooliehttps://lp-oops.canonical.com/oops.py/?oopsid=OOPS-1788K35004:52
poolieworks for me now04:52
poolieis this a new feature, or just a newly-faster feature?04:53
poolieworked this time04:53
lifelesstop OOPS since the monthly rollout04:53
poolieah i see04:53
pooliemaybe it was 'want an affecting bugs list' i was thinking of04:53
spmwgrant: btw. I'm bemused and entertained at your dedication. Finish final exam, hop on IRC. ISTR that post my final exam at Uni, a group of us went to the Brekky Creek Hotel in Brisvegas for a Steak lunch and afternoon of entertaining the regulars in the beer garden with (badly) Monty Python songs sung.05:59
wgrantspm: Well, a few of us finished early, went to pub, consumed beer, returned to the exam venue to see everyone else, then went back to my office.06:01
wgrantAnd then IRC, yes :P06:01
spmbwhahaha06:02
lifelessspm: hey06:16
lifelesssorry, ECHAN06:16
pooliespm, ah, the brekky creek06:42
poolieis it a known problem that all api calls against staging are giving 500 errors?06:42
poolieand indeed the web ui too06:43
stubHayfever or cold :-( Suspect today is another sick day.06:44
spmyeah, staging is borked atm. been fixing the borked prod rollout first tho06:44
poolieheh, that's probably a good choice06:45
poolie:) thanks spm06:45
=== almaisan-away is now known as al-maisan
adeuringgood morning08:41
bigjoolsmorning08:41
bigjoolshow was the exam wgrant?08:42
lifelessis there someone around that can qa https://bugs.launchpad.net/rosetta/+bug/669831 ?08:54
_mup_Bug #669831: obsolete translations exported to the branch <code-integration> <qa-needstesting> <Launchpad Translations:Fix Committed by danilo> <https://launchpad.net/bugs/669831>08:54
mrevellMorning09:09
bigjoolslifeless: I've landed all code to cover the cowboys on cesium, please roll it out (or I can)09:12
lifelessbigjools: its done09:12
bigjoolslifeless: ah great, you checked or Picarded it?09:12
lifelessspm deployed09:12
lifelessand noone has screamed yet09:12
lifelessI figured I'd chat with you briefly :)09:13
bigjoolsheh09:13
* bigjools looks at log09:13
bigjoolslifeless: I thought it was in the nodowntime set?09:14
lifelessbigjools: it was taken out when it blew up09:15
lifelesswhen things get cowboyed we remove them from nodowntime.09:15
bigjoolsso it's back in?  I thought I said I was going to approve that first ...09:16
lifelesswe may be crossing wires09:16
lifelessback at uds it wasn't in, and we discussed what it would take to be in09:16
lifelessthat you were cc'd on a discussion and gave your blessing.09:16
lifelessthen the big branch blew builders away09:17
lifelessso it was cowboy-fixed and removed [to stop deploys uncowboying it]09:17
lifelesswe cross checked that you'd landed fixes [and they were to be deployed] today, and so uncowboyed it (which implies putting it back in nodowntime]09:17
lifelessbigjools: if you had wanted a further cross-check with you, I'm really sorry - I didn't realise.09:18
bigjoolslifeless: np, I talked with Tom not you09:19
bigjoolsas it happens it's fine to roll, but that's because I had landed everything09:19
lifelessright, we looked first ;)09:19
lifelessin future, if you want a cross-check, could you note that on LPS against the cowboy, or the DeploymentException ?09:19
bigjoolsyes but one of the cowboys is not landed, but deliberately.  How did you reconcile that? :)09:20
lifelessbigjools: knowledge.09:20
bigjools:)09:20
lifelessI had discussed the fault with you.09:20
lifelessso I knew you put it in while you diagnosed the root cause.09:20
bigjoolswe need to figure out wtf the builders are taking >30 seconds to accept connections09:20
lifelessI also included that cowboy as a ready to go patch in a mail to losas09:21
lifelessso if there is a submarine present there09:21
lifelessits at-hand to recowboy09:21
bigjoolsone of the side effects of that cowboy is that jobs will "stick" on genuinely failed builders09:22
bigjoolsI have to keep checking the log09:22
bigjoolslifeless: so, did we talk about figuring out the massive slave connection delays?09:30
lifelessnot as such09:31
henningehey jtv! ;)09:36
jtvhi henninge!09:36
henningejtv: let's be chatty ;)09:36
jtvhenninge: I don't think this connection will support voice chat.09:37
henningejtv: oic09:38
henningeyou are also not on #translations09:38
jtvjust a mo'09:38
lifelessjtv: hi09:40
jtvhi09:40
lifelesshttps://devpad.canonical.com/~lpqateam/qa_reports/deployment-stable.html09:40
jtvuh-oh09:41
lifelessare you able to qa  669831 on [qa]staging ?09:41
jtvhenninge: hang on, lifeless wants something :)09:41
lifelessno panic09:41
jtvlifeless: very slow connection, so will be a bit slow to respond09:41
lifelessbut 11960 would remove another cowboy in the datacentre09:41
lifelessjtv: like I say09:42
lifelessno panic09:42
henningejtv, lifeless: I can do that09:42
lifelessawesome09:42
lifelessok, gnight all09:42
jtvg'night09:42
henningelifeless: good night09:42
* jtv runs around a bit and screams for a while, just because he was told not to panic09:42
jtvit's the principle of the thing09:42
bigjoolshave you considered working on Soyuz?09:53
wgrantbigjools: Better than expected.10:06
bigjoolswgrant: excellent, but shouldn't you be out celebrating now10:07
wgrantDid that earlier :P10:07
wgrantBut yes, probably.10:07
bigjoolsyou party animal10:08
bigjoolsdid you get a chance to look at the expiration query?10:08
wgrantHeh.10:08
wgrantWill look now.10:08
wgrantbigjools: Where's the latest version of that query?10:11
bigjoolsthe one I pasted :)10:11
wgrantk10:11
bigjoolshttp://pastebin.ubuntu.com/535167/10:12
wgrantYup.10:12
wgrantbigjools: Are the LFC, DS and DAS joins in the EXCEPT completely useless, or am I stupid and blind?10:21
bigjoolsone sec10:22
wgrantbigjools: http://pastebin.ubuntu.com/535491/10:24
wgrantShould be equivalent, except with the retention condition fixed.10:25
wgrant(now it will exclude if the file is unremoved or Obsolete, rather than Published or Obsolete)10:25
wgrantWhich should make p-d-r less sad.10:25
wgrantHmm.10:27
wgrantbigjools: Did the domination thingy help dogfood at all? I guess it pales in comparison with the file list generation :/10:29
bigjoolswgrant: yes, file lists take ~4 hours10:29
bigjoolsfor just maverick release pocket :/10:29
bigjoolssomething has regressed a lot10:29
bigjoolsit used to take ~30 mins10:29
wgrantI will poke it in the eye in a few weeks.10:29
bigjoolsDF is slow, but ...10:30
wgrantHeh.10:30
bigjoolswgrant: sql looks good, I am trying it on DF10:43
wgrantbigjools: So, about those people who feel like they need to use obsolete PPAs...10:47
bigjoolsOEM10:47
wgrantHow did I guess :(10:48
bigjoolsnow, I am struggling to understand wtf builder sometimes take in excess of 180 seconds to accept a connection from the manager10:49
* bigjools -> caffeine10:50
wgrantbigjools: Load graphs from non-virt builders pls.10:51
wgrantIt might at least give us some idea of if that's the issue.10:51
=== matsubara-afk is now known as matsubara
bigjoolsjml: remember my dirty reactor failure in my b-m tests?11:14
bigjoolsI added debug output on the Deferreds and it's caused by distribution_mirrorprober.  There's a test isolation error... :/11:16
wgrantThe reactor is shared‽11:17
bigjoolsin tests11:17
wgrantMy interrobang stands.11:17
* jml briefly steps in from the sick room11:17
bigjoolsman flu?11:18
jmlwgrant: yeah, there's exactly one reactor.11:18
jmlwgrant: it's arguably the biggest flaw in Twisted11:18
wgrantjml: This sounds... like Zope.11:18
bigjoolsit would be kinda hard to have more than one11:19
jmlbigjools: I know about those tests. my testtools-experiment branch fixes those delayed calls.11:19
jmlbigjools: but it's blocked on landing by some weird-ass 500 from the librarian11:19
bigjoolsjml: yay.  So, what can I do about it in my branch?11:19
jmlbigjools: those tests need to be completely rewritten... let me find you my workaround11:20
bigjoolsand why are they leaking to my test?11:20
jmlbigjools: because the distributionmirror_prober tests aren't using trial11:20
bigjoolsah ...11:20
jmlbigjools: so the calls are going on to the reactor, and when your tests are cleaned up ... bang11:20
bigjoolsthat's kinda bad from a test isolation PoV :/11:21
bigjoolswgrant:11:21
bigjools total_files | space_saved11:21
bigjools-------------+--------------11:21
bigjools      184899 | 30743251643411:21
jmlbigjools: yes. the problem is the mutable global state that is the reactor11:22
bigjoolsindeed11:22
jmlbigjools: got any suggestions on how to make it better?11:22
wgrantbigjools: Mm, not too implausible.11:22
bigjoolsjml: I'd make Trial clean the reactor when starting a test?11:22
jmlbigjools: it can't do that. there might be in-process Twisted-using fixtures11:22
jmlbigjools: http://pastebin.ubuntu.com/535507/ <- should work around the problem11:23
bigjoolsjml: at the very start of the test there should be no fixtures yet though?11:23
jmlbigjools: not if they are shared between tests11:23
bigjoolsaieeee11:23
bigjoolsjml: I'll poke your workaround in, thanks11:24
jmlwgrant: incidentally, it doesn't sound like Zope to me.11:24
wgrantjml: Opaque global state.11:24
wgrantThe Zope Way.11:24
bigjoolsjml: if it's complaining about them when the test ends, how can a fixture's Deferreds never get in the way then?11:24
bigjoolsif it's shared between tests that is11:25
jmlbigjools: good point. in Trial, there's some historical crap back from when we thought setUpClass would be a good idea11:25
jml(it's a terrible idea, we got rid of it, Guido added it back to Python again, and so the circle of crap continues)11:26
bigjools:/11:26
bigjoolsso if someone uses setUpClass with a Deferred, your tests will always fail11:26
jmlbigjools: in testtools, I guess we could clean the reactor before tests11:26
jmlbigjools: no, Trial postpones checking of those things until tearDownClass runs.11:27
jmlbigjools: *that's* the historical crap.11:27
bigjoolsoy11:27
jmlas I said, clearing it out in testtools would help11:28
bigjoolsyeah11:29
jmlalthough it wouldn't help that much.11:29
jml"some test before this one was bonkers"11:29
jmlit's still an improvement over the current situation11:29
bigjoolsI won't ask why the mirrorprober tests are not using Trial then.... :)11:30
jmlI have NFI11:30
jmlif I have my way, they'll be using testtools before the year is out.11:30
bigjools\o/11:30
bigjoolshmm I need to book some holiday11:31
jmlanyway, all this excitement is threatening my delicate constitution11:31
bigjoolsjml: Berocca11:31
jmlbigjools: it's a cold, not a hangover. :P11:31
bigjoolsjml: :)  it still works11:31
bigjoolsthe fizzy stuff11:31
bigjoolsget better soon anyway, go get some rest11:32
* jml watches The Wire instead.11:32
bigjoolsa friend of mine swears his PS3 is medicinal11:32
jmlheh11:32
=== al-maisan is now known as almaisan-away
deryckMorning, all.12:04
bigjoolswgrant: so, I figured out the problem with the log parser12:23
wgrantbigjools: !!12:25
wgrantWhat is it?12:25
bigjoolswgrant: it reads in gzip files in their entirety :/12:26
bigjoolssee lp/services/apachelogparser/base.py12:26
bigjoolsget_fd_and_file_size()12:26
bigjoolsle heavy sigh12:26
wgrantbigjools: Hahaha.12:27
wgrantI thought gzip stored the uncompressed size in the header...12:28
wgrantYes, it's in the footer.12:28
wgrantNot sure how we can access that from Python, though...12:29
bigjoolsbut does any of the python module read that12:29
wgrantIt's limited to 4 bytes, so it probably doesn't.12:33
wgrantBut let's see.12:33
bigjoolshttp://stackoverflow.com/questions/1704458/get-uncompressed-size-of-a-gz-file-in-python12:33
wgrantIt looks like we probably have no choice but to read in chunks.12:34
bigjoolsdid you notice the last answer on that page :)12:36
wgrantSuggesting len(fd.read())?12:36
wgrantOh.12:37
wgrantHahahah.12:37
wgrantI rarely look at the author...12:37
bigjoolsI am tempted to grab the last 4 bytes12:38
wgrantBut 2**32...12:39
wgrantI guess we can make it explode if it ever tries to write a bytes_read greater than that.12:39
wgrantSince something is pretty broken if we have a 4GiB log file, I guess.12:39
bigjoolsthey are in the region of 1.2G uncompressed12:40
wgrantOh.12:41
wgrantThat is inconvenient...12:41
wgrantAlso, that's huge.12:41
bigjoolsit's fine12:41
wgrantIt's far too close for my liking, but OK.12:42
bigjoolswe can put a limit on it12:43
wgrantYes, but a limit within a couple of orders of magnitude of the current value seems like a really bad idea.12:44
* bigjools tries dirty hack on DF12:47
bigjoolsscore13:04
wgrantIt works?13:04
bigjoolsyes13:06
wgrantSo I didn't break it after all :D13:07
bigjoolsseems so :)13:07
bigjools2010-11-23 13:03:04 INFO    Parsed 5000000 lines resulting in 16085 download stats.13:07
wgrantHmm.13:08
bigjoolsI lied, one file is 2.9G13:08
wgrantOw.13:09
wgrantbigjools: Could you sum all the BPRDC.count?13:10
wgrantJust to see that it actually handles most of the lines.13:10
wgrantAlthough I guess lots of those lines will be Packages/Sources, so it might not be too similar.13:11
bigjools688013:12
bigjoolsit's still going though13:12
bigjoolsit will take a while13:12
bigjoolsand it's hammering DF13:12
bigjoolsgood time for lunch, see you later13:13
* wgrant sleeps.13:13
wgrantThanks for fixing that.13:13
bigjoolsmy pleasure13:15
jelmer'morning benji, abentley14:00
abentleyjelmer: morning.14:00
=== almaisan-away is now known as al-maisan
benjimorning jelmer, or afternoon as it were ;)14:05
=== matsubara is now known as matsubara-lunch
jituhow to add https://launchpad.net/~falk-t-j/+archive/lucid/+build/2018840   in repository?14:10
jituanybody to help?14:10
bigjoolsjitu: follow the instructions here https://launchpad.net/~falk-t-j/+archive/lucid14:12
bigjoolswhere it says "Adding this PPA to your system"14:12
jituchecking...14:14
jitubigjools, thnx14:15
deryckmars, hi.  You around?14:19
deryckOr maybe gary_poster could help me.  gary_poster, ping?14:24
gary_posterhey deryck.  what's up14:24
deryckHi gary_poster.  Does this revno look like the right way that I would disable windmill tests to you?  http://bazaar.launchpad.net/~deryck/launchpad/rockstar-js-refresh/revision/1172614:24
* gary_poster is skeptical he will know, but is looking14:25
gary_posterderyck: yeah, that looks like a very reasonable approach to me, especially if you have evidence it works ;-) .14:25
deryckgary_poster, heh.  that's the problem, I don't. ;)  Started an ec2 test run last night that disappeared.  I assume something hung and the test was killed....14:26
gary_poster:-P14:26
deryckso I started another run, but was looking for some confirmation that the patch looked right. :-)14:27
gary_posterderyck: I can assert that I believe that should have worked14:27
deryckgary_poster, good enough.  Thanks!14:27
gary_poster:-) np14:27
deryck:-)14:27
deryckFWIW, my current test run seems to be going well.  The tests started up much faster than earlier attempts I had.14:27
marsderyck, you may need '!(MailmanLayer|WindmillLayer)' instead, but I don't know for sure.14:28
marsempirical evidence needed14:29
deryckmars, gary_poster, also, I opened Bug #680497 about missing test coverage.  If this didn't belong on Foundations, sorry.  I wasn't sure.14:29
_mup_Bug #680497: jstests for LP JavaScript client are not running automatically <Launchpad Foundations:New> <https://launchpad.net/bugs/680497>14:29
deryckmars, ok, I'll watch the run closely and see.  Thanks!14:29
deryckif I see a windmill test go, I'll kill it ASAP and try again :-)14:30
gary_posterderyck: Foundations: yeah, close enough :-) .  I might add the web group.14:30
deryckok, cool14:30
gary_posterderyck, mars: mars makes a good point.  I'm not sure if the layer thing is an and or an or.  ./bin/test --help should say.  looking14:31
deryckheh "is an and or an or" is hard to parse on two cups of coffee only14:31
deryckthe hung run suggests my patch might not have worked.14:32
gary_posterderyck, :-) mars is right.  "--layer" is a logical or.  that is, it will run include all layers that are not Windmill ORed with all layers that are not Mailman, resulting in all layers.14:32
gary_posters/run include/include14:33
deryckah14:33
deryckgary_poster, so I need the form mars suggested then, right?14:33
gary_posteryes14:33
deryckok, cool. an ec2 run killin' I shall go....14:33
gary_poster:-) k14:33
deryckthanks mars and gary_poster!14:33
gary_posternp14:33
* deryck dreads seeing the ec2 bill this month14:34
bigjoolsjml: are you too ill to help a bit with the buildd-manager timeouts?  My fix hasn't worked.14:49
abentleybigjools: Has lamont installed the new buildd (rev 74)?14:51
bigjoolsabentley: you should ask him, not me ;)14:52
abentleybigjools: because if he has, that's a bad sign, but if he hasn't, it could help with the timeouts.14:53
=== salgado is now known as salgado-lunch
=== matsubara-lunch is now known as matsubara
jmlbigjools: I can try. Wassup?15:14
bigjoolsjml: that timeout stuff I added has had no effect :(15:14
bigjoolshere's an example of the sequence:15:15
jmlbigjools: ok. so maybe we misdiagnosed the problem?15:15
bigjools2010-11-23 14:46:27+0000 [QueryProtocol,client] Resuming hassium (http://hassium..15:15
bigjools2010-11-23 14:46:32+0000 [-] Asking builder on http://hassium.ppa:8221/filecache to ensure it has file chroot-ubuntu-lucid-i386.tar.bz215:15
bigjools2010-11-23 14:46:54+0000 [Uninitialized] Scanning hassium failed with: TCP connection timed out: 110: Connection timed out15:15
bigjoolsthe timeout is 22 seconds later15:15
bigjoolsnot even the default 3015:15
bigjoolsso I reckon misdiagnosis is quite probable15:16
jmlbigjools: is your timeout stuff actually on the relevant machines?15:16
bigjoolsjml: yes, I had Tom do a paranoia grep15:18
bigjoolsI'm at a bit of a loss here15:18
jmlpoking at the code now15:19
bigjoolswe probably need some more debugging logging15:19
jmlyes. and the ability to switch it on and off in runtime15:19
jmlbigjools: do you have a traceback with that error?15:20
bigjoolsjml: no15:21
jmlbigjools: isn't that unexpected?15:21
bigjoolsit's one of the "known" errors so it doesn't run the traceback.15:21
jml            BuildSlaveFailure, CannotBuild, BuildBehaviorMismatch,15:21
jml            CannotResumeHost, BuildDaemonError, CannotFetchFile):15:21
jmlwhich one?15:21
bigjoolsit'll be getting re-raised somewhere15:23
jmlbigjools: we should also remove some of the unnecessary layers of stack toot-sweet. We have to trawl through this code frequently enough that it's a noticeable cost.15:24
bigjoolswtf is a toot-sweet?15:24
jmlsorry, an expression from childhood. "very quickly"15:25
bigjools"tout de suite"15:25
bigjools:)15:25
bigjoolsbut yeah, agree15:26
jmlbigjools: I learnt it from folk with Middlesex accents.15:26
jmlanyhow15:26
bigjoolshah15:27
jmlbigjools: my reading of the code shows that it's failing in startBuild (builder.py) after the call to resumeSlaveHost succeeds15:27
jmlbecause resumeSlaveHost prepends crap to the error message, and we don't see that in the logs15:28
jmllikewise, the error can't be coming from any of the eb_foo in startBuild, because they also mutate the error message15:28
bigjoolscorrect15:28
jmlergo, it comes from resume_done15:28
jmlor something after startBuild15:28
bigjoolsit will be the first call that tries to dispatch the chroot15:29
bigjoolsbearing in mind they are >600M I wonder if that is poking a subtle bug15:29
jmlthere doesn't seem to be anything after startBuild that does anything particularly interesting15:30
jmlbigjools: do we know what type of job this is?15:30
bigjoolsbinarypackage15:30
jmlhmm15:32
bigjoolssomething is catching that error and re-raising it15:34
jmlthat's, well, odd.15:34
bigjoolsas a known exception - otherwise we'd see a traceback15:34
bigjoolsactually - I wonder if it contains any trace info15:34
jmlbecause, afaict, it's being raised by the second line in dispatchBuildToSlave in binarypackagebuildbehavior15:35
jmld = self._builder.slave.cacheFile(logger, chroot)15:35
jmlwhich has absolutely no errback added to it until _scanFailed in manager.py15:35
jml(again, my reading only, not actually tested)15:36
jmlactually, I'm going to say some stuff with triple-x markers so the conversation can be more readily grepped for actions15:36
jmlXXX: change _scanFailed to have a different prefix to the log message for unexpected errors15:37
jmlXXX: collapse some of the unnecessary indirection in builder.py (e.g. updateStatus, updateBuilderStatus; updateBuild; _dispatchBuildCandidate)15:38
jmlbigjools: the other big question is why 22s15:38
jmlbigjools: what are the values for timeouts in production?15:39
bigjools180s15:39
bigjoolsI simply cannot fathom where 22 comes from15:39
jmlperhaps we're using an unexpected config setting?15:39
jmlperhaps it's 20 + noise?15:39
bigjoolsthe delay on the failures in the log is not consistent either15:39
jmlbigjools: what's the smallest?15:39
bigjoolsthat's the smallest I've seen, but it's hard to grep15:39
jmlsince we've got blocking code in prod, it's to be expected that there'll be variation15:40
bigjoolsyep15:40
bigjoolsjml: those eb_ functions are not doing much any more15:41
jmlbigjools: I can believe it15:42
jmlhmm.15:42
bigjoolsthey are used when doing the resume op :)15:43
jmlbigjools: what's in the logs roughly 3 minutes before that error message?15:43
bigjoolsnot a lot15:44
jmlbigjools: do we get the same three lines every time the error happens?15:44
bigjoolssorta15:44
bigjoolsthey're obviously spread around the file15:44
bigjoolsbut it's always the chroot dispatch as far as I've seen15:44
jmlhmm.15:45
jmlgrepping twisted code reveals it's definitely a TCPTimedOutError15:45
bigjoolsI wonder if the slave is disconnecting before it's replied?15:45
* jml looks up what ETIMEDOUT means15:46
bigjoolsjml: heh, you know what, the timeout on the slave will still be 30 seconds, right?15:49
jmlbigjools: why so?15:49
bigjoolsit's also twisted15:49
jmlhmm.15:49
jmlI don't think listening works in the same way15:50
bigjoolsit depends on what stupidity it has15:50
jmlhow is the buildd launched in production?15:50
bigjoolsit's part of init.d15:51
jmlTimeout  while attempting connection.  The server may be too busy to accept new connections.  Note that for IP sockets the timeout may be very long when syncookies are enabled on the server.15:53
jmlhah16:00
bigjoolsis that a victorious hah?16:00
jmlmaybe.16:01
bigjoolsshow me the beans16:01
jmlso, if I understand correctly, the timeout passed to connectTCP does not in fact control the UNIX-level socket timeout16:01
bigjoolsit's a callLater to cancel the Deferred I think16:01
jmlexactly16:01
jmlbut the TimeoutError that generates is different to TCPTimedOutError16:02
bigjoolsyep16:02
jmlwhich is a mapping for ETIMEDOUT16:02
bigjoolsinteresting16:02
jmlIIUC, the call to socket.connect_ex() in t/internet/tcp.py is timing out16:03
=== beuno is now known as beuno-lunch
jmlunfortunately, I don't know whether the timeout is being set client-side or server-side, somewhere in Twisted, somewhere in Python or somewhere deeper16:04
bigjoolsmaybe the stack trace will tell us when I get that cowboy in16:05
bigjoolsbut I suspect not :/16:05
jmlbigjools: well, the timeout is being set in a different call16:05
* jml flicks through APUE16:05
jmlnope16:12
=== deryck is now known as deryck[lunch]
=== al-maisan is now known as almaisan-away
* bigjools otp for a bit16:30
=== almaisan-away is now known as al-maisan
=== rockstar` is now known as rockstar
lifelessmorning16:58
lifelessbigjools: how did cesium go ?16:58
lifelesshenninge: hi; how did you go on  669831?16:59
lifelessbigjools: ah, found your mail.17:03
bigjoolslifeless: yeah, badly17:07
lifelessI've suggested a feature flag to you, to let us get rid of the cowboy but keep the code path easily off/on as needed.17:08
lifeless(in mail)17:08
bigjoolslifeless: +1 to the flag.  But only if I don't work out the problem this week.17:11
lifelesssure17:12
bigjoolsmaybe I'll do it along with an attempted fix actually17:12
bigjoolslifeless: the thing that we discovered earlier is that the timeouts I changed are not firing, something else is.17:12
bigjoolswe see a TCPTimedOutError17:13
bigjoolsif it were the timeout value firing it would be a Deferred cancelled error17:13
bigjoolsthe question is, where the heck is generating that17:13
lifelessdo you generate an OOPS ?17:14
lifelessif so, its backtrace might help17:15
bigjoolslifeless: there's no trace on the exception :(17:15
bigjoolsit should get logged, but there's nada17:15
lifelessgrah17:16
bigjoolswell - there IS a traceback, it's just one line17:16
lifelessis it an OOPS ?17:16
lifelessor something else17:16
bigjoolsno, I don't generate an oops17:16
lifelesswhats logging the error for you ?17:16
bigjoolsbecause they are routine failures17:16
bigjoolsmy code!17:16
bigjoolssee _scanFailed() in lib/lp/buildmaster/manager.py17:17
lifelessso there's a more sophisticated error checker you can create17:17
lifelesshave you seen Release It!  - the book ?17:17
bigjoolsit uses failure.getTraceback()17:17
bigjoolsI have not17:17
lifelessok, and failure.getTraceback() is neutered for some reason ?17:17
bigjools2010-11-23 16:39:28+0000 [Uninitialized] Traceback (most recent call last):17:18
bigjools2010-11-23 16:39:28+0000 [Uninitialized] Failure: twisted.internet.error.TCPTime17:18
bigjoolsdOutError: TCP connection timed out: 110: Connection timed out.17:18
bigjoolsthat's it17:18
lifelesshow frustrating17:18
bigjoolssomewhat17:19
bigjoolsit's thrown when we get a errno.ETIMEDOUT17:20
lifelessrighto, the pattern is Circuit Breaker17:20
lifelessits kindof the ultimate variant of what you've implemented17:20
lifelessits not relevant right now17:21
lifelessbut I'm going to explain why I was thinking of it17:21
=== salgado-lunch is now known as salgado
lifelessit has the idea of the thing it measures being ok, being in trouble, and being dead.17:21
lifeless*if* we got the full traceback and were only logging one line17:22
lifelessI was going to suggest logging the full line on the transition from in-trouble->dead17:22
lifelesss/full line/full thing/17:22
lifelesshowever, thats not the issue we're facing, so it was just a short side discussion.17:22
bigjoolsthat's what it's trying to do, effectively17:22
lifelessyah17:24
=== beuno-lunch is now known as beuno
lifelessso17:24
lifelesswe're getting a short tb17:24
lifelessIIRC Failure does that when the error is thrown locally and the frame above is the reactor17:25
bigjoolsthere's only 2 places the Twisted code itself can throw that error17:25
lifelessso could we be dealing with non twisted code doing a regular socket call and throwing, in a callback from some twisted code.17:25
bigjoolstwisted/internet/tcp.py17:25
=== deryck[lunch] is now known as deryck
bigjoolsdoConnect() calls self.failIfNotConnected(error.getConnectError ....17:26
bigjoolswhat I don't get is that none of that seems to be asynchronous17:26
=== benji is now known as benji-lunch
bigjoolslifeless: bear in mind that this code works absolutely flawlessly on dogfood17:27
bigjoolseven with more builders added17:27
lifelessI think- I'd need to check some lower level code - but I think that that is:17:29
lifeless'read the last error from the socket'17:29
lifelessfailIfNotConnected looks like a plausible issue17:30
lifelessso getConnectError generates a stackless exception object17:32
lifelessand twisted.python.failure when given a regular Exception with no frame data bails17:36
lifeless        elif not isinstance(self.value, Failure):17:36
lifeless            # we don't do frame introspection since it's expensive,17:36
lifeless            # and if we were passed a plain exception with no17:36
lifeless            # traceback, it's not useful anyway17:36
lifeless            f = stackOffset = None17:36
lifelessthats why you're not getting anything useful.17:36
lifeless*I think*17:37
lifelessjml: ^ plausible?17:37
lifelessthe failure.Failure construction call is passing in a simple Exception object17:38
lifelessthat has no traceback and thus doesn't get one17:39
lifelessif we passed *no* exception in, sys.exc_inf would have been called, which gets a tb object.17:39
bigjoolsthe err is an errno17:39
bigjoolsright17:40
lifelessI'm surprised that at the claim that getting a stack from scratch is more expensive than was sys.exc_info does (when the exception is triggered). But perhaps it is.17:40
bigjoolsit might be worth a quick cowboy17:40
lifeless(I mean, I know it has overhead and I avoid doing it casually myself)17:40
lifelessbut this function does one and not the other which is pretty surprising to me17:41
bigjoolswhat I need to work out is why we get ETIMEDOUT so quickly17:42
bigjoolsTCP sockets take minutes to time out (hours?) by default17:42
lifelessdepends on the state17:43
lifelessduring connection its 30 seconds (IIRC)17:43
bigjoolsI've never heard of that17:43
bigjoolsactually, hmmn17:43
persia30 seconds of retry for a packet that won't send, but much more for just an open socket without traffic.17:44
lifelessalso 30 seconds or so for a SYN that isn't acked17:44
bigjoolsright17:44
lifelessbut blackholes17:44
lifelessbigjools: http://www.faqs.org/docs/iptables/tcpconnections.html17:45
lifelessTable 4-2. Internal states17:45
lifelessbah17:45
lifelessthats the firewall side17:46
bigjoolsSYN_SENT2 minutes17:46
lifelesswhich has timeouts set *above* what tcp needs17:46
lifelessanyhow, default syn timeout is 30 seconds17:47
jmllifeless: yeah, the stack analysis is correct. see #twisted for some follow-up discussion.17:48
* jml gone again17:48
lifelessjml: I see discussion on the socket, not on the poor stack info17:49
lifelessjml: thanks though17:49
bigjoolslifeless, did yo usee the comment " it's very easy to run out of listen queue in a python server with many short-lived connections"17:52
=== al-maisan is now known as almaisan-away
lifelessbigjools: do you make multiple outstanding SYN attempts to a asingle build slave17:54
lifelesss/you/we/17:54
bigjoolslifeless: the code has no knowledge of SYN attempts, it's using xmlrpc.Proxy17:56
lifelessok17:56
bigjoolsbut it will try and connect every5 seconds17:56
lifelessdoes it make multiple outstanding xmlrpc calls ?17:56
bigjoolsactually, 15, I  changed it17:56
bigjoolsno17:56
lifelessthen that comment is irrelevant17:56
lifelessif we made multiple requests at once17:57
lifelessthen we could exceed the default listen size (8)17:57
bigjoolsthere must be some other difference between dogfood and production17:59
bigjoolsI have not seen this problem *a single time* on DF17:59
bigjoolsand now I have to go to dinner17:59
bigjoolsI'll catch up with you again later18:00
sinzuiflacoste, mumble18:34
=== maxb_ is now known as maxb
=== benji-lunch is now known as benji
sinzuiflacoste, https://bugs.launchpad.net/launchpad-registry/+bug/62177819:21
_mup_Bug #621778: Register project from source package should include homepage URL <qa-ok> <Launchpad Registry:Triaged by jelmer> <https://launchpad.net/bugs/621778>19:21
lifelessthumper: how was pyconz19:36
lifelessfinally19:45
lifelesswe're getting back into shape19:45
lifelessTime Out Counts by Page ID19:45
lifelessHardSoftPage ID19:45
lifeless773783Archive:+index19:45
lifeless54162BugTask:+index19:45
lifeless26298Distribution:+bugs19:45
lifeless2599ProjectGroupSet:CollectionResource:#project_groups19:45
lifeless1746DistroSeries:+queue19:45
lifeless1510Person:+commentedbugs19:45
lifeless12256POFile:+translate19:45
lifeless92Person:+bugs19:45
lifeless64DistributionSourcePackage:+changelog19:45
lifeless61Person:+related-software19:45
lifeless-> hospital for allergy vaccination, back in ~3 hours.20:00
lifelesson mobile if needed20:00
=== matsubara is now known as matsubara-afk
=== salgado is now known as salgado-afk
flacostethumper: field.setTaggedValue('has_structured_doc', True)21:45
flacostewidget.context.queryTaggedValue('has_structured_docstring')21:46
flacostefield = Attribute('The <a>link</a>')21:48
flacostefield.setTaggedValue('has_structured_doc', True)21:48
flacostefield = has_structured_doc(Attribute(...)21:49
flacostefield = exported(has_structured_doc(Attribute(...)))21:49
lifelessbacj23:08
pooliehi lifeless, flacoste23:18
lifelesshiya23:18
lifelessflacoste: quesetion for you23:20
lifelessflacoste: if you're still around23:20
flacostelifeless: will be gone after i hit send on that email23:20
flacostebut shoot23:20
flacostehi poolie23:20
poolielifeless: i think jam would like someone (maybe not you) to state that creating bin/bzr is/isn't the most tasteful way to do it23:21
flacostepoolie: i say it is23:21
pooliejam: ^^23:21
flacostepoolie, jam: try talking to gary tomorrow for help23:21
lifelessflacoste: rt 41361 - I mailed you23:21
flacostelifeless: yes, i saw that23:21
pooliethat's what i said too :)23:21
lifelessjam: creating bin/py is the most tastesful way to do it.23:22
lifelessjam: I just don't know the machinery in that stack yet.23:22
lifelessflacoste: did I make sense? I didn't see a reply.23:22
flacostelifeless: i understand it, i'll have to see how it fits resource wise23:22
flacostesince it needs some amount of coordination on our side23:23
lifelessflacoste: very small :) - I've done the heavy lifting.23:23
flacostewell23:23
lifelessflacoste: anyhow, shoo.23:23
flacosteit requires doing measurements23:23
flacosteand saying +1 / -123:23
flacostethat's not very small in my book<23:23
lifelessflacoste: that doesn't need coordination23:23
lifelessflacoste: that can be done weeks or months later, if needed.23:24
flacosteit needs somebody on the lp side to work with the losa23:24
flacostehmm, ok23:24
flacostebut23:24
flacosteactually23:24
flacosteis it that important to do just one23:24
flacosteif we don'T assess and then deploy it across the board?23:24
lifelessits important to get some headroom23:25
lifelesswe can't do this on all the servers [yet] - not enough ram23:25
thumperflacoste: why google docs???23:25
thumpermy normal gmail login can't see it23:25
lifelessif this works we can probably get headroom without more hardware.23:25
flacostethumper: i'm sending a normal email23:25
* thumper throws hands up23:25
thumperflacoste: thanks23:25
lifelessthumper: a logout/login can help.23:26
lifelessthumper: also turn on MultiSession23:26
thumperI don't know what multi session is23:26
lifelessthumper: there's a link in my facebook feed :)23:26
lifelessthumper: when I whinged about logins obliterating each other on google apps/gmail23:27
flacostelifeless: but my gut feeling is that completing RFWATD takes precedence23:27
lifelessflacoste: thats crucial as well.23:27
lifeless:)23:27
lifelessthey're all critical! :>23:27
lifelessflacoste: seriously, gnight.23:28

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!