[00:30] wgrant: the PPA quotas are precisely as arbitrary as the recipe volume quota [00:30] the PPA quota is less of a surrogate [00:32] lifeless: Well, arbitrary is perhaps not the best word. [00:32] But the PPA quota does what it's designed to do. [00:32] Places a hard, known limit. [00:32] The recipe one does not. [00:33] I'm having some trouble following your detailed description of whats wrong; perhaps you could give me a summary instead? [00:33] The PPA disk quota prevents users from using more disk than they should. [00:34] The recipe build quota is presumably to stop users from using more buildfarm time than they should. [00:34] But it limits how many times they can use it. [00:34] Not how much of it they use. [00:34] right [00:34] as I said, its a surrogate. [00:34] I'd welcome a series of bugs that would allow measuring and allocating the actual resource. [00:38] hi there [00:38] jml wrote some code to suck lp bugs into a desktopcouch db [00:38] i was thinking in the shower this could possibly even be offered as a standard facility [00:39] which would be pretty cool: cheap async (at least readonly) access [00:39] this is very blue sky of course [00:39] maybe [00:40] I'd want couch to have an OOM less sysadmin time first :) [00:40] :) [00:40] (and sadly, I'm not joking) [00:40] * poolie reinterprets that as "order of magnitude" not "out of memory" [00:40] yeah [00:41] perhaps i'll look at his client code someday === al-maisan is now known as almaisan-away [00:45] lifeless: your shoes are now available to be filled - help me find someone good? [00:46] poolie: certainly. [00:46] Have you tweeted yet ? [00:46] i can do that [00:52] http://www.google.com/instant/#utm_campaign=launch&utm_medium=van&utm_source=instant [01:00] why the GA tracking codes? (utm_* are google analytics tracking codes. fwiw) [01:01] spm: it was in my browser bar [01:01] copy-paste, you think I read these things? [01:01] fair enough :-) [03:08] does LoginToken:+validategpg talk to the gpg servers? [03:08] * lifeless is guessing it does [03:12] I wonder, should oauthnonces go in the sessiondb [03:13] hi folks, I created a couple projects recently without anything in them yet. would it be simpler to ask to rename them, or create another couple projects and ask to remove them? [03:13] poolie: lp in desktopcouch> standard facility where? [03:13] cr3: rename should be easy enough [03:13] cr3: unless you have mailing lists or ppas [03:13] james_w`: well, if i ever get around to it, i would look about writing an apis-couch daemon [03:14] eventually, and this is utter pie-in-the-sky, it would be cool to just have something like bugs.launchpad.net/bzr/+couch [03:14] and to through that means get a whole copy of them [03:14] apis-couch? what would that do? [03:15] james_w`: map lp into couch? [03:15] lifeless: nothing yet, so can I ask in the channel or is there a preferable avenue? [03:15] CHR should be able to help in #launchpad [03:15] failing that follow the instructions in the topic there. [03:15] :) [03:16] lifeless: well, I have a project already that doesn't need new daemons etc. [03:16] "CHR"? is that a nick or an acronym I'm not familiar with? [03:17] I'm interested in whether poolie has other ideas [03:17] lifeless: by the way, might you have a moment to chat about the progress of the results tracker? [03:17] james_w`: i have very few ideas ;) other than that having it in a local db that can sync smartly with the server would be nice [03:17] what's your project? [03:17] txrestfulclient === nigelb_ is now known as nigelb [03:18] poolie: it's almost transparent to the app whether it is talking to LP or a couchdb copy of the data. The non-transparent part is that if talking to couch it has extra capabilities to push stuff back to lp [03:19] really, wow [03:23] james_w`: i'll check it out [03:24] I'd love some help getting it up to standard [03:24] it's not got full coverage of the capabilities of the API yet [03:25] rolling it in to my attempt to write a twisted API library wasn't the best thing [03:27] james_w`: I'd love to see that split out [03:27] james_w`: there is a use case in LP for a txlaunchpadlib [03:27] james_w`: but we wouldn't want the couch stuff mixed in [03:28] lifeless: the code is independent at that level [03:28] lifeless: but the couch feature relies on the tx code as I have written it, so people can't play with that until they use twisted [03:29] (and couch is only one possible implementation of the required interface, it just so happens that a JSON-based document store is kind of handy for this) [03:30] lifeless: sleep time, I'll catch up with you another time about the results tracker. cheerio [03:30] cr3: ciao [03:30] the latency is so high doing it on twisted would be good [03:32] james_w`: so your code can use couch as a cache? [03:32] poolie: "cache", yes [03:33] poolie: you talk to LP and it sticks the documents it gets back in to couch before returning them to you [03:33] poolie: at any time you like you can reconfigure the client to talk directly to couch, and you will get those documents back again. [03:33] that's the read-only part [03:34] then you can make changes, and it will store the modifications, and give you the updated information if asked for it again [03:35] sweet [03:35] then you can ask it to iterate the modifications and send them back to LP, and the collision detection will naturally act to prevent problems there [03:35] so this just all works over the existing restful protocol? [03:35] there are still a bunch of things that need work, and I'm not sure whether the approach taken will ever get us to 100%, but it does have an elegance [03:36] poolie: yep [03:36] isn't this what someone demo'd at last UDS? [03:37] poolie: with a way to replace that restful protocol with queries in to couch [03:37] nigelb: yeah, me, very shoddily [03:37] james_w`: lol, laptop not working et al ;) [03:37] exactly [03:38] * nigelb hugs james_w` :) [03:44] james_w`: now do this for notmuch pls [03:46] mwhudson: one day [03:47] though I think I should try and add more moving parts next time [03:57] wgrant: something I don't get… I'm to keep a bfj fk in my new TranslationTemplatesBuild table—but where does it come from? I see build_farm_job being set all over the place, but that all refers to BuildFarmJobOld stuff. [04:22] * poolie tries txrestfulclient [04:22] and whacks in to bug 461356 [04:22] <_mup_> Bug #461356: desktopcouch-service crashed with ImportError in ()

[04:35] spm: suppose we could zap the first 8 months of successful-updates.txt? [04:35] sure. one sec [04:35] spm: of this year! [04:36] [I think it goes back forever, so keep a copy [04:36] but its getting a tad large] [04:36] spm: also, whats it up to? [04:38] successful-updates-2008.txt (already existed) also now have successful-updates-2009.txt [04:38] heh [04:39] gmm [04:39] is there a system load average python module already, I wonder [04:40] import system.load.averages ? [04:40] which is spm doing the equivalent of import icanfly [04:44] lifeless: os.loadavg() [04:44] er no [04:44] getloadavg [04:45] right [04:46] bug https://bugs.edge.launchpad.net/oops-tools/+bug/243554, freshly updated [04:46] <_mup_> Bug #243554: oops report should record information about the running environment

[04:48] I wonder if time.clock() is pid wide or pid wide :P [04:52] lifeless: one of those was supposed to be thread? [04:53] mwhudson: being droll about linux clarity in this area [04:53] heh [04:53] we can use clock_gettime(CLOCK_THREAD_CPUTIME_ID) though [04:54] lifeless: would you mind if I just made that Librarian change now? [04:55] jtv: I don't mind when you do it :) [04:55] lifeless: I've just pulled in the information from /proc before [04:55] lifeless: :) [04:55] jtv: if you mean ...'and sneak it in the release', that would be risky, wouldn't it? [04:56] lifeless: that would be, and it's not what I had in mind. Thinking more of avoiding being the subject of future "what flaming idiot made this horrible change!?" inquiries [04:57] jtv: I'd hope noone in the team would take that attitude following something up [04:57] jtv: and the only risk I know of, is the one I mentioned: the librarian db access is currently very tightly encapsulated; it needs to stay that way. [04:57] Well, so to speak. The thing is, I'm not 100% happy about making an API where you pass "either an object or its id." [04:57] jtv: so don't do that. [04:58] jtv: make a separate pass-the-object API (perhaps on the object :P) [04:58] Ahh [04:58] and have the current id based function delegate [04:58] Now, what is the reason for the tight encapsulation? [04:58] because its in twisted [04:58] so its called via deferToThread [04:58] * jtv likes reasons—easier to remember than rules :) [04:58] it can do DB access in the thread [04:59] it cannot outside of it, or all other requests in progress will block. [04:59] I thought it ran as a separate process? [04:59] jtv: if you aren't touching code used in the librarian /server/ this won't matter - but I don't know exactly what you're touching (and be sure to check for imports :)) [04:59] jtv: the librarian is a twistd process, yes. [05:00] lifeless: I'm touching stuff in canonical.launchpad.librarian and canonical.librarian.client, but nothing in server. [05:00] in the process it has a mainloop, and worker threads; the worker threads do DB access, the mainloop does all the business logic (except DB access) [05:00] jtv: the server is in canonical.launchpad.librarian [05:01] So client.py is sort of exceptional in there? [05:01] I mean, FileDownloadClient does run client-side, right? [05:02] it might be nice to have the twisted code more visually distinct (e.g. in a submodule, or move the client to lp.services.librarian.client, or something. [05:02] I don't want to make the scope bigger on you :) [05:02] Yes. [05:02] I'm not going to do anything along those lines, no. :) [05:04] so, to answer your question, I presume so, but I'd need to check. [05:04] All I'd advising is a little caution and investigation in this area, as we have two dramatically different programming models in play here, and they mix poorly. [05:04] speaking of which. [05:05] ? [05:05] spm: are were there yet ? its been an hour. [05:05] jtv: the speaking of which was a joke. [05:05] :-| [05:06] jtv: the line before wasn't. [05:06] That much was clear. The joke I still don't get. [05:06] oh, it wasn't a very good one [05:06] it was leading into the spm: line [05:07] ah [05:07] lifeless: still, better to have the other shoe dropped :) [05:13] taking a breather; I'll be back to heckle spm later [05:13] lifeless: I think there's a better solution for the librarian problem: it's a bad internal distribution of responsibilities. In _getPathForAlias, the LFA is loaded _only_ to determine that it's visible. The actual work doesn't involve the object at all. [05:14] nm; take your breather [05:15] poolie: ping [05:16] james_w`: a teeny patch for you [05:16] EdwinGrubbs: hi there [05:24] poolie: I have some questions about the preferred way to use the apport format. The oops currently groups the request variables together, but it seems cleaner to use email.message.Message than to use another ProblemReport to make a hierarchy, so that I don't end up with multiple Date and ProblemType fields. [05:25] poolie: I also wondered if I should use the Stacktrace field for python stacktraces, or if it would be better to only use that for stacktraces created by gdb. [05:26] EdwinGrubbs: you can look at bzrlib.crash to see what we do [05:26] we use Traceback for the python traceback [05:27] which i think is consistent with what other python programs use [05:27] i would probably have one thing RequestVariables containing all the variables [05:27] either just as they are in the url, or decoded [05:32] poolie: oh, request variables in the oops actually is all the cgi variables like HTTP_REFERER, REQUEST_METHOD, etc. and not just the query string. I saw in the apport file format pdf that some of the hierarchical variables are stored as "name=value", but it seems more consistent to me to use "name: value". I'm trying to decide whether to use email.message.Message.as_string(), or ProblemReport().write(StringIO()) to create a [05:32] hierarchy. [05:33] so you agreed with gary to do it in apport now anyhow? [05:33] i'd probably just pprint a python dict [05:43] lifeless: looks like it came back about 5 mins ago [05:44] poolie: well, I spent today determining how easy it would be to do now. I just saw Gary's email that he preferred to do it in the long term. However, my original solution, was almost identical to ProblemReport.write_mime() except that I don't base64 encode things and that it handled the special case of the request variables hierarchy. I really think we should use apport now. oops-tools will still be able to process the old oops [05:44] format, so the differences in the apport format don't cause any implementation problems. [05:49] EdwinGrubbs: so they're not really a ahierarchy, are they? [05:49] i mean it's just a dict of strings [06:01] poolie: right, I just meant that the whole problem report is a hierarchy, since the request variables contains multiple values. [06:05] the simplest thing that could work is to pprint an array or put them in urlencoded [06:06] istm that using email formatting would be complicated, might break, and wouldn't help [06:06] ditto nested appotr [06:10] wgrant: seems I needed to create the BuildFarmJob from the factory for my custom job type. Passing tests again. [06:22] spm: ok cool [06:22] spm: so, can we enable profiling, and make the load be good ? [06:24] lifeless: hey, what ever happened with SSL improvements? Was just reviewing past threads.. [06:28] SpamapS: theres an RT ticket open to increase the cache length [06:28] (for idle keys) [06:29] and theres another open to get me access to the DC apache front end over a VPN + HTTP; I can then test a FE SSL here [06:29] ah cool. I have used distcache for mod_ssl in the past to great effect before btw. ;) [06:30] I'm not sure if we have dual apache or not [06:30] I suspect not [06:31] jtv: uhm, doesn't the name from the the LFA too ? [06:31] wow distcache's last release was in 2004 .. man its been so long since I setup an actual SSL server .. got BigIP's to do it a while back and have just been soft on SSL ever since. ;) [06:31] lifeless: yes, as usual I saw my mistake right after I said it—but no reason to keep you at the time. ;) [06:32] lifeless: can there be thread/process boundaries in this call chain that I would not see at all? [06:33] well, the txrestfulclient hello world passes again [06:33] that's something [06:34] but also probably enough for now [06:36] jtv: deferToThread in the librarian is the call boundary [06:36] jtv: when it returns from the callable supplied to that function, it comes back across the thread. [06:38] lifeless: I don't see that happening anywhere in the call chain from the first fetch of the LFA to the redundant second fetch—I guess that means that it's safe to re-use the same LFA object. [06:43] lifeless: is back in profling mode [06:43] pro-fling. hrm. maybe not quite. profiling tho.... [06:45] heh [06:45] spm: and hows the load ? [06:45] dropping. 1 3 4 atm [06:46] spm: is it running with/without the patch ? [06:46] good question... [06:46] spm: and are background jobs still disabled ? [06:46] with-out [06:46] yup; just nowish. [06:46] ok thats cool [06:46] spm: have they been killed :P [06:47] oh yes. legit excuse to kill cronjobs off? opportunities like this are rare and to be enjoyed post haste! [06:47] ok, profiling things now [06:48] first off, bugtask:+index [06:49] and now ubuntu/++assignments [06:50] ok, 7 seconds rather than 16 [06:50] spm: the patch I put up the other day [06:50] hm? [06:51] spm: can we put that on again? [06:51] do you have a paste handy? [06:51] said file has been overwritten since [06:51] one sec , finding/making [06:52] http://paste.ubuntu.com/489589/ [06:56] spm: ^ [06:56] ta [06:57] sorry - was horribly distracted doing the push code for the release; and made the shocking discovery that praseodymium does NOT have sl installed. [06:57] naturally this is a critical/urgent problem and moves to rectify needed to be made. [06:58] indeed [06:58] sysadmin porn comes first, of course. [06:58] hahaha [06:58] darn, I typed that :( [06:59] :P [06:59] * SpamapS cannot disagree with that [07:00] * SpamapS prepares a petition to have sl added to the server cd seed [07:03] there. nicely taken wildly out of context. [07:03] rotfl [07:04] * spm bows at the appreciation [07:04] the fine art of context free quoting - choosing the title [07:05] damn, seems somebody beat me to it. ;) [07:05] lifeless: restarting with the patch; give it a few [07:05] thanks [07:05] spm: your title was better than mine. :) [07:05] heh [07:05] well done [07:06] blink. something crash nicely on the restart [07:09] wow. something is really not right here... [07:09] details? [07:09] oh ffs. it's doing a staging rollout AGAIN! aARGH [07:10] <...> [07:10] * spm grumps off and puts in the lock file on sourcherry. [07:12] i've killed the crontab entry as a savage "don't do that" for now. I'll see if I can manually get the app server on asuka back to right'n'goodness [07:15] jtv: Sorry, I'm not completely down with the latest implementation details. [07:15] wgrant: I think I've done all I know I should do… question now is: what next? [07:16] jtv: You have BuildFarmJob rows now? [07:16] Yes [07:16] I had to create them myself, which from what I see elsewhere seems to be the way. [07:16] I suggest talking to noodles. [07:17] Yeah [07:17] How much do you still depend on BranchJob? [07:18] I haven't gone through the details, but I thought it depended mainly on how much the dispatch machinery still depends on TranslationTemplatesBuildJob. [07:19] I mean, there's not much in there other than methods the build farm needs. [07:19] True. [07:19] OK, so you've probably done your bit for now, but talk to noodles. [07:19] If the build farm can start dispatching TranslationTemplatesBuilds instead of TranslationTemplatesBuildJobs, I'm either there or very close. [07:19] That's the next step. [07:19] Yes, I will definitely talk to him once he appears—he also promised me a review this morning. :) [07:20] Ah, excellent. [07:21] I'm sort of eager to start cleaning out the old stuff, and sort of not looking forward to it at the same time. :-) [07:32] spm: and the conclufsion is? [07:33] ARGH [07:33] am working on getting that to argh <== atm [07:33] ij [07:33] ok [07:33] bbs [07:33] can you kick the profile rsync in the interim? [07:34] so kicked [07:36] try to start the patched and profiling "new" version... [07:36] trying. [07:48] it's still "starting".... [07:50] lifeless: wooo. it's started. have at it. [07:55] spm: load is still low ? [07:55] spm: and does it have both patches, or only the query changing one? [07:58] spm: please kick the profile rsync - thanks [07:59] kicked and very low, 0.22 0.35 0.71 [07:59] thanks [07:59] spm: which patch(es) did it have? [08:00] lib/lp/blueprints/model/specification.py and the profiling on [08:00] uhm, both patches change that file :P [08:01] haha [08:01] have alook [08:01] does it change the column definitions [08:01] one sec. just trying to stop a db from faceplanting [08:01] or the query [08:01] kk [08:07] lifeless: appears to be this one at a cursory glance at the first few lines: http://paste.ubuntu.com/489589/ [08:08] ok, could you appyly the other as well ? [08:11] heh sure, you have a paste handy? [08:13] yes [08:13] hang on while I check the backlog [08:13] ta [08:14] just doing about 17 bazzilions things at once atm. [08:14] spm: Like notmal [08:14] Er, normal [08:14] .... [08:14] http://pastebin.com/E7hMnL28 [08:14] spm: ^ [08:14] ta [08:14] gimme 5-10; just need to disable.notify a bunch of things in prep for the release in 45. [08:15] sure [08:22] good morning === spm changed the topic of #launchpad-dev to: Launchpad down/read-only from 0800-1100 UTC for a code update | Launchpad Development Channel | Week 3 of 10.09 | PQM is CLOSED | firefighting: - | https://dev.launchpad.net/ | Get the code: https://dev.launchpad.net/Getting | On-call review in irc://irc.freenode.net/#launchpad-reviews [08:41] spm: I'll check back in regularly till you say its done... I can see you're busay [08:42] ta [08:46] * lifeless starts coming up with ideas to make rollouts even shorter [08:49] lifeless: did you talk with deryrck about my theory of the cause of the still remaining problems of the apprt retracer? that the librarian is simply queueing request from the app server for too long? [08:52] adeuring: it was discussed; IIRC it was diagnosed as missing firewall rules for some app servers [08:53] spiv: intersting.... do you have any details? [08:54] adeuring: see the #is and/or #launchpad-code logs from about 7 hours ago [08:54] spiv: thanks! [08:57] Hi [08:58] adeuring: hi [08:59] hi lifeless [08:59] adeuring: tcp connect timeout default is 30 seconds IIRC (you need to wait for the MSS, again IIRC) [08:59] if the librarian was doing that with 5 concurrent requests we'd be sunk, also its careful to do lots of incremental bits of work [09:00] so I'd expect a-diskio * 4 peak slowness, - a second or two tops - not 30 [09:00] adeuring: which is why I looked elsewhere [09:00] adeuring: now, 4 is the number of threads our appservers have [09:01] so 5 spilling you into a new appserver was a reasonable assumption :) [09:01] right [09:01] I'm changing our firewall rules so that we REJECT rather than DROP cross-site requests to make diagnosis of this kind of thing easier, FWIW [09:02] elmo: thank you! [09:02] lifeless: just tried my scrpit with 5 concurrent requests -- looks a bit better [09:02] adeuring: a bit? [09:02] no errors [09:02] thats a lot better then :P [09:03] lifeless:i think we should do more logging of what happens on the librarian server, [09:03] lifeless: i've been reading a book ian gave me 'prefactoring' [09:04] adeuring: We should strike a balance between non and too much... note that the librarian does OOPS as of this rollout (or was it last one) [09:04] it's a bit basic but it has some nice suggestions along the lines of your guide there [09:04] I think QA haven't added it to the daily reportcard yet. [09:04] poolie: intereseting [09:04] poolie: can I borrow it @ UDS ? [09:04] if you remind me several times closer to the date :) [09:05] poolie: is this close enough? [09:05] poolie: how about now? [09:05] lifeless: applied that 2nd patch as well; restarting now [09:05] Haha [09:05] uh [09:05] good night :) [09:05] poolie: :P [09:05] poolie: I'll remind you just before we go [09:05] lifeless: well, I think the issue is not necessarily a bug in code or anything -- just that the librarian can't handle requsts fast enough [09:05] adeuring: I'm not aware of issues like that [09:06] adeuring: or data suggesting we have them; certainly I agree that we *need to be able to diagnose such things* [09:06] morning all [09:06] and if the logs are insufficient, we should increase them till they are. [09:06] lifeless: right [09:06] adeuring: we're now logging librarian times in the appserver for downloads; we can add uploads easily [09:07] hmm [09:07] for uploads we should also add the size, perhaps in the closing bit of downloads too [09:07] lifeless: ok, that would help. but i suspect that these connection timeouts are caused by the librarian, not the app server [09:07] adeuring: which timeouts? [09:08] adeuring: if you mean the ones apport has been having, that show up as 500 errors from the API with timeout to mizuho in them... [09:08] adeuring: they were a firewall [09:08] adeuring: the evidence I've seen suggests the librarian server is coping just fine [09:08] adeuring: why do you think otherwise? [09:08] lifeless: well, my little script causes them just fine even now [09:08] adeuring: it does? [09:09] yes, if it starts 8 concurrent requests [09:09] can you pastebin the appserver trace? [09:09] 5 seems to be better [09:09] or does it seem to be identical? [09:09] "just now", you mean during a rollout? ;) [09:09] adeuring: oh hang on. lollolllollololl [09:10] adeuring: launchpad is down. Readonly mode. Zer iz no upload possible because the librarian is switched off [09:10] or meant to be; if you're successfully uploading there is something really wrong. [09:10] lifeless: yeah, ok, that's another possible cause [09:10] but... when exactly was the rollout started? [09:10] About 10 minutes ago. [09:10] 11 minutes ago [09:11] ok.. hard to be sure then when exactly i ran the script again.... [09:11] ok, let's try again once the rollout is done [09:11] yes [09:11] if you can provoke a connection timeout error, its a definite bug. [09:11] My first reaction is to look elsewhere than the librarian [09:12] adeuring: so, connection timeouts are really unlikely to be a problem in the librarian server in my opinion [09:12] for a bunch of reasons. [09:12] but we can't exclude it; lets just keep the net broad. [09:12] spiv: well, we _could_ see them [09:12] e.g. today we found a concrete problem with the firewall config [09:12]