[00:30] wgrant: the PPA quotas are precisely as arbitrary as the recipe volume quota [00:30] the PPA quota is less of a surrogate [00:32] lifeless: Well, arbitrary is perhaps not the best word. [00:32] But the PPA quota does what it's designed to do. [00:32] Places a hard, known limit. [00:32] The recipe one does not. [00:33] I'm having some trouble following your detailed description of whats wrong; perhaps you could give me a summary instead? [00:33] The PPA disk quota prevents users from using more disk than they should. [00:34] The recipe build quota is presumably to stop users from using more buildfarm time than they should. [00:34] But it limits how many times they can use it. [00:34] Not how much of it they use. [00:34] right [00:34] as I said, its a surrogate. [00:34] I'd welcome a series of bugs that would allow measuring and allocating the actual resource. [00:38] hi there [00:38] jml wrote some code to suck lp bugs into a desktopcouch db [00:38] i was thinking in the shower this could possibly even be offered as a standard facility [00:39] which would be pretty cool: cheap async (at least readonly) access [00:39] this is very blue sky of course [00:39] maybe [00:40] I'd want couch to have an OOM less sysadmin time first :) [00:40] :) [00:40] (and sadly, I'm not joking) [00:40] * poolie reinterprets that as "order of magnitude" not "out of memory" [00:40] yeah [00:41] perhaps i'll look at his client code someday === al-maisan is now known as almaisan-away [00:45] lifeless: your shoes are now available to be filled - help me find someone good? [00:46] poolie: certainly. [00:46] Have you tweeted yet ? [00:46] i can do that [00:52] http://www.google.com/instant/#utm_campaign=launch&utm_medium=van&utm_source=instant [01:00] why the GA tracking codes? (utm_* are google analytics tracking codes. fwiw) [01:01] spm: it was in my browser bar [01:01] copy-paste, you think I read these things? [01:01] fair enough :-) [03:08] does LoginToken:+validategpg talk to the gpg servers? [03:08] * lifeless is guessing it does [03:12] I wonder, should oauthnonces go in the sessiondb [03:13] hi folks, I created a couple projects recently without anything in them yet. would it be simpler to ask to rename them, or create another couple projects and ask to remove them? [03:13] poolie: lp in desktopcouch> standard facility where? [03:13] cr3: rename should be easy enough [03:13] cr3: unless you have mailing lists or ppas [03:13] james_w`: well, if i ever get around to it, i would look about writing an apis-couch daemon [03:14] eventually, and this is utter pie-in-the-sky, it would be cool to just have something like bugs.launchpad.net/bzr/+couch [03:14] and to through that means get a whole copy of them [03:14] apis-couch? what would that do? [03:15] james_w`: map lp into couch? [03:15] lifeless: nothing yet, so can I ask in the channel or is there a preferable avenue? [03:15] CHR should be able to help in #launchpad [03:15] failing that follow the instructions in the topic there. [03:15] :) [03:16] lifeless: well, I have a project already that doesn't need new daemons etc. [03:16] "CHR"? is that a nick or an acronym I'm not familiar with? [03:17] I'm interested in whether poolie has other ideas [03:17] lifeless: by the way, might you have a moment to chat about the progress of the results tracker? [03:17] james_w`: i have very few ideas ;) other than that having it in a local db that can sync smartly with the server would be nice [03:17] what's your project? [03:17] txrestfulclient === nigelb_ is now known as nigelb [03:18] poolie: it's almost transparent to the app whether it is talking to LP or a couchdb copy of the data. The non-transparent part is that if talking to couch it has extra capabilities to push stuff back to lp [03:19] really, wow [03:23] james_w`: i'll check it out [03:24] I'd love some help getting it up to standard [03:24] it's not got full coverage of the capabilities of the API yet [03:25] rolling it in to my attempt to write a twisted API library wasn't the best thing [03:27] james_w`: I'd love to see that split out [03:27] james_w`: there is a use case in LP for a txlaunchpadlib [03:27] james_w`: but we wouldn't want the couch stuff mixed in [03:28] lifeless: the code is independent at that level [03:28] lifeless: but the couch feature relies on the tx code as I have written it, so people can't play with that until they use twisted [03:29] (and couch is only one possible implementation of the required interface, it just so happens that a JSON-based document store is kind of handy for this) [03:30] lifeless: sleep time, I'll catch up with you another time about the results tracker. cheerio [03:30] cr3: ciao [03:30] the latency is so high doing it on twisted would be good [03:32] james_w`: so your code can use couch as a cache? [03:32] poolie: "cache", yes [03:33] poolie: you talk to LP and it sticks the documents it gets back in to couch before returning them to you [03:33] poolie: at any time you like you can reconfigure the client to talk directly to couch, and you will get those documents back again. [03:33] that's the read-only part [03:34] then you can make changes, and it will store the modifications, and give you the updated information if asked for it again [03:35] sweet [03:35] then you can ask it to iterate the modifications and send them back to LP, and the collision detection will naturally act to prevent problems there [03:35] so this just all works over the existing restful protocol? [03:35] there are still a bunch of things that need work, and I'm not sure whether the approach taken will ever get us to 100%, but it does have an elegance [03:36] poolie: yep [03:36] isn't this what someone demo'd at last UDS? [03:37] poolie: with a way to replace that restful protocol with queries in to couch [03:37] nigelb: yeah, me, very shoddily [03:37] james_w`: lol, laptop not working et al ;) [03:37] exactly [03:38] * nigelb hugs james_w` :) [03:44] james_w`: now do this for notmuch pls [03:46] mwhudson: one day [03:47] though I think I should try and add more moving parts next time [03:57] wgrant: something I don't get… I'm to keep a bfj fk in my new TranslationTemplatesBuild table—but where does it come from? I see build_farm_job being set all over the place, but that all refers to BuildFarmJobOld stuff. [04:22] * poolie tries txrestfulclient [04:22] and whacks in to bug 461356 [04:22] <_mup_> Bug #461356: desktopcouch-service crashed with ImportError in () [04:35] spm: suppose we could zap the first 8 months of successful-updates.txt? [04:35] sure. one sec [04:35] spm: of this year! [04:36] [I think it goes back forever, so keep a copy [04:36] but its getting a tad large] [04:36] spm: also, whats it up to? [04:38] successful-updates-2008.txt (already existed) also now have successful-updates-2009.txt [04:38] heh [04:39] gmm [04:39] is there a system load average python module already, I wonder [04:40] import system.load.averages ? [04:40] which is spm doing the equivalent of import icanfly [04:44] lifeless: os.loadavg() [04:44] er no [04:44] getloadavg [04:45] right [04:46] bug https://bugs.edge.launchpad.net/oops-tools/+bug/243554, freshly updated [04:46] <_mup_> Bug #243554: oops report should record information about the running environment [04:48] I wonder if time.clock() is pid wide or pid wide :P [04:52] lifeless: one of those was supposed to be thread? [04:53] mwhudson: being droll about linux clarity in this area [04:53] heh [04:53] we can use clock_gettime(CLOCK_THREAD_CPUTIME_ID) though [04:54] lifeless: would you mind if I just made that Librarian change now? [04:55] jtv: I don't mind when you do it :) [04:55] lifeless: I've just pulled in the information from /proc before [04:55] lifeless: :) [04:55] jtv: if you mean ...'and sneak it in the release', that would be risky, wouldn't it? [04:56] lifeless: that would be, and it's not what I had in mind. Thinking more of avoiding being the subject of future "what flaming idiot made this horrible change!?" inquiries [04:57] jtv: I'd hope noone in the team would take that attitude following something up [04:57] jtv: and the only risk I know of, is the one I mentioned: the librarian db access is currently very tightly encapsulated; it needs to stay that way. [04:57] Well, so to speak. The thing is, I'm not 100% happy about making an API where you pass "either an object or its id." [04:57] jtv: so don't do that. [04:58] jtv: make a separate pass-the-object API (perhaps on the object :P) [04:58] Ahh [04:58] and have the current id based function delegate [04:58] Now, what is the reason for the tight encapsulation? [04:58] because its in twisted [04:58] so its called via deferToThread [04:58] * jtv likes reasons—easier to remember than rules :) [04:58] it can do DB access in the thread [04:59] it cannot outside of it, or all other requests in progress will block. [04:59] I thought it ran as a separate process? [04:59] jtv: if you aren't touching code used in the librarian /server/ this won't matter - but I don't know exactly what you're touching (and be sure to check for imports :)) [04:59] jtv: the librarian is a twistd process, yes. [05:00] lifeless: I'm touching stuff in canonical.launchpad.librarian and canonical.librarian.client, but nothing in server. [05:00] in the process it has a mainloop, and worker threads; the worker threads do DB access, the mainloop does all the business logic (except DB access) [05:00] jtv: the server is in canonical.launchpad.librarian [05:01] So client.py is sort of exceptional in there? [05:01] I mean, FileDownloadClient does run client-side, right? [05:02] it might be nice to have the twisted code more visually distinct (e.g. in a submodule, or move the client to lp.services.librarian.client, or something. [05:02] I don't want to make the scope bigger on you :) [05:02] Yes. [05:02] I'm not going to do anything along those lines, no. :) [05:04] so, to answer your question, I presume so, but I'd need to check. [05:04] All I'd advising is a little caution and investigation in this area, as we have two dramatically different programming models in play here, and they mix poorly. [05:04] speaking of which. [05:05] ? [05:05] spm: are were there yet ? its been an hour. [05:05] jtv: the speaking of which was a joke. [05:05] :-| [05:06] jtv: the line before wasn't. [05:06] That much was clear. The joke I still don't get. [05:06] oh, it wasn't a very good one [05:06] it was leading into the spm: line [05:07] ah [05:07] lifeless: still, better to have the other shoe dropped :) [05:13] taking a breather; I'll be back to heckle spm later [05:13] lifeless: I think there's a better solution for the librarian problem: it's a bad internal distribution of responsibilities. In _getPathForAlias, the LFA is loaded _only_ to determine that it's visible. The actual work doesn't involve the object at all. [05:14] nm; take your breather [05:15] poolie: ping [05:16] james_w`: a teeny patch for you [05:16] EdwinGrubbs: hi there [05:24] poolie: I have some questions about the preferred way to use the apport format. The oops currently groups the request variables together, but it seems cleaner to use email.message.Message than to use another ProblemReport to make a hierarchy, so that I don't end up with multiple Date and ProblemType fields. [05:25] poolie: I also wondered if I should use the Stacktrace field for python stacktraces, or if it would be better to only use that for stacktraces created by gdb. [05:26] EdwinGrubbs: you can look at bzrlib.crash to see what we do [05:26] we use Traceback for the python traceback [05:27] which i think is consistent with what other python programs use [05:27] i would probably have one thing RequestVariables containing all the variables [05:27] either just as they are in the url, or decoded [05:32] poolie: oh, request variables in the oops actually is all the cgi variables like HTTP_REFERER, REQUEST_METHOD, etc. and not just the query string. I saw in the apport file format pdf that some of the hierarchical variables are stored as "name=value", but it seems more consistent to me to use "name: value". I'm trying to decide whether to use email.message.Message.as_string(), or ProblemReport().write(StringIO()) to create a [05:32] hierarchy. [05:33] so you agreed with gary to do it in apport now anyhow? [05:33] i'd probably just pprint a python dict [05:43] lifeless: looks like it came back about 5 mins ago [05:44] poolie: well, I spent today determining how easy it would be to do now. I just saw Gary's email that he preferred to do it in the long term. However, my original solution, was almost identical to ProblemReport.write_mime() except that I don't base64 encode things and that it handled the special case of the request variables hierarchy. I really think we should use apport now. oops-tools will still be able to process the old oops [05:44] format, so the differences in the apport format don't cause any implementation problems. [05:49] EdwinGrubbs: so they're not really a ahierarchy, are they? [05:49] i mean it's just a dict of strings [06:01] poolie: right, I just meant that the whole problem report is a hierarchy, since the request variables contains multiple values. [06:05] the simplest thing that could work is to pprint an array or put them in urlencoded [06:06] istm that using email formatting would be complicated, might break, and wouldn't help [06:06] ditto nested appotr [06:10] wgrant: seems I needed to create the BuildFarmJob from the factory for my custom job type. Passing tests again. [06:22] spm: ok cool [06:22] spm: so, can we enable profiling, and make the load be good ? [06:24] lifeless: hey, what ever happened with SSL improvements? Was just reviewing past threads.. [06:28] SpamapS: theres an RT ticket open to increase the cache length [06:28] (for idle keys) [06:29] and theres another open to get me access to the DC apache front end over a VPN + HTTP; I can then test a FE SSL here [06:29] ah cool. I have used distcache for mod_ssl in the past to great effect before btw. ;) [06:30] I'm not sure if we have dual apache or not [06:30] I suspect not [06:31] jtv: uhm, doesn't the name from the the LFA too ? [06:31] wow distcache's last release was in 2004 .. man its been so long since I setup an actual SSL server .. got BigIP's to do it a while back and have just been soft on SSL ever since. ;) [06:31] lifeless: yes, as usual I saw my mistake right after I said it—but no reason to keep you at the time. ;) [06:32] lifeless: can there be thread/process boundaries in this call chain that I would not see at all? [06:33] well, the txrestfulclient hello world passes again [06:33] that's something [06:34] but also probably enough for now [06:36] jtv: deferToThread in the librarian is the call boundary [06:36] jtv: when it returns from the callable supplied to that function, it comes back across the thread. [06:38] lifeless: I don't see that happening anywhere in the call chain from the first fetch of the LFA to the redundant second fetch—I guess that means that it's safe to re-use the same LFA object. [06:43] lifeless: is back in profling mode [06:43] pro-fling. hrm. maybe not quite. profiling tho.... [06:45] heh [06:45] spm: and hows the load ? [06:45] dropping. 1 3 4 atm [06:46] spm: is it running with/without the patch ? [06:46] good question... [06:46] spm: and are background jobs still disabled ? [06:46] with-out [06:46] yup; just nowish. [06:46] ok thats cool [06:46] spm: have they been killed :P [06:47] oh yes. legit excuse to kill cronjobs off? opportunities like this are rare and to be enjoyed post haste! [06:47] ok, profiling things now [06:48] first off, bugtask:+index [06:49] and now ubuntu/++assignments [06:50] ok, 7 seconds rather than 16 [06:50] spm: the patch I put up the other day [06:50] hm? [06:51] spm: can we put that on again? [06:51] do you have a paste handy? [06:51] said file has been overwritten since [06:51] one sec , finding/making [06:52] http://paste.ubuntu.com/489589/ [06:56] spm: ^ [06:56] ta [06:57] sorry - was horribly distracted doing the push code for the release; and made the shocking discovery that praseodymium does NOT have sl installed. [06:57] naturally this is a critical/urgent problem and moves to rectify needed to be made. [06:58] indeed [06:58] sysadmin porn comes first, of course. [06:58] hahaha [06:58] darn, I typed that :( [06:59] :P [06:59] * SpamapS cannot disagree with that [07:00] * SpamapS prepares a petition to have sl added to the server cd seed [07:03] there. nicely taken wildly out of context. [07:03] rotfl [07:04] * spm bows at the appreciation [07:04] the fine art of context free quoting - choosing the title [07:05] damn, seems somebody beat me to it. ;) [07:05] lifeless: restarting with the patch; give it a few [07:05] thanks [07:05] spm: your title was better than mine. :) [07:05] heh [07:05] well done [07:06] blink. something crash nicely on the restart [07:09] wow. something is really not right here... [07:09] details? [07:09] oh ffs. it's doing a staging rollout AGAIN! aARGH [07:10] <...> [07:10] * spm grumps off and puts in the lock file on sourcherry. [07:12] i've killed the crontab entry as a savage "don't do that" for now. I'll see if I can manually get the app server on asuka back to right'n'goodness [07:15] jtv: Sorry, I'm not completely down with the latest implementation details. [07:15] wgrant: I think I've done all I know I should do… question now is: what next? [07:16] jtv: You have BuildFarmJob rows now? [07:16] Yes [07:16] I had to create them myself, which from what I see elsewhere seems to be the way. [07:16] I suggest talking to noodles. [07:17] Yeah [07:17] How much do you still depend on BranchJob? [07:18] I haven't gone through the details, but I thought it depended mainly on how much the dispatch machinery still depends on TranslationTemplatesBuildJob. [07:19] I mean, there's not much in there other than methods the build farm needs. [07:19] True. [07:19] OK, so you've probably done your bit for now, but talk to noodles. [07:19] If the build farm can start dispatching TranslationTemplatesBuilds instead of TranslationTemplatesBuildJobs, I'm either there or very close. [07:19] That's the next step. [07:19] Yes, I will definitely talk to him once he appears—he also promised me a review this morning. :) [07:20] Ah, excellent. [07:21] I'm sort of eager to start cleaning out the old stuff, and sort of not looking forward to it at the same time. :-) [07:32] spm: and the conclufsion is? [07:33] ARGH [07:33] am working on getting that to argh <== atm [07:33] ij [07:33] ok [07:33] bbs [07:33] can you kick the profile rsync in the interim? [07:34] so kicked [07:36] try to start the patched and profiling "new" version... [07:36] trying. [07:48] it's still "starting".... [07:50] lifeless: wooo. it's started. have at it. [07:55] spm: load is still low ? [07:55] spm: and does it have both patches, or only the query changing one? [07:58] spm: please kick the profile rsync - thanks [07:59] kicked and very low, 0.22 0.35 0.71 [07:59] thanks [07:59] spm: which patch(es) did it have? [08:00] lib/lp/blueprints/model/specification.py and the profiling on [08:00] uhm, both patches change that file :P [08:01] haha [08:01] have alook [08:01] does it change the column definitions [08:01] one sec. just trying to stop a db from faceplanting [08:01] or the query [08:01] kk [08:07] lifeless: appears to be this one at a cursory glance at the first few lines: http://paste.ubuntu.com/489589/ [08:08] ok, could you appyly the other as well ? [08:11] heh sure, you have a paste handy? [08:13] yes [08:13] hang on while I check the backlog [08:13] ta [08:14] just doing about 17 bazzilions things at once atm. [08:14] spm: Like notmal [08:14] Er, normal [08:14] .... [08:14] http://pastebin.com/E7hMnL28 [08:14] spm: ^ [08:14] ta [08:14] gimme 5-10; just need to disable.notify a bunch of things in prep for the release in 45. [08:15] sure [08:22] good morning === spm changed the topic of #launchpad-dev to: Launchpad down/read-only from 0800-1100 UTC for a code update | Launchpad Development Channel | Week 3 of 10.09 | PQM is CLOSED | firefighting: - | https:/​/​dev.launchpad.net/​ | Get the code: https:/​/​dev.launchpad.net/​Getting | On-call review in irc:/​/​irc.freenode.net/​#launchpad-reviews [08:41] spm: I'll check back in regularly till you say its done... I can see you're busay [08:42] ta [08:46] * lifeless starts coming up with ideas to make rollouts even shorter [08:49] lifeless: did you talk with deryrck about my theory of the cause of the still remaining problems of the apprt retracer? that the librarian is simply queueing request from the app server for too long? [08:52] adeuring: it was discussed; IIRC it was diagnosed as missing firewall rules for some app servers [08:53] spiv: intersting.... do you have any details? [08:54] adeuring: see the #is and/or #launchpad-code logs from about 7 hours ago [08:54] spiv: thanks! [08:57] Hi [08:58] adeuring: hi [08:59] hi lifeless [08:59] adeuring: tcp connect timeout default is 30 seconds IIRC (you need to wait for the MSS, again IIRC) [08:59] if the librarian was doing that with 5 concurrent requests we'd be sunk, also its careful to do lots of incremental bits of work [09:00] so I'd expect a-diskio * 4 peak slowness, - a second or two tops - not 30 [09:00] adeuring: which is why I looked elsewhere [09:00] adeuring: now, 4 is the number of threads our appservers have [09:01] so 5 spilling you into a new appserver was a reasonable assumption :) [09:01] right [09:01] I'm changing our firewall rules so that we REJECT rather than DROP cross-site requests to make diagnosis of this kind of thing easier, FWIW [09:02] elmo: thank you! [09:02] lifeless: just tried my scrpit with 5 concurrent requests -- looks a bit better [09:02] adeuring: a bit? [09:02] no errors [09:02] thats a lot better then :P [09:03] lifeless:i think we should do more logging of what happens on the librarian server, [09:03] lifeless: i've been reading a book ian gave me 'prefactoring' [09:04] adeuring: We should strike a balance between non and too much... note that the librarian does OOPS as of this rollout (or was it last one) [09:04] it's a bit basic but it has some nice suggestions along the lines of your guide there [09:04] I think QA haven't added it to the daily reportcard yet. [09:04] poolie: intereseting [09:04] poolie: can I borrow it @ UDS ? [09:04] if you remind me several times closer to the date :) [09:05] poolie: is this close enough? [09:05] poolie: how about now? [09:05] lifeless: applied that 2nd patch as well; restarting now [09:05] Haha [09:05] uh [09:05] good night :) [09:05] poolie: :P [09:05] poolie: I'll remind you just before we go [09:05] lifeless: well, I think the issue is not necessarily a bug in code or anything -- just that the librarian can't handle requsts fast enough [09:05] adeuring: I'm not aware of issues like that [09:06] adeuring: or data suggesting we have them; certainly I agree that we *need to be able to diagnose such things* [09:06] morning all [09:06] and if the logs are insufficient, we should increase them till they are. [09:06] lifeless: right [09:06] adeuring: we're now logging librarian times in the appserver for downloads; we can add uploads easily [09:07] hmm [09:07] for uploads we should also add the size, perhaps in the closing bit of downloads too [09:07] lifeless: ok, that would help. but i suspect that these connection timeouts are caused by the librarian, not the app server [09:07] adeuring: which timeouts? [09:08] adeuring: if you mean the ones apport has been having, that show up as 500 errors from the API with timeout to mizuho in them... [09:08] adeuring: they were a firewall [09:08] adeuring: the evidence I've seen suggests the librarian server is coping just fine [09:08] adeuring: why do you think otherwise? [09:08] lifeless: well, my little script causes them just fine even now [09:08] adeuring: it does? [09:09] yes, if it starts 8 concurrent requests [09:09] can you pastebin the appserver trace? [09:09] 5 seems to be better [09:09] or does it seem to be identical? [09:09] "just now", you mean during a rollout? ;) [09:09] adeuring: oh hang on. lollolllollololl [09:10] adeuring: launchpad is down. Readonly mode. Zer iz no upload possible because the librarian is switched off [09:10] or meant to be; if you're successfully uploading there is something really wrong. [09:10] lifeless: yeah, ok, that's another possible cause [09:10] but... when exactly was the rollout started? [09:10] About 10 minutes ago. [09:10] 11 minutes ago [09:11] ok.. hard to be sure then when exactly i ran the script again.... [09:11] ok, let's try again once the rollout is done [09:11] yes [09:11] if you can provoke a connection timeout error, its a definite bug. [09:11] My first reaction is to look elsewhere than the librarian [09:12] adeuring: so, connection timeouts are really unlikely to be a problem in the librarian server in my opinion [09:12] for a bunch of reasons. [09:12] but we can't exclude it; lets just keep the net broad. [09:12] spiv: well, we _could_ see them [09:12] e.g. today we found a concrete problem with the firewall config [09:12] adeuring: no, you saw a firewall. [09:13] adeuring: we know for *some things* it was definitely *a firewall* [09:13] lifeless: what firewall issue was it? [09:13] adeuring: 2 appservers are in a different datacentre. [09:13] adeuring: it's a Twisted server that ought that ought to always be accepting connections rapidly; if connections are not being processed by that daemon in a timely fashion the only likely cause is that the librarian's host is totally swamped for disk IO [09:13] ahhh, so they could not access the librarian? [09:13] adeuring: the firewall rules for them did not include the restricted upload port, which is what the appservers connect to to upload restricted files. [09:14] adeuring: the firewall rules dropped the packets as hostile, and so at the network layer it looks like the librarian /machine/ is missing. [09:14] lifeless: ah, ok, that looks like a real problem.... [09:14] adeuring: so a failure to connect() from another machine strongly suggests problems in something other than the librarian daemon. [09:15] adeuring: but there may be other problems. [09:15] adeuring: however, we *know* there was *a* problem, that would cause the symptoms seen before [09:15] That doesn't make it impossible, of course, but it does mean that assuming the daemon is the most likely source of the problem is likely to be the wrong approach. [09:16] yes, i understand [09:24] lifeless: are you familiar with the distroeries +queue page? [09:24] distroseries, even [09:24] I seem to recall it showing up on slow-pages reports [09:25] indeed [09:25] it's the bane of my life [09:25] anyhow, I'm not intimately familiar with it [09:25] but lets pretend I am [09:25] well it's mainly intended to let archive admins move uploaded packages into the accepted state [09:25] if they got held for some reason [09:25] righto, its the aa review queue [09:26] this is normally done in zopeless scripts if it's auto-accepted [09:26] when accepting packages we also close lots of bugs and email people [09:26] (potentially) [09:26] meep [09:27] and this is where the trouble arises with that page if any of the objects are private [09:27] (not to mention the query load) [09:27] and email load [09:27] that really needs to be out of appserver anyway [09:27] yes [09:27] I'm so excited [09:27] anyway, I have a bug where it's OOPSing occasionally for someone when it tries to access private email addresses [09:27] should be able to point really clearly at email perf tomorrow [09:28] we'll have failed convertToQuestions, I'm sure. [09:28] I am wondering if it's acceptable to remove the security proxy in carefully defined situations [09:28] so, why does it try to access their email address? [09:28] [clearly its ok to do that in carefully defined istuations [09:28] code exists to serve us, not the other way around; but if we can avoid it its somewhat nicer. [09:29] it's trying to email potentially private addresses as part of a) upload notification, b) bug notification [09:29] all done under the permission of the webapp user [09:29] now, to avoid disclosure that has to be part of the BCC right ? [09:29] or a direct mail [09:29] long term we need to jobify it of course [09:29] did I just make up a word? :) [09:30] I don't think that's the private email address problem. [09:30] IIRC it dies (possibly correctly) when trying to include it in Changed-By or Signed-By in the announcement email. [09:30] I'd suggest having some method that you pass to the Person asking it to do the bit thats private [09:30] But I said that in the bug... let's see.. [09:30] yeah, I'd say thats correct. [09:30] that would be a) as I said above [09:31] bigjools: There's no problem emailing to them, though. [09:31] It's including them in the email that's the problem. [09:31] Oh, no, other way around. [09:31] This is confusing. [09:31] So it must already rSP in places. [09:31] no [09:31] well, [09:31] I said in the bug: [09:31] I believe it only fails if it would send a notification to the private [09:31] email address; using the private email address in the email (eg. if the [09:31] person is deactivated) seems to work fine. [09:32] I would have thought any access of a private email would blow up [09:32] Yeah, it does. [09:32] So it's rSPing already. [09:32] hmmm [09:32] broad brush strokes [09:32] here is how I would tackle it. [09:32] I would ensure that Person is responsible for deciding when to/not to bypass the proxy [09:33] removeSecurityProxy does not appear in the queue.py file [09:33] so if it's used it's elsewhere [09:33] lifeless: I'm not sure it knows enough to do that does it? [09:33] bigjools: add methods ;) [09:33] eugh :( [09:34] Person is already bloated [09:34] multiple places may want to be able to send an email [09:34] *to* someone, with only one recipient [09:34] thats reasonable to bypass the proxy -in that case- [09:35] grabbing a private email to put into a template for announcements isn't ok though. [09:35] that was the point I was going to make [09:35] as long as folk choosing to use the method won't be confused or guided into making the wrong choice, I think its ok to do it anywhich way.. [09:36] maybe we should refuse uploads on people who have private email addresses? [09:36] s/on/by/ [09:36] why are we looking up their email (vs using the one in the changes file itself) ? [09:37] it uses the preferred email [09:37] so if I upload as noddy@example.com [09:37] but my preferred mai is fred@demo.com [09:37] the .changes file is regenerated with fred@demo.cpom ? [09:38] nothing's regenerated [09:38] ok, so why are we looking up their email? [09:38] we put email addresses on the email template [09:38] whats the template file [09:38] it will be faster than 20 questions :) [09:38] to, y'know, send it :) [09:39] man it's been 3 years since I looked at this code, hang on [09:39] bigjools: no, I don't understand why we need their email [09:39] lib/canonical/launchpad/emailtemplates/upload-accepted.txt [09:39] it uses changed-by, maintainer and signer [09:39] the approver did the approving; we need their Name; the uploaded uploaded it, we need their Name. Neither sent the mail, so we don't need their emails, and we send separate copies per recipient, so thats fine too. [09:40] lifeless: the current format was arrived at after extensive discussion with the ubuntu guys [09:40] I think we need to involve them then. [09:41] we need to re-create the problem first [09:41] I think its incompatible to both have 'you can have your email address private in launchpad' and to be putting it in mails sent to other people. [09:41] certainly a test case will help [09:41] yes, exactly [09:41] o/~ [09:42] I also don't understand why a preferred email address would be private [09:42] its their only email ? [09:42] actually it's all or none isn't it? [09:42] probably [09:42] I think so [09:43] so trying to hide your email address while doing public works seems...odd :) [09:43] folk are very worried about spam [09:43] changes files go to a public list. [09:43] we could put on the template [09:44] for instance, yes. [09:44] Or an LP account url, or the SSO persistent url. [09:44] but the To: can't be hidden [09:44] bigjools: the To: shouldn't be them anyway ? [09:44] err From:, sorry [09:44] actually I can't remember [09:44] I think the uploader is CCed from memory [09:44] when I read that template, it doesn't look like it would make sense to be 'from them', because its 'thanking them' [09:45] bigjools: I have a suggestion: register a UDS spec for this. [09:45] bigjools: it needs it. [09:45] well, there's quite a few people involved in a package [09:46] well, maybe it doesn't, but I think the driving forces are complex enough it will benefit from that scale of analysis and discussion. [09:46] I'm not comfortable with the analysis yet [09:46] ok [09:46] once we get a failing test case (and when I can see the oops that might help) then we can decide where to go [09:46] lets get a autoated test [09:47] agreed [09:47] if I can help further I'd be delighted to do so, [09:47] great, thank you [09:47] but I suspect that its going to run into a definitional problem very early on rather than a code problem; and for that the stakeholders... have to hold their stakes. [09:48] dipped in silver nitrate? [09:48] lol [10:23] spm: hah just saw you put th epatch on... trying [10:23] its gone already... [10:23] will try tomorrow [10:38] bigjools: I bet that https://launchpad.net/ubuntu/+search (Distribution:+search) will be your top timeout this cycle. [10:39] yay [10:41] anyhow, I'm going to be looking at when I get up :) [10:41] https://lpstats.canonical.com/graphs/OopsLpnetHourly [10:42] adeuring: try now [10:47] Now, let's see how badly Soyuz breaks on Lucid... [10:47] /o\ [10:47] I love your optimism [10:47] Oh, code upgrade's done already too? Nice. [10:47] whee things feel sluggish === elmo changed the topic of #launchpad-dev to: Launchpad Development Channel | Week 3 of 10.09 | PQM is CLOSED | firefighting: - | https:/​/​dev.launchpad.net/​ | Get the code: https:/​/​dev.launchpad.net/​Getting | On-call review in irc:/​/​irc.freenode.net/​#launchpad-reviews [10:48] bigjools: Is optimism ever a good idea when Soyuz's fragility is involved? :) [10:49] hey it's a lot better than it used to nbe [10:50] It is. [10:50] But buildd-manager will still break if you think about it the wrong way. [10:50] Although I guess that's not technically Soyuz any more. [10:50] indeeed [10:50] I need to set up a new project for it [10:50] launchpad-buildmaster has a nice ring to it [10:52] another project? [10:52] bigjools: launchpad-buildfarm, surely? [10:53] bigjools: This reminds me... [10:53] no, buildmaster, it's specifically for the master sude [10:53] side [10:53] Have you ever seen production or DF b-m start failing to dispatch builds, complaining that the XML-RPC build command is being given too many arguments? [10:53] nope [10:54] I've seen it locally, and had similar reports from a couple of others' local setups. [10:54] And I partly tracked down why it happens. [10:54] But I don't know what triggered it... and I'd never heard about it happening anywhere prod-ish. [10:55] Sweet, codebrowse OOPSes appear to work. [10:55] (the problem is that BuilderSlave is broken now, but we only use RecordingSlave... except for in some circumstances that appear sometimes locally, which are not entirely clear) [10:56] spiv: you got one? [10:56] Be one of the first to generate your very own codebrowse OOPS: http://bazaar.launchpad.net/~bzr-pqm/bzr/bzr.dev/annotate/head%3A/README2 [10:56] spiv: but can you see it on lp-oops? [10:56] That's a little more friendly than the old page [10:56] lifeless: I haven't tried yet [10:56] spiv: acid test man ;) [10:56] lifeless: one test run with 8 parallel uplods: one failed with a "connection timed out" [10:56] lifeless: but even that page beats "Internal server error" [10:57] adeuring: whats the oops? [10:57] wait a second... [10:57] lifeless: so far the OOPS isn't on lp-oops [10:58] (OOPS-1713CB6) [10:58] spiv: it may need some follow up [10:58] spiv: with QA [10:58] specifically: [10:58] What's the typical delay for syncing? [10:58] - needs to be added to the lpnet summaries. [10:58] - needs to be added to the list of dirs to scan for the oops db scanner [10:58] lifeless: problem is that we don't get an OOPS [10:58] spiv: 3m I think [10:58] adeuring: what do we get ? [10:59] adeuring: if its apis check the X-Launchpad-OOPS header [10:59] just the error message "connection time out", my script doesn't print it [10:59] adeuring: (I think that is where the id goes) [10:59] ok, I'll try to find it [10:59] adeuring: to debug this we need: [10:59] the backtrace [10:59] the appserver it happened on [10:59] i know [11:00] it may be a further firewall issue on the other appservers. [11:00] or something. [11:00] so, how can I figure out which app server is used? [11:01] anyhow... late here. If you can get the appserver + error, ask the GSAs if they can confirm that appserver has access to the restricted upload port [11:01] adeuring: its in the OOPS :) [11:01] ah, ok [11:01] I'm quite sure we generate one, just goes into a header from what gary was saying th eother week [11:02] adeuring: you might like to file a new private bug [11:02] adeuring: unsubscribe everyone but you [11:02] and then test on it. [11:02] lifeless: yeah... [11:02] gnight [11:03] if its not a firewall issue I will debug it with your script (if you supply it) and the data you gather overnight; getting the OOPS is critical path to solving it. [11:03] * lifeless waves gnight [11:04] jelmer: Hey, how's the branch? Were you able to reproduce the issues I reported? [11:04] nn lifeless [11:04] Night lifeless. [11:04] Enjoy the rest of your evening lifeless [11:04] 'night lifeless [11:05] wgrant, yeah, fixing + qa'ing at the moment [11:05] jelmer: Great. [11:05] so, my failure-detecting b-m hasn't failed anything yet. Is it wrong to want to see that happen? :) [11:06] Sorry for throwing them at you so late... I wasn't aware until yesterday that the branch was targetted at 10.09. [11:07] bigjools: Does it manage to distinguish between build and builder failures? [11:07] that's the plan, yes [11:08] wgrant: Thanks for bringing it up in the first place. You saved quite a few people the stress and extra time that would've come with a broken rollout. [11:09] jelmer: I have a few other issues with the branch from a more thorough review today, but I'm sure I've caused you enough trouble for now. [11:09] None are particularly major, I don't think. === gmb changed the topic of #launchpad-dev to: Launchpad Development Channel | Week 4 of 10.09 | PQM is open for business | firefighting: - | https:/​/​dev.launchpad.net/​ | Get the code: https:/​/​dev.launchpad.net/​Getting | On-call review in irc:/​/​irc.freenode.net/​#launchpad-reviews [12:04] Morning, all. === mrevell is now known as mrevell-lunch === Ursinha-afk is now known as Ursinha === matsubara-afk is now known as matsubara === mrevell-lunch is now known as mrevell === almaisan-away is now known as al-maisan [14:22] Was cocoplum upgraded to Lucid? [14:22] grep-dctrl now chokes on maverick's Sources files for reasons which I cannot entirely determine. [14:23] Ooh dear. [14:24] I think cocoplum's a-f cache might be pretty fucked. [14:25] bigjools: ^^ [14:25] maverick's main Sources is now ~3.4MB. From an out-of-date mirror, it's ~4MB. [14:25] Some sections are truncated. [14:25] wgrant: known bug with lucid a-f [14:26] bug 633967 [14:26] <_mup_> Bug #633967: apt-ftparchive generates corrupt Sources stanzas for .dsc files without Checksums-* fields [14:26] Ah, great. [14:26] life is never too easy is it [14:27] (now, what were you saying about optimism a few hours ago? :P) [14:30] bigjools: I bet it's the case-sensitivity stuff in changes file parsing. [14:31] The code looks for Launchpad-bugs-fixed, but it's normally spelt Launchpad-Bugs-Fixed. [14:31] well remembered [14:31] The uploads referenced are fine. [14:32] A ha ha. [14:33] A test was changed to use the former capitalisation. [14:33] jelmer! [14:33] (that's one way of making the tests pass) [14:33] Heh. [14:43] wgrant: which test was changed? [14:44] bigjools: See the last hunk of the rev... [14:44] Let me find it. [14:44] which revno? [14:44] Ah, no. [14:44] The tests were already broken. [14:44] sync-source was changed to use the bad capitalisation. [14:45] it's because of the changed parser we're using now [14:45] db-devel r9741 [14:45] FML [14:45] It is, yes. [14:45] * bigjools wonders why that didn't break the test [14:46] The test packages probably use the bogus capitalisation. [14:46] oh, I see [14:46] * wgrant greps. [14:46] yes it does [14:46] YAY. [14:46] * bigjools files a bug === james_w` is now known as james_w [15:09] wgrant: btw, isn't it way past your bedtime? :) [15:12] bigjools: It's barely midnight. [15:33] Um, how would I set up an environment of db-stable instead of devel with rocketfuel-setup? I tried changing all of the lines I could find in the script and still got a "devel" directory under lp-branches. [15:35] ...or I could try reading the help output. *headdesk* [16:08] Where is the "etc/zope.conf" for disabling developer mode supposed to go, and how would one do that? [16:10] Also, I can log in with the test account, but how do I register new accounts? I don't see any links for that yet... [16:27] asking a possibly stupid question rather than wasting more time. i've got a totally up-to-date system and 'make' is failing with a bzrlib import error, "cannot import name SAFE_INSTRUCTIONS". any help? [16:27] gary, lifeless -^ [16:29] don't know leonardr but will try to dupe. maybe try a code team or bzr team member after that. [16:29] maybe rockstar can help? [16:30] tyarusso: register new accounts: not exposed on dev system. our openid server is responsible for that in the production/staging. [16:30] leonardr, update download-cache? [16:30] gary_poster: Oh. Well how is someone supposed to use it then? [16:30] tyarusso: etc/zope.conf: see configs/development/launchpad.conf [16:31] rockstar, up to date [16:31] tyarusso: sorry, don't understand question. dev build is for developers, which approximates production just enough to do dev style testing. We're not in charge of making new users, so we don't expose it [16:32] leonardr, oh, and sourcecode also needs to be updated. [16:32] SAFE_INSTRUCTIONS should come from bzr-builder, which can't be eggified. [16:32] gary_poster: Okay, my goal here is to set up a Launchpad instance that we could actually use for our company. What other pieces would I need to get to accomplish that? [16:33] rockstar: sourcecode/bzr-builder is up to date at revision 63 [16:33] however, revision 63 is from january. could it have moved? [16:34] leonardr, hm, it should have. abentley? [16:34] tyarusso: oh, eek. I'm sorry, that's a huge deal that we don't support. We open source the code to improve our free service. ...um, you could try asking the dev list and see if anyone there has advice? [16:35] rockstar, I have just landed my permit-commands branch. [16:35] rockstar: i've got the pqm-managed branch at bzr+ssh://bazaar.launchpad.net/~launchpad-pqm/bzr-builder/trunk/ [16:35] rockstar, in that branch, the revno is 66. [16:35] leonardr, ^^ [16:36] leonardr, the revno in that branch is 66, so you aren't up to date. [16:36] gary_poster: Hmm, okay. Meanwhile let's try a different direction: 1) Is the stuff you use for your openid server all open source too? 2) Do you offer hosted services that could use our own domain names & branding? If so, what's the fee schedule like for that? [16:36] abentley: ok, i've got the new stuff [16:38] gary_poster: Oh, I should note - we'd be interested in hosting both open-source and proprietary projects. [16:38] bac, are you around to address tyarusso's question # 2 above? [16:39] ok, i think i've got a bunch of out-of-date stuff [16:39] mrevell: can you help tyarusso? [16:40] haha, support hot potato! [16:40] ah, tyarusso, i read through all of your question. to answer #2, no, we don't offer branded hosting [16:40] bac: Hmm, okay. [16:41] The normal stuff might be better than what we have now, but wouldn't let us put everything all in one place, which would be ideal. [16:42] tyarusso: for 1) the openid server is closed source. ...I keep revising my answer to cut to the chase sufficiently...in sum, you'd have to branch the code to make it work, and it would be hairy; maybe you could get some community people interested, dunno. [16:43] tyarusso: everything in one place: this isn't my part of story, and I have to run now, but (A) you could have a group that collects your projects and (B) I'm almost 100% sure it can contain both proprietary and open-source bits. [16:44] bac, mrevell, you can fix my reply if necessary :-) === beuno is now known as beuno-lunch [16:55] is there any way in the Launchpad API for search a bug? [16:55] bugs has only createBug and in bug there is nothing which matches searchBug or something like that [17:01] rockstar, did you ever get that widget moving with help from dav? [17:01] deryck, nope. Was hoping to get with you tomorrow about it. [17:01] deryck, also, I just landed yui 3.2 into lazr-js, and that's got some more debugging happiness in it, so I thought I'd merge. [17:01] ok, let's plan on it. I'll try to poke at your code in my evening for home work. :-) [17:01] indeed! [17:10] m4n1sh, Find the project you want to search on and then use project.searchTasks() [17:10] gmb: so this means global bug search is not possible [17:11] m4n1sh, Yes, but it's not possible in the Launchpad UI, either. [17:12] gmb: I think you can https://bugs.launchpad.net/ [17:12] you can choose "All projects" [17:12] m4n1sh, You're right; my bad. [17:13] I thought we used Google for that. [17:13] me too :) [17:13] (Launchpad Bugs developer doesn't know about Launchpad Bugs) [17:13] m4n1sh, But no, that's not available in the API at the moment, I'm afraid. [17:13] Of that I'm sure. [17:13] I could not find it in the API [17:14] actually I am presenting a talk on PyCon India 2010 on launchpadlib [17:14] so needed to know abuot this [17:14] gmb: your help is appreciated :) [17:14] m4n1sh, I think that's because the API is basically an export of our underlying object model. [17:14] And the UI sits on top of that object model, so it can do things the API can't. [17:15] m4n1sh, There's probably a bug for it (I'm on the phone now, otherwise I'd check for you) but feel free to file one if there isn't. [17:15] gmb: sure :) thanks [17:16] np === matsubara is now known as matsubara-lunch === benji is now known as benji-lunch [17:40] gary, rockstar: (rockstar, i know this isn't your problem, but maybe you can help anyhow) [17:40] :-) [17:41] actually, let me check the wiki really quick, since the problem is probably that my dev process has gotten out of sync with launchpad's === Ursinha is now known as Ursinha-lunch [17:54] gary, rockstar: no, i'm fairly certain i've done everything right, and i'm getting an import error from shipit [17:54] from canonical.cachedproperty import cachedproperty -> "No module named cachedproperty" [17:54] leonardr, oh, shipit? Uh, I have no knowledge in that area. :( [17:54] rockstar: yeah, i said it wasn't your problem [17:55] leonardr, I thought we moved all the shipit import stuff to somewhere under lp though. [17:55] I saw that leonardr...I think I ran utilities/update-sourcecode. You've done that too? [17:55] All right, flailing e-mail sent to the mailing list - maybe someone out there has a clue :S [17:56] gary: does rocketfuel-get not run update-sourcecode? i didn't run it specifically [17:56] :-) Good luck. It won't be a core LP dev, I strongly suspect, and I'll be surprised if someone knows what to do, but who knows. I'm sorry that the available options don't work for you. :-/ [17:56] leonardr: yeah I think so [17:57] I mean, I think it does run it [17:57] it certainly should [17:57] I'll look... [17:58] it does [17:58] when i run it manually i get a bzr repository conversion error! [17:58] in dulwich I bet [17:58] yeah? [17:58] no, in pygettextpo [17:59] oh [17:59] maybe i should just remove sourcecode and get everything again? i don't think i've run this properly for a _long_ time [17:59] yeah ,maybe so leonardr. I did surgery instead, myself [18:00] I found the broken ones and did a fresh branch in launchpad/lp-sourcedeps/sourcecode [18:00] of the broken ones reported by update-sourcecode [18:00] your call === al-maisan is now known as almaisan-away [18:10] Who takes care of the svn import stuff? Can someone take a look at http://launchpadlibrarian.net/54208924/vcs-imports-django-trunk.log ? === beuno-lunch is now known as beuno === deryck is now known as deryck[lunch] [18:16] cody-somerville, are there branches from this branch? [18:17] rockstar, it looks like it but most are hundreds of weeks old [18:18] cody-somerville, I suggest creating a new import, and maybe deleting or Abandoning that branch. [18:18] It's still using cscvs, which we're basically not maintaining anymore. [18:21] rockstar, https://code.edge.launchpad.net/django <-- do you see how there are two series of the same name associated with lp:django? Is that intentionally possible? [18:22] cody-somerville, whoa, that's weird. I don't know if that's intentionally possible. Maybe sinzui knows? [18:22] cody-somerville, one seems to be off the project group, and the other off of the prohect [18:22] *project [18:22] beuno, both are just normal projects [18:22] beuno, yeah, djangoproject should probably be deleted. [18:22] yea [18:22] Er, disabled. [18:23] cody-somerville, can you file a bug about that, and I'll discuss it with the team this afternoon? [18:23] that is something interesting and new then! [18:23] It seems possible to associate any branch in launchpad with a series === benji-lunch is now known as benji [18:24] Or no... if I try to edit the series in djangoproject I get an error [18:24] so I wonder how the value it has now got set... maybe via web services API if it has different error checking? === matsubara-lunch is now known as matsubara [18:30] moin [18:34] rockstar, bug #634280 [18:34] <_mup_> Bug #634280: Branch of another project associated with series [18:34] cody-somerville, thanks [18:35] rockstar, Is it possible to edit the URL for svn imports yet or does that still require a LOSA? [18:35] cody-somerville, are you trying to edit the cscvs one? [18:35] no, new one [18:35] cody-somerville, link? [18:35] rockstar, https://code.edge.launchpad.net/~vcs-imports/django/trunk (I renamed the old import to old-trunk) [18:36] rockstar, It looks like maybe the trailing slash is problematic on the URL [18:36] cody-somerville, looks like the old branch needs to be deleted... [18:36] rockstar, why? [18:37] cody-somerville, the urls need to be unique. [18:37] You can't have two imports with the same URL? [18:37] cody-somerville, nope. [18:37] I just pointed the old-trunk to django/trunkDISABLED [18:38] in launchpad, an account has a identity url and a person has a an account and a name, does that mean that multiple people can share the same account? [18:38] no [18:39] lifeless: so why doesn't he person contain the identity url instead? [18:42] rockstar, it looks like it failed again... :/ [18:42] cr3: account is there for shipit [18:42] from a long time ago, nothing to do with openid [18:43] it has grown and changed since then, but we're trying to delete the account table [18:43] jelmer, see https://code.edge.launchpad.net/~vcs-imports/django/trunk - Does that make any sense? [18:43] deryck[lunch]: when you return [18:43] have a look at this: [18:43] https://lp-oops.canonical.com/oops.py/?oopsid=OOPS-1713L718 [18:44] lifeless: awesome, makes sense. I was wondering when logging in with openid how it might demultiplex the various person objects in the event there was a one to many relationship [18:45] 'do not permit one' :P [18:49] gary, how many packages are in your sourcecode/? [18:49] leonardr: 20 [18:50] gary: i've got 17, but i used to have 56. the difference between you and me probalby reflects the continuing trend of moving things out of sourcecode [18:51] i ask because i was still getting an error, but i think 'make' might fix it [18:51] gary_poster: hi, I'd also like you to eyeball https://lp-oops.canonical.com/oops.py/?oopsid=OOPS-1713L718 [18:51] yes, yay. on call leonardr, but http://pastebin.ubuntu.com/491137/ fwiw [18:51] gary_poster: don't interrupt your call; just look when you have a chance. [18:51] ack thanks lifeless [18:52] we are making -huge- numbers of memcache calls - and they appear to be capable of blocking for 20ms [19:13] rockstar: it works locally; it's hard to say much about it without the error from bzr-svn on lp [19:13] jelmer, do we need better error logging? Is that what you're saying? [19:14] rockstar: it would be nice to have the bzr log output as well [19:15] jelmer, okay. [19:15] rockstar: Actually [19:15] this looks like a launchpad-code bug [19:15] the exception is being raised from the part of the code that fetches the existing import branch [19:15] jelmer, I suspected it might be. [19:16] my initial thought was that it was failing to open the svn branch using bzr-svn === Ursinha-lunch is now known as Ursinha === deryck[lunch] is now known as deryck [19:22] hi lifeless. Looking at the OOPS report now.... [19:23] deryck: so basically we're spending - FWICT - about 2 seconds in memcache on that page [19:23] deryck: that combined with the low hit rate we're seeing makes me strongly suspect that turning off memcache in the template/tales will take 2 seconds off that page. [19:24] we need to do some improvements to the oops aggregation facilities now that we're putting more data in them [19:25] (I had to stare rather hard at the page to get a good sense of whats going on) [19:26] yeah, taking me a bit to process it too... [19:26] so interestingly [19:26] comments=all [19:26] is in the memcache key [19:27] yeah, I was noticing that. Seems like stupid caching in the first place. [19:27] thats an issue : it means the short and long versions of the messages won't share cache keys. [19:27] but perhaps on some pages that matters. Filing a bug on -foundations now vis-a-vis that [19:28] yeah, I'm trying to look at the template. We didn't add this caching, so I need to remind myself what it's doing. [19:29] deryck: https://bugs.edge.launchpad.net/launchpad-foundations/+bug/634326 [19:29] <_mup_> Bug #634326: memcache cache keys interact poorly with query parameters [19:30] deryck: Please understand, when I talk about turning memcache off for things, its not because I don't like it : its because I want the best performance for any given page. [19:30] lifeless, I do understand that. I'm not against turning off bits of it or poor uses *at all* :-) I was just against ripping it our wholesale. [19:31] k [19:31] though it's more accurate to say I'm *for* learning to use it correctly and having the ability to use it correctly. [19:31] right === almaisan-away is now known as al-maisan [19:31] I have the sense though, that we haven't finished putting our house in order in the underlying layers yet :) [19:32] I completely agree [19:32] or have the same sense rather [19:32] :) [19:32] which is okay if we rapidly iterate on it. [19:32] but as is, it has some issues. [19:33] lifeless, so I don't know this spelling, I guess: "cache:authenticated,comment/rendered_cache_time,comment/index" is that part of your logging work? [19:33] hmm, for clarity by underlying layers I meant - db use; storm; tal rendering; python scheduling [19:33] my logging work puts this in [19:33] ah, ok. agree on that, too, though I thought you meant something different [19:33] category=memcache-get [19:33] details=$url where $url is the memcached url we're requesting [19:34] so pt:lpnet:lp/bugs/templates/bugcomment-box.pt,9760:l:1,0:MjE=,https_//bugs.launchpad.net/ubuntu/+bug/1/+index?comments=all4N2mYcTmyKyEC6n0gBJBFG [19:34] <_mup_> Bug #1: Microsoft has a majority market share < [19:34] is the url in memcache [19:34] the cache: stuff I didn't touch, thats the existing control-memcache-in-templates [19:35] hmmm, so I thought it was just "cache:type,howlong" so I need to refer to updated docs. [19:36] anyhow [19:36] if we're caching per url in the webapp thats going to go a long way to explaining the terrible hit rate. [19:36] many objects appear in many urls. [19:36] e.g person, branding, bug comments (per distrotask) etc [19:37] hmmm [19:38] create-question appears to send email via a different codepath or something. darn. [19:38] no, thats not it. I wonder [19:39] is email spooling perhaps deferred to the end of the request? [19:39] gary_poster: ^ [19:39] https://lp-oops.canonical.com/oops.py/?oopsid=OOPS-1713N810 [19:39] lifeless, so about this memcache question, are you asking me about ripping this out and seeing if we save 2 seconds? [19:39] deryck: I'm speculating; I have a lot of data gathering to do now. [19:40] deryck: but bugtask is one of your pain-pages [19:40] and turning that off would be a very easy thing to do [19:40] lifeless, yeah, that was going to be my concern is that we don't know the savings (if any) vs the 2 sec. cost. [19:40] but I'm open to turn it off and see what oops appear. [19:40] I think it would be worth trying [19:41] even with the overhead of doing a CP to get it on prod, it will be a pretty cheap experiment [19:41] sure, I'm open to that. With a close eye on it. I do however wish we could feature flag it, if that works now. So if it was a bust, we could turn it on/off easy. [19:41] I didn't put this in, so I don't know the problems it was trying to prevent. [19:42] or the knowledge that underpinned the choice. [19:42] feature flags do work, I think you'd need to repeat the contained section though [19:43] which is a bit ugly [19:43] yeah [19:43] if we had a pageid /scope/ we could disable memcache for a pageid via a feature flag [19:43] that should be reasonably easy to do, for someone familiar with the page id logic. [19:43] I'll file a bug requesting that, and note we'd like to CP it to prod [19:43] sure, sounds good. [19:44] leave it New please, so I'll triage it and add it to the board. [19:44] forces me to add a card when I have to toggle it to triaged. [19:44] unless this is a foundations bug ;) [19:45] its a foundations arena bug, but you might find it easy enough to JDI - sec while I file it [19:47] hmm, is there a known issue with the code scanner in lp right now? [19:47] deryck: https://bugs.edge.launchpad.net/launchpad-foundations/+bug/634342 [19:47] <_mup_> Bug #634342: need a features 'scope' for page ids [19:48] like it seems to not be scanning only some branches, and sending proposal e-mails without diffs for them [19:50] lifeless, got it. I'll decide if I can JFDI that later today or not. which goes a bit against JFDI, I know. [19:50] deryck: :P [20:19] deryck: how did abel go last night [20:20] lifeless, he was worn out from all this, so worked on an easy bug. :-) But glad that we had it fixed for retracers. [20:21] deryck: I'm glad too [20:25] deryck: also on performance - https://lp-oops.canonical.com/oops.py/?oopsid=OOPS-1713S81 [20:25] I grabbed that for bug 1 on staging yesterday, with profile. [20:25] <_mup_> Bug #1: Microsoft has a majority market share < [20:25] I'm going to put it in the [20:25] lifeless, today is call day, but wading through backlog. yes, email spooling is deferred to the end of the request. [20:26] gary_poster: ahha [20:26] gary_poster: can you point me at the thing that does that, so I can instrument it ? [20:26] end of transaction to be precise [20:26] gary_poster: really? after request finalisation ? [20:26] yes [20:26] can we change that? [20:27] bad idea IMO. the idea is that if a transaction retries, we don't want to send multiple emails. [20:27] to be on-part-of-something like that [20:27] ok [20:27] uhm, let me describe some symptoms [20:27] k [20:27] https://lp-oops.canonical.com/oops.py/?oopsid=OOPS-1713N810 [20:27] 32 second request [20:28] need to go in a few; and then will be back on calls, btw [20:28] looking [20:28] it sends 115 emails [20:28] 2pc would solve this but be a bit nasty. [20:28] doing the email spooling from a worker thread would solve it and not distort the request time [20:29] we're suposed to have the infrastructure to be doing this already. standard zope bit... [20:29] so, anyhow, 260ms per email is one useful thing to know [20:29] that seems slow. [20:29] it's supposed to put in worker thread (in* the 2pc [20:29] *in* [20:30] could I perhaps move the bug to foundations ? [20:31] I guess...it's either something for foundations or something that foundations would be happy to advise on [20:31] blue sky, does pgsql let us ask 'will this commit succeed' ? [20:31] its bug 438116, I'll update the details now we have some instrumentation. [20:31] <_mup_> Bug #438116: Timeout when converting bug into question (BugTask:+create-question) [20:32] lifeless, I don't yet see 260ms per email on that OOPS. give me a hint as to where I should be looking? [20:32] gary_poster: look for sending emails [20:33] or 'send' perhaps [20:33] anyhow, 115 rows (by subtraction) [20:33] e.g. 352. 3617 0ms sendmail [Question #124730]: No vdpau-va-driver for amd64 in Maverick [20:33] <_mup_> Bug #124730: Feisty - Sound stops after a few seconds and various apps fail to work properly [20:33] and at the end [20:33] the request does its session update @ 32 seconds in [20:33] and it finished its sql work, which for this code is ~= to finishing completely, ~6 seconds in [20:34] 26000/115 [20:34] ah ok [20:34] approximations-R-us-ly-yrs [20:36] sure. :-) from users perspective, the request should have been sent back by 6 seconds, but agreed anyway that sending email ought to be done elsewhere. as I said, the expected pattern is that you would spool in the commit phase. well, since that's not happening, then maybe user didn't get the reply till 6 seconds in either. :-/ [20:36] didn't get the reply 6 seconds in I meant [20:36] I need to go get boys from school [20:36] then I have calls [20:37] I will look at how we are sending emails [20:37] if you happen to know the code path that would be nice [20:37] thanks, no panic - this like other timeouts is just kanban backlog [20:37] (that is, where in LP the email is sent) [20:38] right, cool [20:38] I will dig up some of that for you [20:38] thank you [20:38] biab [21:45] i guess this is the more active channel now? [21:55] this is the development channel [21:56] matsubara: ping [21:56] EdwinGrubbs, on the phone [21:56] will ping you once I'm done [21:56] ok [22:53] morning [22:59] lifeless: need to go, bug re bug 633664, when you talk about double-buffering, you mean at the load balancer and in the app server, correct? [22:59] <_mup_> Bug #633664: API file attachments are done via the appservers [22:59] s/bug re bug/but re bug/ [22:59] gary_poster: the load balancer isn't buffering [22:59] apache? [22:59] gary_poster: the double buffering is in the appserver [23:00] buffers in asyncore, then? [23:00] gary_poster: from what you're telling me, yes. [23:00] right but, I mean, where else? [23:00] gary_poster: there are some explicit things I know, and some things I don't. [23:01] gary_poster: we're calling addFile in the appserver, and we have a facility to retry the request if the db conflicts -> that implies a buffer [23:01] (this isn't to ignore your other concerns, but I understand them) [23:01] but why a second buffer? [23:01] ah [23:01] I think I've added confusion [23:02] what I mean is 'we're not sending it directly to where it belongs, but the design is intended to support big blobs by doing that' [23:02] ah! [23:02] got it [23:02] fair enough, understood [23:02] thanks [23:02] must run :-) [23:02] wallyworld_: morning [23:02] happy to talk another day [23:02] ciao [23:03] cool, but I understand now and agree that what we have is a problem [23:03] thumper: happy birthday [23:03] thumper: happy birthday :-) [23:03] gary_poster: cool, thanks. [23:03] thanks === matsubara is now known as matsubara-afk === Ursinha is now known as Ursinha-afk === al-maisan is now known as almaisan-away [23:53] thumper: hi [23:53] thumper: did you see the wiki page maxb added with failing bzr-svn imports summary? [23:53] jelmer: hi [23:53] no [23:53] I'm busy chasing a critical branch scanner fubar [23:53] due to bug heat [23:53] I won't bother you then :-) [23:53] jelmer: good that chicken finally imported :) [23:54] I'm sure peter is happy [23:54] jelmer: i saw a lot of code imports marked failed overnight, do you know what that's all about? [23:55] mwhudson: I retried all of the ones I thought *might* be fixed and some still failed, but a lot are now actually working [23:55] jelmer: oh ok [23:55] that's good [23:56] mwhudson, we're down to less than 100 failures in the bzr-svn/bzr-git imports, 59 of which are caused by lack of support for nested trees [23:57] jelmer: \o/ [23:59] Are nested trees actually happening at some point? [23:59] yes [23:59] Well, more than they were two years ago? :P