/srv/irclogs.ubuntu.com/2012/08/10/#launchpad-dev.txt

wgrantStevenK:   File "/srv/buildbot/slaves/launchpad/lucid-devel/build/orig_sourcecode/eggs/auditorfixture-0.0.3-py2.6.egg/auditorfixture/server.py", line 100, in _start00:00
wgrant    raise Exception("Timeout waiting for auditor to start.")00:00
wgrantException: Timeout waiting for auditor to start.00:01
StevenKwgrant: I don't think the new testbrowser does send a referer.00:02
StevenKwgrant: Hmmmmm. What? That's passed a few other buildbot runs.00:03
wgrantStevenK: Yes. It's a nice new intermittent failure.00:03
StevenKBleh00:03
StevenKFirst one, I think00:03
StevenKwgrant: It looks like most of the test failures will be sorted by fixing NoReferrerError.00:04
wgrantStevenK: There are surely more than 21 tests that use testbrowser to submit forms, so there must be something special about yours.00:07
StevenKwgrant: http://pastebin.ubuntu.com/1138745/ is the diff. 17481 tests run in 4:24:53.047311, 21 failures, 2 errors00:09
wgrantStevenK: Sure, but what's special about the tests that failed?00:09
wgrantStevenK: How goes the QA?00:35
StevenKwgrant: You offered a project I could push to00:42
StevenKwgrant: All of the failing tests are doctests.00:43
StevenKSo maybe the problem is in the browser objects we toss into the doctests00:44
wgrantStevenK: But there are hundreds of doctests that work.00:51
wgrantStevenK: You now have APG for python-oops-tools/private, and branches default to private00:53
wgrant(using a sharing policy, not BVPs)00:53
wgrantBah, actually, that won't work00:54
wgrantNeed to use a BVP00:54
wgrantsec00:54
wgrantThere00:54
StevenKwgrant: So, there are 675 doctests. Only 11 use getControl(..).click()00:55
wgrantStevenK: No00:56
wgrant$ bzr grep -l 'getControl.*click()' | wc -l00:56
wgrant33100:56
wgrant$ bzr grep -l 'getControl.*click()' | grep txt$ | wc -l00:56
wgrant31100:56
StevenKfor i in $(find . -name '*.txt' | grep -E '(doc|tests)') ; do grep -l '.click()' $i; done | wc -l00:57
StevenK1100:57
wgrantpagetests live in stories00:57
wgrantnot doc or tests00:57
StevenKAh00:57
StevenKI thought I might be missing one00:58
StevenKwgrant:  bzr push lp://qastaging/~stevenk/python-oops-tools/foo-100:58
StevenKIt's been too long since I had to push a branch to qas.00:58
wgrantHm, I can't see it...00:59
wallyworld_sinzui: sadly there is no native css support for multi-line text truncation but i have found a really neat little yui module which works perfectly and does all the internal calcs to simply allow a requested number of lines to be specified. it also falls back to native support if only one line is required.01:00
StevenKSigh, pasted it but didn't hit enter01:00
sinzuiwallyworld_: go on?01:01
StevenKwgrant: Which gives me an error, so there has to be something wrong with that URL01:01
wgrantStevenK: What's the error? That /+branch-id/foo is not a branch?01:01
wallyworld_sinzui: i'd like to use it. the js then is Y.all('.ellipsis').ellipsis({'lines': 2})01:01
StevenKbzr: ERROR: Server sent an unexpected error: ('error', 'NotBranchError', 'Not a branch: "chroot-67089488:///+branch-id/519259/".')01:02
wgrantStevenK: Right, that's fine. The stacked-on branch doesn't exist on qas.01:02
wallyworld_sinzui: for example01:02
wgrantStevenK: The branch is created now. You can test.01:02
wallyworld_sinzui: i haven't looked, but the tooltip could also display all the text as well i think01:03
wallyworld_sinzui:  so it seems like a nice solution for a few hundred lines of 3rd part code, much like we do for sottable01:03
sinzuiwallyworld_: since you are adding it to things that use the title editor, I don't see a reason why you wouldn't hesitate01:03
wallyworld_sinzui: extra code, so thought i would check01:04
sinzuiwallyworld_: add it I can take a look if you wnt01:05
sinzuiwant01:05
StevenKwgrant: Okay, unsubscribed and no redirect.01:05
StevenKwgrant: Remove the BVP?01:05
wgrantStevenK: You mean APG?01:05
wallyworld_sinzui: ok, will tidy up the prototype and add it properly, then update the mp. no rush, since it won't be deployed till next week anyway01:05
StevenKwgrant: Er, yeah.01:06
wgrantYou still want it to be private, don't you?01:06
wgrantRight01:06
wgrantGone01:06
StevenKwgrant: Right, so now we create foo-2 which should have me with an AAG?01:06
wgrantStevenK: Or I subscribe you to foo-1 again01:07
wgrantThat's probably better01:07
wgrantStevenK: You are subscribed01:07
StevenKToo late, I pushed foo-2, but I get Forbidden on foo-101:07
* StevenK refreshes01:07
wgrantAnd +sharing confirms you have access01:07
StevenKOkay, unsubscribing01:07
StevenKwgrant: Redirected to https://code.qastaging.launchpad.net/python-oops-tools01:08
wgrantExcellent01:08
* StevenK marks as qa-ok01:08
* wgrant deploys01:08
StevenKwgrant: Oh, with the notification too, so it's excellent01:08
StevenKwgrant: Sigh, it looks like the poll doctests make use of POST01:17
StevenKwgrant: I bet that auditor failure was the usual omg-port-is-in-use-panic-and-catch-fire failure01:32
wgrantStevenK: If you haven't worked out what's broken, throw me a list of errors and I'll try to find out01:40
StevenKwgrant: I've fixed a few.01:43
wgrantStevenK: Any that were clicking submit?01:44
wgrantOr are those still proving troublesome?01:44
StevenKwgrant: Nope, I've dropped that for now.01:44
StevenKwgrant: I can scp the subunit stream to lillypilly if you wish01:44
wgrantStevenK: testr failing | utilities/paste01:44
StevenKwgrant: http://pastebin.ubuntu.com/1138836/01:45
wgrantStevenK: A lot of them seem to be posting manually01:48
wgrantAnother one calls goBack just before the failure, which is possibly relevant01:48
StevenKwgrant: Yeah, I've started on xx-productseries.txt converting it to browser.post01:50
wgrantIt looks like the actual change must be in zope.app.testing, rather than mechanize or testbrowser01:51
wgrantI think01:51
wgrantSince http() uses zope.app.testing's HTTPCaller directly.01:52
wgrantIn fact01:52
wgrantI think all the browser-based failures are immediately after a goBack01:53
lifelessanyone up for a review - https://code.launchpad.net/~lifeless/python-oops-amqp/misc/+merge/119076 ?02:48
* sinzui awards wgrant a gold ★02:50
wgrantsinzui: What have I done?02:50
wgrantlifeless: Looking02:50
sinzuimade bugs public02:50
wgrantsinzui: Ah, yeah02:50
wgrantsinzui: I also removed ~launchpad-security's subscriptions to about 400 bugs02:51
wgrantThat were public02:51
wgrantWe can hopefully do away with the team soon.02:51
sinzuiI just made qastaging laucnhapd/+sharing look more like I expect production to be.02:52
wgrantGreat.02:52
sinzuiWe have a lot of bots that I left in place.02:52
wgrantI'm trying to clean things up so that we can use sharing as we intend sharing to be used :)02:52
sinzuiI left many teams in place to keeps subscriptions, then I looked at the bugs, saw they were closed, so I unsubscribed a lot of teams02:52
wgrantlifeless: Do you deliberately depend on both bsons?03:00
lifelesswgrant: james_w is doing a migration to the 'real' one across everything, incrementally.03:01
lifelesswgrant: so this is just dealing with that more or less03:01
lifelesswgrant: I've ushed up the fixes james_w asked for03:01
wgrantlifeless: Looks good, then.03:02
wgrantThanks.03:02
* StevenK stabs these tests03:38
wgrantStevenK: 'sup?03:44
StevenKwgrant: My switch from POST to browser.post() is not going well.03:45
wgrantAh03:45
StevenKbrowser.post() is also triggering NoReferrerError03:46
wgrantIs there a referer?03:46
StevenKbrowser.open() ... browser.post() should set one?03:47
StevenKOr is my understanding of zope.testbrowser bonkers?03:47
wgrantI'm not sure that post uses the current context03:48
wgrantIt's probably similar to open in that respect03:48
StevenKI'm not sure that sprinkling 'browser.addHeader('Referrer: ...')' into the test is the right behaviour either03:49
wgrantThere's hopefully a prettier way to do that.03:50
StevenKValueError: line 123 of the docstring for xx-productseries.txt lacks blank after ...: '  ...Support E4X in EcmaScript...'04:10
StevenKBleh04:10
StevenKWhat the heck does 'lacks blank' even mean, anyway. :-(04:13
wgrantStevenK: It thinks it's a continuation of the previous statement04:13
StevenK... how04:15
wgrantLike that, yes.04:15
StevenK lib/lp/blueprints/stories/blueprints/xx-productseries.txt04:27
StevenK  Ran 1 tests with 0 failures and 0 errors in 4.433 seconds.04:27
StevenKHowever, that was by adding .addHeader('Referer', ...)04:28
lifelesswgrant: stub is curious about the sso link removal project status04:56
wgrantlifeless: I have an SSO branch. It works. It has no tests, and IIRC it doesn't handle failure very well, and due to SSO's view structure it makes several XML-RPC requests04:57
wgrantSo the whole thing needs a lot of refactoring04:57
wgrantSome of which landed three months after I proposed the branch04:57
wgrantBut the LP side works fine, and fundamentally the SSO side is fairly easily doable.04:57
lifelessok so some moderate work to do04:57
wgrantYes04:57
wgrantAnd I think elmo will cry less if LP is never down :)04:58
stubIs the LP side a separate service or just the appserver?05:13
stubNow I think of it, if we tear out the slony code from the appserver then I think it will happily respond to read only requests when the master is down, because it doesn't need the master to calculate lag.05:15
wgrantstub: xmlrpc-private is always master-only at present, but indeed05:16
stubMight need a little polish, like ignoring the last write timestamp in the cookie, and no master only mode if lag > 2 minutes05:16
wgrantstub: Well05:16
wgrantMaybe05:16
wgrantSlave-capable things should use the slave if possible. If the slave is lagging too much, it should fall back to the master.05:17
wgrantIf the master is unavailable, does it want to fail the request, or use the lagging slave?05:17
stubUse the lagging slave05:17
wgrantRight.05:17
wgrantThat's my suspicion.05:17
wgrantIt means we need to tweak things to only request master if they really need it05:18
stubWe are interested in using up to date master. If the master is down, the lagged slave is still by definition the most up to date data we have05:18
wgrantMost XML-RPC requests are probably OK with a slave05:18
stubc/using up to date master/using up to date data/05:18
wgrantActually05:18
wgrantHm05:18
wgrantIt might be similar to the API05:18
wgrantWhere we want to use the master if at all possible05:18
wgrantBecause why not be consistent and up to date05:19
stubWell, that is a bug05:19
wgrantSo we want really up to date data, but we don't want to fail if we can avoid it05:19
wgrantSo MasterPleaseIfAtAllPossibleDatabasePolicy :)05:19
stubBy the time you receive your data, it might be out of date. We can never guarantee consistency, even from the master05:19
wgrantTrue.05:19
wgrantAnd I guess slave lag should be pretty minimal nowadays.05:19
wgrant"Nowadays" being since Monday.05:19
spmif the master is down, will the lag result actually show as lagged? I assume yes, but...05:19
stubSo instead we just give data that is 'recent enough', which can come from a slave.05:19
wgrantspm: Yes05:20
wgrantspm: The "lag" is the age of the last WAL replayed from the master.05:20
wgrantstub: Right.05:20
stubThe cutoff we care about is 'is the lag greater than my last write', which means we need a session identifier.05:20
spmcool. just conscious that the process on the master is what does the laggy updates, aiui05:20
wgrantspm: In the old world, yeah05:20
spmoh this is new shiny? nm then.05:21
wgrantspm: (although the old stuff also wouldn't break in this case: it stored lag plus a last update time, IIRC)05:21
stubspm: Its designed that the master isn't particularly aware of the slaves, who they are or what they are doing.05:21
spmright05:21
wgrantSo if the master goes away, the last update time lags, and clients can notice05:21
wgrantstub: So I think there's a place for a MasterPlease policy, which is used for eg. recently-POSTed web sessions and xmlrpc-private and all API sessions (until we have a reliable session identifier for API clients), which uses the master unless it's disappeared.05:23
wgrantReal write requests would still use the classical Master policy05:23
wgrantSo would fail during fdt.05:23
stubyeah, sounds about right.05:24
StevenKMasterIfPossibleDatabasePolicy ?05:24
StevenKMasterWithFallbackDatabasePolicy perhaps05:24
stubWe might be able to just tweak the existing LP policy.05:26
stubIf the master is unavailable but asked for, give out a slave. If the POST or XML-RPC request or whatever attempts to UPDATE, it will fail with a read only violation. Put some lipstick on that, make it a 503 status code and we might be good.05:27
lifelessstub: btw, do we set the feedback setting on the slaves?05:28
stublifeless: yes, I didn't want to mess with behaviour too much just yet05:28
stublifeless: it is probably how we will keep it too.05:29
lifelesshot_standby_feedback is the one I mean; defaults off but looks like we may want it on05:30
stubit is on for us, yes05:42
lifelesscool cool05:56
adeuringgood morning07:53
=== mrevell_ is now known as mrevell
evis there anyone I need to notify if I'm going to do a large number of lplib API calls in a test?10:02
wgrantev: How many is 'large', what sort of calls, and can you do it on (qa)staging instead/first?10:13
evwgrant: apols, I just realized that was hopelessly vague. So I have 81,455 crashes. I'm going to get the package for each and all of the relevant dependency packages. I then need to call getPublishedBinaries for each of those (but I'll cache calls on a key of package + series)10:14
evand yes, I can do it on staging first. Is that woefully slow by comparsion?10:14
evcomparison*10:14
cjwatsonIs this a one-off or in a frequently-run test suite?10:15
wgrantI forget whether getPublishedBinaries is terrible or not10:15
wgrantIt should be reasonably fast even on staging if it doesn't do any stupid substring matching10:15
evcjwatson: it will be run daily, but for right this moment it's just a one off to get some basic data10:15
evwgrant: I have exact_match set, though I do realize there could be substring matches elsewhere10:15
wgrantI think that should be most of it10:16
wgrantSo, I'd try it on staging first.10:16
wgrantIt should be fairly quick once it's warmed up10:16
evexcellent10:16
wgrantAlthough10:16
evwgrant: should I notify webops as well?10:16
wgrantAt 82k crashes10:16
cjwatsongetPublishedBinaries - which publication statuses?10:16
wgrantPresumably there's lots of deps?10:16
wgrantFor each?10:16
wgrantAh, but if you cache...10:16
lifelessev: no need to notify webops if you are doing these alls serially.10:17
lifelessev: its a less than 1% increase in traffic.10:17
evlifeless: it is serially, and hi :)10:17
lifelessev: if you're doing it in parallel, thats another matter :)10:17
lifelessev: oh hi :)10:17
cjwatsonIf it's just "Published", it would be better to just get the relevant Packages files from a mirror and parse locally ...10:17
wgrantIt's not going to be a problem, but we can probably make it faster :)10:17
wgrantYeah10:17
wgrantThat's the thing10:17
lifelessev: btw, I did get you commit access to lp:python-oopsrepository right?10:17
wgrantI don't see why you don't just use the normal indices10:17
cjwatsonIf you're trying to get historical publication information for some reason, that would be different10:18
evlifeless: yes, I've been terrible and having merged back yet. Will do today.10:18
cjwatsonLike when they were superseded or something10:18
lifelessev: we've got webops, u1, ca and LP all using one python-oops-tools system now10:19
lifelessev: so the interest in migrating to a cassandra backend is growing.10:19
evlifeless: though I'll throw it up as a MP first, just so people have a chance to tell me no before I merge it in10:19
evlifeless: excellent!10:19
evlifeless: yeah, I've had brief conversations with james_w about it, and you I believe :)10:19
evcjwatson: historical information. It's about creating an "ideal" crash line10:20
evthat is, crashes where every package in the dependency chain that apport lists was up to date at the time of the crash10:20
evcjwatson: I've forwarded you a mail I sent to lifeless explaining the basic idea10:21
evthe code will be something akin to this http://paste.ubuntu.com/1139323/  (at least for the test)10:22
cjwatsonCan you use created_since_date10:23
cjwatson?10:23
evsince we now have to calculate the unique users seen in the past 90 day period for the denominator, and that's not a calculation that can be done quickly, the whole thing will be calculated once a day for the day that's passed10:23
cjwatsonConsider ordered=False too, since you don't appear to need ordered results10:24
ev(with the "actual" line being total crashes divided by unique users in 90 days and the "ideal" line being total crashes that were on up to date systems divided by unique users in 90 days)10:24
evcjwatson: created_since_date doesn't work as far as I can tell for the reason mentioned in the code comment. But maybe I'm wrong?10:24
evcjwatson: ordered=False> excellent, will do10:25
wgrantev: Would you be better served by maintaining a full set of when each (name, version, arch) first appeared?10:25
wgrantRather than querying most of Ubuntu's history every day10:25
cjwatsonYeah, surely there's some kind of inter-run caching possible here10:26
evwgrant: so cache the package name, version, and arch tuple into cassandra?10:26
cjwatsonIt's not like binary_package_version or date_published on past publications are going to change10:26
wgrantRight10:26
wgrantThe history won't change10:26
lifelessev: is that 81K distinct crash signatures?10:26
lifelessev: or 81K reports ?10:26
evyeah, sure10:26
cjwatsondate_superseded might of course, but you aren't looking at that10:26
wgrantYou can easily keep a local copy of the relevant bits of history10:26
evlifeless: 81K reports for a day period10:26
evwhich seems about average10:27
lifelessso few10:27
wgrantAnd use created_since_date to just bring in all the new records every $interval10:27
cjwatsonIt might even be worth doing one getPublishedBinaries call with created_since_date for the whole interval, rather than one per binary name?10:27
evcjwatson: right, just when it was published10:27
wgrantcjwatson: Exactly.10:27
wgrantYou keep a local database of (name, version, arch, date_published/date_created), then every $interval ask for all the new publications since the last time you asked - a bit10:28
cjwatsonIt's per-series so the initial setup will have some giant returned result set, but only back <six months10:28
evwhere interval is the daily run of this code to generate the totals for the ideal line for the day past, right?10:29
mptlifeless, once this publishing history discussion ^ is sorted, we have a fun question about calculation of that "ideal" line10:29
wgrantev: Well, it doesn't really have to be this code10:29
wgrantev: The update process is separate.10:29
lifelessmpt: cool10:29
wgrantev: You can rapidly query your local cache of the relevant info whenever.10:30
lifelessFWIW I don't care whether ev caches the data or not.10:30
lifelessdatastores are data stores.10:30
evwgrant: okay, but still iterating over the same data set, right? My point is that it's not building a cache for packages it's not going to care about. Just ones for the oopses and their dependencies that we've seen day by day10:30
evlifeless: you'd argue for talking directly to LP without a cache?10:31
lifelessLP can trivially handle the load; the current API may be inefficient, but its got no intrinsic reason to be so.10:31
wgrantlifeless: Datastores are datastores, but the LP API is about as inefficient as it gets.10:31
lifelessev: I would start with the simplest thing possible.10:31
wgrantCache locally => hundreds of times faster10:31
lifelessev: and add complexity only when I had to.10:31
wgrantev: How many packages is that? The closure of dependencies could be fairly large.10:32
evwgrant: I can calculate an approximation based off a days run10:32
lifelessev: e.g. start by just talking to LP; then the next step either make the LP API faster (often easy, lots of unoptimised stuff) or add a local store.10:32
evI'll just add that to the set of things to count in this sample10:33
evlifeless: okay10:33
lifelesswgrant and cjwatson may be entirely correct that doing it via LP will be terrible (and the hidden HTTP requests launchpadlib does are likely to prove them right :P)10:33
lifelessbut its still better to deliver something soon and then iterate IMNSHO10:34
evabsolutely10:34
cjwatsonI wouldn't be making these suggestions if I thought they were hard to implement :)10:35
cjwatsonFWIW10:35
lifelesscjwatson: sure, and I don't think they are necessarily wrong.10:35
cjwatsonas in, it's what I'd do and I expect writing the code for it would be quicker than waiting for the initial "easy" but slow version to complete10:36
lifelessI'm just rather aware about the political side of getting this data as soon as possible, due to that u-r thread of doom10:36
cjwatsonso I think in this case the "easy" version is a false economy10:36
lifelesscjwatson: I'm not suggesting using launchpadlib directly because its easier, but because it has less moving parts.10:37
cjwatsoneven so10:37
evI suspect I'm going to lose the thread of doom. There's no way I can get the changes to apport for a single pair of dialogs done by the end of the day. Well, I can probably have the code done, but then there's getting pitti to magically appear and review it, and it's quite deep.10:37
lifelessFWIW my initial suggestion to ev was a dedicated API to do the heavy lifting in LP.10:37
evindeed, I wasn't expecting to get to such optimization just yet as this conversation started off discussing an initial test10:39
evso, mpt. Maths10:39
evgah, NOT A PLURAL WORD10:39
mptRoad works10:39
lifelessMathematics10:39
ev:)10:40
mptlifeless, so. The graph aims to show the average number of crashes per calendar day. (Making it per 24 hours of uptime, to eliminate the spike during weekends, is a problem we've tabled for now.)10:40
mptlifeless, to do that we take the number of errors reported each day, and divides it by an estimate of the number of machines from which errors would be reported if they happened.10:41
mpt(Now I'm the one adding extra Ses.)10:41
mptAs an estimate of "the number of machines from which errors would be reported", we use "the number of machines that reported at least one error any time in the 90 days up to that day".10:42
mptThat slightly under-counts because of machines that were active but lucky enough not to have any errors. And it slightly over-counts because of machines that were destroyed or had Ubuntu removed from them during that 90-day period.10:43
mptHopefully the under-count and over-count cancel each other out.10:43
mptAnyway.10:43
lifelessuh,10:43
lifelessit massively undercounts10:43
lifelessbut thats a different point10:44
mptok, why does it massively undercount?10:44
lifelessyou want 'size of the population of machines with error reporting turned on and users that don't always hit no'10:44
mpt"users that would usually hit yes", but yes.10:45
lifelessyou are getting '90 sliding observation of [machines with error reporting turned on and users that don't always hit no] that encountered 1 or more errors and reported them'10:45
lifelessmpt: how often does the error reporting message come up for you10:45
mptlifeless, about three or four times a week.10:46
lifelessmpt: so for me it comes up -maybe- once a month. I think twice since precise released.10:46
mptlifeless, if it turns out that the average is anywhere close to 1/90, then we'll need to increase the 90-day period to more than that.10:46
lifelessmpt: the underreport is due to all the machines that don't encounter errors at all10:46
lifelessand you can't tell how big the under report is because the sample you have is only from reporting machines.10:47
mptSo? So is the numerator.10:47
lifelessI mean, machines that are biased to report at a frequency of 1 in 90 days or great.10:47
lifeless*greater*10:47
lifelessmpt: I don't follow how that matters10:48
lifelessmpt: you said "As an estimate of "the number of machines from which errors would be reported"10:48
mptyes10:48
lifelessmpt: I'm saying that its seems likely to me that your estimate is very low. We can test this theory.10:48
lifelessev: whats the current unique reporting machine count for the last 90 days ?10:49
mptWe're assuming the number of errors/day is a unimodal distribution (probably Poisson), and that there aren't a lot of machines that have zero errors in a 90-day period10:49
evlifeless: I don't have that yet. The query was taking more than 12 hours to back-populate so I need to come up with a quicker approach.10:50
evbut10:50
mpt(where "aren't a lot" = "are fewer than 1.1%"10:50
mpt)10:50
evoh, nevermind. I thought I had a quick way to get the unique machines for all releases for the past 90 days10:51
lifelessmpt: For a single individual, the distribution should be poisson, unless the way they use their machine influences crash rates, in which case it won't be.10:51
evbut it's not as quick as I thought10:51
lifelessmpt: this is a distraction; we can investigate whether we have a underestimate or not separately.10:53
mptlifeless, I think if we are massively under-counting, then the next rollout of the graph will probably show an average of close to 0.01 errors/day, because it was dominated by machines that reported only one error in that 90-day period.10:53
mptAnyway, distraction, yes.10:54
mptFor the ideal line, we want to show the effect of people installing updates or not.10:54
lifelessmpt: that doesn't necessarily follow. We know our estimate should match some fraction of the precise userbase, where that fraction is the number of users that leave the tick box on and click continue.10:54
mptNot how quickly they are, but how much their promptness/tardiness affects Ubuntu's reliability in the wild.10:55
lifelessmpt: we can independently estimate that fraction, multiple by the separate estimate of precise desktop users, and compare to the errors.ubuntu.com estimate.10:55
lifelessmpt: if they different substantially, one or more of the estimators is wrong.10:55
ev(so I can quickly get the number of unique users that have ever report crashes, so something just over 120 days, and that's 1,975,010)10:55
lifelessmpt: 81K*90 = 7.2M, which is too low I believe.10:56
lifelessev: great.10:56
lifelessThat means we're massively underestimating :)10:56
mptlifeless, it's much less than 81K*90, because many of those machines are the same in multiple days10:56
lifelessmpt: sure, I used 81K*90 as an upper bound10:57
lifelessmpt: because if it was still too low, there is no way that any answer ev gave could be higher.10:57
mptsure10:57
lifelessok, so ideal line.10:57
lifelessso you want to show the number of crashes per day that would be saved if users updated ?10:58
mptIf we calculate it right, the "ideal" line will be like a smooth + lagged version of the "actual" line.10:58
mptwait, no, other way around.10:58
mptThe "actual" line will be like a smooth + lagged version of the "ideal" line.10:58
mptIf we issue a fix for an error that's causing 50% of the errors reported, the "ideal" line will drop down to half its previous level immediately, and the "actual" line will drift down slowly to meet it.10:59
mptConversely, if something goes wrong and we issue a really crashy update, the (now-misnamed) "ideal" line will spike up, and the "actual" line will drift up to meet it.11:00
lifelessSure11:00
lifelesss/ideal/projected/11:00
lifelesspotential11:00
lifelesspossible11:00
mptsomething like that.11:00
mpt"If all updates were installed", we call it in the page currently11:01
mptjust for clarity :-)11:01
mptNow, for this we do the same kind of division as before11:01
lifelessyeah11:01
mptThe numerator is the number of error reports on that day, for which that package and all its dependencies were up to date11:02
mptBut we're not sure what the denominator should be.11:02
mptI thought that it should be the same denominator as the "actual" line, the estimate of all machines that would typically report errors if they encountered them.11:02
mptThat passes the sanity test that if every Ubuntu machine was perfectly up to date, the "actual" and "potential" lines would be exactly the same.11:03
lifelessuhm11:03
lifelesswhy calculate it from scratch ?11:03
lifelessI mean, the way you expressed it: 'issue a fix for an error that's causing 50% of the errors reported, the "ideal" line will drop down to half its previous level'11:04
mptIf you mean the denominator, I'm not suggesting calculating it from scratch11:04
mptoh11:04
lifelesswhen everything is up to date / there are no fixes available then ideal == actual11:05
mptBecause errors are not evenly distributed over updates. For example, there are a bunch of machines out there (we don't know how many) that install only security updates, not other updates, and security updates may be more or less likely than average to fix reportable errors.11:06
mpts/over updates/over updated packages/11:06
jmlany buildout folks around?11:06
lifelessjml: passingly familiar11:08
lifelessmpt: ideally all machines install all updates right?11:09
mptlifeless, anyway, I *think* (though I'm not sure) that using the total number of 90-day-active machines may cause the "projected" line to be too low. The slower people install updates, the lower the number of errors will be from up-to-date packages, but that doesn't mean those up-to-date packages are more reliable.11:09
mptlifeless, ideally, yes.11:10
jmlwe have a Python project that comes with executable files. buildout (and pretty much anything that installs from eggs) loses the executable bit. afaict, this is due to a bug in stdlib zipfile where the external_attr part of the ZipInfo is ignored on extraction.11:10
mptlifeless, but remember, we're not trying to measure the proportion of machines that are all up to date, we're trying to measure how much reliability is affected by packages being out of date.11:11
lifelessok, so lets talk reliability estimators11:11
mptIf a package is way out of date but doesn't generate any error reports, that's fine as far as this graph is concerned.11:11
jmlmy immediate plan is to carry a temporary fork of distribute that corrects extract_zipfile in setuptools.archive_utils to chmod after extraction.11:11
lifelessjml: executable files in the package? or scripts that should be executable after install11:11
jmllifeless: the first one.11:12
lifelessjml: thats frowned upon. lintian will whinge, for instance.11:12
lifelessjml: why do you want that ?11:12
jmllifeless: I am not changing it.11:13
lifelessjml: the question is too abstract.11:13
jmllifeless: but, since you asked, it's because pkgme's interface to backends is by spawning sub-processes11:13
jmla backend is just a couple of executables11:13
jmlif those backends happen to be written in Python, then distributing them is very tedious, thanks to this bug in zipfile11:14
lifelessjml: I don't think its a bug.11:14
jmllifeless: tarfile preserves permissions11:14
cjwatsonlifeless: lintian> that rather depends on where the files in question are installed, and whether they include #!11:15
jmllifeless: why should zipfile not?11:15
lifelessjml: the interface for running things from a python package is python -m foo.bar11:15
lifelessjml: not /usr/lib/pythonx.y/dist-packages/foo/bar.py11:15
jmllifeless: it's a script11:15
lifelessjml: if its a script, the installation of it should be putting it in the right bin directory, updating the interpreter and making it executable for you.11:15
lifelesscjwatson: 'in a python package' is well defined, and #! in those files also warns IIRC.11:16
cjwatsonOh, you meant Python package, right, you just said package :)11:16
jmllifeless: pkgme works by searching a list of paths that contain backends and then running the executables it finds there.11:17
cjwatsonThough I'm not sure I believe your IIRC without proof, as I don't immediately see evidence of that in lintian11:17
lifelesscjwatson: I may well be horribly mistaken11:22
lifelessjml: so, with buildout, that won't work unless those scripts have no dependencies that buildout is supplying.11:24
lifelessjml: you need to run them via bin/py <path to python file> or bin/py -m <python module path>11:24
lifelessjml: otherwise you will get the system interpreter path.11:25
jmllifeless: yes, this is true of virtualenv too11:25
lifelessjml: To me, this makes the issue you are actually facing irrelevant.11:25
lifelessjml: Have I missed something ?11:25
jmlwe have some work-around for that atm11:26
jmlwhich I'm having trouble locating11:28
jmlah yes. it's hideous and won't work with buildout.11:29
lifelessso, we can talk about the mode bg11:30
jmlbut I know something worse that will11:30
lifelessbut I don't think it will help you will it ?11:30
lifelessjml: its late, I need to go. I'd be happy to design something simple that will work for you, just not now.11:33
jmllifeless: ok, thanks.11:33
jamtrying to go to: https://launchpad.net/projects/+review-licenses is timing out for me. It worked for a bit yesterday (until I reviewed a bunch of projects and then reloaded)11:53
jamI submitted a bug, is there much else to do (I'm trying to take care of the review queue, etc)11:54
wgrantjam: It's working for me. Tried refreshing a couple of times?11:56
wgrantMy superpowers may be causing permission checks to be skipped though11:56
rick_h_yea, 6 loads here all timeouts11:57
rick_h_timeout backs to the is_valid_person ValidPersonCache.get and all storm from there on out11:59
StevenKSounds like preloading is in order then11:59
jamrick_h_: yeah, my OOPS shows 1.7s run in a single query, though running it on staging completes in 17ms...12:00
wgrantjam: OOPS ID?12:01
jamwgrant: https://oops.canonical.com/oops/?oopsid=OOPS-3abca09f555663402bbd26a37805e0a012:02
wgrantAh12:03
wgrantI bet it's the private team privacy adapter12:03
wgrantBut sooooo many queries12:03
wgrantIndeed12:04
wgrantThe insane private team privacy rules12:04
wgrantjam: Can you see it now?12:05
wgrantI've removed the only obvious private team from the listing12:06
jamwgrant: timeout again12:06
jamwgrant: new oops: https://oops.canonical.com/oops/?oopsid=OOPS-5c67f643d64f10768d671842a80680e012:06
wgrantAh, there's another one12:06
wgrantTry now?12:06
rick_h_loads now12:07
wgrantYeah12:07
wgrantThere were two private teams on Canonical projects12:07
jamyay12:07
jamthough there still seems to be death-by-thousand-cuts on that page12:07
rick_h_good to know, shold we get a hard timeout exception for the review page for now?12:07
rick_h_so it's usable for maint?12:07
jamIf you look at the new oops, it has one query repeated 56 times.12:07
wgrantjam: Yeah, fortunately they're pretty small cuts in this case.12:07
wgrantrick_h_: Maybe. But AIUI czajkowski is back next week, so it's not maintenance's responsibility any more12:08
wgrantAnd she is immune to these issues12:08
rick_h_ah, wasn't aware she was immune12:08
wgrantThis is useful ammunition in my war against lifeless' overcomplicated private team visibility rules :)12:09
jamwgrant: weird, I see the helensburgh project in there, even though it succeeded in loading... :)12:10
wgrantjam: It's driven by a private team, not owned by one12:10
wgrantThat page only shows owners12:10
wgrant(well, and registrantsa)12:11
rick_h_huwshimi: ping, do we have a login for the web balsmiq?12:18
rick_h_huwshimi: or did you just get the air version running?12:18
huwshimirick_h_: I think so, I'll forward the details.12:19
rick_h_huwshimi: ty12:19
cjwatsonwgrant: Care to re-review https://code.launchpad.net/~cjwatson/launchpad/archive-getallpermissions/+merge/117606 ?  I'm becoming good friends with StormStatementRecorder.12:21
wgrantcjwatson: Looks good, thanks.12:23
wgrantcjwatson: (although you might want to rename allPermissions to getPermissionsForArchive or permissionsForArchive or something)12:24
cjwatsonMm, yeah.  The almost-but-not-quite-the-same names between Archive and ArchivePermission are a tad confusing.12:25
wgrantYeah12:25
wgrantArchivePermissionSet's are wrong12:25
wgrantBecause method names are meant to be verbs12:26
wgrantBut consistency might be best there for now12:26
deryckrick_h_, hey, how about at 15 after?  in roughly 5 minutes, for our call?15:09
rick_h_sure thing15:10
lifelesswgrant: unoptimisde thing is slow isn't a surprise18:44
Ergo^evening19:46

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!