/srv/irclogs.ubuntu.com/2012/08/10/#launchpad-dev.txt

wgrant	StevenK: File "/srv/buildbot/slaves/launchpad/lucid-devel/build/orig_sourcecode/eggs/auditorfixture-0.0.3-py2.6.egg/auditorfixture/server.py", line 100, in _start	00:00
wgrant	raise Exception("Timeout waiting for auditor to start.")	00:00
wgrant	Exception: Timeout waiting for auditor to start.	00:01
StevenK	wgrant: I don't think the new testbrowser does send a referer.	00:02
StevenK	wgrant: Hmmmmm. What? That's passed a few other buildbot runs.	00:03
wgrant	StevenK: Yes. It's a nice new intermittent failure.	00:03
StevenK	Bleh	00:03
StevenK	First one, I think	00:03
StevenK	wgrant: It looks like most of the test failures will be sorted by fixing NoReferrerError.	00:04
wgrant	StevenK: There are surely more than 21 tests that use testbrowser to submit forms, so there must be something special about yours.	00:07
StevenK	wgrant: http://pastebin.ubuntu.com/1138745/ is the diff. 17481 tests run in 4:24:53.047311, 21 failures, 2 errors	00:09
wgrant	StevenK: Sure, but what's special about the tests that failed?	00:09
wgrant	StevenK: How goes the QA?	00:35
StevenK	wgrant: You offered a project I could push to	00:42
StevenK	wgrant: All of the failing tests are doctests.	00:43
StevenK	So maybe the problem is in the browser objects we toss into the doctests	00:44
wgrant	StevenK: But there are hundreds of doctests that work.	00:51
wgrant	StevenK: You now have APG for python-oops-tools/private, and branches default to private	00:53
wgrant	(using a sharing policy, not BVPs)	00:53
wgrant	Bah, actually, that won't work	00:54
wgrant	Need to use a BVP	00:54
wgrant	sec	00:54
wgrant	There	00:54
StevenK	wgrant: So, there are 675 doctests. Only 11 use getControl(..).click()	00:55
wgrant	StevenK: No	00:56
wgrant	$ bzr grep -l 'getControl.*click()' \| wc -l	00:56
wgrant	331	00:56
wgrant	$ bzr grep -l 'getControl.*click()' \| grep txt$ \| wc -l	00:56
wgrant	311	00:56
StevenK	for i in $(find . -name '*.txt' \| grep -E '(doc\|tests)') ; do grep -l '.click()' $i; done \| wc -l	00:57
StevenK	11	00:57
wgrant	pagetests live in stories	00:57
wgrant	not doc or tests	00:57
StevenK	Ah	00:57
StevenK	I thought I might be missing one	00:58
StevenK	wgrant: bzr push lp://qastaging/~stevenk/python-oops-tools/foo-1	00:58
StevenK	It's been too long since I had to push a branch to qas.	00:58
wgrant	Hm, I can't see it...	00:59
wallyworld_	sinzui: sadly there is no native css support for multi-line text truncation but i have found a really neat little yui module which works perfectly and does all the internal calcs to simply allow a requested number of lines to be specified. it also falls back to native support if only one line is required.	01:00
StevenK	Sigh, pasted it but didn't hit enter	01:00
sinzui	wallyworld_: go on?	01:01
StevenK	wgrant: Which gives me an error, so there has to be something wrong with that URL	01:01
wgrant	StevenK: What's the error? That /+branch-id/foo is not a branch?	01:01
wallyworld_	sinzui: i'd like to use it. the js then is Y.all('.ellipsis').ellipsis({'lines': 2})	01:01
StevenK	bzr: ERROR: Server sent an unexpected error: ('error', 'NotBranchError', 'Not a branch: "chroot-67089488:///+branch-id/519259/".')	01:02
wgrant	StevenK: Right, that's fine. The stacked-on branch doesn't exist on qas.	01:02
wallyworld_	sinzui: for example	01:02
wgrant	StevenK: The branch is created now. You can test.	01:02
wallyworld_	sinzui: i haven't looked, but the tooltip could also display all the text as well i think	01:03
wallyworld_	sinzui: so it seems like a nice solution for a few hundred lines of 3rd part code, much like we do for sottable	01:03
sinzui	wallyworld_: since you are adding it to things that use the title editor, I don't see a reason why you wouldn't hesitate	01:03
wallyworld_	sinzui: extra code, so thought i would check	01:04
sinzui	wallyworld_: add it I can take a look if you wnt	01:05
sinzui	want	01:05
StevenK	wgrant: Okay, unsubscribed and no redirect.	01:05
StevenK	wgrant: Remove the BVP?	01:05
wgrant	StevenK: You mean APG?	01:05
wallyworld_	sinzui: ok, will tidy up the prototype and add it properly, then update the mp. no rush, since it won't be deployed till next week anyway	01:05
StevenK	wgrant: Er, yeah.	01:06
wgrant	You still want it to be private, don't you?	01:06
wgrant	Right	01:06
wgrant	Gone	01:06
StevenK	wgrant: Right, so now we create foo-2 which should have me with an AAG?	01:06
wgrant	StevenK: Or I subscribe you to foo-1 again	01:07
wgrant	That's probably better	01:07
wgrant	StevenK: You are subscribed	01:07
StevenK	Too late, I pushed foo-2, but I get Forbidden on foo-1	01:07
* StevenK refreshes		01:07
wgrant	And +sharing confirms you have access	01:07
StevenK	Okay, unsubscribing	01:07
StevenK	wgrant: Redirected to https://code.qastaging.launchpad.net/python-oops-tools	01:08
wgrant	Excellent	01:08
* StevenK marks as qa-ok		01:08
* wgrant deploys		01:08
StevenK	wgrant: Oh, with the notification too, so it's excellent	01:08
StevenK	wgrant: Sigh, it looks like the poll doctests make use of POST	01:17
StevenK	wgrant: I bet that auditor failure was the usual omg-port-is-in-use-panic-and-catch-fire failure	01:32
wgrant	StevenK: If you haven't worked out what's broken, throw me a list of errors and I'll try to find out	01:40
StevenK	wgrant: I've fixed a few.	01:43
wgrant	StevenK: Any that were clicking submit?	01:44
wgrant	Or are those still proving troublesome?	01:44
StevenK	wgrant: Nope, I've dropped that for now.	01:44
StevenK	wgrant: I can scp the subunit stream to lillypilly if you wish	01:44
wgrant	StevenK: testr failing \| utilities/paste	01:44
StevenK	wgrant: http://pastebin.ubuntu.com/1138836/	01:45
wgrant	StevenK: A lot of them seem to be posting manually	01:48
wgrant	Another one calls goBack just before the failure, which is possibly relevant	01:48
StevenK	wgrant: Yeah, I've started on xx-productseries.txt converting it to browser.post	01:50
wgrant	It looks like the actual change must be in zope.app.testing, rather than mechanize or testbrowser	01:51
wgrant	I think	01:51
wgrant	Since http() uses zope.app.testing's HTTPCaller directly.	01:52
wgrant	In fact	01:52
wgrant	I think all the browser-based failures are immediately after a goBack	01:53
lifeless	anyone up for a review - https://code.launchpad.net/~lifeless/python-oops-amqp/misc/+merge/119076 ?	02:48
* sinzui awards wgrant a gold ★		02:50
wgrant	sinzui: What have I done?	02:50
wgrant	lifeless: Looking	02:50
sinzui	made bugs public	02:50
wgrant	sinzui: Ah, yeah	02:50
wgrant	sinzui: I also removed ~launchpad-security's subscriptions to about 400 bugs	02:51
wgrant	That were public	02:51
wgrant	We can hopefully do away with the team soon.	02:51
sinzui	I just made qastaging laucnhapd/+sharing look more like I expect production to be.	02:52
wgrant	Great.	02:52
sinzui	We have a lot of bots that I left in place.	02:52
wgrant	I'm trying to clean things up so that we can use sharing as we intend sharing to be used :)	02:52
sinzui	I left many teams in place to keeps subscriptions, then I looked at the bugs, saw they were closed, so I unsubscribed a lot of teams	02:52
wgrant	lifeless: Do you deliberately depend on both bsons?	03:00
lifeless	wgrant: james_w is doing a migration to the 'real' one across everything, incrementally.	03:01
lifeless	wgrant: so this is just dealing with that more or less	03:01
lifeless	wgrant: I've ushed up the fixes james_w asked for	03:01
wgrant	lifeless: Looks good, then.	03:02
wgrant	Thanks.	03:02
* StevenK stabs these tests		03:38
wgrant	StevenK: 'sup?	03:44
StevenK	wgrant: My switch from POST to browser.post() is not going well.	03:45
wgrant	Ah	03:45
StevenK	browser.post() is also triggering NoReferrerError	03:46
wgrant	Is there a referer?	03:46
StevenK	browser.open() ... browser.post() should set one?	03:47
StevenK	Or is my understanding of zope.testbrowser bonkers?	03:47
wgrant	I'm not sure that post uses the current context	03:48
wgrant	It's probably similar to open in that respect	03:48
StevenK	I'm not sure that sprinkling 'browser.addHeader('Referrer: ...')' into the test is the right behaviour either	03:49
wgrant	There's hopefully a prettier way to do that.	03:50
StevenK	ValueError: line 123 of the docstring for xx-productseries.txt lacks blank after ...: ' ...Support E4X in EcmaScript...'	04:10
StevenK	Bleh	04:10
StevenK	What the heck does 'lacks blank' even mean, anyway. :-(	04:13
wgrant	StevenK: It thinks it's a continuation of the previous statement	04:13
StevenK	... how	04:15
wgrant	Like that, yes.	04:15
StevenK	lib/lp/blueprints/stories/blueprints/xx-productseries.txt	04:27
StevenK	Ran 1 tests with 0 failures and 0 errors in 4.433 seconds.	04:27
StevenK	However, that was by adding .addHeader('Referer', ...)	04:28
lifeless	wgrant: stub is curious about the sso link removal project status	04:56
wgrant	lifeless: I have an SSO branch. It works. It has no tests, and IIRC it doesn't handle failure very well, and due to SSO's view structure it makes several XML-RPC requests	04:57
wgrant	So the whole thing needs a lot of refactoring	04:57
wgrant	Some of which landed three months after I proposed the branch	04:57
wgrant	But the LP side works fine, and fundamentally the SSO side is fairly easily doable.	04:57
lifeless	ok so some moderate work to do	04:57
wgrant	Yes	04:57
wgrant	And I think elmo will cry less if LP is never down :)	04:58
stub	Is the LP side a separate service or just the appserver?	05:13
stub	Now I think of it, if we tear out the slony code from the appserver then I think it will happily respond to read only requests when the master is down, because it doesn't need the master to calculate lag.	05:15
wgrant	stub: xmlrpc-private is always master-only at present, but indeed	05:16
stub	Might need a little polish, like ignoring the last write timestamp in the cookie, and no master only mode if lag > 2 minutes	05:16
wgrant	stub: Well	05:16
wgrant	Maybe	05:16
wgrant	Slave-capable things should use the slave if possible. If the slave is lagging too much, it should fall back to the master.	05:17
wgrant	If the master is unavailable, does it want to fail the request, or use the lagging slave?	05:17
stub	Use the lagging slave	05:17
wgrant	Right.	05:17
wgrant	That's my suspicion.	05:17
wgrant	It means we need to tweak things to only request master if they really need it	05:18
stub	We are interested in using up to date master. If the master is down, the lagged slave is still by definition the most up to date data we have	05:18
wgrant	Most XML-RPC requests are probably OK with a slave	05:18
stub	c/using up to date master/using up to date data/	05:18
wgrant	Actually	05:18
wgrant	Hm	05:18
wgrant	It might be similar to the API	05:18
wgrant	Where we want to use the master if at all possible	05:18
wgrant	Because why not be consistent and up to date	05:19
stub	Well, that is a bug	05:19
wgrant	So we want really up to date data, but we don't want to fail if we can avoid it	05:19
wgrant	So MasterPleaseIfAtAllPossibleDatabasePolicy :)	05:19
stub	By the time you receive your data, it might be out of date. We can never guarantee consistency, even from the master	05:19
wgrant	True.	05:19
wgrant	And I guess slave lag should be pretty minimal nowadays.	05:19
wgrant	"Nowadays" being since Monday.	05:19
spm	if the master is down, will the lag result actually show as lagged? I assume yes, but...	05:19
stub	So instead we just give data that is 'recent enough', which can come from a slave.	05:19
wgrant	spm: Yes	05:20
wgrant	spm: The "lag" is the age of the last WAL replayed from the master.	05:20
wgrant	stub: Right.	05:20
stub	The cutoff we care about is 'is the lag greater than my last write', which means we need a session identifier.	05:20
spm	cool. just conscious that the process on the master is what does the laggy updates, aiui	05:20
wgrant	spm: In the old world, yeah	05:20
spm	oh this is new shiny? nm then.	05:21
wgrant	spm: (although the old stuff also wouldn't break in this case: it stored lag plus a last update time, IIRC)	05:21
stub	spm: Its designed that the master isn't particularly aware of the slaves, who they are or what they are doing.	05:21
spm	right	05:21
wgrant	So if the master goes away, the last update time lags, and clients can notice	05:21
wgrant	stub: So I think there's a place for a MasterPlease policy, which is used for eg. recently-POSTed web sessions and xmlrpc-private and all API sessions (until we have a reliable session identifier for API clients), which uses the master unless it's disappeared.	05:23
wgrant	Real write requests would still use the classical Master policy	05:23
wgrant	So would fail during fdt.	05:23
stub	yeah, sounds about right.	05:24
StevenK	MasterIfPossibleDatabasePolicy ?	05:24
StevenK	MasterWithFallbackDatabasePolicy perhaps	05:24
stub	We might be able to just tweak the existing LP policy.	05:26
stub	If the master is unavailable but asked for, give out a slave. If the POST or XML-RPC request or whatever attempts to UPDATE, it will fail with a read only violation. Put some lipstick on that, make it a 503 status code and we might be good.	05:27
lifeless	stub: btw, do we set the feedback setting on the slaves?	05:28
stub	lifeless: yes, I didn't want to mess with behaviour too much just yet	05:28
stub	lifeless: it is probably how we will keep it too.	05:29
lifeless	hot_standby_feedback is the one I mean; defaults off but looks like we may want it on	05:30
stub	it is on for us, yes	05:42
lifeless	cool cool	05:56
adeuring	good morning	07:53
=== mrevell_ is now known as mrevell
ev	is there anyone I need to notify if I'm going to do a large number of lplib API calls in a test?	10:02
wgrant	ev: How many is 'large', what sort of calls, and can you do it on (qa)staging instead/first?	10:13
ev	wgrant: apols, I just realized that was hopelessly vague. So I have 81,455 crashes. I'm going to get the package for each and all of the relevant dependency packages. I then need to call getPublishedBinaries for each of those (but I'll cache calls on a key of package + series)	10:14
ev	and yes, I can do it on staging first. Is that woefully slow by comparsion?	10:14
ev	comparison*	10:14
cjwatson	Is this a one-off or in a frequently-run test suite?	10:15
wgrant	I forget whether getPublishedBinaries is terrible or not	10:15
wgrant	It should be reasonably fast even on staging if it doesn't do any stupid substring matching	10:15
ev	cjwatson: it will be run daily, but for right this moment it's just a one off to get some basic data	10:15
ev	wgrant: I have exact_match set, though I do realize there could be substring matches elsewhere	10:15
wgrant	I think that should be most of it	10:16
wgrant	So, I'd try it on staging first.	10:16
wgrant	It should be fairly quick once it's warmed up	10:16
ev	excellent	10:16
wgrant	Although	10:16
ev	wgrant: should I notify webops as well?	10:16
wgrant	At 82k crashes	10:16
cjwatson	getPublishedBinaries - which publication statuses?	10:16
wgrant	Presumably there's lots of deps?	10:16
wgrant	For each?	10:16
wgrant	Ah, but if you cache...	10:16
lifeless	ev: no need to notify webops if you are doing these alls serially.	10:17
lifeless	ev: its a less than 1% increase in traffic.	10:17
ev	lifeless: it is serially, and hi :)	10:17
lifeless	ev: if you're doing it in parallel, thats another matter :)	10:17
lifeless	ev: oh hi :)	10:17
cjwatson	If it's just "Published", it would be better to just get the relevant Packages files from a mirror and parse locally ...	10:17
wgrant	It's not going to be a problem, but we can probably make it faster :)	10:17
wgrant	Yeah	10:17
wgrant	That's the thing	10:17
lifeless	ev: btw, I did get you commit access to lp:python-oopsrepository right?	10:17
wgrant	I don't see why you don't just use the normal indices	10:17
cjwatson	If you're trying to get historical publication information for some reason, that would be different	10:18
ev	lifeless: yes, I've been terrible and having merged back yet. Will do today.	10:18
cjwatson	Like when they were superseded or something	10:18
lifeless	ev: we've got webops, u1, ca and LP all using one python-oops-tools system now	10:19
lifeless	ev: so the interest in migrating to a cassandra backend is growing.	10:19
ev	lifeless: though I'll throw it up as a MP first, just so people have a chance to tell me no before I merge it in	10:19
ev	lifeless: excellent!	10:19
ev	lifeless: yeah, I've had brief conversations with james_w about it, and you I believe :)	10:19
ev	cjwatson: historical information. It's about creating an "ideal" crash line	10:20
ev	that is, crashes where every package in the dependency chain that apport lists was up to date at the time of the crash	10:20
ev	cjwatson: I've forwarded you a mail I sent to lifeless explaining the basic idea	10:21
ev	the code will be something akin to this http://paste.ubuntu.com/1139323/ (at least for the test)	10:22
cjwatson	Can you use created_since_date	10:23
cjwatson	?	10:23
ev	since we now have to calculate the unique users seen in the past 90 day period for the denominator, and that's not a calculation that can be done quickly, the whole thing will be calculated once a day for the day that's passed	10:23
cjwatson	Consider ordered=False too, since you don't appear to need ordered results	10:24
ev	(with the "actual" line being total crashes divided by unique users in 90 days and the "ideal" line being total crashes that were on up to date systems divided by unique users in 90 days)	10:24
ev	cjwatson: created_since_date doesn't work as far as I can tell for the reason mentioned in the code comment. But maybe I'm wrong?	10:24
ev	cjwatson: ordered=False> excellent, will do	10:25
wgrant	ev: Would you be better served by maintaining a full set of when each (name, version, arch) first appeared?	10:25
wgrant	Rather than querying most of Ubuntu's history every day	10:25
cjwatson	Yeah, surely there's some kind of inter-run caching possible here	10:26
ev	wgrant: so cache the package name, version, and arch tuple into cassandra?	10:26
cjwatson	It's not like binary_package_version or date_published on past publications are going to change	10:26
wgrant	Right	10:26
wgrant	The history won't change	10:26
lifeless	ev: is that 81K distinct crash signatures?	10:26
lifeless	ev: or 81K reports ?	10:26
ev	yeah, sure	10:26
cjwatson	date_superseded might of course, but you aren't looking at that	10:26
wgrant	You can easily keep a local copy of the relevant bits of history	10:26
ev	lifeless: 81K reports for a day period	10:26
ev	which seems about average	10:27
lifeless	so few	10:27
wgrant	And use created_since_date to just bring in all the new records every $interval	10:27
cjwatson	It might even be worth doing one getPublishedBinaries call with created_since_date for the whole interval, rather than one per binary name?	10:27
ev	cjwatson: right, just when it was published	10:27
wgrant	cjwatson: Exactly.	10:27
wgrant	You keep a local database of (name, version, arch, date_published/date_created), then every $interval ask for all the new publications since the last time you asked - a bit	10:28
cjwatson	It's per-series so the initial setup will have some giant returned result set, but only back <six months	10:28
ev	where interval is the daily run of this code to generate the totals for the ideal line for the day past, right?	10:29
mpt	lifeless, once this publishing history discussion ^ is sorted, we have a fun question about calculation of that "ideal" line	10:29
wgrant	ev: Well, it doesn't really have to be this code	10:29
wgrant	ev: The update process is separate.	10:29
lifeless	mpt: cool	10:29
wgrant	ev: You can rapidly query your local cache of the relevant info whenever.	10:30
lifeless	FWIW I don't care whether ev caches the data or not.	10:30
lifeless	datastores are data stores.	10:30
ev	wgrant: okay, but still iterating over the same data set, right? My point is that it's not building a cache for packages it's not going to care about. Just ones for the oopses and their dependencies that we've seen day by day	10:30
ev	lifeless: you'd argue for talking directly to LP without a cache?	10:31
lifeless	LP can trivially handle the load; the current API may be inefficient, but its got no intrinsic reason to be so.	10:31
wgrant	lifeless: Datastores are datastores, but the LP API is about as inefficient as it gets.	10:31
lifeless	ev: I would start with the simplest thing possible.	10:31
wgrant	Cache locally => hundreds of times faster	10:31
lifeless	ev: and add complexity only when I had to.	10:31
wgrant	ev: How many packages is that? The closure of dependencies could be fairly large.	10:32
ev	wgrant: I can calculate an approximation based off a days run	10:32
lifeless	ev: e.g. start by just talking to LP; then the next step either make the LP API faster (often easy, lots of unoptimised stuff) or add a local store.	10:32
ev	I'll just add that to the set of things to count in this sample	10:33
ev	lifeless: okay	10:33
lifeless	wgrant and cjwatson may be entirely correct that doing it via LP will be terrible (and the hidden HTTP requests launchpadlib does are likely to prove them right :P)	10:33
lifeless	but its still better to deliver something soon and then iterate IMNSHO	10:34
ev	absolutely	10:34
cjwatson	I wouldn't be making these suggestions if I thought they were hard to implement :)	10:35
cjwatson	FWIW	10:35
lifeless	cjwatson: sure, and I don't think they are necessarily wrong.	10:35
cjwatson	as in, it's what I'd do and I expect writing the code for it would be quicker than waiting for the initial "easy" but slow version to complete	10:36
lifeless	I'm just rather aware about the political side of getting this data as soon as possible, due to that u-r thread of doom	10:36
cjwatson	so I think in this case the "easy" version is a false economy	10:36
lifeless	cjwatson: I'm not suggesting using launchpadlib directly because its easier, but because it has less moving parts.	10:37
cjwatson	even so	10:37
ev	I suspect I'm going to lose the thread of doom. There's no way I can get the changes to apport for a single pair of dialogs done by the end of the day. Well, I can probably have the code done, but then there's getting pitti to magically appear and review it, and it's quite deep.	10:37
lifeless	FWIW my initial suggestion to ev was a dedicated API to do the heavy lifting in LP.	10:37
ev	indeed, I wasn't expecting to get to such optimization just yet as this conversation started off discussing an initial test	10:39
ev	so, mpt. Maths	10:39
ev	gah, NOT A PLURAL WORD	10:39
mpt	Road works	10:39
lifeless	Mathematics	10:39
ev	:)	10:40
mpt	lifeless, so. The graph aims to show the average number of crashes per calendar day. (Making it per 24 hours of uptime, to eliminate the spike during weekends, is a problem we've tabled for now.)	10:40
mpt	lifeless, to do that we take the number of errors reported each day, and divides it by an estimate of the number of machines from which errors would be reported if they happened.	10:41
mpt	(Now I'm the one adding extra Ses.)	10:41
mpt	As an estimate of "the number of machines from which errors would be reported", we use "the number of machines that reported at least one error any time in the 90 days up to that day".	10:42
mpt	That slightly under-counts because of machines that were active but lucky enough not to have any errors. And it slightly over-counts because of machines that were destroyed or had Ubuntu removed from them during that 90-day period.	10:43
mpt	Hopefully the under-count and over-count cancel each other out.	10:43
mpt	Anyway.	10:43
lifeless	uh,	10:43
lifeless	it massively undercounts	10:43
lifeless	but thats a different point	10:44
mpt	ok, why does it massively undercount?	10:44
lifeless	you want 'size of the population of machines with error reporting turned on and users that don't always hit no'	10:44
mpt	"users that would usually hit yes", but yes.	10:45
lifeless	you are getting '90 sliding observation of [machines with error reporting turned on and users that don't always hit no] that encountered 1 or more errors and reported them'	10:45
lifeless	mpt: how often does the error reporting message come up for you	10:45
mpt	lifeless, about three or four times a week.	10:46
lifeless	mpt: so for me it comes up -maybe- once a month. I think twice since precise released.	10:46
mpt	lifeless, if it turns out that the average is anywhere close to 1/90, then we'll need to increase the 90-day period to more than that.	10:46
lifeless	mpt: the underreport is due to all the machines that don't encounter errors at all	10:46
lifeless	and you can't tell how big the under report is because the sample you have is only from reporting machines.	10:47
mpt	So? So is the numerator.	10:47
lifeless	I mean, machines that are biased to report at a frequency of 1 in 90 days or great.	10:47
lifeless	greater	10:47
lifeless	mpt: I don't follow how that matters	10:48
lifeless	mpt: you said "As an estimate of "the number of machines from which errors would be reported"	10:48
mpt	yes	10:48
lifeless	mpt: I'm saying that its seems likely to me that your estimate is very low. We can test this theory.	10:48
lifeless	ev: whats the current unique reporting machine count for the last 90 days ?	10:49
mpt	We're assuming the number of errors/day is a unimodal distribution (probably Poisson), and that there aren't a lot of machines that have zero errors in a 90-day period	10:49
ev	lifeless: I don't have that yet. The query was taking more than 12 hours to back-populate so I need to come up with a quicker approach.	10:50
ev	but	10:50
mpt	(where "aren't a lot" = "are fewer than 1.1%"	10:50
mpt	)	10:50
ev	oh, nevermind. I thought I had a quick way to get the unique machines for all releases for the past 90 days	10:51
lifeless	mpt: For a single individual, the distribution should be poisson, unless the way they use their machine influences crash rates, in which case it won't be.	10:51
ev	but it's not as quick as I thought	10:51
lifeless	mpt: this is a distraction; we can investigate whether we have a underestimate or not separately.	10:53
mpt	lifeless, I think if we are massively under-counting, then the next rollout of the graph will probably show an average of close to 0.01 errors/day, because it was dominated by machines that reported only one error in that 90-day period.	10:53
mpt	Anyway, distraction, yes.	10:54
mpt	For the ideal line, we want to show the effect of people installing updates or not.	10:54
lifeless	mpt: that doesn't necessarily follow. We know our estimate should match some fraction of the precise userbase, where that fraction is the number of users that leave the tick box on and click continue.	10:54
mpt	Not how quickly they are, but how much their promptness/tardiness affects Ubuntu's reliability in the wild.	10:55
lifeless	mpt: we can independently estimate that fraction, multiple by the separate estimate of precise desktop users, and compare to the errors.ubuntu.com estimate.	10:55
lifeless	mpt: if they different substantially, one or more of the estimators is wrong.	10:55
ev	(so I can quickly get the number of unique users that have ever report crashes, so something just over 120 days, and that's 1,975,010)	10:55
lifeless	mpt: 81K*90 = 7.2M, which is too low I believe.	10:56
lifeless	ev: great.	10:56
lifeless	That means we're massively underestimating :)	10:56
mpt	lifeless, it's much less than 81K*90, because many of those machines are the same in multiple days	10:56
lifeless	mpt: sure, I used 81K*90 as an upper bound	10:57
lifeless	mpt: because if it was still too low, there is no way that any answer ev gave could be higher.	10:57
mpt	sure	10:57
lifeless	ok, so ideal line.	10:57
lifeless	so you want to show the number of crashes per day that would be saved if users updated ?	10:58
mpt	If we calculate it right, the "ideal" line will be like a smooth + lagged version of the "actual" line.	10:58
mpt	wait, no, other way around.	10:58
mpt	The "actual" line will be like a smooth + lagged version of the "ideal" line.	10:58
mpt	If we issue a fix for an error that's causing 50% of the errors reported, the "ideal" line will drop down to half its previous level immediately, and the "actual" line will drift down slowly to meet it.	10:59
mpt	Conversely, if something goes wrong and we issue a really crashy update, the (now-misnamed) "ideal" line will spike up, and the "actual" line will drift up to meet it.	11:00
lifeless	Sure	11:00
lifeless	s/ideal/projected/	11:00
lifeless	potential	11:00
lifeless	possible	11:00
mpt	something like that.	11:00
mpt	"If all updates were installed", we call it in the page currently	11:01
mpt	just for clarity :-)	11:01
mpt	Now, for this we do the same kind of division as before	11:01
lifeless	yeah	11:01
mpt	The numerator is the number of error reports on that day, for which that package and all its dependencies were up to date	11:02
mpt	But we're not sure what the denominator should be.	11:02
mpt	I thought that it should be the same denominator as the "actual" line, the estimate of all machines that would typically report errors if they encountered them.	11:02
mpt	That passes the sanity test that if every Ubuntu machine was perfectly up to date, the "actual" and "potential" lines would be exactly the same.	11:03
lifeless	uhm	11:03
lifeless	why calculate it from scratch ?	11:03
lifeless	I mean, the way you expressed it: 'issue a fix for an error that's causing 50% of the errors reported, the "ideal" line will drop down to half its previous level'	11:04
mpt	If you mean the denominator, I'm not suggesting calculating it from scratch	11:04
mpt	oh	11:04
lifeless	when everything is up to date / there are no fixes available then ideal == actual	11:05
mpt	Because errors are not evenly distributed over updates. For example, there are a bunch of machines out there (we don't know how many) that install only security updates, not other updates, and security updates may be more or less likely than average to fix reportable errors.	11:06
mpt	s/over updates/over updated packages/	11:06
jml	any buildout folks around?	11:06
lifeless	jml: passingly familiar	11:08
lifeless	mpt: ideally all machines install all updates right?	11:09
mpt	lifeless, anyway, I think (though I'm not sure) that using the total number of 90-day-active machines may cause the "projected" line to be too low. The slower people install updates, the lower the number of errors will be from up-to-date packages, but that doesn't mean those up-to-date packages are more reliable.	11:09
mpt	lifeless, ideally, yes.	11:10
jml	we have a Python project that comes with executable files. buildout (and pretty much anything that installs from eggs) loses the executable bit. afaict, this is due to a bug in stdlib zipfile where the external_attr part of the ZipInfo is ignored on extraction.	11:10
mpt	lifeless, but remember, we're not trying to measure the proportion of machines that are all up to date, we're trying to measure how much reliability is affected by packages being out of date.	11:11
lifeless	ok, so lets talk reliability estimators	11:11
mpt	If a package is way out of date but doesn't generate any error reports, that's fine as far as this graph is concerned.	11:11
jml	my immediate plan is to carry a temporary fork of distribute that corrects extract_zipfile in setuptools.archive_utils to chmod after extraction.	11:11
lifeless	jml: executable files in the package? or scripts that should be executable after install	11:11
jml	lifeless: the first one.	11:12
lifeless	jml: thats frowned upon. lintian will whinge, for instance.	11:12
lifeless	jml: why do you want that ?	11:12
jml	lifeless: I am not changing it.	11:13
lifeless	jml: the question is too abstract.	11:13
jml	lifeless: but, since you asked, it's because pkgme's interface to backends is by spawning sub-processes	11:13
jml	a backend is just a couple of executables	11:13
jml	if those backends happen to be written in Python, then distributing them is very tedious, thanks to this bug in zipfile	11:14
lifeless	jml: I don't think its a bug.	11:14
jml	lifeless: tarfile preserves permissions	11:14
cjwatson	lifeless: lintian> that rather depends on where the files in question are installed, and whether they include #!	11:15
jml	lifeless: why should zipfile not?	11:15
lifeless	jml: the interface for running things from a python package is python -m foo.bar	11:15
lifeless	jml: not /usr/lib/pythonx.y/dist-packages/foo/bar.py	11:15
jml	lifeless: it's a script	11:15
lifeless	jml: if its a script, the installation of it should be putting it in the right bin directory, updating the interpreter and making it executable for you.	11:15
lifeless	cjwatson: 'in a python package' is well defined, and #! in those files also warns IIRC.	11:16
cjwatson	Oh, you meant Python package, right, you just said package :)	11:16
jml	lifeless: pkgme works by searching a list of paths that contain backends and then running the executables it finds there.	11:17
cjwatson	Though I'm not sure I believe your IIRC without proof, as I don't immediately see evidence of that in lintian	11:17
lifeless	cjwatson: I may well be horribly mistaken	11:22
lifeless	jml: so, with buildout, that won't work unless those scripts have no dependencies that buildout is supplying.	11:24
lifeless	jml: you need to run them via bin/py <path to python file> or bin/py -m <python module path>	11:24
lifeless	jml: otherwise you will get the system interpreter path.	11:25
jml	lifeless: yes, this is true of virtualenv too	11:25
lifeless	jml: To me, this makes the issue you are actually facing irrelevant.	11:25
lifeless	jml: Have I missed something ?	11:25
jml	we have some work-around for that atm	11:26
jml	which I'm having trouble locating	11:28
jml	ah yes. it's hideous and won't work with buildout.	11:29
lifeless	so, we can talk about the mode bg	11:30
jml	but I know something worse that will	11:30
lifeless	but I don't think it will help you will it ?	11:30
lifeless	jml: its late, I need to go. I'd be happy to design something simple that will work for you, just not now.	11:33
jml	lifeless: ok, thanks.	11:33
jam	trying to go to: https://launchpad.net/projects/+review-licenses is timing out for me. It worked for a bit yesterday (until I reviewed a bunch of projects and then reloaded)	11:53
jam	I submitted a bug, is there much else to do (I'm trying to take care of the review queue, etc)	11:54
wgrant	jam: It's working for me. Tried refreshing a couple of times?	11:56
wgrant	My superpowers may be causing permission checks to be skipped though	11:56
rick_h_	yea, 6 loads here all timeouts	11:57
rick_h_	timeout backs to the is_valid_person ValidPersonCache.get and all storm from there on out	11:59
StevenK	Sounds like preloading is in order then	11:59
jam	rick_h_: yeah, my OOPS shows 1.7s run in a single query, though running it on staging completes in 17ms...	12:00
wgrant	jam: OOPS ID?	12:01
jam	wgrant: https://oops.canonical.com/oops/?oopsid=OOPS-3abca09f555663402bbd26a37805e0a0	12:02
wgrant	Ah	12:03
wgrant	I bet it's the private team privacy adapter	12:03
wgrant	But sooooo many queries	12:03
wgrant	Indeed	12:04
wgrant	The insane private team privacy rules	12:04
wgrant	jam: Can you see it now?	12:05
wgrant	I've removed the only obvious private team from the listing	12:06
jam	wgrant: timeout again	12:06
jam	wgrant: new oops: https://oops.canonical.com/oops/?oopsid=OOPS-5c67f643d64f10768d671842a80680e0	12:06
wgrant	Ah, there's another one	12:06
wgrant	Try now?	12:06
rick_h_	loads now	12:07
wgrant	Yeah	12:07
wgrant	There were two private teams on Canonical projects	12:07
jam	yay	12:07
jam	though there still seems to be death-by-thousand-cuts on that page	12:07
rick_h_	good to know, shold we get a hard timeout exception for the review page for now?	12:07
rick_h_	so it's usable for maint?	12:07
jam	If you look at the new oops, it has one query repeated 56 times.	12:07
wgrant	jam: Yeah, fortunately they're pretty small cuts in this case.	12:07
wgrant	rick_h_: Maybe. But AIUI czajkowski is back next week, so it's not maintenance's responsibility any more	12:08
wgrant	And she is immune to these issues	12:08
rick_h_	ah, wasn't aware she was immune	12:08
wgrant	This is useful ammunition in my war against lifeless' overcomplicated private team visibility rules :)	12:09
jam	wgrant: weird, I see the helensburgh project in there, even though it succeeded in loading... :)	12:10
wgrant	jam: It's driven by a private team, not owned by one	12:10
wgrant	That page only shows owners	12:10
wgrant	(well, and registrantsa)	12:11
rick_h_	huwshimi: ping, do we have a login for the web balsmiq?	12:18
rick_h_	huwshimi: or did you just get the air version running?	12:18
huwshimi	rick_h_: I think so, I'll forward the details.	12:19
rick_h_	huwshimi: ty	12:19
cjwatson	wgrant: Care to re-review https://code.launchpad.net/~cjwatson/launchpad/archive-getallpermissions/+merge/117606 ? I'm becoming good friends with StormStatementRecorder.	12:21
wgrant	cjwatson: Looks good, thanks.	12:23
wgrant	cjwatson: (although you might want to rename allPermissions to getPermissionsForArchive or permissionsForArchive or something)	12:24
cjwatson	Mm, yeah. The almost-but-not-quite-the-same names between Archive and ArchivePermission are a tad confusing.	12:25
wgrant	Yeah	12:25
wgrant	ArchivePermissionSet's are wrong	12:25
wgrant	Because method names are meant to be verbs	12:26
wgrant	But consistency might be best there for now	12:26
deryck	rick_h_, hey, how about at 15 after? in roughly 5 minutes, for our call?	15:09
rick_h_	sure thing	15:10
lifeless	wgrant: unoptimisde thing is slow isn't a surprise	18:44
Ergo^	evening	19:46

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!