[05:20] <jono> hey
[05:21] <jono> I am getting this error:
[05:21] <jono> Using saved push location: http://bazaar.launchpad.net/~jonobacon/ubuntu-accomplishments-system/adminportal/
[05:21] <jono> bzr: ERROR: Cannot lock LockDir(http://bazaar.launchpad.net/~jonobacon/ubuntu-accomplishments-system/adminportal/.bzr/branch/lock): Transport operation not possible: http does not support mkdir()
[05:21] <jono> can anyone help me resolve it?
[05:22] <bob2> pretty sure you want to push via ssh instead
[05:22] <bob2> I think it'll be misconfigured due to you using an lp: url to clone without having use bzr launchpad-login (so lp: resolves to anonymouse http)
[05:23] <bob2> from my vague distant memories :)
[05:24] <jono> bob2, I just re-authed with launchpad-login and it doesnt seem to fix it
[05:24] <jono> I think the issue is that another user checked this branch out
[05:24] <jono> and I changed the owner/group on the code
[05:24] <bob2> possibly logging in will fix it, possibly you'll need to replace the htpt url in .bzr/checkout (maybe) by hand, or use 'bzr push --update-defaults-or-whatyever ssh://'
[05:24] <bob2> what does bzr info say the url is?
[05:25] <jono> Using saved push location: http://bazaar.launchpad.net/~jonobacon/ubuntu-accomplishments-system/adminportal/
[05:25] <bob2> yeah
[05:25] <jono> so I need to set it to default to ssh?
[05:25] <spiv> If you're doing bzr push, just use "bzr push --remember lp:~jonobacon/ubuntu-accomplishments-system/adminportal" this once to remember the right location
[05:26] <jono> cd ..
[05:26] <spiv> (now that you've done 'bzr lp-login, the lp: URL will resolve to the right location)
[05:26] <jono> thanks spiv
[05:26] <jono> that fixed it
[08:09] <mgz> morning
[10:09] <james_w> I'm stopping the importer for the rollout
[10:42] <james_w> backed up databases
[10:42] <james_w> cleaning some cruft from the dbs and vacuuming
[10:46] <james_w> starting importer
[10:52] <james_w> first successful import
[11:06] <jml> what's the reason for the named NOT NULL constraints in the table definitions in udd?
[11:06] <jml> e.g. signature TEXT CONSTRAINT nonull NOT NULL
[11:06] <jml> Why not "signature TEXT NOT NULL"?
[11:08] <james_w> don't know
[11:09] <james_w> I'd say one of "sqlite refused the alternative" or "I didn't know any better"
[11:10] <jml> ok, thanks.
[11:15] <jml> hmm.
[11:15] <james_w> importer seems to be running fine so far fwiw
[11:15] <jml> james_w: yay
[11:15] <jml> james_w: some of these tables seem pretty weird, especially when creation is expressed with terser syntax
[11:16] <jml> e.g. CREATE TABLE packages_update (anupdate TIMESTAMP NOT NULL)
[11:16] <james_w> hmm, yeah
[11:17] <jml> e.g. CREATE TABLE commits (package TEXT PRIMARY KEY)
[11:38] <mikhas> hi, I am trying to use bzr-git on a git branch with a non-linear history (read: it has some merge commits)
[11:39] <mikhas> cloning the branch fails, and I get bzr: ERROR: Unknown extra fields in <Commit f45a66d9ea4e1a3043c66e4cb0ee21f56ff7dcfd>: ['mergetag', '', '', '', '', '', '', '', '', '', '', '', ''].
[11:39] <mikhas> any ideas?
[11:54] <vila> james_w: Count me in
[11:54] <james_w> vila, sorry, for what?
[11:55] <vila> james_w: watching the importer
[11:55] <james_w> vila, oh, great, thanks
[11:55] <vila> I see dead imports...
[11:55] <vila> joke aside, how did you start it ?
[11:56] <james_w> ./etc-init-mass-import start or whatever the script is called
[11:56] <vila> 674 outstanding jobs seems... unexpected
[11:56] <vila> and only 18 failures too
[11:56] <vila> sounds like.. dunno, retry-all-failures of whatever option jml added is on
[11:57] <vila> on the bright side, some imports have succeeded
[11:57]  * jml doesn't remember adding options
[11:57] <vila> oh, did you clear the failures table or something ?
[11:58] <james_w> vila, not intentionally
[11:58] <vila> jml: pi.retry_all_failed_jobs ?
[11:58] <vila> james_w: ok, may not be a problem in the end, was just checking
[11:58] <james_w> vila, it might be
[11:58] <jml> vila: I might have written it, but I honestly can't remember :)
[11:59] <vila> jml: :)
[11:59] <james_w> there's still a lot of failures in the failures table
[11:59] <james_w> so maybe a bug in the retry logic
[11:59] <james_w> it's also failing quite a lot with "something changed" with its own revisions, which shouldn't happen
[12:00] <james_w> but that seems to have been happening a fair bit recently, so I'm not sure it's related to storm
[12:00] <vila> james_w: there were quite a lot of such failures
[12:01] <vila> james_w: we have successful imports, that means, most of the db updates are exercised and branches pushed
[12:01] <vila> even gtk+3.0 imported a release (I saw it succeed for several of them this morning so that's a *new* one)
[12:02] <vila> james_w: the backup you did should allow further and finer diagnosis, keep them preciously ;)
[12:03] <vila> james_w: egoboo killed ?
[12:03] <vila> http://package-import.ubuntu.com/status/egoboo.html#2012-07-04%2010:30:26.488895 ?
[12:03] <vila> oh, *before* you restarted ?
[12:04] <james_w> vila, yeah, I got bored of waiting
[12:04] <vila> but that one wasn't requeued ? weird
[12:04] <james_w> sorry, should have requeued that already
[12:04] <vila> james_w: not sure, wait
[12:04] <james_w> yep
[12:04] <james_w> not doing anything :-)
[12:04] <vila> egoboo is the new nexuiz-data, it can't succeed IIUC
[12:04] <vila> so better not requeue it
[12:07] <vila> hmm, key AssertionError:<module>:main:_import_package:find_unimported_versions:check will be retried automatically ?
[12:07] <vila> somthing is probably wrong here... as if some data was inverted about the failure
[12:09] <vila> or the retry one
[12:13] <vila> james_w: my guess would be that something is wrong with the retry table and the code can't find what it's searching for inverting the whole behaviour
[12:14] <james_w> vila, when considering what to automatically retry?
[12:15] <vila> at least
[12:15] <vila> all the failures being retried is already unexpected, then all failures are seen as needing a retry too
[12:17] <vila> james_w: (out of curiosity) what did you clean in the dbs ?
[12:17] <james_w> vila, old rows in the jobs table
[12:17] <james_w> and old rows in the import table
[12:18] <vila> 'old' means ?
[12:19] <james_w> vila, earlier than this month for import
[12:19] <james_w> vila, earlier than this year and active=0 for jobs
[12:20] <vila> hmm
[12:21] <vila> I mostly understand the later (I thought all active=0 can be purged and the importer will catch up) but the former rings no bell
[12:21] <vila> ha no, the lp lag window, so not all active=0
[12:27] <vila> james_w: just got the emails about {categorise|graph}_failures
[12:28] <james_w> vila, I fixed categorise_failures
[12:28] <james_w> I haven't seen graph_failures yet
[12:28] <vila> james_w: yup, I saw the fix before the email :)
[12:28] <james_w> ah, ok :-)
[12:28] <jml> is 'jobs.id' ever used as a foreign key?
[12:29] <jml> I don't think so.
[12:29] <james_w> not that I recall
[12:29] <vila> james_w: looks like the same traceback
[12:29] <james_w> these dbs haven't ever really heard of normal form
[12:29] <vila> james_w: so most probably already fixed too
[12:29] <james_w> ok
[12:35] <vila> james_w: http://package-import.ubuntu.com/status/mlton.html#2012-07-04%2012:30:30.284526 is *definitely* something that shouldn't be retried
[12:35] <vila> james_w: this means the next lp fdt will create a bunch of failures that won't be retried
[12:36] <james_w> vila, why is that?
[12:36] <vila> well, if the retry logic is inverted, true failures are retried and transient ones are not
[12:37] <james_w> ah, I see
[12:38] <james_w> and retrying that failure would have bad consequences? or it's just something that certainly won't be fixed by a retry?
[12:39] <vila> dunno, that's the point, but I would refrain requeue anything until you understand what is going on as I don't know what kind of data will be inserted then
[12:41] <vila> nexuiz-data should not be retried because it can't succeed (we don't know why yet), but it's running right now (it has a special-cased timeout of 24h),
[12:41] <vila> the last time it failed (I don't remember exactly how) the associated failure was a non-transient one,
[12:41] <vila> as such it was black-listed
[12:41] <vila> now it's not
[12:42] <vila> no big deal for this one, but as a whole, the importer has now lost its knowledge about what should be retried or not
[13:16] <james_w> vila, it certainly looks like it is retrying everything
[13:16] <james_w> except for that linux failure, which is odd
[13:16] <james_w> I'll look in to that
[13:18] <vila> james_w: hmm, now that you mention it... I wonder if the linux failure wasn't marked as a transient one *by mistake* !
[13:19] <vila> james_w: and since you said the blob -> text stuff  shouldn't be an issue, I wonder if the root cause may be in how signatures are created from backtraces or something...
[13:20] <james_w> yeah
[13:20] <james_w> that's what I'm thinking
[13:20] <james_w> if that changed (either type, or the content) then it could well cause this behaviour
[13:20] <james_w> though I would expect it to stop auto-retrying, rather than to auto-retry everything
[13:21] <vila> hehe, that's part of what make bugs interesting, surprise us :)
[13:21] <vila> wow, wow,  Exception: sqlite3.OperationalError: database is locked
[13:22] <vila> not yet on the web page
[13:23] <vila> python-bsddb3 base-installer vdr-plugin-live nexuiz-data (!)
[13:23] <vila> followed by two successful imports
[13:24] <vila> web page updated
[13:24] <mikhas> is there a workaround for git mergetags? https://bugs.launchpad.net/dulwich/+bug/963525
[13:24] <mgz> not that I'm aware of.
[13:25] <mikhas> so nothing I can do then? other than swearing at bzr, of course?
[13:26] <vila> james_w: and mails from add_import_jobs and categorise_failures with the same traceback too
[13:26] <james_w> vila, database locked ones?
[13:26] <vila> yup :-/
[13:26] <james_w> ok
[13:26] <james_w> no need to panic yet
[13:26] <james_w> it's not like they were completely absent with the old code
[13:26] <vila> I don't :)
[13:26] <vila> indeed
[13:27] <mikhas> since the error message I get is "ERROR: Unknown extra fields in blah", I was wondering whether I could force bzr to simply ignore the extra fields?
[13:28] <vila> I realized today we weren't talking about the same db locked errors, you were talking about the one in your first attempt while I was talking about the existing ones, I was still hoping your change could kill two birds though :-/
[13:29] <mgz> mikhas: ask in the bug or on the list, jelmer would know best but is not around today
[13:29] <mikhas> ok
[13:33] <vila> james_w: still, the ones in the existing code fired like... 10 times in the last 2 weeks, we're at 7 in 3 hours. In some ways, it's a progress, the bug is revealing itself, ready to be fought ;)
[13:33] <james_w> heh
[13:34] <vila> I don't remember categorise-failures triggering it either but I may be wrong
[13:36] <james_w> vila, it's run less frequently than add-import-jobs, so I'd expect to see it fail with that error roughly one fifth of the number of times
[13:37] <vila> james_w: unless an fdt ruined the game, and given that some very old failures seem to succeed (given the number of releases imported for some packages), I think it's worth to let the importer run a bit longer
[13:37] <vila> true
[13:37] <james_w> vila, I don't think there's a distinction between the existing errors and the ones from the last attempt
[13:37] <james_w> it was just a matter of frequency of them occuring
[13:37] <james_w> I predict we will all but abolish them by moving to postgres
[13:38] <vila> that's a risky bet ;)
[13:38] <james_w> so unless there is a significant performance degradation with the importer on the current code we should just push ahead with that
[13:38] <james_w> vila, I'll bet you a beer :-)
[13:38] <vila> hehe
[13:39] <vila> if you fix the retry stuff, the picture will be clearer,
[13:39] <vila> if we don't get too much locked errors, we can mark them as transient errors and pursue the experiment
[13:40] <james_w> yeah
[13:40] <vila> add-import-jobs and categorise-failures won't retry unless you fix them too but again as long as they *can* run often enough, it's still worth continuing
[13:41] <vila> james_w: you still have 645 outstanding jobs to find a fix ;)
[13:48] <vila> james_w: just to confirm, in your udd use case, you don't run import-package right ? Nor needs bzr-builddeb nor pristine-tar still right ?
[13:48] <james_w> vila, correct, we do not
[13:48] <vila> good
[13:50] <vila> hehe, oops is a package name, was wondering about its appearance in an error message ;)
[13:52] <james_w> vila, ok, found it
[13:52] <james_w> MP coming
[13:52] <vila> james_w: hmm, looking at the crontab categorise-failures and add-import-jobs run at the same frequency...
[13:52] <james_w> vila, hmm, ok, I was misremembering then
[13:53] <james_w> it is odd that it is usually add-import-jobs then
[13:53] <vila> no worries, yeah weird
[13:53] <vila> both should run at the same time, may be one is blocking the other in "normal" circumstances
[13:54] <vila> james_w: which reminds me another question: you mention 30 seconds delay for sqlite and I recently realized that's the case in your proposal, but what was it before ?
[13:55] <james_w> vila, 30 seconds
[13:55] <james_w> it has been for months/years
[13:56] <james_w> it crept up as the load went up
[13:56] <vila> you mean: implicit before and explicit in your proposal ?
[14:17] <james_w> vila, nope, it was explicit before as well
[14:47] <james_w> vila, https://code.launchpad.net/~james-w/udd/fix-auto-retry/+merge/113409
[14:49] <vila> wow, just a misplaced is not None ?
[14:50] <james_w> vila, yeah
[14:50] <james_w> and missing tests :-)
[14:50] <vila> indeed :)
[14:51] <vila> james_w: approved
[14:51] <james_w> thanks
[14:51] <james_w> rolling it out
[14:51] <vila> james_w: I have to go in a few minutes,
[14:52] <vila> yeah, roll out, kill whatever is in the way but keep refraining from requeuing until you have a good feeling you nailed that one good ;)
[14:52] <james_w> yeah
[14:53] <mgz> I'll be around a bit longer, so feel free to bug me instead for anything urgent
[14:58] <fullermd> mgz: My lawn needs mowed...
[14:59] <vila> james_w: the fix is convincing and rules out the data so that's encouraging
[15:01] <james_w> vila, yeah
[15:03] <james_w> vila, yeah, and testing with local dbs shows that the same data gives the correct behaviour now
[15:03] <vila> \o/
[15:07]  * mgz seeds fullermd's lawn
[16:05] <vila> james_w: back for a tiny bit, you did deploy with no down time ? ;)
[16:05] <james_w> vila, I did
[16:05] <vila> cute :)
[16:05] <james_w> though mass-import should really be restarted
[16:06] <james_w> otherwise it may start to think that LP is down when it isn't
[16:06] <vila> to etect , yeah
[16:06] <james_w> I can do that before I go if you like?
[16:07] <vila> or even now, so it will just have to be restarted
[16:07] <vila> hmm, libreoffice...
[16:08] <vila> well, the web page shows that the fix is effective, since no data was harmed, we can requeue, so mass-import stop and the killed ones should just restart IIRC
[16:16] <james_w> vila, stopping...
[16:17] <vila> I see, driver said db locked, not sure if anything can be wrong there (I don't think so, bit surprising though)
[17:19] <james_w> vila, finally started again after a few kills
[17:19] <james_w> I have to leave now, I'll check in later
[17:19] <vila> james_w: mass0import stop (not grateful-stop) didn't kill them ?
[17:19] <james_w> vila, it did not
[17:19] <vila> weird
[17:19] <james_w> nor a normal kill
[17:20] <vila> Weird
[17:20] <james_w> I have to get out the bigger hammer in the end