[10:27] cjwatson: I guess those PostgresConnection leaks aren't a coincidence :/ [10:29] wgrant: Yeah, although if I'm not mistaken I was seeing them before my psycopg2 upgrade as well [10:30] wgrant: Indeed, builds 897 and 898 both have a bunch [10:30] r17870 and r17871 respectively [10:31] Yeah, they happened occasionally. [10:31] But two in a row straight after an upgrade is slightly worrying. [10:31] Hopefully just a coincidence. [10:31] * cjwatson tries forcing once more for luck [10:57] LayerIsolationError: Test left uncollectable garbage [10:57] [] (referenced from [(,), [], {'_connection': , '_order': {}, '_dirty': {}, '_implicit_flush_block_count': 0, '_sequence': 0, '_event': isn't that saying that it's referenced from itself? [10:57] ... 0x15d3790c>, '__provides__': , '_lp_store_initialized': True, '_register_for_txn': False, '_cache': , '_database': , '_alive': }]) [10:57] cjwatson: Referenced from a tuple, a list, and a dict. [10:59] oh right [11:34] wgrant: no luck reproducing this locally so far [12:03] cjwatson: Hm, you've run a big chunk of the tests? [12:29] I tried running one test that happened to fail a couple of hundred times, and tried running lp.blueprints.browser. I'll try a bigger chunk [12:52] gc.get_referents(*gc.garbage) is returning [{'_closed': True, '_database': , '_event': , '_raw_co [12:52] nnection': None}, ] [13:01] so the cycle could either be via an event hook, or something in LaunchpadDatabase (but neither that nor any of its superclasses hold any non-trivial references that I can see) [16:21] ... ok, that did reproduce locally, once in the entire test suite. now to try the whole thing again with -D :-/ [16:21] (and hopefully that won't perturb it away) === BradCrittenden is now known as bac [22:19] cjwatson: Ah, I apparently actually dug into a reproducible version of this in my Product.access_policies changes a few months ago. A MutableValueVariable held a reference through _event_system, but I changed the approach used in the code rather than debugging all the way. [22:25] Curious, since MVV's hooks don't seem to do anything that should produce circularity [22:26] I mean, it only adds methods with zero args as hooks [22:27] At some point I may manage to get lucky and catch this in pdb. [22:30] cjwatson: If you want to bet that the two PostgresConnection leaks aren't unrelated, it may be worth reinstating my broken garbo job and poking at that test. [22:30] "∘ product-aps-set broke buildbot spectacularly. [22:30] ‣ garbage: {{{[]}}} [22:30] " [22:31] cjwatson: http://paste.ubuntu.com/14057795/ reproduced it consistently. [22:31] OK. I'm running --layer=DatabaseFunctionalLayer at the moment with pdb.set_trace() inserted just before the exception; if that doesn't catch it (or maybe even if it does) then I'll try that. [22:31] Specifically setting the access_policies list. [22:32] Though it would certainly be nice to have a shorter and more reliable reproducer, so yes ... [22:32] Won't get to it before tomorrow morning though [22:32] Well, the problem is that we don't know if they're actually related. [22:32] Indeed. [22:32] And we won't know until the garbo one is fixed and buildbot passes :/ [22:33] Feel free to hand over any notes you have and I'll continue poking. [22:33] I don't yet have meaningful notes other than what I've mentioned here, since my reproducer was so long and unreliable. [22:34] Just running with http://paste.ubuntu.com/14057865/ [22:34] (But I'm sure you could have figured that much out) [22:34] I've been multitasking with working on getting my git-build-recipe tests passing [22:35] Currently at 119/135 ... [22:44] cjwatson: 119/135 tests complete, or 119/135 runs succeeded/ [23:24] wgrant: 119/135 tests succeeded [23:25] lp.registry.tests.test_distro_webservice.TestGetBranchTips.test_same_results> /home/cjwatson/src/canonical/launchpad/lp-branches/more-layer-isolation-info/lib/lp/testing/layers.py(473)testTearDown() [23:25] -> BaseLayer.flagTestIsolationFailure( [23:25] oho [23:25] really [23:25] It's an interesting test to break on. It's a slightly weird method. [23:26] Hesitant to ascribe too much significance to the particular test. [23:26] Indeed. [23:28] (Pdb) p obj['_event']._hooks [23:28] {'register-transaction': set([]), 'flush': set([(>, ())])} [23:29] Oho [23:29] Very relevant, then. [23:29] So *could* be, but how's that circular [23:29] I guess that bound method refers to that variable [23:29] Yep. [23:29] Bound methods hold a reference to their object, funnily enough. [23:30] But I'm not sure why this wouldn't happen all the time, unless it sometimes gets unregistered. [23:30] (Pdb) gc.get_referents(meth.im_self) [23:30] [, , {u'include_long_descriptions': True, u'backports_not_automatic': False}, Undef, (Undef, u'{"include_long_descriptions": true, "backports_not_automatic": false}'), , ] [23:31] So I guess my publisher_options change could have tickled it? [23:31] Maybe made it more probable, at least, since DistroSeries is everywhere [23:31] It is not impossible. [23:31] I did think about that when reviewing it, but buildbot seemed happy... [23:32] The only things with __del__ are Result and Connection, unless there's something inside psycopg2 [23:34] Right, but it's probably something like PostgresConnection holding a reference to the event_system so it can fire events on commit or similar. [23:35] There'd have to be a reference back to the connection somewhere for it to be relevantly cyclic, unless I misunderstand the gc [23:36] Indeed, but the variable or event_system probably knows the store which knows the connection, or something like that. [23:36] ah [23:36] (Pdb) gc.get_referents(meth.im_self)[6]._hooks [23:36] {'stop-tracking-changes': set([(>, ())]), 'changed': set([(>, ()), (>, ({: {}, 'sequence': 1, 'primary_vars': ... [23:37] ... (,), : {}, 'invalidated': True, : {}, 'store': , : {}},))]), 'start-tracking-changes': set([(>, ())]), ... [23:37] ... 'object-deleted': set([(>, ())]), 'resolve-lazy-value': set([(>, ())]), 'flushed': set([])} [23:37] Right. [23:37] there's the store [23:37] so these must normally be unhooked or nothing would work [23:37] Yeah, maybe _stop_tracking isn't being called somewhere along the line. [23:37] * wgrant tries to revive the reproducible case. [23:39] Oh I have vague memories of this. [23:39] I could only reproduce it with a reset in some circumstances, I think. [23:42] I'm a bit suspicious of the object-deleted implementation in the C extensions. [23:43] That's at least sometimes how _stop_tracking is called. [23:44] Confirmed that r17587 plus that test I pasted reliably generates garbage [23:44] This JSONVariable is only referred to by its bound methods [23:45] Yep [23:48] I wonder if it's possible to end up with the same hook attached multiple times? [23:48] Via multiple MutableValueVariable.get calls [23:51] It indeed seems to hook twice, hmmm. [23:51] But then my unhook log isn't triggered at all. [23:52] Oh, probably unhooked once in set, which I didn't log. [23:53] cjwatson: Oh, it's a set anyway. [23:53] Unhook is never called AFAICS. [23:54] It's not called via object-deleted -> _detect_changes_and_stop -> _stop_tracking? [23:54] Nope. [23:54] Logging more and tweaking the test to not leave garbage, to see how it's called normally. [23:55] It is possible, of course, that it only leaks on reset. [23:55] And it depends on test ordering. [23:56] reset (or invalidate) could cause multiple gets by clearing the cache [23:56] Interesting. [23:56] If I remove the reset at the end of the test, it actually gets unhooked by a new set() call. [23:57] I guess it gets set to AutoReload when the transaction is aborted at the end of the test.