[01:02] <kiko> hey spiv
[01:02] <kiko> how's it going?
[02:20] <spiv> kiko: Hey.  Pretty good.
[02:21] <kiko> but pretty laggy.
[02:21] <spiv> Once I start waking up ;)
[02:21] <spiv> :)
[02:21] <kiko> any news on the sqlobject cache scare? :)
[02:21] <spiv> carlos's issue?
[02:21] <kiko> yes. or rather, YOUR issue. :)
[02:22] <spiv> I meant the one he encountered, rather than owned ;)
[02:22] <kiko> I was being cheeky
[02:27] <spiv> It's real problem, but I don't know what the solution is yet.
[02:28] <spiv> I know enough about it to write a test case for it.
[02:28] <spiv> There's another problem I'm aware of that may be related.
[02:29] <kiko> what does the good ian bicking say of it?
[02:29] <spiv> Overall, there's too many layers involved.  SQLObject does connection pooling and object caching, sqlos hacks those, we hack those a little bit more...
[02:30] <kiko> it's not necessarily in the sqlobject layer, then?
[02:30] <kiko> ugh
[02:30] <spiv> My gut feeling is it should be about 5x simpler.
[02:30] <kiko> my gut feeling is that any tectonic movements there are going to delay us significantly.
[02:30] <spiv> Well, I think SQLObject's abstractions for connection management are partly to blame, in a design sense....
[02:31] <spiv> because that means that sqlos needs to build its own stuff, rather than work with SQLObject.
[02:31] <spiv> But I agree.
[02:31] <spiv> I don't want to do open-heart surgery on this stuff, because that will take time.
[02:31] <kiko> that's what I feel as well
[02:32] <kiko> but we need to get this resolved once and for all
[02:32] <kiko> so how about you invest some serious hours into getting down into understanding the issues
[02:32] <kiko> get some testcases going
[02:32] <spiv> I wish I could be confident that fixing these latest issues will be "once and for all"... it's been whack-a-mole so far.
[02:32] <kiko> if I can help in any way, perhaps waving sidnei into the rink if it feels good to you
[02:32] <kiko> yes.
[02:33] <kiko> is nobody else using sqlos/sqlobject and the latest set of hacks on top of it?
[02:33] <spiv> Well, certainly no-one else is using sqlos + initZopeless (i.e. the same SQLObject classes inside and outside of Zope).
[02:34] <kiko> is that where the problem comes from?
[02:34] <kiko> changes being done "behind zope's back"?
[02:34] <kiko> which happens because of the po importd?
[02:34] <spiv> More generally, changes being done behind the back of SQLObject in a particular process.
[02:35] <kiko> can you elaborate in 3 lines?
[02:35] <spiv> Actually, I'm not certain that that's what carlos's problem is, I want to investigate to be sure.
[02:36] <spiv> But ddaa has hit that with importd, I've hacked around it for now.
[02:37] <kiko> by forcing a cache refresh at critical points?
[02:37] <spiv> So, for some reason doing a commit (or an abort) via the proper high-level mechanisms, at least in initZopeless, isn't sufficient to make db updates from other processes visible.
[02:37] <ddaa> kiko: example: when I reload buildbot, I abort the current transaction and do a select() in sourcesource to catch changes made from lauchpad.
[02:37] <ddaa> It used to work.
[02:37] <spiv> In that case, it appears that it's not the cache, but the actual connection objects themselves that need resetting.
[02:38] <ddaa> I seriously suspect that's a regression which was introduced by the sqlos commit/abort fix.
[02:38] <spiv> (there's no cache issues -- these objects are new, and so aren't in the cache yet)
[02:38] <kiko> connection cache?
[02:38] <ddaa> But I was not able to test that theory.
[02:38] <ddaa> (lack of time)
[02:39] <spiv> carlos's problem seems to be slightly different, but I think it may be related, at least in a design sense.
[02:39] <spiv> In his case, clearing the cache is sufficient, no need to muck about with the actual connections.
[02:40] <spiv> kiko: Btw, thanks for asking about this.  Explaining things out-loud does help clarify my own thoughts :)
[02:40] <kiko> talking is good.
[02:40] <kiko> so tell me more about ddaa's issue.
[02:41] <kiko> so it doesn't seem to be the object cache, but the connection isn't seeing an updated view of the database?
[02:41] <spiv> Correct.
[02:41] <ddaa> yup
[02:41] <ddaa> spiv: in case you missed that, _it used to work_
[02:41] <spiv> The workaround is to reach behind sqlobject's back, get the actual low-level psycopg connection object, and to a .rollback and a .begin on it.
[02:41] <spiv> ddaa: I saw :)
[02:42] <stub> Would this be because a new transaction is being started as soon as the last one is committed, and transaction isolation not letting you see DB changes after that occurs?
[02:43] <spiv> stub: That's my current hypothesis, yes.
[02:44] <stub> spiv: Do you have a plan to progress, or do you want mine?
[02:45] <spiv> stub: I've already heard yours, I think?  Close/re-open the connections, let psycopg's connection pooling take care of it.  I like that plan.
[02:46] <stub> Thats pretty much it. SQLOS's connection descriptor will probably even take care of reopening the connection for you.
[02:47] <spiv> Right.
[02:49] <stub> Was there any *reason* the librarian refuses to store the same file with the same filename but different mime-types? Or is that just the way it happened?
[02:50] <kiko> it's at least interesting
[02:50] <spiv> It's currently just the way it happened, mainly.
[02:50] <spiv> The minor advantage is that a file content id + filename is sufficient to know the mime type.
[02:50] <spiv> Which makes the URLs slightly simpler. 
[02:51] <stub> Hmm.... we should only ever need to use filealiases though. I'll add a low priority bug on this - the use case is where somebody uploads a file with the wrong mime type and tries to correct this situation by uploading again.
[02:52] <spiv> Hmm, yeah.
[02:52] <kiko> they'd need to use a new ID here
[02:53] <spiv> I've suspected it was a bad restriction, but I couldn't think of a use-case, so called YAGNI on myself before fixing it.
[02:53] <spiv> But yeah, that might happen.
[02:53] <stub> kiko: Indeed. The upload should work and we get a new filealias pointing to the same filecontent, just like they tried to upload it using a different filename.
[02:53] <kiko> should we nuke the original filecontent?
[02:53] <spiv> No.
[02:53] <stub> kiko: filecontent is readonly except for garbage collection (in the future)
[02:54] <stub> erm... writeonce
[02:54] <spiv> The filecontent is independent of names and mime-types (it's just a sequence of bytes).
[02:54] <kiko> I see.
[02:54] <spiv> And uploading the same content twice only writes it once -- the librarian catches dupes.
[02:54] <kiko> indeed.
[02:54] <kiko> sounds like the perfect archival mechanism -- the only thing it doesn't help us with is mirroring.
[02:55] <spiv> (properly -- it actually checks the bytes, not just the SHA digests ;)
[02:55] <kiko> the risk of a clash is that bad?
[02:55] <stub> kiko: It will only take a minor extension to support mirroring, and the only code changes need to be done to the librarian
[02:55] <kiko> the mirrors need to run the librarian as well, however, I imagine?
[02:56] <spiv> kiko: Probably not, but it wasn't significantly harder to be sure.
[02:56] <kiko> so you run a real diff?
[02:57] <stub> kiko: I listened to them too much in Oxford and believed that the minute chance of a clash worth worrying about. I have since done some more reading and match and believe that it really isn't worth doing a byte-by-byte :-) Future optimization - it is running fine atm.
[02:58] <kiko> heh. a sha-clash would be a confusing bug to follow though :)
[02:58] <stub> kiko: No - any mirror.
[02:59] <kiko> stub, and how would the downloader get the right file? or would the mirrors be structured with symlinks? or is it yet something more obvious?
[02:59] <stub> kiko: Yeah - but it would make us famous since we would be the first people to ever find one in the decade the hash has been around (and even more so since we would probably store the MD5 as well, and to clash we would have to dupe both)
[02:59] <spiv> stub: I believe the benefits of the optimisation wouldn't be worth worrying about either ;)
[03:00] <stub> kiko: We add another table, called mirroredcontents or such. We have a process that checks mirrors, checks the librarian, and adds entries to this table.
[03:01] <kiko> stub, so far so good. when somebody comes to download?
[03:01] <stub> kiko: To get a URL, launchpad et.  al. ask the librian for a URL. So all the logic can be embedded there.
[03:02] <kiko> hummm
[03:02] <stub> kiko: So the librarian might just give a URL to a load balancer/mirror selecter server. Or does a GEOIP and gives the URL direct to the mirror. Or whatever. We can even have rules like 'only mirror for files > X bytes' or 'check this cookie for further instructions'
[03:03] <kiko> so the end-user would get a URL to somewhere else. my question is if the somewhere else would have those files with their original filenames?
[03:03] <kiko> stub, which means we would probably have to duplicate files with same content and different IDs (or use symlinks)
[03:06] <stub> kiko: The files would have to be served with the same filename and same mime-type. This also QA's our mirrors.
[03:07] <stub> (we could relax that rule if we want, but I like it)
[03:08] <kiko> indeed.
[03:08] <stub> The only files that might have multiple mime-types are the pissy little ones we don't need to worry about mirroring (text/plain, text/xml, application/xml)
[03:32] <kiko> great!
[03:32] <kiko> stub, so you'll be going alone to SA?
[03:45] <stub> kiko: Yes. Kirsten is trying to stop herself getting distracted from the Great Work.
[03:45] <kiko> smart girl she is.
[03:46] <kiko> there's only so much field work that can be absorbed in a lifetime -- ask me one day.
[03:46] <kiko> o/' you spin me round round baby round round o/
[04:06] <stub> I think making product names unique will bite us :-( I'm just creating some products in dogfood and realizing that they are all pretty generic names, which would be just fine if (project, product) was the key.