[00:00] mwhudson: unfortunately they don't get propagated [00:00] jelmer: oh really? [00:00] how completely useless [00:01] mwhudson: I discussed them with some git developers at LCA, and we came to the conclusion that git notes wouldn't be useful for this sort of thing. [00:01] oh ok, i got the idea somehow that they were a new thing [00:01] They are relatively new, only about a year I think. [00:02] I ended up implementing bzr-git roundtripping using extra metadata in commit messages (revision properties, revision id) and a file-id table in the tree. [00:02] ah ok === almaisan-away is now known as al-maisan === al-maisan is now known as almaisan-away [00:12] jelmer: so [00:12] /home/robertc/launchpad/lp-branches/working/lib/lp/scripts/utilities/importfascist.py:187: DeprecationWarning: please use 'debian' instead of 'debian_bundle' [00:12] module = original_import(name, globals, locals, fromlist, level) [00:12] jelmer: when is that getting stabbed ? [00:12] jelmer: i see the kdebase import failed :/ [00:12] lifeless: yes that is nice to separate the policy bits of testresources from the mechanism [00:13] lifeless: I thought I already head [00:13] *had [00:14] jelmer: I [00:14] 'm on latest devel [00:14] latest sourcedeps, upgraded my packages [00:14] hmm, I'll hit apt harder there were some hold- backs [00:15] lifeless: it doesn't tell you where that DeprecationWarning is coming from? [00:15] mwhudson: yeah :-( [00:15] jelmer: no, its a little opaque [00:16] jelmer: any idea on this one? [00:16] lifeless: Have you run update-sourcecode recently? [00:16] jelmer: never ? [00:16] seems that it's some kind of remote server problem [00:16] mwhudson: yeah, it's the same one as the len(tview) != len_tview one [00:17] jelmer: oh ok [00:17] poolie: small request for you [00:18] poolie: I'd like to put a rule in the feature flags ruleset on production [00:18] ok [00:18] which will say 'beta users team membership means is_edge should evaluate true' [00:18] by all means [00:19] so keeping the same behaviour, but changing the guts to use the flags mechanism rather than being hardcoded? [00:19] yeah [00:19] yes i was thinking of doing that soon too [00:19] for now is_edge (and is_lpnet) need to union the two things [00:19] as a transition, I think. [00:19] e.g. [00:20] if the appserver is configured as edge [00:20] then is_edge is true and is_lpnet is false [00:20] if the appserver is configured as lpnet [00:20] so last thursday [00:20] then the flags rule can override that [00:20] which seems like a long time ago [00:20] i got a readonly web view of them [00:20] to say 'actually, this is edge, nyarh' [00:20] and i was trying to work out a nice way to make them editable [00:21] poolie: I quite like what sinzui has organised for jabber ids with bac [00:21] which is, you can add or delete, but not edit in place; it makes it kindof simple [00:22] I dunno - we can have something pretty crude - even a big textfield you parse and diff would do :). I don't know this part of the LP machinery as yet. [00:26] yeah, something like that [00:27] mwhudson: getting that particular bug fixed is high on my todo list, but it's non-trivial [00:27] it wasn't a blocker, that was just what i got up to before i stopped [00:27] jelmer: it's fixable on the bzr-svn end? [00:27] mwhudson: yeah, it's a bzr-svn bug [00:28] oh ok [00:28] mwhudson: the problem is that because of odd operations in the repo bzr-svn ends up with an invalid base text to apply the delta it receives from the server against [00:28] jelmer: is it like the hash reconstruction fun you had with bzr-git? [00:28] mwhudson: It's just for the file fulltext, doesn't depend on the particular serialization that bzr-svn uses [00:29] oh ok [00:29] sounds like fun :) [00:29] s/bzr-svn/svn/ [01:54] jelmer: still up ? [01:54] or StevenK ? [01:54] I have this happening : [01:55] robertc 6840 0.0 0.2 26224 4276 pts/1 T 10:52 0:00 \_ /usr/bin/perl -w /usr/bin/debuild --no-conf -S -k0x5D147547 [01:55] robertc 6868 0.0 0.0 5572 772 pts/1 T 10:52 0:00 \_ tee ../biscuit_1.0-4_source.build [01:55] robertc 6929 0.0 0.0 19404 1920 pts/1 T 10:52 0:00 \_ /bin/bash /usr/bin/debsign -k0x5D147547 biscuit_1.0-4_source.changes [01:55] robertc 6968 0.0 0.0 5596 772 pts/1 T 10:52 0:00 \_ stty 400:1:bf:a20:3:1c:7f:15:4:0:1:0:11:13:1a:0:12:f:17:16:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 [01:55] ^ hung test [02:21] jelmer: the stty is looping: [02:21] ioctl(0, SNDCTL_TMR_STOP or TCSETSW, {B38400 opost isig icanon echo ...}) = ? ERESTARTSYS (To be restarted) [02:21] --- SIGTTOU (Stopped (tty output)) @ 0 (0) --- [02:21] -forever- [02:21] --- SIGTTOU (Stopped (tty output)) @ 0 (0) --- [02:35] man, my vim in this vm is _borked_ [02:35] its the cause, whatever is going on. [03:17] mwhudson: did you have any thoughts on https://bugs.edge.launchpad.net/launchpad-foundations/+bug/618019 ? [03:17] <_mup_> Bug #618019: OOPS may be underrepresenting storm/sql time [03:19] lifeless: not particularly [03:19] lifeless: i know that object construction time can be significant [03:19] calling it 'sql time' doesn't seem quite fair though [03:20] well, I'm saying we don't know [03:20] I'd like to be able to say ORM time and sql time [03:20] but I have a sinking feeling tha time spent getting stuff out of the sql socket is being accrued as nonsql time [03:35] mwhudson: ould factory.makePerson be reusing Person db id's ? [03:35] lifeless: no [03:35] I have a -weird- interaction then [03:35] there might be db-non marked dirty issues [03:35] lifeless: details++ pls ;) [03:35] yes [03:35] typing [03:35] lp/registry/browser/tests/coc-views.txt [03:35] line 37 [03:36] makes a new person [03:36] grabs a view [03:36] and checks that the instructions are show [03:36] n [03:36] I'm seeing [03:36] - 1. Register an OpenPGP key. [03:36] - 2. Download the current Code of Conduct. [03:36] - 3. Sign it! [03:36] which means the view things the principle has signed the coc [03:36] I think [03:38] lifeless: Wouldn't that suggest the opposite? [03:38] wgrant: the - means the line is missing [03:39] Oh. [03:39] Not a bullet. [03:39] I see. [03:39] now, for hilarity [03:39] I can't find those instructions in the source [03:39] What are the changes in your branch? [03:39] wgrant: its the registry branch [03:40] The one where you're caching that field? [03:40] yes [03:40] lifeless: the instructions are in codeofconduct-list.pt [03:40] nothing obvious in a preview merge [03:40] oh, the threw my gre out [03:41] mwhudson: thanks [03:41] it also seems to think that they have registered an openpgpg key [03:42] which is -really weird- as I didn't cache that [03:42] ah no, its all in the not: is_ubuntu_coc_signer clause [03:42] so the symptoms are 'a new person from factory.makePerson has their is_ubuntu_coc_signer set true [03:47] yes, I added a print of the is_ubuntu_coc_signer right after makePerson [03:47] -> True [03:48] lifeless: Is it caching (eg. does it work in make harness), or is your query broken? [03:48] my query ? [03:48] wgrant: this isn't going through _all_members [03:48] wgrant: thats a -very- explicit code path; [03:48] lifeless: But you refactored the property itself. [03:48] yes [03:49] wgrant: oh, right, I see what you're asking. Thanks. [03:49] I haven't seen this AND thing before. [03:49] thats easy to tes [03:49] Ah. [03:49] There's the problem. [03:49] You're not constraining the Person... [03:50] You're just saying Person.id. [03:50] yeah, its the calculation itself [03:50] You mean self.id. [03:50] Ah, except that it's static. [03:51] yeah, the refactoring is borked [03:51] very good catch [03:53] in both cases in fact. [03:53] not enough attention to detail; fixing. [03:53] Hah. [03:53] it needs to LeftJoin on the _all_members case [03:53] to be fair though, it was the second attribute, I was still figuring things out ;) [03:54] Hm, is the _all_members case really broken? [03:54] yes [03:54] I'm not sure how Storm will SQLify that. [03:54] OK. [03:54] people that haven't signed a coc at all will be excluded [03:54] because their columns would be NULL [03:54] But it's in an Exists... [03:54] oh [03:54] I think I need more caffeine. This isn't me :) [03:55] right. square one. [03:55] Heh. [03:55] the _all_members case does look right. [03:55] I think it's OK, yes. [03:55] the property is naffed [03:55] we can't use Person.id there [03:55] so [03:55] You can. [03:55] You just have to constrain it in the property itself. [03:56] we're not querying on the Person table [03:56] and querying both tables would be nuts [03:56] True. [03:56] brb. [03:58] mwhudson: wgrant: thank you for your help. [04:04] taking a break; -> new house stuff and talking to bigjools late tonight [04:04] if you need me, SMS/ring the aussie mobile [07:07] getting there with this branch [07:11] The cache-the-world one? [07:15] cache person [07:15] got some wtf failures in soyuz tests [07:16] also in my cachedproperty branch which I'm using as a prereq now [07:37] wgrant: for instance [07:37] File "lib/lp/soyuz/browser/tests/archive-views.txt", line 1279, in archive-views.txt [07:37] Failed example: [07:37] view = create_initialized_view( [07:37] ubuntu_team.archive, name="+copy-packages", [07:37] form={ [07:37] 'field.destination_archive': '', [07:37] 'field.destination_series': '', [07:37] }) [07:37] ... [07:37] raise ComponentLookupError(objects, interface, name) [07:37] ComponentLookupError: ((None, ), , '+copy-packages') [07:38] has me scratching [08:07] lifeless: I'm also seeing a hanging test, FWIW [08:07] StevenK: in soyuz? same symptoms? how are you running it ? [08:07] lifeless: I threw a branch at ec2 land [08:07] ec2test@domU-12-31-39-0E-60-31:~$ tail -n 1 /var/www/current_test.log && date [08:07] time: 2010-08-16 05:14:30.759898Z [08:08] Mon Aug 16 07:07:44 UTC 2010 [08:08] lifeless: strace is being as unhelpful as I feared [08:08] \o/ [08:09] so [08:09] what does ps tell you ? [08:09] e.g. ps fux [08:09] All that shows is the shell, the ps process and the librarian [08:10] hmm, what user are you :) [08:10] oh, no terminal [08:10] ec2test [08:10] ps faux ? [08:10] the stty process is what was hurting me [08:10] (and its a known 'feature' of stty [08:10] No stty process [08:11] what do you have? [08:12] lifeless: http://pastebin.ubuntu.com/478711/ [08:12] I'm suspecting the test runner has fallen over [08:12] well [08:12] did you see the patch from maris [08:13] shutdown stomps on shutdown [08:13] so falling over is possible [08:13] finishing and not shutting down is also possible [08:13] I seriously doubt the test suite finished [08:14] subunit-ls < test.log [08:14] sorry [08:14] Minusing the last test ran versus what time the instance started is 80 minutes. [08:14] We're not that quick :-) [08:15] subunit-stats < test.log [08:15] true [08:15] It only ran 1744 tests [08:16] One did fail, though, which is odd [08:18] may be interrupted [08:18] if the test kills the runner thats how it shows up [08:18] error: lp.soyuz.tests.test_buildpackagejob.TestBuildPackageJob.test_providesInterfaces [ [08:18] _StringException: lost connection during test 'lp.soyuz.tests.test_buildpackagejob.TestBuildPackageJob.test_providesInterfaces' [08:18] lifeless: Like so, from subunit-filter ? [08:19] yes [08:19] that tells you what nuked things [08:19] assuming no buffering [08:19] Indeed [08:20] [which is false - I *know* we have buggering in place in the test supervisor] [08:20] * StevenK smirks [08:20] s/gg/ff/ [08:21] lifeless: So, what do you suggest? Kill the test runner and run make check by hand on the instance? [08:22] well [08:22] is it repeatable? [08:23] This branch has done this twice, but I don't know which test killed it the first time since I was sleeping [08:24] running the tests while watching isn't a silly idea [08:25] you could try running just that test [08:28] Running just that test locally doesn't fail [08:28] But, like you say, I think buffering is screwing us [08:29] you can run it and the next 20 [08:29] grab any previous ec2 result [08:29] and use subunit-ls to get a list of the tests [08:29] pick that one and 20 or 30 after [08:29] put them in a file and use bin/test --load-list filename [08:29] The ordering doesn't change? [08:30] its tolerably stable [08:31] btw [08:31] 1500 line text files are not 'tests' [08:31] just saying [08:32] Wha? [08:32] * StevenK is missing contextg [08:32] s/g$// [08:33] tests/archive-views.txt [08:33] its blowing up, spectacularly, for me [08:34] File "/home/robertc/launchpad/lp-sourcedeps/eggs/zope.component-3.9.3-py2.6.egg/zope/component/_api.py", line 111, in getMultiAdapter [08:34] raise ComponentLookupError(objects, interface, name) [08:34] ComponentLookupError: ((None, ), , '+copy-packages') [08:34] line 1279 [08:34] lifeless: That test predates me by a while [08:35] sure [08:35] not blaming you :) [08:35] has anyone seen one of these sorts of failures before [08:35] and can suggest how to debug the thing ? [08:35] He can! [08:35] Um, I have, but I can't remember what I did [08:36] lifeless: normally you see that kind of error when ZCA hasn't been initialised properly... I'm assuming you've not modified the test so you haven't changed the layer it runs with? [08:36] * StevenK grumbles at the letter he just got from Medibank Private [08:36] noodles775: no, I haven't changed anything in soyuz [08:37] noodles775: this is my registry branch, which adds some caching to Person (and only Person) [08:37] hullo [08:37] hey bigjools [08:37] lifeless: ah, so it's only failing in your branch? I'll take a look at the MP. [08:37] hey lifeless, epic fail at getting up early I'm afraid [08:38] bigjools: thats ok [08:38] bigjools: Still flu-ey? [08:38] bigjools: Read as, "Did the weekend help?" [08:38] not quite full strength but better, thanks [08:38] noodles775: I'm pushing the latest now, but the shape is unchanged [08:39] noodles775: https://code.edge.launchpad.net/~lifeless/launchpad/registry/+merge/32067 [08:40] Ta. [08:40] its pushing now (bit slow because Lynne is eating all the wifi slots with a machine migration) [08:40] no worries... I'm just running the doc firstbefore merging anyway. [08:41] *sigh* and need to run make schema first. [08:41] the cachedproperty changes are from a different branch [08:41] which is approved and ec2ing itself [08:42] * StevenK stares at bin/test --load-list on his instance === almaisan-away is now known as al-maisan === henninge_ is now known as henninge [08:51] lifeless: Your suggestion was to run bin/test --load-list . That has set up the layers and then done nothing else [08:52] you probably want -vv there too :) [08:53] It just spat out 'Killed', so I'm now *very* curious what is going on [08:55] lifeless: Probably :-) [08:56] lifeless: the cache you've added on IPerson.archive is changing the test: http://pastebin.ubuntu.com/478724/ [08:56] (sorry, doc :) L. [08:56] noodles775: thanks [08:57] np. [08:57] (waiting for pakcets to look at the pastebin) [08:57] ec2test@domU-12-31-39-0E-60-31:~$ dmesg | grep -c 'oom-killer' [08:57] 6 [08:57] lifeless: ^ [08:57] -epic- packet loss on local wifi :( [08:57] The plot thins :-( [08:57] hah [08:58] 8693 ec2test 20 0 3575m 3.2g 11m R 100 45.1 4:35.38 /usr/bin/python [08:58] Fuuuuuuuuuun [08:58] thats less that optimal [08:58] bin/test is blaming buildd-slavescanner.txt [08:58] noodles775: so what does it /mean/ ? 'no archive found when one should be found' ? [08:59] lifeless: I'm guessing it means that ubuntu_team.archive was accessed before the doc added an archive for them, so the value of None is cached... [09:00] The line of the error you see is where it tries to adapt ubuntu_team.archive (which is None) and fails. [09:00] As the paste shows, the archive is found if you kill the cache (and the initialized view returned as expected) [09:01] So, convert it to a unit test and the problem goes away :) [09:07] noodles775: yeah, I need to find where though - so that I can fix the code to invalidate automatically (there are many more failures, this is just the first bit of fallout that was completly bizarre) [09:09] bigjools: so [09:09] Right, Python got to 6.7g resident before the oom-killer stepped in [09:10] bigjools: I don't konw what bits needed expansion in the incident report [09:10] Hello all [09:10] bigjools: so I suggest you ask me stuff ;) [09:10] lifeless: so an explanation of how you reached your conclusion would be a good start [09:10] noodles775: ArchiveSet.new() caches the value [09:10] bigjools: which conclusion ? [09:10] lifeless: aha. [09:11] noodles775: yes, aha indeed. [09:11] lifeless: "Soyuz went into a busy loop on the 'bohrium' builder" [09:11] noodles775: I suspect its this : if purpose == ArchivePurpose.PPA and owner.archive is not None: [09:13] lifeless: yep. Isn't this why we don't normally cache model attributes? (sorry, I've probably not caught up on some email discussion saying why what you're doing is ok). [09:13] bigjools: wgrant and cody talked about that. [09:14] bigjools: cody looked at the log on devpad and saw multi-times-per-second logged events saying that bohrium was being disabled [09:14] bigjools: I thought I linked the example log entry in the report [09:14] lifeless: that's just one snippet, it doesn't show repeated attempts at anything [09:15] noodles775: see several threads of mine about this on the list :) short story: caching is *a* way to get 'things that look cheap are cheap' more widespread, and bigger picture solutions require r&d [09:15] bigjools: it was spitting that out a lot, or so I was told [09:16] ok [09:16] bigjools: other evidence about a busy loop on bohrium is that attempts to update it from both psql and the lp webapp and airlock all failed [09:16] * wgrant was just working on what Cody said was hoppening in the logs. [09:16] lifeless: in the traceback it's calling "requestAbort" [09:16] specificaly just update builder set thing=False where name=bohrium got stuck waiting for a lock [09:17] that was a sign that *something* was busy updating that row....and updating the row.... and updating the row [09:17] we *don't know* if it was one long transaction, or many short ones. [09:17] bigjools: That codepath then immediately sets builderok=false, then commits. With no other options. [09:18] noodles775: pm [09:18] wgrant: nope, it calls slave.abort() [09:18] bigjools: True. Which fails, then invokes the exception handler which prints the log message. [09:18] which is why we get an XMLRPC fault [09:18] Which sets builderok=false, which commits. [09:18] Missed the exception handler bit, sorry. [09:18] ah ok [09:18] But we know that the handler was called, since the log entry is there. [09:19] noodles775: does that pasted thing look reasonable to you ? [09:19] it makes that entire doctest pass [09:19] In fact, I think that's just about all we *know* happened. [09:20] we also knew that other builds were not happening [09:20] True. [09:20] (or were being updated/processed so slowly that it was equivalent to not happening) [09:20] Now, I haven't seen the logs, but from what I heard there was nothing about the commit failing. [09:21] So possible something kept setting builderok=true again, or the Twisted evil is swallowing exceptions. [09:21] lifeless: so what that means is that we never got back into the reactor, so there was a loop somewhere [09:21] lifeless: was the log spewing stuff or stuck on that line? [09:22] lifeless: hrm... it looks like, yes, it would get the doctest passing, but it's adding a dependence elsewhere on the knowledge that a property is cached... would something like @cachedproperty(unless_equal=None) be silly? Hrm [09:22] bigjools: I didn't look : when I got here it had been plausibly analysed > 1 hour before, without escalation: so I discussed enough to determine that it needed (IMO) escalation, and handed off to elmo [09:23] noodles775: yes, because it would mean that /participants of a team with many people without PPAs would trigger lots of tiny queries [09:23] lifeless: Right. [09:23] lifeless: So I can't think of any other solution immediately than what you've pasted. [09:24] archive.txt does this [09:24] * StevenK tries to figure out why a doctest is causing python to eat 6.8g of RAM [09:24] mm, does something similar [09:24] lifeless: ok so I am looking at the log now, it's repeating that log section over and over [09:26] lifeless: I have a suspiscion that it happened when the Enablement guys yanked it from the pool [09:26] assuming they did so on a Saturday [09:26] actually, Friday night [09:26] you say potato, I say pohtahto [09:27] b-m was stuck in this loop from 22:17 to 08:47 [09:27] no you don't :) [09:27] I mean tz issues :) [09:28] that's UTC [09:28] yeah [09:28] so 10am sat [09:28] therefore it was Friday night everywhere ;) [09:28] Oho. @write_transaction retries.... [09:28] forever? [09:28] Only three times, apparently. [09:28] was gonna say ... [09:28] But it's still utterly wrong to use it. [09:29] ? [09:29] What with the whole talking to the builder thing. [09:29] what do you guys think of the idea of making the buildd manager an API client [09:29] you need to expand on that [09:29] lifeless: CRACK [09:29] and removing its DB usage. [09:29] total crack [09:29] why ? [09:29] bigjools: Retrying transactions that have external effects seems... unwise. [09:29]