[00:05] StevenK: More importantly you should check what the subscribers are [00:06] wgrant: So I guess I should set a bug supervisor too, then [00:07] Right [00:08] * StevenK picks wgrant. [00:11] wgrant: https://bugs.qastaging.launchpad.net/auditorclient/+bug/939552 looks good to me [00:11] <_mup_> Bug #939552: Juju should support MAAS as a provider < https://launchpad.net/bugs/939552 > [00:49] wallyworld_, wgrant: https://code.launchpad.net/~stevenk/launchpad/contains-to-match/+merge/121354 [00:50] +1 from me [00:51] wallyworld_: Thanks [00:52] np [01:12] webops: Could you ppa-reset marid, please? [01:13] It's on furud [01:13] ta [01:16] wgrant: hmm. oki, how does one reset a single server. ppa-reset seems to be an all or nothing affair? [01:18] spm: I think 'ppa-reset marid' should work [01:19] I don't have rights for that [01:19] Ah [01:19] You'll have to do it from alphecca, I guess [01:19] sec [01:19] and looking at scripts, I'm going to need gsa intervention [01:19] Nope [01:20] ssh -i /home/lp_buildd/.ssh/ppa-reset-builder ppa@furud.ppa ppa-reset marid [01:20] ah yes, the builddmaster would have stab ability. [01:20] Mon, 27 Aug 2012 01:20:18 +0000: Clearing all marid Copy-On-Write devices. [01:20] device-mapper: create ioctl failed: Device or resource busy [01:20] Command failed [01:20] That's rather unpleasant of it [01:20] that doesn't look happy. [01:20] Sounds GSAy [01:24] wgrant: Shall I put together a deploy? [01:24] StevenK: Worth a try [01:26] wgrant: according to bug 1040999, branches should always be able to be marked as security fixes, with userdata only available if branch is linked to a userdata bug. so i'm going to make this change [01:26] <_mup_> Bug #1040999: Cannot use branch information type portlet to set type < https://launchpad.net/bugs/1040999 > [01:26] wallyworld_: Sounds reasonable [01:27] but proprietary cannot be done just yet i don't think? [01:27] It can be [01:27] It should always be shown if it's allowed [01:27] Like Public [01:27] Hm [01:27] Actually [01:27] We can't really hide userdata until nobody's using BVPs [01:28] it's always there now, but the code comments say it should only be shown if branch linked to proprietary bug [01:28] That's meant to apply to Public Security, Private Security and Private [01:28] Not Proprietary [01:28] * wgrant checks the comments [01:29] Oh [01:29] # Once Proprietary is fully deployed, it should be added here. [01:29] "it" == Private there [01:29] I must have removed a line describing why Private wasn't included [01:30] ok, i'll fix the comment [01:30] Thanks. [01:30] The idea is that we need to show Private now since it's what BVPs use for privacy [01:30] so just to confirm, userdata is to be updated once bvps go away [01:30] But once everyone's using Proprietary, Private is no longer going to be common at all for branches [01:30] wgrant: it's being stubborn, but looked at. [01:31] spm: Thanks [01:31] wgrant: The amount of cowboys is terrible :-( [01:31] Yeah... [01:31] All of them have landed, though [01:33] Only one code changes [01:33] -s [01:33] Rest are security.cfg [01:33] So we can ndt without a problem [01:33] Um [01:33] Though [01:33] StevenK: Have you checked for new DB perms? [01:34] I have not. [01:34] * wgrant does so [01:34] garbo is one I can think of, I think [01:34] Oh [01:34] Can't pull [01:34] Blah [01:34] Hahaha [01:34] * wgrant does it manually [01:35] wgrant: Is this going to involve a second call to Optarse? [01:35] Already done [01:37] Two sets of DB perms [01:38] garbo and what else? [01:41] wgrant: we believe that's back [01:42] StevenK, webops: There's an SQL request at the usual place to manually apply this nodowntime's DB security changes [01:43] Which will take us up to 5 live cowboys :) [01:44] ./ignore wgrant [01:45] spm: marid looks healthy again, thanks [01:45] wgrant: Do you even have an ETA from them? [01:46] No [01:46] Because routing to Europe is hard or something. [01:46] Maybe I can convince a GSA to check what the melons think of the BGP state [01:46] L3's looking glass looks fine [01:47] Maybe Datahop is breaking things [01:52] urls for multi-task bugs are weird [01:53] I entered a bug on maas, url has maas in it. Add a task for cloud-init and the url changes to one for cloud-init [01:55] bigjools: Right, when you add a new task it sends you to the bug in that context [03:01] wgrant: No test failure for me. :-( [03:02] StevenK: auditor today? [03:09] Haha [03:10] wgrant: http://pastebin.ubuntu.com/1169187/ == no failure after make schema [03:13] StevenK: The problem is creating the link from a --fixes [03:18] wgrant: I thought that just linked the bug? [03:18] StevenK: Yes [03:19] But it's the bug linking that crashes [03:19] Not scanning a branch with a linked bug [03:19] You can probably reproduce by switching to the DB user before calling linkBug [03:25] wgrant: Calling db_branch.linkBug() in the with dbuser block == no crash [03:28] StevenK: Hm, possibly it's not calling the notify methods? [03:28] I forget where in the traceback it died [03:28] But I linked the OOPS in the bug [03:30] wgrant: Ah, yes, that would likely be it. [03:31] Actually, I think it's because the project that the bug is created has no structural subscribers. [03:31] And maybe no notification [03:37] That could do it too [04:41] wgrant: Right, I'm hooking it into the bug linking bit, I needed a revprop with the right format. [04:41] But still no failure, which is annoying. [05:35] Oh, hah. My contains-to-match trips over test_getAllPermissions_constant_query_count [05:46] Heh [05:47] wgrant: Still no failure. :-( [05:47] wgrant: psql:tmp/wg.sql:8: ERROR: cannot execute GRANT in a read-only transaction [05:47] spm: tThat's no druk [05:47] oh wait. nm. my bad. wrong server. [05:47] yah [05:47] I blame mondays. [05:48] I blame Optus/Datahop/NTT :) [05:48] ad that tomorrow is tuesday. and yesterday being sunday. [05:48] For everything [05:48] wgrant: applied [05:48] spm: Thank [05:49] s [05:50] Hopefully you can now ndt without the world burning down [05:50] more than it already is [05:51] spm: http://images.ucomics.com/comics/ga/1991/ga910304.gif [06:13] woah [06:13] ppa queue almost caught up [06:19] Wah, still no failure. [06:22] Ah, the job running code isn't sending a notification. [06:26] wgrant: I think this test is being defeated by caching [06:30] StevenK: :( [06:30] StevenK: Then clear the cache :) [06:31] Which doesn't help either [06:31] Sigh [06:36] wgrant: Store.of(obj).invalidate() only invalidates only that object? What if I want to invalidate everything? [06:43] StevenK: That invalidates the whole store [06:49] Then I'm not sure why notify isn't calling back into getBugNotificationRecipients :-( [06:56] wgrant: The notify(ObjectCreatedEvent(bug_branch)) [06:56] line in linkBranch() is directly implicated in the OOPS, but my test causes it to notify noone [06:56] You've verified that there are adequate subs to the bug? [06:57] I've added a direct subscriber with an APG [06:57] I'm trying to work out why that notify call is deciding to do nothing [06:58] And failing, I might add [07:10] wgrant: The bugchange ignores private branches. The notification code has to run before the branch is made private. [07:16] StevenK: Why is the branch private? [07:16] Because I was forcing it to be. [07:17] Ah [07:17] (and yes, it does ignore private branches -- I reported that leak a few years ago :)) [07:18] wgrant: BTF isn't in branchscanner's security.cfg, too [07:29] StevenK: It probably inherits that [07:29] eg. from write or something [07:31] Yeah, from write [07:31] Ah. I think APG is fine, and we have a test that will blow up if that check changes. [07:43] Oh, sigh. [07:43] I bet the scanner has cursed this branch [07:54] * StevenK stabs the branch scanner over and over. [07:58] I have mail [07:58] So I take it you uncursed it [07:59] No, rename and push again dance. [07:59] You do love to crush my hopes and dreams. [07:59] And putting a bloody knife in a Express Post and addressing it to celeryd@ackee [08:00] wgrant: Well, I do need a hobby ... [08:00] StevenK: Care to check that the bug actually gets linked? [08:01] That's probably a good idea. [08:11] good morning === adeuring changed the topic of #launchpad-dev to: http://dev.launchpad.net/ | On call reviewer: - | Firefighting: - | Critical bugs: 4.0*10^2 [08:13] wgrant: Done. Look again? [08:19] StevenK: r=me [08:19] thanks === almaisan-away is now known as al-maisan [08:43] wgrant: Do you think http://paste.ubuntu.com/1169485/ will work? [08:45] wgrant: I'm thinking this behaviour makes the improved FDT process much simpler. I just need to stop the slaves replaying WAL, disable master connections, apply patches to the master, enable master connections, disable slave connections, reenable replication, wait for sync, back to normal. [08:46] Which I think I can do without swapping pgbouncer config files around, which seems fragile. [08:50] stub: That was exactly the process I had in mind, but let me read the code... [08:52] stub: I think that would work [08:52] But we'd want to go a bit further eventually :) [08:53] A bit of refactoring to allow generic support for fallbacks would probably make it all a bit nicer [08:53] In what way? We can also cause master requests to get a slave, which is reinventing lp's read-only mode. [08:54] stub: Well, webapp and API requests always use the master [08:54] Erm [08:54] Webapp requests in recently-POSTed sessions [08:54] Because they want up to date data [08:54] But then we have a little risk with scripts, as we are deliberately returning a broken result. [08:54] But if the master's not available then they should fall back [08:55] xmlrpc-private also usually uses the master policy [08:55] Because it wants to be as up to date as possible [08:55] Yeah, so we can do that for all master requests, which would be a lie. Or make a master + fallback policy for them, and switch them to using that policy. [08:55] Right, I think we want a MasterIfYouCan policy which all those use [08:56] So the slave policy should always allow fallback to master [08:56] the masterifyoucan policy can always fall back to a slave [08:56] and the master policy just fails [08:56] Yes, slave falling back to master is documented as allowed. [08:57] Oh right, I think it even already does that [08:57] It must [08:59] I don't think we do that dynamically anywhere [08:59] So I think default_flavor = MAIN_FLAVOR becomes eg. flavours = [MAIN_FLAVOUR, SLAVE_FLAVOUR] [08:59] stub: I thought the slave policy respected lag [08:59] But I can't remember exactly. [09:00] oh yes [09:00] Only in the LaunchpadPolicy [09:00] Indeed [09:00] SlaveDatabasePolicy doesn't respect lag [09:00] That's probably a bad idea [09:01] We only choose the default based on lag. [09:01] Right, and only in LaunchpadDatabasePolicy [09:01] master requests still get a master if explicit, and slave requests still get a slave if explicit, no matter lag. [09:01] Oh hm [09:01] True [09:01] That sucks [09:01] So [09:02] I think most of dbpolicy.py wants a bit of a rethink [09:02] and [09:02] most importantly [09:02] a de-Americanisation [09:02] :) [09:02] Because there's not much reason to ever not respect lag [09:02] Lag should probably be treated as failure [09:03] Although not failed enough that it won't use it as a last resort [09:03] I think there is plenty of stuff that is happy using a slave even if it is an hour behind, and we raise alerts when things are 5 minutes behind [09:05] True [09:06] So yeah [09:06] xmlrpc [09:06] xmlrpc-private [09:06] recently-POSTed webapp [09:06] and API [09:06] probably all want to fall back to slaves [09:06] Webapp writes shouldn't [09:06] So we need a new policy [09:06] Try slave first, fallback to master ;) [09:07] Right, that's correct for webapp [09:07] But xmlrpc probably wants the opposite [09:07] Or at least a very low lag limit [09:07] I'm joking there. [09:07] I'd like to try to use the LaunchpadDatabasePolicy logic if there is a session cookie [09:08] Right, that logic is still good [09:08] Except that that should only influence the default [09:08] And I'm still annoyed nobody would put in a session token to the webapi, killing its scalability. But I suspect a lot of clients give us a cookie anyway. [09:09] I think the way forward is to try this with just slave fallback, which is the original ppa use case. Get the bugs ironed out on the production side before complicating things further. [09:10] I'm just worried that we're complicating things unnecessarily by adding a hacky single-case fallback [09:11] We only have 2 types of connections, 3 if you count 'DEFAULT'. We don't really need a generic framework. [09:11] Or do you mean this shouldn't be in the BaseDatabasePolicy? [09:12] Right, I don't think this belongs in BaseDatabasePolicy [09:12] I'm not quite sure where is better [09:12] I can put it in SlaveDatabasePolicy and SlaveOnlyDatabasePolicy [09:12] (also, am I missing something or do you try to reretrieve the same store there? you don't change the flavour) [09:13] Which means it'll just try to regrab the slave [09:13] typo [09:13] Not tried actually using this yet, just thinking through the idea :) [09:14] Heh [09:14] * wgrant foods [09:46] wgrant: stub you guys doing anything to LP right now? getting timeouts doing the licience review [09:46] (Error ID: OOPS-e3753aed5cfe86fe227192e43be904c1) [09:46] nope [09:46] hmm [09:56] Bah. Need to rethink this, again. DBPolicy will happily hand out stores from the ZStorm cache even if they won't work, and I don't want to test if connections work every time the policy is invoked. [10:06] stub: Hm [10:06] stub: Well [10:06] stub: It should be done the same way as the lag check, right? [10:06] I forget at what stage that is done, but whatever it is it's right at the start of the request [10:07] And for non-request-based stuff we probably just want to switch when a connection fails, maybe? [10:07] Although that makes it harder to fail back [10:07] We might not have a request [10:07] I think I can ask the store if it is in a disconnected state or not before handing it out. If it is disconnected, attempt reconnection [10:07] Yeah [10:08] So a script running during fdt will get disconnected and need to handle that. And when it handles it, it will get the master store if it asks for the slave. [10:08] Just need to wade through to see how to detect disconnected state, and to force a reconnection attempt. [10:11] stub: Yeah, we may just have to wait for a disconnectionerror to be raised, I suspect [10:11] And teach stuff to deal === benji changed the topic of #launchpad-dev to: http://dev.launchpad.net/ | On call reviewer: benji | Firefighting: - | Critical bugs: 4.0*10^2 === al-maisan is now known as almaisan-away [13:20] abentley, adeuring, rick_h_ -- let's be ready for a longer stand-up today, to let rick_h_ lead us through the mockups he has. [13:21] deryck: okay. [13:21] party [13:21] drink refill before the meeting, got it [13:22] rick_h_: hey, don't party too hard :-) [13:49] jelmer: I added a card to track the translations stuff, but I don't have any specific insight into it. [13:49] My first guess is that there is an issue with a cron job that is supposed to be running. [13:49] jtv is someone you can poke for translations background, but he doesn't necessarily know more intimate details if it is an operational issue [13:49] wgrant is generally the person with the most ops ideas [15:25] gary_poster: ping, hazmat ping'd me about looking over their YUI work on the juju js app/ui and deryck mentioned that since you guys were coming into that work should someone from your squad take part in the discussion [15:25] bah [15:28] rick_h_, hey thank you. why the bah? It would be great to be a part of it, I think, though I'll check with hazmat [15:30] jam: jelmer jtv will be on later if that helps, in the mean time I'm going to put ana annoucement out on Twitter and places as we're getting more bugs/questions logged about it [15:30] it's added to the topic as well in case people ask, that way you're not under as much pressure to find an answer [15:30] abentley, ready for call? [15:31] stand-up hangout is fine [15:31] deryck: sure. === bac- is now known as bac === salgado is now known as salgado-lunch === almaisan-away is now known as al-maisan === al-maisan is now known as almaisan-away [15:59] hiya someone help me for a momen, bug https://bugs.launchpad.net/launchpad/+bug/1041864 I wanted to change the URL for the import, but when I do i get invalid as it's being used elsewhere and I've never seen that issue before :/ [15:59] <_mup_> Bug #1041864: Badly named weston import < https://launchpad.net/bugs/1041864 > [16:18] czajkowski: Hm.. I don't think the user wants the import URL change [16:18] +d [16:19] Can someone look up OOPS-eb261d3e309c39d6948f60de23422af9 for me? [16:20] maxb: loaded, the user isn't a member of the team [16:21] rick_h_: you're faster than I was [16:21] maxb: http://pastebin.ubuntu.com/1170166/ [16:23] Oh, right, this is because ~vcs-imports members has slightly weird hybrid edit permissions on code imports, I remember now === salgado-lunch is now known as salgado === Beret- is now known as Beret [17:47] czajkowski, jelmer: will be at least 8 more hours before I'm here — provided I get well enough. What's the crisis? [17:48] jtv: translation imports seem to have stopped [17:49] jtv: https://bugs.launchpad.net/launchpad/+bug/1041858 [17:49] <_mup_> Bug #1041858: No daily translation export anymore < https://launchpad.net/bugs/1041858 > [17:49] Import or export? [17:49] export sorry [17:50] And it's not the normal exports, but the exports to branches, I see. [17:50] Now, there's always a few exports that are skipped because the branches are locked, or there are concurrent translation updates that the exporter doesn't want to overwrite, etc. [17:51] So there's a big difference between “several people haven't seen it work” and “it's stopped.” Do we know which it is? [17:52] https://answers.launchpad.net/launchpad/+question/206912 https://answers.launchpad.net/launchpad/+question/206948 [17:52] jtv: I asked jelmer to look into it today [17:52] not sure he made progress or what update he got with it [17:53] If there's no breakthrough, chances are it's hard to diagnose — which most likely means a crash in a C-level library. Might be worth finding out if the outage coincides with an upgrade. [17:55] Hmm the log on crowberry simply stops on the 17th. We haven't disabled the whole cron job by any chance? [17:55] jtv: no clue :s [17:56] jtv: but as soon as jelmer and jam come on tormorrow will get them to loook [17:56] or ask wgrant in a bit when he arrives [17:56] august 17th was when we did a lot of the DC move [17:56] There's more log on taotie. [17:57] I'll see if anything jumps out, but will leave it to the Dutch Cavalry otherwise then. [17:57] thanks jtv [18:10] czajkowski: I can give you my quick & vague impression… The exporter writes to branches, which triggers branch-scan jobs (to make Launchpad notice the branch changes). These seem to go into Celery now, via a RabbitMQ queue. It looks as if the connection to RabbitMQ started breaking on the 18th (possibly from a change on the 17th, since this job runs in early morning UTC) and eventually on the 21st somebody may just have killed and disabled the [18:10] Well, not disabled exactly. Maybe the lock file from the aborted run is still there; I seem to remember that logging is a bit asymmetrical when it comes to those lock files. [18:11] It may mean that that instance from the 21st is still hanging around trying to request a branch scan for Stellarium, and the subsequent runs are quietly giving up as they notice that. [19:05] flacoste: o/ - just settling Cynthia a little, back soon [19:07] lifeless_: o/ === lifeless_ is now known as lifeless [19:13] flacoste: ok, ready when you are. [19:45] * deryck heads home, back online shortly === salgado is now known as salgado-afk === salgado-afk is now known as salgado === BradCrittenden is now known as bac === benji changed the topic of #launchpad-dev to: http://dev.launchpad.net/ | On call reviewer: - | Firefighting: - | Critical bugs: 4.0*10^2 [22:02] wgrant: StevenK: mumble? === Ursinha` is now known as Ursinha === jelmer_ is now known as jelmer === jelmer is now known as Guest90165 === Guest90165 is now known as jelmer [23:15] http crackheads of the world, does bug 1040689 strike you as add? [23:15] s/add/odd/ [23:15] <_mup_> Bug #1040689: add api to refresh an existing token < https://launchpad.net/bugs/1040689 > [23:16] lifeless: Hello replay attack?