=== poolie_ is now known as poolie
=== wgrant changed the topic of #launchpad-dev to: https://dev.launchpad.net/ | On call reviewer: - | Critical bugs: 232 - 0:[######=_]:256 [02:18] eyeballs requested: https://dev.launchpad.net/PolicyAndProcess/DatabaseSchemaChangesProcess [02:21] lifeless: Have we analysed recent patches to see how many of them are doable without simultaenous changes? [02:22] wgrant: not rigorously, but we haven't been trying to do it either, so what we've done isn't a baseline [02:22] wgrant: we know that all the ha services out there manage [02:24] lifeless: I think we should before we go much further, so we don't get ourselves into trouble. [02:24] Since we're throwing away read-only (which I still find fairly unwise) [02:25] lifeless: about your document ^^^: in the Deploying Patches section, "After successful QA on a patch, ...". I take it that is for hot patches. Cold patches are deployed once a month during out downtime deployment. Shoudl that text say "After QA on a hot patch, ..."? [02:25] wallyworld_: not any more [02:25] wallyworld_: you might want to read yesterdays performance tuesday mail ;) [02:26] wgrant: Do you think the removal of the bazaar-experts celebrity is QA-able? [02:26] lifeless: i read it but it clearly has gone in one ear and straight out the other :-( [02:26] StevenK: Check that you have branch access, possibly check that LOSAs do. [02:27] wgrant: I can change the details of an owned branch, good enough? [02:28] StevenK: Indeed. [02:29] wallyworld_: there are now no scheduled monthly patch windows [02:29] wallyworld_: over the next 4 weeks we'll be bringing up the process for doing short downtime multiple times a week [02:29] wallyworld_: (short => less than 300 seconds) [02:29] wallyworld_: and all patches landing from here on in need to be compatible with that process [02:30] lifeless: ah, right. thanks. for that 300 seconds, lp will be in ro mode? [02:30] this will be a fantasic change [02:31] no, just db connections refused [02:31] going into readonly mode and out again takes an hour [02:31] lifeless: In the current implementation. [02:31] There is nothing that requires that.l [02:31] wgrant: sure, and if someone wants to work on that they can [02:32] Even in the current model, where we have to rebuild the slave because slony is crap. [02:32] Touch file, detach slave, upgrade, remove file, rebuild slave. [02:32] No slower than blocking connections. [02:32] so no [02:32] we're not detaching slaves or rebuilding them. [02:32] Why not? [02:32] thats a cause of about 90% of our downtime delays. [02:32] We can easily do this without downtime... [02:33] I don't object to a great implementation of readonly mode. But readonly mode must not make the downtime longer or riskier. [02:33] Currently it does. [02:33] if db connections are refused for 5 minutes, the users will see page timeouts? [02:33] And its not at all clear to me that that can be fixed on slony. [02:33] Can we tell SSO to GTFO of our DB and move to a sensible replication strategy? [02:34] wgrant: not in the same timeframe as this project ;) [02:34] wallyworld_: they will see a horrible mess in the very first cut, but we'll iterate. [02:34] cool, just checking :-) [02:34] lifeless: "a horrible mess" meaning a maintenance page? [02:34] like, the appservers can show a fail-UFO page when they can't connect to the DB eventually [02:34] * wallyworld_ was also wondering that [02:35] Unlike slony, reloading apache isn't risky. [02:35] wgrant: according to losas it is. [02:35] They are wrong :) [02:35] wgrant: they are the ones dealing with our apaches not restarting cleanly today. [02:35] wgrant: so, I beg to differ. [02:35] We have had a lot of rollout problems, and that is not one of them that I've ever heard of. [02:35] So we have rollout problems that are hidden from us? Yay. [02:35] wgrant: we don't restart apache during our rollouts. [02:36] minor correction. we graceful apache on crowberry. [02:36] ah, one. [02:36] We can't just leave the appservers silently broken for 5 minutes. [02:36] branch rewriter thingybobby [02:37] wgrant: 5 minutes is not the target; its the absolute outer limit [02:37] Perfect is the enemy of good, but terrible is the enemy of not looking like we are useless. [02:37] wgrant: the target is 10-20 seconds [02:38] wgrant: look, if you want to jump in and make the error condition look nice, that would be awesome. [02:38] wgrant: I was clear about what I was proposing on -stakeholders [02:38] wait [02:38] what? [02:38] what's the problem with reloading apache? [02:38] That's what I said. [02:38] How do I get (and reset) the OOPS count in a test? self.getOopsCount() or so? [02:38] re*starting* apache can be problematic [02:38] but reloading is just fine [02:38] we do it *all the time* [02:38] If we can't reload Apache, we have pretty serious problems. [02:39] elmo: spm was telling me a month or so back that our ones don't go cleanly occasionally [02:39] elmo: I didn't get details [02:39] lifeless: restart yes - reload no [02:39] and we wouldn't need a restart to put a maintenance page in place [02:39] ok, good to know. Can a script running on wildcherry drive such a change ? [02:39] So I may not be insane. This is good. [02:40] lifeless: it could, sure [02:40] ok, cool. [02:41] lifeless: I suspect you're mixing problems there. we have a known issue where a reload won't clear a certain problem; necessitating a restart. but that's a different beastie. === almaisan-away is now known as al-maisan === al-maisan is now known as almaisan-away
=== jtv is now known as jtv-brb
=== jtv-brb is now known as jtv
=== almaisan-away is now known as al-maisan === jtv is now known as jtv-eat
=== allenap changed the topic of #launchpad-dev to: https://dev.launchpad.net/ | On call reviewer: allenap | Critical bugs: 232 - 0:[######=_]:256 === henninge is now known as henninge-lunch === al-maisan is now known as almaisan-away
=== jcsackett changed the topic of #launchpad-dev to: https://dev.launchpad.net/ | On call reviewer: allenap, jcsackett | Critical bugs: 232 - 0:[######=_]:256 We also need better metrics on it and Rabbit [11:20] we'll probably look at it agan when DDs are done. [11:20] and when I say "better", I mean "some" :) [11:21] bigjools: quite a lot of stuff then [11:21] bigjools: it'd be nice to be able to see how an external contributor could help. [11:22] jml: the best place would be to help us get metrics since the rest of it's internal [11:23] bigjools: got a list of the things you need & the format you need them in? [11:23] that's a better question for lifeless [11:23] let me dig up some bugs though [11:24] hmm no bugs [11:24] jml: the bugs are all linked from the LEP [11:25] there we go [11:25] lifeless: ok, thanks. [11:40] Project db-devel build #720: FAILURE in 5 hr 13 min: https://lpci.wedontsleep.org/job/db-devel/720/ [11:43] is https://dev.launchpad.net/PolicyAndProcess/DatabaseSchemaChangesProcess active as of now? should I be refactoring my multiarch-translations branch to separate out the schema patch? === almaisan-away is now known as al-maisan === almaisan-away is now known as al-maisan [11:54] bigjools: https://bugs.launchpad.net/launchpad/+bugs?field.tag=fastdowntime [11:54] yes :) [11:54] cjwatson: for your branch, you'll need to land both components on db-devel (because we're not live on the incremental deploy steps yet) [11:54] cjwatson: but you need to land them in separate landings so we know the schema change doesn't break existing code. [11:55] ok, makes sense [11:56] I think I am in love with bzr switch [11:56] how can I break this to my wife [11:56] bigjools: offer her preferred-courtesan status [11:56] bigjools: 'same job, better perks' [11:57] "cour·te·san (kôr t -z n, k r -). n. A woman prostitute" [11:57] she'll love that [11:58] We desperately need an Ubuntu quotes db. === henninge-lunch is now known as hennigne
=== hennigne is now known as henninge [13:55] It was causing an oops [13:55] henninge: ok, then is it possible to create a test showing the oops doesn't happen anymore as part of this branch, to prove it's the fix? [13:55] jcsackett: Sorry, I just realized I forgot to mention more context. [13:56] henninge: no worries. just leads to me bugging you on IRC. :-) [13:56] jcsackett: I could try but it is a test for a corner case. [13:57] henninge: how "corner" are we talking? major pain to set up the testcase? [13:58] jcsackett: I am not sure. There are tests that don't trigger the error so I think something is still missing to create the same situation as in production. [13:58] jcsackett: But I am happy to give it a try. [14:00] jcsackett: did you claim the review? deryck had offered to review it. Just to avoid double work. [14:01] henninge: i was just looking it over. i claimed it, but can abstain in a comment and assign it to deryck, if he has more context to look at it with. :-) [14:01] * deryck doesn't mind jcsackett taking it, but is happy to do it, too [14:01] deryck, henninge: I'm OCR today, so i may as well finish it up. :-) [14:01] jcsackett, henninge -- works for me. :) [14:02] jcsackett: go ahead ;) [14:02] Fresh eyes are nice as well. [14:02] yup [14:04] bigjools: i need to QA https://bugs.launchpad.net/bugs/805634 [14:04] bigjools: what do you suggest? I ask a losa to run populate-archive on qastaging and check the builds priority? [14:05] flacoste: yes, that should work [14:05] flacoste: or staging [14:08] bigjools: any easy way to look at an archives builds from the UI? or should I use API, or poke the DB? [14:08] flacoste: yes it'll appear in the UI [14:08] flacoste: /+archives [14:09] and you can examine the builds [14:11] deryck: chat? === al-maisan is now known as almaisan-away === salgado is now known as salgado-lunch
=== beuno is now known as beuno-lunch
=== beuno-lunch is now known as beuno
=== allenap changed the topic of #launchpad-dev to: https://dev.launchpad.net/ | On call reviewer: jcsackett | Critical bugs: 232 - 0:[######=_]:256 === salgado-lunch is now known as salgado [17:19] Project db-devel build #721: FIXED in 5 hr 39 min: https://lpci.wedontsleep.org/job/db-devel/721/ === matsubara is now known as matsubara-lunch [18:04] deryck, gary_poster: a new escalated bug by ISD: https://bugs.launchpad.net/launchpad/+bug/810626 [18:04] <_mup_> Bug #810626: launchpad should mark required sreg attributes as required < https://launchpad.net/bugs/810626 > [18:04] should be fairly shallow [18:04] flacoste, cool, thanks. [18:04] flacoste, cool, yeah, we were asking you about that one on ops [18:04] but it blocks them deploying their new version as it's a potential source of problems for us [18:05] cool [18:05] gary_poster, what will it be? orange or yellow? rock, paper, scissors, lizard, Spock for it? [18:05] i've only marked the one about making the fields required Critical [18:06] since the other one, is only relevant when we handle multiple OPs [18:06] so I marked it Low [18:06] deryck, whoever gets to first :-) we've got a couple of escalated in progress already [18:06] * flacoste thinks Orange should contribute to the escalated effort ;-) [18:06] indeed we can :) [18:07] Yellow has 3 potential fixes for the week [18:07] * deryck didn't realize yellow had escalated bugs already [18:07] yellow hears "escalated" and comes running! It might be...fun, or something?! [18:08] :-) [18:08] flacoste, gary_poster -- we've got a card on orange next lane now for it. [18:09] cool [18:34] bac: because your getnewcache is based on my json-serialisation, merging from devel creates criss-cross merges and screws up the diffs between my branch and yours. Could you please refrain from that in the future? === matsubara-lunch is now known as matsubara [19:43] gary_poster, I did not find the log. report the bug [19:43] thanks sinzui [19:51] gary_poster: well it should be oopsing [19:51] gary_poster: have you looked for an OOPS yet ? [19:51] gary_poster: changing from stderr to a log file is an RT generally, if its a Launchpad*Script [19:52] lifeless, looked for OOPS: how and where would I do so? === matsubara is now known as matsubara-afk [22:01] s//oughtn't it? [22:02] or is it just a case of, if they don't work hard enough to report it we don't have a special bug for it? [22:04] its not easy to find the right one (we log oops *files* for things we don't count as OOPS. [thats a bug]). [22:04] also, we're still swamped, so its not like we're at the point [yet] of picking up the 3 or 4 unique oopses from the daily report and driving them to zero. [22:18] poolie_: actually, another way to put it is: we track all oopses and well get them all eventually; if soneone wants to jump queue - manually filing a bug for us, thats fine, but then the onus is on them to help us out [22:26] right [22:26] that's pretty much what i though [22:26] t [22:35] oh wow [22:35] sinzui: why ? [22:35] New member: [22:35] (Choose…) [22:35] You can't add a team that doesn't have any active members. [22:35] sinzui: adding ~canonical-launchpad-emeritus to https://launchpad.net/~launchpad-emeritus/+addmember [22:36] The most recent 90 minute downtime was the last ever. [22:36] would be nice [22:47] Project devel build #890: FAILURE in 5 hr 36 min: https://lpci.wedontsleep.org/job/devel/890/ [23:00] lifeless, I have never seen that issue show up. I do not know why Lp wont let you [23:00] m [23:01] Project db-devel build #722: FAILURE in 5 hr 41 min: https://lpci.wedontsleep.org/job/db-devel/722/ [23:04] StevenK, mumble? [23:13] sinzui: I think its a mistake, because a team can be emptied after including in another team [23:13] sinzui: any objection to a jfdi fix ? [23:15] please jfdi [23:22] win 24 [23:23] fail? :) [23:23] * StevenK kicks nigelb :-P [23:24] * nigelb has had a sleepless night :/ [23:25] lifeless: When do you want to talk? [23:25] when I get off the phone with allison :) [23:26] 15? [23:29] Sure [23:29] Attack of the TAs? === wallyworld_ changed the topic of #launchpad-dev to: https://dev.launchpad.net/ | On call reviewer: wallyworld* (jtv) | Critical bugs: 232 - 0:[######=_]:256 [23:40] wgrant: it just came back now [23:41] wgrant: could you do me a favour to help me qa? could you subscribe someone (not you or me) to a branch you own and then change the owner to launchpad and let me know which branch? [23:41] lifeless: Do you still only do Skype? [23:43] wallyworld_: https://code.qastaging.launchpad.net/~launchpad/launchpad/db-devel-merge-fix [23:43] wgrant: awesome thanks [23:44] * wallyworld_ marks his bug as qa-ok [23:44] Thanks! [23:44] s/bug/branch [23:45] * wgrant goes to use canonicaladmin for a while, to acclimatise himself to crap software before using Skype. [23:45] lol [23:46] skype isn't that bad. it has pretty good echo cancellation [23:46] seems to work better than mumble often times [23:46] The echo cancellation is the only thing that is not crap. [23:46] Well, and the NAT traversal. [23:46] they're pretty important features, no? [23:47] Yes, but the software itself is crap. [23:47] Crashy, slow, ugly. [23:48] wgrant: its easiest in a bunch of ways, if you don't mind significant feedback I can do voip [23:48] I have Skype not crashing now. [23:48] So we can use it. [23:48] \o/ [23:59] Hi there. I'd like to import the GDB 7.3 release branch into Launchpad. They use CVS and have a GIT mirror. Who should I ask? (mwhudson is away)