=== Ming is now known as Guest48309 [07:57] is https://wiki.ubuntu.com/Kernel/Dev/KernelGitGuide maybe outdated / incorrect? [07:58] the suggested git clone git://kernel.ubuntu.com/ubuntu/linux-2.6 just gives me "fatal: The remote end hung up unexpectedly" [07:58] Correct. [07:59] i guess it should be git://kernel.ubuntu.com/ubuntu/linux.git [07:59] judging from gitweb, right? [07:59] You want the bit directly above there, where it says you're after git://kernel.ubuntu.com/ubuntu/ubuntu-quantal.git [07:59] RAOF: nah, i picked it on purpose because of the referencing === Ming is now known as Guest78405 [14:58] ogasawara, will the kernel uploaded yesterday (3.5.0-5.5) be the one we'll be sticking with for A3 or is there another planned drop? [15:00] skaet: if upstream v3.5-rc8 lands, we'd like to upload it, but if not we'll likely stick with 3.5.0-5.5. I currently don't have any critical bug fixes queued to warrant another upload at this time. [15:01] ogasawara, thanks. === akgraner` is now known as akgraner [16:33] utlemming, we need to have the discussion here [16:33] utlemming, yo ... [16:33] howdy [16:33] bug #1026690 [16:33] Launchpad bug 1026690 in linux-meta "3.0.0.-23.38-virtual kernel regression kills EC2 instances" [Critical,Confirmed] https://launchpad.net/bugs/1026690 [16:33] smb, ^^ [16:33] yo [16:34] The impact is that all 32-bit EC2 instances that consume that SRU are toast [16:34] I am trying to find out what taht kernel has in and whther its the latest [16:34] Instance-store users will lose all data, and EBS users will have a painful recovery [16:36] utlemming: we are looking [16:36] bjf: thanks [16:41] utlemming, Was the last version working? (-23.37 I think) [16:46] smb: yes, that is the latest [16:46] smb: we didn't see problems before .38 [16:47] 22.36 was the previous. 23.37 was a dud. [16:47] Hm, actually would be one before because that one is a no-change uplaod [16:47] yeah [16:47] And that one only had one xen specific patch in which would filter some cpu feature bits... [16:47] Does not sound likely to be relevant there... [16:48] And the cmpxchg fix I thought of was in before that or with it [16:50] Ah [16:50] Think I may have found something, there was a fix for a fix in precise but does not seem to have made it to stable of 3.0 [16:51] thp: avoid atomic64_read in pmd_read_atomic for 32bit PAE [16:53] Should I violently pull that binary from -updates while you guys sort this out? [16:54] infinity: +1 [16:55] bjf, herton If we could pick a77da6dcf81fb3346d3449a4925515343c9f18b9 from precise and do a rebuild? [16:56] Removing packages: [16:56] linux-image-3.0.0-23-virtual 3.0.0-23.38 in oneiric i386 [16:56] Comment: Broken SRU; blows up EC2 instances [16:56] Remove [y|N]? [16:56] ^-- Should do? [16:56] (It'll mean people get an error from apt on upgrade, but sure beats the alternative) [16:57] infinity: if there was a work around, I'd say leave it, but given that it kills the instance, leaving it is inviting user pain [16:57] * infinity nods. [16:57] kernel team, do disagree about pulling it? [16:58] Too late if they do. [16:58] lol [16:58] bjf, http://bazaar.launchpad.net/~colin-king/ecryptfs/test-fixes/revision/705 [16:59] herton, sorry 6e60d7d7667a47b1b8760a45d95547799f4df2c5 [16:59] infinity: will that be reflected in the meta-data later? [16:59] utlemming: It'll drop out of the Packages file on the next mirror pulse. [16:59] (And off disk) [17:00] infinity: great. I'm just thinking of how long before it propigates to the S3 mirrors. [17:00] tyhicks, i've got a couple of eCryptfs patches that fix up some issues we see on our automatic eCryptfs tests, can you pull or merge these sometime for us? [17:00] utlemming, #is maintain those i think ... perhaps they can expedite [17:01] apw: yup, I'll go over there and ask them [17:03] IS can 403 files on the mirrors they own. That said, by the time they finish that, this pulse might happen. [17:03] (It'll happen in about 25m) [17:03] skaet: do you have an incident report started ? [17:04] infinity, indeed. and thankfully it is oneiric so there should be a much lower penetration [17:04] "https://wiki.ubuntu.com/IncidentReports/2012-07-19-oneiric-kernel-regression-kills-EC2-instances [17:04] bjf, ^ just started. [17:07] apw: Hopefully, yeah. [17:08] bjf: If you guys can fast-track an SRU with just this fix, I'll be happy to help babysit it through the process. [17:08] infinity: ack [17:09] cking: Yeah, sorry I haven't been able to get to it yet. Looking now. [17:09] Should be able to skip most everything except a "does it boot and, like, contain useful files" regression test, and obviously testing that it fixes the EC2 breakage. [17:12] utlemming, this issue has only been seen with O instances right ? [17:12] bjf: And it might not be outlandish to suggest that "spin up an EC2 instance on $release and upgrade to the -proposed kernel" should be part of the standard QA process here. ;) [17:13] gema, pgraner, ^ [17:14] bjf, I thought we did that already as part of the SRU testing, I'm waiting on hggdh to confirm [17:14] apw: checking all the others now [17:14] bjf, and for some reason if we don't then we will [17:14] (low priority, feel free to ignore me during the crisis) anyone know why we don't set CONFIG_EARLY_PRINTK_DBGP in our kernel configs? [17:15] slangasek, will look [17:15] apw: confirming the others now. [17:15] apw: it happens that this would be handy for me at the moment while trying to debug UEFI boot failures at Plug Fest; but I have no idea if it would be sensible to enable it generally [17:20] apw: precise is unaffected with 3.2.0.26.28 [17:20] slangasek, ok that sounds like something that might be safe to have on if the kit is missing, and it sunds like cking may have one. is there a bug on this, i want to get cking to investigate [17:21] utlemming, we believe we've identified the bad commit as well as the commit that fixes the issue [17:21] apw: haven't open a bug yet, I can do so [17:21] utlemming, this problem seems to be unique to Oneiric kernels which only have the bad commit and not the fix [17:21] cking: done - thanks! [17:21] utlemming, we are working to verify all of that [17:21] slangasek, awsome, get me the number and i'll get it looked into. efi not working in early boot is likley to be a high probabality in the next cycle or two [17:22] bjf: glad to hear. If you have a build of the fix, I'll test it too [17:22] pgraner, bjf: we fire off EC2 from the ./current/, not from $RELEASE [17:24] hggdh: do we do any kernel upgrade testing ? [17:25] bjf: we fire off an existing ec2 image (from the current set), enable -proposed, and run a dist-upgrade; then reboot, and run the tests [17:26] hggdh: On i386 as well? This was i386-specific, so an amd64 test wouldn't have caught it. [17:26] hggh: we caught it here: https://jenkins.qa.ubuntu.com/view/ec2%20AMI%20Testing/view/Overview/job/oneiric-server-ec2-daily/ [17:27] which tests the daily ec2 image builds [17:27] utlemming, Can you do quick verifications when I get you a test kernel build? [17:27] smb: yes...and gladly [17:27] infinity: on both i386 and amd64 [17:27] hggdh: Curious that this slipped through, then, given that it appears to be universally fatal. :/ [17:28] hggdh: do you have logs of that test? [17:28] indeed [17:28] utlemming, ok, give me a couple of minutes. [17:29] hggdh: Can you re-do your test by hand before the mirrors pulse the kernel out of existence, and prove/show that the test worked for you? It would be valuable to know how or why this slipped through the cracks. [17:29] infinity, +1 [17:30] infinity, utlemming: I am re-running them, but I guess I already found the reason -- I mistakenly got a amd64 run in place of the i386 [17:32] and I did get the bad kernel on the dist-upgrade [17:35] hggdh, sounds like we need a arch/kernelversion/machine confirmation test as the first test in every test set [17:36] hggdh, if these were matrix tests, that might make it more difficult to have this case ? [17:36] Not that we can test on all hardware (sadly), but at least being able to test Xen and KVM machines seems easy enough. [17:37] bjf: sort of. I certainly see the reason to have a coverage like in the current ec2 tests (which we do not have right now) [17:38] (meaning a mix of regions/types/etc) [17:38] utlemming, should QA be running the ec2 tests that you found this issue with ? [17:40] bjf: well, how we found the issue was making sure that our daily builds work, not the package set [17:40] bjf: so if they have proposed covered, then I think not [17:41] utlemming: actually, yes (in this case), since you grab the current (daily) kernel, and it was updated already [17:41] but this is a post facto finding [17:42] hgggh: right [17:54] pgraner: who on your team should i add to the incident report ? [17:56] bjf: I [17:58] infinity: the idea is to expand the kernel tests as much as possible (and we finally can test on both Intel and AMD) [17:58] for bare-metal, of course [18:00] Oh, crap. Does anyone use the lts-backports of these images? [18:00] linux-image-3.0.0-23-virtual | 3.0.0-23.38~lucid1 | lucid-updates | amd64, i386 [18:01] linux-image-3.0.0-23-virtual | 3.0.0-23.38 | oneiric-updates | amd64 [18:01] ^--- I should probably remove the ~lucid one as well, eh? :P [18:01] (And we should re-do that backport) [18:01] infinity: also -- I confirm the kernel barfs [18:01] utlemming, http://people.canonical.com/~smb/lp1026690/ [18:01] smb: testing now [18:01] hggdh: Good, good. [18:02] smb / bjf: Confirm that there will be a new lts-backport fasktracked to go with the oneiric update? (and I'll pull the mirror one) [18:03] infinity: ack, will happen [18:06] smb: [18:06] dpkg: dependency problems prevent configuration of linux-image-3.0.0-23-virtual: [18:06] linux-image-3.0.0-23-virtual depends on wireless-crda; however: [18:06] Package wireless-crda is not installed. [18:07] * smb looks confused [18:07] I would have thought we never removed that dependency for O [18:09] smb: I'm not sure what happened there...I threw that instance away and tried again, and it now installed [18:09] with out that error [18:10] I'm rebooting now [18:10] smb: fix confirmed [18:10] utlemming, uname -a it showing the +102...? [18:10] smb: Linux ip-10-110-55-78 3.0.0-23-virtual #38+1026690v1 SMP Thu Jul 19 17:33:51 UTC 2012 i686 i686 i386 GNU/Linux [18:11] utlemming, Ok, looks good then [18:11] smb: boot log looks good too [18:13] hggdh: can you test smb's kernel as well to see if this fixes the issue for you as well? [18:14] bjf: doing it now [18:27] smb: http://pastebin.ubuntu.com/1100576/ [18:28] smb: duh, missed a file [18:28] smb: no, did not [18:28] :-) [18:29] hggdh, Not sure what you or utlemming do, the crda dependency should have been there all time [18:29] smb: this one is missing linux-headers [18:31] hggdh, a second, building the all headers [18:33] hggdh, ok, its there now [18:34] smb: thank you [18:40] smb, bjf, infinity: it does reboot ok, running the tests now (should take ~ 1 hour) [18:42] We'll have to re-do all the testing again after we build the proper package. Call me crazy, but "it boots properly" is probably enough to start the SRU PPA process? [18:43] bjf: git://kernel.ubuntu.com/henrix/ubuntu-oneiric.git [18:43] ups [18:43] smb: ^ [18:43] bjf: And when that PPA upload lands, poke me, I'll make sure to score it through the roof on PPC to get it pushed through. [18:43] henrix, looking [18:44] henrix: Or you. ;) [18:44] infinity: we're on it [18:44] infinity: ack, will do. [18:46] henrix, looks ok. [18:46] smb: cool [18:46] infinity, With the last version made going away, will this upload need -v to include all the changes from before? [18:47] henrix, ^ [19:00] smb: uploaded pkgs into zinc:~henrix/for-signing [19:02] smb: No. [19:02] smb: -v is just for the .changes [19:03] infinity, That is what I tried to say. But we also decided that its probably not needed as the states of bugs and that all may already be right after the last upload [19:03] (And no point in that, since we long ago accepted the previous version) [19:03] This is a new bug, new version, new changelog, the -v would be pointless. [19:04] (Not that it would hurt terribly, it would just try to re-close all the old bugs again) [19:04] infinity, Right, and agreed [19:06] apw: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1026761 [19:06] Ubuntu bug 1026761 in linux "Please enable CONFIG_EARLY_PRINTK_DBGP" [Undecided,New] [19:07] henrix, should be signed now === yofel_ is now known as yofel [19:13] infinity: i'm about to upload the oneiric kernel [19:13] henrix: Kay. [19:14] infinity: done! [19:17] smb: Say, why are you "stephan-bader-canonical" instead of stealing ~smb? [19:17] smb: (We could totally get that unused account renamed and given to you) [19:17] stefan, even. [19:24] infinity, Just because someone created that name for me (and iirc smb is already taken by someone ) and when thinking of getting the smb removed and me taking it over (because that somebody did not use it for years) I was a bit afraid of lp messup when transferring ids [19:26] I read some pages about it and some said you may loose all your groups if it goes wrong [19:26] (not so much caring about any karma ) [19:26] infinity, apw would certainly approve me changing given the amount of cursing I hear whenever he has to type the long thing [19:27] ;) [19:30] cking, blktrace was the package [19:30] smb, thanks, I will look at that :-) [19:37] smb: It all works except for published PPAs. And I suspect you wouldn't care about losing PPAs and having to recreate them. [19:37] infinity: flash-kernel is broken [19:37] ppisati: More broken than usual? [19:37] infinity, No, not really... Cannot even remember when I updated them last [19:38] smb: Yeah, we should talk to webops and do a hostile takeover of ~smb for you, then. [19:38] smb: The only other thing that would go wonky is work item tracking, but reclaiming all your work items wouldn't be rocket science. [19:39] infinity, Right, and if I would loose some work items... hm, actually sounds rather good... :) [19:39] smb: Hahaha. [19:40] infinity: http://paste.ubuntu.com/1100710/ [19:40] * smb will think about it when he gets back [19:40] infinity: it doesn't honor any /boot/boot.script anymore [19:41] ppisati: Special. I can look at that later, perhaps, or you can bug Oli when he's up. [19:42] infinity: i'll open a bug and assign it to... you/Oli? [19:42] * smb thinks its past GBT now... [19:44] ppisati: Open it and assign it to Oli, but give me the bug# on IRC, and I'll see if I can find time to poke it before he does. [19:44] infinity: ack [19:44] smb: That's missing an L. [19:44] infinity, Hm, or a D [19:45] infinity, Depending whether we think about the same thing [19:47] smb: I was thinking LGBT. [19:47] infinity, Lovely German Beer Time? :) [19:49] smb: Yeah, not quite. ;) [19:50] infinity, Heh, yeah I see the web disagrees with me... :) [19:53] bug 1026780 [19:53] Launchpad bug 1026780 in flash-kernel "3.0~rc.4ubuntu4 doesn't honor bootargs from /boot/boot.script anymore" [Undecided,New] https://launchpad.net/bugs/1026780 [19:53] infinity: ^ [19:59] infinity: ah, and btw, omap4 server doesn't start any console after the first reboot [20:00] infinity: and since it's a server installation, with no X&c, it's useful as a heater in summer time [20:04] bjf, smb, infinity: tests passed except for seccomp [20:04] hggdh: ack, thanks [20:06] hggdh: Was the "except for seccomp" bit expected? [20:07] infinity: http://pastebin.ubuntu.com/1100755/ [20:07] infinity: we have a known issue there; I do not know it if classifies as a regression [20:07] Mmkay. [20:07] jjohansen: ^ [20:18] infinity: are you sure you didn't scored PPC build in the with wrong signal ('-' instead of a '+')? :) [20:28] henrix: It's scored way up, it was just behind a 2h openldap build. [20:28] henrix: The whole mess will take >5h after that anyway, so I figured it wasn't worth killing the openldap-in-progress over. [20:28] infinity: yeah, np. just joking :) [20:29] infinity: it always take *ages* anyway [20:30] We're working on the whole "new PPC hardware" thing, which should solve some of your sadness there. [20:36] infinity, what's the schedule for that comming online? [20:37] bjf: It needs to be purchased still, so I dunno. There was some delay on my end with research, but it's been handed off to pgraner now. [20:37] (That is, I've said "hey, we should buy this" and washed my hands of it) [20:38] infinity, ah, it's one of those "in 5 years ..." things :-) [20:38] bjf: Not that it'll be a drastic improvement for the kernel use-case. The pandas are still neck-and-neck with the current PPC hardware anyway. [20:38] But it will definitely keep PPC from being backlogged anymore. [20:39] Moving from 2-CPU 2GHz G5s to 6-core 3.7GHz POWER7s (well, if they buy what I asked) should give PPC breathing room for another decade. [20:39] At least, if the last decade with the current hardware is anything to go by. [20:40] (My god, we've been at this for a while...) [20:47] bjf, you willing to take point on the incident report? I need to run some errands, and want to make sure we have explicit handoff ;) Since you've been making most of the updates recently... [20:48] hggdh: regression [20:51] hggdh: test the previous kernel, IFF that has the same failure we can call it not a regression [20:51] skaet: ack [20:51] jjohansen: running it [20:51] thanks bjf [20:51] bjf: ^ [20:51] hggdh: then we can investigate why it hasn't been detected before [21:08] smb: could you have a look at bug 1026251 regarding kerneloops? [21:08] Launchpad bug 1026251 in kerneloops "kerneloops assert failure: *** glibc detected *** /usr/sbin/kerneloops: free(): invalid next size (normal): 0x099254a0 ***" [Medium,Triaged] https://launchpad.net/bugs/1026251 [21:09] hggdh: ack [21:24] jjohansen, bjf: this seccomp error is present since the -17 kernel, so it is *NOT* a regression [21:24] jjohansen, bjf: only fails on m1.small on EC2, no record of failures on KVM or bare-metal [21:25] hggdh: okay, we will have to investigate why further, but for now lets just say it isn't a regression [21:26] infinity: ^ not a regression [21:27] jjohansen, bjf: I reproduced it on the -17 (taken from the official Ubuntu images on AWS); I am still looking for an earlier kernel; NOW -- the saved logs I have from the original -17 run DO NOT show this error [21:28] bjf: shiny. [21:28] hggdh, you have been testing oneiric i386 (non ec2) haven't you? [21:28] bjf: yes, as I said above, on KVM and bare-metal [21:28] apw: ogasawara: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/759913 [21:28] Ubuntu bug 759913 in linux "musb driver doesn't attach on omap3" [Medium,Fix released] [21:29] jjohansen, bjf: so it is starting to sound as something new in the test code [21:40] jjohansen, bjf: I found it on the original -12 test. So it was there, then vanished, then reappeared. But, again, the original -17 run does not have it, and my just-finihsed rerun of -17 shows it [21:40] hggdh: okay thanks [21:40] I dimly remember talking with kees about it at the time [22:01] hggdh: can you file a bug with the failure and boot logs etc, so we can use it to track this issue [22:01] jjohansen: against the QRT, right? [22:02] ppisati: CONFIG_USB_SERIAL_SAFE_PADDED and CONFIG_USB_SISUSBVGA_CON are both =y only on omap4, care to investigate? [22:02] hggdh: sure that works [22:02] ppisati: it's set =n for all other flavors and arch's [22:03] ppisati: and CONFIG_USB_USBNET [22:20] ppisati: CONFIG_USB_INVENTRA_DMA=n on omap4, should it be =y? its =y for omap3 [22:22] ppisati: and CONFIG_USB_MUSB_OMAP2PLUS needs investigation for omap3 and omap4 [22:22] ppisati: just fyi, this is what we're looking at -> http://kernel.ubuntu.com/~kernel-ppa/configs/quantal/reviews/Portland.html [22:31] ogasawara: I can't be sure without deeper investigation, but CONFIG_USB_USBNET might be required for the hackish usb-boot "treat the device as a USB gadget" thing that some of these boards do. [22:31] (Though, now that I think about it, I can't see why that would be) [22:31] So maybe I'm just a victim of waking up grumpy and needing it to be beer o'clock. [22:32] infinity: doing an entire day of config review is driving me to drink [22:32] nope [22:32] USBNET is dependency for SMSC95XX [22:33] ogasawara: So, it's driving you to the status quo? ;) [22:34] USB_SISUSBVGA_CON is totally useless [22:34] it's about VGA/EGA framebuffer console [22:35] Huh. A USB ethernet driver including usb/usbnet.h seems like a bit of an oops to me, but there it is. [22:36] Would take some effort to decouple it, as they use usbnet types. [22:36] Still seems entirely wrong. :P [22:36] USB_SERIAL_SAFE_PADDED is the "USB Secure Encapsulated Driver - Padded", wtf?!?! [22:36] infinity: you can't have SMSC95XX without USBNET [22:36] ppisati: I know, I can see that from reading the code. [22:36] iirc USBNET "includes" drivers/net/usb/core [22:37] ppisati: I'm arguing that the driver is wrong, not the kernel config. [22:37] and all the drivers/usb/net requires it [22:37] so [22:37] infinity: you are a smart guy [22:37] :) [22:38] Or, I'm not up to date on the new and terrifying ways that people have conflated "usbnet" and "USB network drivers", if you say that others do the same. [22:38] s/network/NIC/ [22:38] Given that usbnet didn't used to have anything to do with NICs at all and was, in fact, all about networking without a NIC. [22:39] Oh well. Upstreams do silly things. News at 11. [22:44] ogasawara: default USB_INVENTRA_DMA if USB_MUSB_OMAP2PLUS [22:44] ogasawara: and omap4plus_defconfig selects USB_MUSB_OMAP2PLUS [22:44] ogasawara: now i wonder why it's off on omap3... [22:45] ogasawara: wait [22:46] infinity, we talked this week here about not publishing packages to -updates/-security on late friday/weekend, jjohansen told us about lack of man power if there is any fallout on weekend (especially for security team). We agreed that publishing should not be done between 18:00UTC Fri - 21:00UTC Sun, and would like to have all archive admins to be aware of that [22:47] herton: We tend to generally follow that rule for -updates anyway. [22:48] herton: We've even codified it in our SRU team policy that the person on Friday duty (slangasek) doesn't do promotions, only accepts to -proposed. [22:48] herton: Though, for kernels, I tend to do well over 90% of the AA/SRU stuff. [22:49] infinity, all we are doing is asking you not to work on a friday then :) [22:50] yes, nice, so I guess that's set for the kernel then. If anyone working on it doen't do on a Friday should be fine [22:50] I'll try my hardest to ignore you guys on Fridays. [22:51] The kernel we just rolled out will be an exception to that new rule. :P [22:51] Cause it'll be friday in a few parts of the world (if not all of them) by the time we release it. [22:51] And releasing it is the Right Thing To Do. [22:51] (Once we get some smoketesting to endure the build didn't break in any fun ways) [22:51] ensure, too. [22:53] yes, this last oneiric kernel is an exception, no problem on releasing tomorrow I think. I changed the bot to also not set the promote to -updates/-security tasks to confirmed through late Friday-Sunday, as we talked about it as well and decided to do it in addition [22:53] ogasawara: you were asking why USB_MUSB_OMAP2PLUS=y on omap3, right? [22:53] ogasawara: that's a dependency of USB_MUSB_HDRC and that's =y on omap3 [22:54] herton: +1 to the bot change, since I tend to not operate without its go-ahead. [22:54] ogasawara: IMO we can make USB_MUSB_HDRC=m on omap3 too (and that should turn USB_MUSB_OMAP2PLUS=m on omap3 too) [22:58] * ppisati -> reboot [23:04] ogasawara: x86_powernow_* are still not autoloadable? meh, I thought that was supposed to land in 3.5 [23:05] jwi: do you have a pointer to a discussion by chance? [23:05] * ogasawara add a work item to investigate [23:09] ogasawara: hm, 3.4 actually. fa8031aefec0cf7ea6c2387c93610d99d9659aa2 upstream [23:10] there should be similar patches for other x86 cpu features - coretemp, microcode, crypto and stuff [23:58] ppisati: USB_GPIO_VBUS one more to investigate [23:58] ppisati: it's =y for omap3, =m for all other arch's and flavors