=== jk-- is now known as jk- [01:26] hi, the ubuntu-natty kernel tree doesn't seem to include the 2.6.38-10.46 abi stuff at the Ubuntu-2.6.38-10.46 tag [01:26] is that to be expected? [01:26] i'm very new to this sort of thing :) [01:31] ah, it seems that the abi stuff gets added the commit *after* the tag [01:31] (that makes some sense i guess) === smb` is now known as smb [07:25] * apw waves [07:28] * smb finishes scrollback and waves back [07:35] reading scrollback, how dedicated :) [07:36] Just because someone caused botty to spew out a lot of stuff and some of it hit the "bell of interest" ;) [08:11] smb, ahh yes the c-ve bell [09:42] apw: can you take a look at my ubuntu-oneiric ti-omap4-next branch? if it's ok, i'll pull req [10:52] * ppisati -> out for lunch === _LibertyZero is now known as LibertyZero [12:06] ppisati, will try and look it over after i've sorted this lucid kernel issue [12:36] It seems that the apport rules for reporting a kernel bug are excessively thight: [12:36] I was forced to reboot into an older kernel, because of a regression, all while leaving the current package pulled by linux-generic installed. now, apport refuses to let me file a bug using 'ubuntu-bug' because the currently running kernel is not the same as the latest. [12:37] using 'ubuntu-bug linux' that is. [12:37] apw: k [12:38] Q-FUNK, yep its a pain in the backside [12:38] apw: is there any way this could be fixed? [12:39] Q-FUNK, it cirtainly can be fixed [12:40] I can understand apport complaining if there is no linux-generic installed at all, but refusing to let me file a bug just because I rebooted into an older kernel that is no longer in the repository seems excessive. [12:40] Q-FUNK, indeed seems excessive if that is the criteria [12:40] though the bug information will be poor if you cannot file it from the right kernel anynhow [12:41] as that will indicate the bug is in your old kernel which is false [12:41] will it? or will the bug be filed against the current version of linux-generic that is installed? [12:42] all of the live information will be taken from teh running kernel [12:42] so it would be even worse as mixture [12:42] right. shouldn't there be a way to instead attach the dmesg from the previous boot with the buggy kernel, instead? [12:44] yeah there likely should, or a way to take all the infrmation offline or something [12:45] that would work too. [12:45] Q-FUNK, what happens if you ask it to file a bug against the specific binary kernel package which is broken [12:45] don't we already same the dmesg from the previous boot? [12:46] öö... save [12:46] Q-FUNK, maybe so, not that i am aware of specificall, but maybe [12:46] trying that. just a sec. [12:47] yup. specifying the exact binary seems to work. [12:47] Q-FUNK, well thats something, not helpful particularly but ok .. [12:49] well, at least the hardware data remains valid. :) [12:49] whats the regression and with which release [12:51] since 2.6.39 there's frequent kernel paging failures that make the kernel oops. it especially happens whenever running dpkg. [12:52] it's a non-fatal oops, but I end up having to run 'dpkg -a --configure' and re-start 'apt-get upgrade' several times in a row to complete the upgrade. [12:52] so in lucid ? [12:52] oneirc [12:53] well you must be unsual as i have 10 machines in that range and none have ever shown that kind of behaviour [12:53] you are saying if you run 2.6.39 it doesn't occur ? [12:53] 2.6.38 works fine, but 2.6.39 and more recent have the symptoms. with 2.6.39 it happened seldom, but with 3.0 it's nearly systematically when I do a daily package upgrade. [12:54] 32 or 64 bit ? [12:54] 32-bit. Geode LX [12:54] oh a geode, heh, bet its h/w specific [12:55] not really [12:55] well its not occuring any any of my oneiric boxes [12:55] so its cirtainly not general [12:55] however, this particular host has repeatedly shown to be a good trap for corner-case kernel bugs. [12:57] Q-FUNK, as you have a pair of good/bad you could see if the corresponding mainline kernels have the issue [12:57] and then use the -rc's to try and pare it down a bit [12:58] tried that before, the last time I had a major regression on that host. in the end, while we found out around which commit the regression took place, it never got investigated. [13:00] Q-FUNK, as you are the only one with the h/w i suspect unless you do it it'll stay broken [13:01] I don't mind testing every -rc in the vanilla kernel folder to narrow it down to one specific release, but unless actions are taken beyond that point, it gets ridiculous. [13:02] geode is amazingly niche hardware, and lots of people seem enthusiastic about building businesses around it without actually being willing to put in the funding or effort for making sure that the software they depend on continues to work [13:03] for instance, the ext4 inode destroying bug I encountered on the same host from kernel 2.6.31 to 2.6.35 was never properly investigated. it was just marked as fixed the day I announced that 2.6.36 apparently fixes it. [13:03] Q-FUNK, if we can figure out whats breaking you we'll try and fix it, but last time it basically appears to be a work around for broken cache coherency so ... its not easy to either find or fix [13:04] apw: it could be. I'm still baffled as to how the bug occurred on that particular geode box and not on another one with a different bios. [13:05] Q-FUNK, isn't half of the geode instruction set implemented via SMI, at which point the BIOS is part of your processor from a semanatic point of view [13:06] apw: Not so much the instruction set. Just every time you think you're touching hardware. [13:06] mjg59, ahh ok, so its part of the h/w then even after its handed off ... just as bad [13:06] IIRC the bios is mostly used to provide a traditional x86 abstraction (PCI bus, etc.) for a system that uses an entirely different bus architecture and there are various implementations of that abstraction layer. [13:07] Right, any PCI accesses get handled by non-free firmware [13:07] So who knows what it's doing? [13:07] Q-FUNK, right but if the code is running via SMI after the kernel takes over, any bug in it ... can break things ... if they don't clean up right [13:07] free or non-free. coreboot works quite well on those, at least for a few known configurations. [13:08] Anyone who knew anything about how it worked appears to have vanished in a set of freak accidents [13:08] mostly in a set of random AMD attritions and OLPC changes of mind. [13:08] mjg59, not physcial accidents i hope [13:09] apw: Not to the best of my knowledeg [13:10] AMD used to have an extremely knowledgeable and dedicated coder who handled Geode coding for the OLPC project. he even won employee awards for his efforts. then one day, after a particularly bad quarterly, he fell the victim of random attrition. [13:10] I suspect that if it weren't for OLPC, everyone would just have given up pretending to support Geode [13:10] It's cetainly the only reason we care [13:10] (And we don't for RHEL) [13:10] ahh ... shame i don't know anyone who has one [13:11] that random attrition even left his immediate boss in usnly had to provide technical support and ongoing code development without his main guy. [13:12] argh. friggin kernel i/o stealing my keystrokes again [13:13] that random attrition even left his immediate boss in total limbo, because he suddenly had to provide technical support and ongoing code updates without his main guy. [13:14] I really hate how hard-disk access has the bad habit of momentarily halting the keyboard buffer. [13:15] Q-FUNK, never see that either, keys may be delayed for me but not lost [13:15] not that i want them delayed of course, but thats a separate gripe [13:16] delayed would be acceptable. half of the time, if the kernel starts swapping, I end up missing several words in the middle of a sentence. [13:18] you could try changing your io wait method [13:19] herton, we likely are going to have to shove a lucid kernel with a single fix in for the point release ... so hold off any lucid uploads to the PPA [13:21] apw: we were avoiding uploading anything because of the point release too, so that's ok [13:21] ok good stuff [13:21] ohsix: the scheduling algorhythm, you mean? [13:22] Q-FUNK: nevermind, was thinking of something else [14:24] sconklin, i've pushed a temporary bracnh to lucid, master-point which is what i am proposing for the upload [14:30] apw: ok, sounds good [14:43] * ogasawara back in 20 [14:57] sconklin, ok the decision from the release team is that they want a kernel spun, could you check that branch for me as the stable check script just talks crap [14:58] heh, probably as soon as we finish the meeting [15:33] Problem! If anyone is able to help.... running 10.04, recently upgraded to kernel 2.6.32-33, and can't boot up!! See http://ubuntuforums.org/showthread.php?t=1807978 for my ongoing thread. Any suggestions?!?! [15:33] tgardner: heading in, see ya in 15 [15:34] ogasawara, ack === tgardner is now known as tgardner-afk [16:10] * herton -> lunch [16:18] mumble went belly up [16:21] apw: say, did you see my email about a funky CVE in the hardy update? [16:23] kees, not yet, which one [16:24] oh, sorry, not hardy. 2010-4175 in linux-lts-backport-maverick [16:24] the tracker shows "released 2.6.35-25.44~lucid1" but there is a changelog entry in 2.6.35-30.56~lucid1 [16:26] kees, seems to be applied twice [16:26] ?? [16:26] git log --oneline origin/lts-backport-maverick | grep 'rds: Integer overflow in RDS cmsg handling' [16:26] was it a no-change cherry-pick or something? [16:26] 9a3798f rds: Integer overflow in RDS cmsg handling, CVE-2010-4175 [16:26] apw: Integer overflow in the rds_cmsg_rdma_args function (net/rds/rdma.c) in Linux kernel 2.6.35 allows local users to cause a denial of service (crash) and possibly trigger memory corruption via a crafted Reliable Datagram Sockets (RDS) request, a different vulnerability than CVE-2010-3865. (http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2010-4175) [16:26] ae1d3ae rds: Integer overflow in RDS cmsg handling [16:26] git describe --contains ae1d3ae [16:26] Ubuntu-lts-2.6.35-25.44~33 [16:26] git describe --contains 9a3798f [16:26] Ubuntu-lts-2.6.35-28.50~20 [16:27] so thats where the version you mention come from. from whats in the tree i'd expect my tools to take the first version number there [16:27] and i think thats what you are saying it did [16:27] yeah, that's certainly the correct approach, but I guess I'm wonder if we need to look closer -- how did it get applied twice (did that have a bad effect?) [16:27] kees, both commits seem to be complete [16:27] which should be impossible [16:28] right :P [16:28] apw: did you test build this? [16:28] sconklin, i built only generic on amd64/i386 [16:29] I'm going to test build before I shove it to the PPA. Less time in the long run if it fails [16:29] sconklin, ack, thanks [16:29] kees, will investigate and let you know [16:29] apw: cool, thanks [16:30] i can only assume the first one got reverted somehow [16:30] just ... how i don't know [16:35] apw: does this require an ec2 respin? [16:36] sconklin, i would expect it might now you mention it, but i am confirming now with the release team [16:41] ppisati: yeah, mumble can't login anymore here [16:41] here either [16:45] kees, ok this actually is fixed in the earlier one as advertised. however the relayout of the code since then makes it inobvious so someone has applied the same fix again to the containing routine in the second place [16:45] kees, so its doubly fixed essentially [16:49] apw: how strange, okay. thanks! [16:49] sconklin, ok the basic answer is no we don't need -ec2 [16:49] kees, good spot though [16:49] ok [16:50] 17:49:02 Daviey | it's a nice to have.. but we are only currently seeing this with euca. │ astraljava [16:51] \o/ [16:53] Daviey, yep taking your name in vein [16:53] * apw screams about mumble [17:14] * jjohansen running an errand [17:30] apw: package uploaded, It'll take 24 hours to complete, since it includes ARM arch [17:30] sconklin, yeah [17:30] sconklin, one reason we need it in as soon as [17:31] apw: yep, and another reason that I did a test build (which completed OK) [17:32] sconklin, yep all sensible and right [17:38] Problem! If anyone is able to help.... running 10.04, recently upgraded to kernel 2.6.32-33, and can't boot up!! See http://ubuntuforums.org/showthread.php?t=1807978 for my ongoing thread. Any suggestions?!?! [17:57] climbe2, does the previous kernel work [17:58] as in seelecting an older kernle from the grub menu [18:03] apw, mumble seems to have reincarnated [18:04] apw, no none will work anymore [18:05] climbe2, ok thats indeed odd if you took a kernle update and all your old kernels stop working too [18:05] which versions have you tried [18:05] perhaps it was not the upgrade.... I was recently using an SDHC card through my HP printer... two different USB storage devices... [18:06] all I know is that I can't boot up at all! [18:06] /dev/disk/by-uuid/e3df952c-d462-4292-bab6-4965da1d567c does not exist. Dropping to a shell. [18:07] so that implies your disk does not have the lable that is expected [18:08] at the busy box you can get the dmesg output and see if any of your disks were found [18:08] ok, how do I do that? [18:09] I am also on a different computer right now, so I will have to hand type it [18:09] climbe2, 'dmesg | grep sd' [18:09] that command sequence might give you some clues [18:09] climbe2, you might also try 'blkid' [18:10] if that is available it might tell you waht disks it thinks it can see [18:10] /dev/sda5: UUID="cf503727-25f2-4ecd-b0f3-2b894523bcba" TYPE="ext4" [18:10] I can only get to the grub command line [18:10] is there another command line I can use?> [18:10] mine has a line like that ... wihhc matches the UUID in the error ... [18:10] climbe2, your post has the busybox prompt in it ? [18:11] (initramfs) [18:11] that one [18:11] I've been a few different places from back then...let me see [18:12] yes, I am there [18:13] does blkid produce any output [18:13] and does dmesg | grep sd [18:14] dmesg | grep sd produces hundreds of lines it seems....blkid produces my drives [18:14] ok, and does the UUID in the error line appear ? [18:14] blkid: /dev/sda1, 2, 5, 6 /dev/sdab1 [18:15] climbe2, is that the exact output it produced ? [18:15] i am expecting some UUID= segments [18:16] /dev/disk/by-uuid/e3df952c-d462-4292-bab6-4965da1d567c [18:16] there is an extra "\" in the error message than in the drive UUID [18:16] an extra ? [18:16] what ? [18:16] wait..let me see [18:18] in grub, i press 'e' to edit... it has /dev/disk/by-uuid/e3fd.....b\ab6-... instead of bab6 [18:19] well you could try removing that \ [18:20] rather, in the edit screen, 5th line down it says linux /boot/vmlinuz 2.6.32-33-generic root=UUID=e3fd.....b\ab6... [18:21] though i am supprised to get those full names in your grub conf [18:21] perhaps that is an end-of-line signal...? it occurs on another line as well [18:21] when the line breaks to continue to the next [18:22] * apw has to run [18:23] thanks for the help [18:23] i'll see what i can do [18:26] skaet, ping [18:27] skaet, we don't need to pull the kernel re: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/788351 [18:27] Ubuntu bug 788351 in linux "xfs ioctl XFS_IOC_FSGEOMETRY_V1 clobbers kernel stack" [Undecided,New] [18:27] skaet, the patch in question is in the current -proposed [18:27] skaet, no need to do anything [18:27] pgraner, thanks - was pinging around about it. [18:27] skaet, next time just drop in here and ask [18:28] pgraner, will do === Quintasan_ is now known as Quintasan [18:59] apw: another "why is this 'released'?" question for 2011-0711. uct shows it as "released" for a kernel in -proposed (hardy) [19:26] * jjohansen lunch [20:37] * tgardner bounces tangerine for kernel update === yofel_ is now known as yofel [21:15] ! === kentb is now known as kentb-out === jeremy is now known as Guest4843 === rsalveti` is now known as rsalveti