=== jk-- is now known as jk- === jjohansen is now known as jj-afk [02:28] Hi guys, I'm running 10.04 LTS on a group of servers and I'm having some trouble when it comes under high load [02:29] the system becomes unstable and will sometimes go into a state where it responds to basic network access (i.e. ping, accepting connection) but nothing further and for all intents and purposes is down/dead [02:30] I have a system in this state at the moment, and it's fairly easy to trigger the problem (it just takes a trip out to the colo to kick them in the pants when it happens). [02:30] How would I start collecting data to file this as a bug report? [02:31] is ubuntu-bug -p linux the best way to get underway? [02:32] chrismsnz: Yes, that's generally a good start. [02:32] so rebooting the server and running ubuntu-bug will pick up the old dmesg and everything? [02:33] chrismsnz, check dmesg for error messages [02:33] chrismsnz, no you have to run it while system is frozen [02:34] chrismsnz, if it is a gpu hang, you should also gather intel register dumps as well, although frankly it's unlikely gpu hang fixes will get done for lucid. But who knows. [02:34] EOD, cya. [02:35] right [02:37] I actually had installed a newer kernel on this particular system to see if the problem was solved - but it was not [02:37] I will include in bug report, thanks guys === anubhav__ is now known as anubhav === smb` is now known as smb [07:22] morning [07:26] smb: morning [10:59] * ppisati -> out for lunch [11:40] apw: Could I trouble you for a kernel build with the patch from bug #753994 ? [11:40] Launchpad bug 753994 in linux "[arrandale] Display is slanted when using 1360x768 resolution" [Medium,Confirmed] https://launchpad.net/bugs/753994 [12:01] RAOF, sure [12:01] RAOF, for what series [12:08] RAOF, ok found it, assuming natty, will let you know when its done [12:18] herton: hi, i'm the guy on #793796, is this patch related to it? https://patchwork.kernel.org/patch/837692/ it's in the -11 package that just hit proposed; i won't have time to test it for a bit [12:20] ohsix: hi. yes, the patch is related, but it wasn't sufficient to fix your case. you need also the 0004 and 0005 patches from http://people.canonical.com/~herton/lp793796/r5/ [12:21] ok, i read the one that was in r6 and that looked like the same, but percolated up [12:21] ohsix: upstream did another patch trying to not having to add a lot of tests for dead queues on elevator, it was the last kernel you tested, but it didn't work. I reported your test on upstream bug report, but there is no response yet [12:22] nod, saw that [12:22] just saw it in -11 and figured some wizard might have just bypassed it all :} [12:23] unfortunately no. still you shouldn't see the issues on -11 kernel as we kept reverted the patches that caused the problem for you, so you can use the new kernel [12:25] those are just reverts right? who committed the patches? shouldn't be too hard to get them to try unplugging a usb drive :] [12:27] yep, it just reverts the commit we found on bisect, plus also reverts the followup fixes to this first commit reverted [12:30] ohsix: the reverts are being tracked on bug 802986 [12:30] Launchpad bug 802986 in linux "Revert upstream scsi run queue changes" [Undecided,In progress] https://launchpad.net/bugs/802986 [12:34] hm, didn't know about that one, thanks [12:34] i'm surprised i haven't heard of a thread on the lkml about it or something [12:38] I think there was a thread only in linux-scsi, from where the patch in r6 you tested came from, and bug https://bugzilla.kernel.org/show_bug.cgi?id=38842 is in the regressions list that is frequently posted on lkml [12:38] bugzilla.kernel.org bug 38842 in Other "panic in elv_completed_request on safe remove usb hard drive" [High,Needinfo] [12:43] i hope people look at the lp bug for the simple reproduction steps, instead of getting the virtual machine image D: [12:53] RAOF, ok kernels build and published in the bug [12:59] smb, hey where do you get old isos from [13:00] old isos of what exactly? [13:00] hardy for instance [13:00] http://release.ubuntu.com [13:00] i happen to have other releases lying around [13:00] err [13:00] releases [13:02] apw, If course that only gives you latest point release and release for lts. if that is ok [13:02] yeah just something old enough :) [13:03] apw, could probably ship you gutsy... ;-P [13:06] smb, so very kind [13:08] apw, Heh, though you would be appreciating. Btw, where you want to install hw-wise. Usually hardy fails on NICs or wireless (if recent) [13:08] on a netbook, any chance ? [13:10] hm. could already be questionable as they always slam in latest stuff... it seems ok on the dell 1521 which has some broadcom... [13:10] (not that you could call that recent) [13:10] smb, well i guess i'll give it a shot and see what happens [13:11] Yeah, you will notice quite early when the install complains about the network setup failed [13:18] bugs related to -virtual kconfig: 720644, 769527, 771855, 794570, 761809, 658461 [13:27] bug #720644 bug #769527 bug #771855 bug #794570 bug #761809 bug #658461 [13:27] Launchpad bug 720644 in linux "linux-virtual kernel does not allow network configuration via kernel command line" [Undecided,Confirmed] https://launchpad.net/bugs/720644 [13:27] Launchpad bug 769527 in linux "Missing rpcsec_gss_krb5 module" [Undecided,New] https://launchpad.net/bugs/769527 [13:27] Launchpad bug 771855 in linux "reiserfs module missing in linux-image-virtual" [Undecided,New] https://launchpad.net/bugs/771855 [13:27] Launchpad bug 794570 in linux "igbvf driver is missing from virtual-flavored kernel" [Medium,Triaged] https://launchpad.net/bugs/794570 [13:27] Launchpad bug 761809 in linux "Quota modules are missing from the package" [Undecided,Incomplete] https://launchpad.net/bugs/761809 [14:00] ## [14:00] ## Ubuntu Kernel Team Meeting - Today @ 17:00 UTC - #ubuntu-meeting [14:00] ## agenda: https://wiki.ubuntu.com/KernelTeam/Meeting [14:00] ## [14:00] J [14:03] bjf thanks for the reminder [14:04] apw, and this one only once :-) [14:04] its hard to remember its tue when its your first day of the week [14:04] bjf: I liked the morning bug report [14:05] bjf: sending it 50 times really made sure I looked at it :) [14:05] ogasawara, heh [14:06] * smb was already worried somebody sent a lot of patches... [14:06] hey! we take our bugs seriously! [14:06] smb: hehe, I had the same feeling for a second [14:47] * ogasawara back in 20 [14:49] whoa, he put himself back to sleep. lets see how long this lasts... [14:52] ogasawara, not long :) === cmagina_ is now known as cmagina [15:00] short lived indeed [15:00] * ogasawara back in 20 [15:30] herton: heya [15:30] vanhoof: hi [15:30] herton: quick q [15:30] herton: bug 774947 [15:30] Launchpad bug 774947 in linux "[Lenovo Edge 11 AMD] system locks up completely running the "stress" tool" [Medium,Fix released] https://launchpad.net/bugs/774947 [15:30] that fix came through 2.6.38 -stable, and landed in last weeks 2.6.38.10 kernel [15:31] anything I need to do with it at this point? (just looking at the most recent update that was popped in there) [15:31] vanhoof: in fact it was applied by hand, doesn't have a buglink to a stable update tracking bug [15:32] let me check here if it was included in a stable update [15:33] herton: ah ok, thought it was from the dialog in the bug [15:33] vanhoof, why have you closed the natty task for that bug ? you say its cause 2.6.38-10 was released but that only contains 2.6.38.7 [15:34] apw: based on the testing that was done on -10, so I assumed this trickled down in 2.6.38.7 [15:35] vanhoof, but you just said some 10 lines back that it was only in last weeks 38.10 [15:35] which isn't in 38-10 [15:35] apw: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/774947/comments/31 [15:35] Ubuntu bug 774947 in linux "[Lenovo Edge 11 AMD] system locks up completely running the "stress" tool" [Medium,Fix committed] [15:35] thats what I'm going with here [15:35] if its in 2.6.38.8, I'll flip it back to fix commited [15:36] vanhoof, well if the fix is in that kernel then all is good, but thats the right reason to change the state to released [15:37] looks like it was just those fixed on-top of -10 [15:37] vanhoof: right, same fix came with 2.6.38.8, I'll update the bug there, you don't need to verify [15:37] herton: cool, i flipped it back to fix commited then [15:37] * vanhoof mis-interpreted the testing comment, and thought it was on the -proposed kernel [15:38] apw@dm$ git describe --contains 9ee653dce0efc6bad29f0d68b4ac74dbed093131 [15:38] Ubuntu-2.6.38-11.47~159 [15:38] as far as i can see the commit is not yet released, it is Fix Committed only [15:38] right [15:38] which I just set it back to [15:38] and as it is correcelyt marked [15:38] it should close itself when it releases [15:39] apw: we get a few bugs where it's just waiting on some fix in -stable, like this one, didn't know if it would be automagically closed out [15:39] in any event, we're good [15:40] vanhoof, no often it won't get picked up if its not known before we apply stable [15:40] apw: yeah I assumed this was the case, and mis-interpreted the test results that were posted as 'fixed in -proposed' :) [15:56] ## [15:56] ## Kernel team meeting today @ 17:00 UTC [15:56] ## [16:08] i'm still confused by bug 774947 - do we need to test or not? the commit is in the -proposed kernel right? [16:08] Launchpad bug 774947 in linux "[Lenovo Edge 11 AMD] system locks up completely running the "stress" tool" [Medium,Fix committed] https://launchpad.net/bugs/774947 [16:19] cking: you don't need to test, yes the commit is on proposed [17:00] ## [17:00] ## Meeting starting now [17:00] ## [17:30] * bjf -> dr. appt. === bjf is now known as bjf[afk] === cking is now known as cking-afk === manjo` is now known as manjo [18:04] apw: none of the hardy linux CVE fixes show up in the tracker with a specific version... [18:04] kees, got an example [18:04] grep ^hardy_linux: CVE-2011-1170 CVE-2011-1171 CVE-2011-1172 CVE-2011-1173 CVE-2011-2534 CVE-2010-4649 CVE-2010-4073 CVE-2010-4238 CVE-2011-2484 CVE-2010-4165 CVE-2010-4249 CVE-2011-1010 CVE-2011-0711 CVE-2011-1090 [18:04] kees: net/ipv4/netfilter/arp_tables.c in the IPv4 implementation in the Linux kernel before 2.6.39 does not place the expected '\0' character at the end of string data in the values of certain structure members, which allows local users to obtain potentially sensitive information from kernel memory by leveraging the CAP_NET_ADMIN capability to issue a crafted request, and then reading the argument to the resulting modprobe process. (http:// [18:04] kees: net/ipv4/netfilter/ip_tables.c in the IPv4 implementation in the Linux kernel before 2.6.39 does not place the expected '\0' character at the end of string data in the values of certain structure members, which allows local users to obtain potentially sensitive information from kernel memory by leveraging the CAP_NET_ADMIN capability to issue a crafted request, and then reading the argument to the resulting modprobe process. (http://c [18:04] kees: net/ipv6/netfilter/ip6_tables.c in the IPv6 implementation in the Linux kernel before 2.6.39 does not place the expected '\0' character at the end of string data in the values of certain structure members, which allows local users to obtain potentially sensitive information from kernel memory by leveraging the CAP_NET_ADMIN capability to issue a crafted request, and then reading the argument to the resulting modprobe process. (http:// [18:04] kees: The econet_sendmsg function in net/econet/af_econet.c in the Linux kernel before 2.6.39 on the x86_64 platform allows remote attackers to obtain potentially sensitive information from kernel stack memory by reading uninitialized data in the ah field of an Acorn Universal Networking (AUN) packet. (http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2011-1173) [18:05] aaarggh [18:05] kees: Buffer overflow in the clusterip_proc_write function in net/ipv4/netfilter/ipt_CLUSTERIP.c in the Linux kernel before 2.6.39 might allow local users to cause a denial of service or have unspecified other impact via a crafted write operation, related to string data that lacks a terminating '\0' character. (http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2011-2534) [18:05] kees: Integer overflow in the ib_uverbs_poll_cq function in drivers/infiniband/core/uverbs_cmd.c in the Linux kernel before 2.6.37 allows local users to cause a denial of service (memory corruption) or possibly have unspecified other impact via a large value of a certain structure member. (http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2010-4649) [18:05] kees: The ipc subsystem in the Linux kernel before 2.6.37-rc1 does not initialize certain structures, which allows local users to obtain potentially sensitive information from kernel stack memory via vectors related to the (1) compat_sys_semctl, (2) compat_sys_msgctl, and (3) compat_sys_shmctl functions in ipc/compat.c; and the (4) compat_sys_mq_open and (5) compat_sys_mq_getsetattr functions in ipc/compat_mq.c. (http://cve.mitre.org/cgi-bi [18:05] kees: The vbd_create function in Xen 3.1.2, when the Linux kernel 2.6.18 on Red Hat Enterprise Linux (RHEL) 5 is used, allows guest OS users to cause a denial of service (host OS panic) via an attempted access to a virtual CD-ROM device through the blkback driver. NOTE: some of these details are obtained from third party information. (http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2010-4238) [18:05] kees: The add_del_listener function in kernel/taskstats.c in the Linux kernel 2.6.39.1 and earlier does not prevent multiple registrations of exit handlers, which allows local users to cause a denial of service (memory and CPU consumption), and bypass the OOM Killer, via a crafted application. (http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2011-2484) [18:05] kees: The do_tcp_setsockopt function in net/ipv4/tcp.c in the Linux kernel before 2.6.37-rc2 does not properly restrict TCP_MAXSEG (aka MSS) values, which allows local users to cause a denial of service (OOPS) via a setsockopt call that specifies a small value, leading to a divide-by-zero error or incorrect use of a signed integer. (http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2010-4165) [18:05] kees: The wait_for_unix_gc function in net/unix/garbage.c in the Linux kernel before 2.6.37-rc3-next-20101125 does not properly select times for garbage collection of inflight sockets, which allows local users to cause a denial of service (system hang) via crafted use of the socketpair and sendmsg system calls for SOCK_SEQPACKET sockets. (http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2010-4249) [18:05] kees: Buffer overflow in the mac_partition function in fs/partitions/mac.c in the Linux kernel before 2.6.37.2 allows local users to cause a denial of service (panic) or possibly have unspecified other impact via a malformed Mac OS partition table. (http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2011-1010) [18:05] kees: The xfs_fs_geometry function in fs/xfs/xfs_fsops.c in the Linux kernel before 2.6.38-rc6-git3 does not initialize a certain structure member, which allows local users to obtain potentially sensitive information from kernel stack memory via an FSGEOMETRY_V1 ioctl call. (http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2011-0711) [18:05] kees: The __nfs4_proc_set_acl function in fs/nfs/nfs4proc.c in the Linux kernel before 2.6.38 stores NFSv4 ACL data in memory that is allocated by kmalloc but not properly freed, which allows local users to cause a denial of service (panic) via a crafted attempt to set an ACL. (http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2011-1090) [18:05] AAHHHHH [18:05] apw: anyway, all of those CVEs are just "pending" without versions [18:05] kees, hardy_linux: pending (2.6.24-29.92) [18:06] * kees does an update.... [18:06] apw@dm$ grep ^hardy_linux: active/CVE-2011-1170 [18:06] apw: net/ipv4/netfilter/arp_tables.c in the IPv4 implementation in the Linux kernel before 2.6.39 does not place the expected '\0' character at the end of string data in the values of certain structure members, which allows local users to obtain potentially sensitive information from kernel memory by leveraging the CAP_NET_ADMIN capability to issue a crafted request, and then reading the argument to the resulting modprobe process. (http://c [18:06] pending without a version should mean Fix Commited but not uploaded yet [18:06] kees: make sure that xfs fix comes with the fix to the regression it caused [18:07] apw: oh, ha-ha, we're out of sync with you. fixing, please ignore me... [18:07] * bliss bows his head in shame [18:07] kees, np [18:07] bliss: do you have a pointer for it? [18:09] i'll find the commit [18:09] bliss: do you mean CVE 2011-0711 ? [18:09] kees: The xfs_fs_geometry function in fs/xfs/xfs_fsops.c in the Linux kernel before 2.6.38-rc6-git3 does not initialize a certain structure member, which allows local users to obtain potentially sensitive information from kernel stack memory via an FSGEOMETRY_V1 ioctl call. (http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2011-0711) [18:09] gah, I can't avoid the bot [18:09] yes, that's the bug [18:09] my fix was broken [18:09] i content their code was broken and my fix revealed it, but that's an argument for another day [18:09] contend* [18:09] bliss: heh. it's just a memset though? [18:10] yeah [18:10] but they cast a structure to a larger size in a function call [18:10] so the memset caused an overflow [18:10] *ew* [18:10] yes, i wrote the patch, but that was clearly their fault [18:11] that rings a bell, we may have gotten both but worth checking [18:11] http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=af24ee9ea8d532e16883251a6684dfa1be8eec29 [18:12] bliss: yup, looks like we've got it. [18:12] those sha1s even look familiar [18:13] yeah we have both marked in the tracker as needed, so we are good [18:13] * kees nods [18:38] hggdh: I have a question for you about q-r-t and updates... I want to add a test for something that hasn't been fixed yet. since we no longer fix CVEs in lock-step across all releases, I don't have a good way to avoid having the test fail. any thoughts on how to add a test that isn't a regression "yet"? [18:40] hum [18:40] the easy way out: let it fail -- I always verify the results [18:41] another option, bit more involved -- set up a table of fixed releases (and versions) and compare against [18:41] kees: ^ [18:42] hggdh: yeah, we're going to need this going forward, so probably the table is the best solution. I will ponder how to best implement it. thanks! [18:43] the only thing is this is probably going to explode, too many CVEs against too many different packages :-( [18:44] heh [18:44] hggdh: well, we have some of the data in the CVE tracker, so I'm just going to ponder how to avoid duplicating it. [18:45] if the cve tracker is API-accessible by anyone... [18:46] we could just grab the data on run time. If not accessible, I am not sure it is a good idea [18:54] jhunt_: hello! [18:55] kees: hi. [18:55] jhunt_: so, we were trying to figure out exactly what the changed behavior was so we could reproduce it. [18:55] jhunt_: without that, we were kind of stuff. we've been looking at another set of related changes, but I couldn't find fd/-specific stuf [18:55] *stuff [18:56] well, to be clear, I'm not 100% sure what's happening yet. So, I'd really like to know what is expected behavior wrt a root process calling setuid and attempting to open /proc/self/fd/* directly. [18:57] i'd have expected /fd/ to continue to work, as you already have those fds open [18:57] * kees nods [18:57] "self" should always be able to read its own fds I would imagine [18:58] I've got a noddy test program that attempts this and on lucid, the process can only open /proc/self/fd/0 whereas on natty+oneiric, you can open /proc/self/fd/[012]. Any fd higher than 2 gives EPERM. [18:59] apw: right. And I still think I might be going made, but the upstart session code that falls into this scenario *used* to work (on natty). however, I now cannot make the same code work on either natty or oneiric which made me think maybe a kernel or security change might be affecting things. [18:59] s/made/mad :) [18:59] (see, I'm mad me :) [18:59] jhunt_, perhaps you could send us the test code so we can look it over [19:00] will do... [19:00] kees, this fix: UBUNTU: SAUCE: proc: hide kernel addresses via %pK in /proc//stack [19:00] UBUNTU: SAUCE: proc: hide kernel addresses via %pK in /proc//stack [19:00] kees, whats the gen with that [19:00] "gen" ? [19:01] kees, the history, the background? [19:01] it looks like a security related change, which we have from mainline in oneirc, backport from you in natty, but not in maverick [19:02] apw: right, so, it's an upstream improvement for continuing to lock down things that should be hidden with %pK [19:02] apw: it doesn't need backporting as far as I'm concerned. (i.e. I already backported it to natty which was the first the have %pK) [19:03] * bliss cheers [19:03] kees, ah ok, fair enough, i'll drop teh K in the backport before that then [19:04] apw, kees: code should with you now. [19:06] apw: I'm not sure what you mean? [19:07] jhunt_: thanks [19:07] kees: np [19:09] kees, just trying to introduce a %pK too early is all [19:09] apw: ah-ha, okay === bjf[afk] is now known as bjf [19:22] * jjohansen -> lunch [19:23] ogasawara, can you see if you can still get to the "site admin" page on voices.canonical.com? i'm not able to any more so no more blog postings for me [19:23] bjf: lemme check, just a sec [19:24] bjf: works for me, want me to post the meeting minutes? [19:24] ogasawara, sure, i'll pastebin them [19:25] ogasawara, http://pastebin.ubuntu.com/647561/ [19:26] bjf what error you get ? internal server error? [19:27] apw, You do not have sufficient permissions to access this page. [19:27] bjf odder and odder [19:27] bjf: cool, posted [19:28] ogasawara, thanks [19:28] apw: weren't you getting some error as well? [19:28] bjf, nice ... i am still getting internal server error whenever i try [19:28] we'll run out of people who can soon [19:29] apw, jjohansen is the community guy now, maybe only he can :-) [19:29] bjf he will be so pleased to hear that === bjf changed the topic of #ubuntu-kernel to: Home: https://wiki.ubuntu.com/Kernel/ || Natty Kernel Version: 2.6.38 || Ubuntu Kernel Team Meeting - July-26 - 17:00 UTC || If you have a question just ask, and do wait around for an answer! [19:43] oh yeah so very pleased [19:47] apw, kees: just seen http://bit.ly/nRONfB which suggests the problem I'm seeing cannot be entirely caused by the /proc/self/fd/ behaviour. I'll do some more debug tomorrow and keep you posted... [19:47] jhunt_: fwiw, even maverick shows this as a perm failure [19:50] kees: ok, thx. [20:01] manjo: I'm sorry I snapped at you yesterday. [20:01] sconklin, hey no worries man [20:02] sconklin, reason I asked you the Q was ... the patch is in 3.1... I was not sure if there was a scheme to get patches down before it hit stable (which might take a few weeks) [20:03] sconklin, I will push thro sru process once it gets in the 3.1 tree, its got acks from most mainitaners now [20:03] no, there's no scheme, and in general it's always better to just go ahead and get it out onto the mailing list instead of discussing it on IRC then having to rehash it in email [20:04] right .. will do [20:06] and I owe you a beer. Now, who wants to be snapped at next ;-) [20:06] sconklin, :) no worries.. you don't 'owe' me... [20:44] I was just doing some maintenance, and therefore performing basic package updates on my 10.04 box. The explanation of why the kernel is being updated is "Bump ABI". Can anyone explain why I want to take my machine down as a result of this message? Does it mean something? [20:45] hacksaw, that sounds like a bug in the changlog generator for update-manager ... whats the version you are being offered [20:46] 2.6.32.33.39 === yofel_ is now known as yofel [21:20] tgardner, when do you want to get into the office? [21:20] bjf, I should be by 0800 [21:20] be here* [21:21] tgardner, sconklin is trying to setup a mumbe meeting, you see that? [21:21] bjf, yeah, but 0700 seems a bit early [21:22] bjf, speaking of mumble, I'd better make sure it works on my laptop. I've been having some problems. [21:43] Okay, apw, I'm going to assume someone will look at the potential changelog bug, and make corrections in time. [21:51] ogasawara, we are going to try to get into the office by 8:00 a.m., that work for you? [21:55] bjf: I'll probably come in a little later then since Kai gets up around 8am. be in likely by 9am. [21:55] ogasawara, sounds good === kentb_ is now known as kentb-out