=== bladernr_afk is now known as bladernr_ === bladernr_ is now known as bladernr_afk [08:21] jsalisbury: it happens on hw as well, patrick tried yesterday === smb` is now known as smb [09:14] bug 897506 [09:14] Launchpad bug 897506 in linux "USB3.0 host can't detect USB3.0 device" [High,In progress] https://launchpad.net/bugs/897506 [09:16] a simple question, the fix for above bug has been merged into 3.0 stable kernel and in 3.0.11 already. This oneiric SRU cycle goes to 3.0.10 kernel. Shall I post the fix to u-k list or just wait for next oneiric SRU cycle? [10:16] BAH no ie [10:16] ike [10:32] apw, ? [10:33] at 9:16 ike asked something, and now is not here [10:34] ah. could not put the pieces together [10:34] Wondered why you want internet explorer... :-P [10:35] never! [11:16] anyone knows what this means: "Uhhuh. NMI received for unknown reason 21 on CPU 0." [11:58] brendand, at a guess, NMI reason is being read for port 0x61, and a reason code 0x21 does not indicate a channel or parity check error which I would expect from that kind of error [12:00] I'd expect a reason code >= 64 to indicate some kind of H/W detected error, see: http://wiki.osdev.org/Non_Maskable_Interrupt [13:22] i'm not seeing the new oneiric-proposed kernel image (14.23) on a.u.c, though some other packages seem to be there.. [13:23] like linux-doc_3.0.0-14.23, but not linux-image-3.0.0-14-* [13:25] herton: ^ [13:32] tjaalton, hmm, some of them went into universe [13:32] (http://archive.ubuntu.com/ubuntu/pool/universe/l/linux/) [13:33] interesting.. though I still can't apt-get it :) [13:33] not that I would, it's enough to know where to wget it from [13:33] tjaalton, do you have universe enabled? well, we have to report to pitti so he can fix it [13:34] herton: yes === bladernr_afk is now known as bladernr_ [14:17] Hey! Has rtl_nic/rt18168d-1.fw been dropped from Precise? Being asked for it at install time, which seems to be a regression from Oneiric. [14:18] Daviey, lemme look [14:19] tgardner: thanks [14:19] Daviey, its in the linux-firmware package for Precise. [14:20] what does "being asked for it" mean? the installer doesn't prompt for firmware files AFAIK [14:20] tgardner: Hmm, okay - thanks. I'll have to dig into why the installer isn't finding it.. [14:20] tgardner: well it is :) [14:20] (d-i, not ubiquity) [14:21] huh. /lib/firmware/rtl_nic should be the ultimate location. [14:21] It's asking me to insert media containing that file, Yes / No [14:21] * Daviey looks [14:22] Daviey, I'm surprised that d-i is cranking up your wireless connection. perhaps this is new behavior ? [14:24] tgardner: hmm.. dmesg - .. r8169 ... eth0: unable to load firmware oatch rtl_nic/rt18168d-1.fw (-2) [14:25] interesting it's eth0 [14:25] Daviey, r8169 is not wireless, its wired. [14:27] Daviey, looks like we need firmware in the udeb. Are you sure this installed correctly on Oneiric ? [14:27] tgardner: I can double check. [14:27] Daviey, from scratch please. I don't think we've changed anything wrt r8169 firmware. [14:27] the odd thing is, networking is working [14:27] Hey everyone! I need to report a couple upstream bugs on the kernel, but bugzilla.kernel.org is still down :-/ what's a good place to file bugs in the meanwhile? LKML? or the per-subsystem mailing lists? what are other people doing about this? [14:27] tgardner: thanks === hyperair is now known as Guest43903 [15:07] apw, hey, last we discussed some i915 hangcheck issues. I think this is what we're seeing in natty (i have garbled serial output). Was there a set of patches / bug report I should be looking at that covers this? [15:08] * herton -> lunch [15:13] herton - hi [15:13] arges_, hmmm can't recall off the top of my head [15:14] apw, ok [15:15] arges_, might be worth asking bryceh he tends to remember better [15:26] * ogasawara back in 20 [15:50] brendand, hey, did you get the results about the cert. regression? [15:51] herton - not yet. it is a very tricky issue. [15:52] brendand, please explain. you've given us almost no information other that "ther may be an issue" and it's holding up a SRU [15:52] herton - all the extra info i have now is we saw the message "Uhhuh. NMI received for unknown reason 21 on CPU 0." just before the system halted [15:53] bjf - i totally appreciate the urgency [15:54] ** [15:54] ** Ubuntu Kernel Team Meeting - Today @ 17:00 UTC - #ubuntu-meeting [15:54] ** [15:55] brendand, that sounds kind of hardwarish. Is it a fault from EDAC ? (RAM single bit errors) [15:56] tgardner, I thought port $61 would report that in bits 6-7 if that was true [15:57] cking, by the time the system is halted I don't think we're getting anything out of it [15:58] yeah, looks like this NMI msg may well be something RAM related, likely hardware issue [15:58] tgardner - by hardwarish you mean it's totally a failing hardware issue? we can't reproduce this halting with the release kernel [15:59] ah, just FWIW, it does sound like a H/W issue to me [16:00] brendand, can you run a full cycle of the RAM test ? [16:00] tgardner - the one in the BIOS? [16:00] brendand, no, the one on the CDROM [16:01] it should also be in the grub boot menu [16:01] the mem86 test thingy [16:02] tgardner - at the moment i only have ssh access. there will be lab engineer in early tomorrow [16:02] brendand, no KVM access ? [16:03] tgardner - no, unfortunately [16:03] tgardner - the lab engineer is in early tomorrow morning [16:03] (i already said that) [16:03] brendand, so wtf do you have equipment in Asia that you can't actually do anything with ? [16:04] brendan, how many systems has this been tested on and it has passed? how many of those systems were server class systems ? [16:06] bjf - nearly 100. probably about 60 are servers [16:06] make that 53 [16:07] are servers [16:07] brendand, please update the tracking bug with what information you _do_ have right now, I think we are going to suggest publishing this and addressing this possible regression in the next cycle [16:12] tgardner - the thing that's suspicious is that two systems are doing it, both of the same make. [16:13] bryceh, looking at some i915 hangchecks on an x220 machine. wondering if there are any open bugs or issues I should be looking at. We're seeing this on a natty kernel [16:15] brendand, well, thats kind of an interesting factoid. what bug # is this, and do you have the machine specifics attached to it ? === shadeslayer_ is now known as shadeslayer [16:23] tgardner - haven't been able to gather the machine details because of the halting [16:23] brendand, huh? are you saying it won't even boot ? [16:25] remember i'm working through a lab engineer here, so i'm taking his word for everything [16:27] brendand, until you can get us some actual facts to work with, there isn't a hell of a lot we can do [16:38] skaet, can we continue our discussion here? thanks [16:40] bjf, sure. [16:41] Daviey, when the server team meeting is finished, can you look at the backscroll on this? [16:42] brendand, can you start a separate bug off to track what you're seeing as symptoms, and get more of the machine specifics in there? [16:42] skaet: wilco, have 2 x calls directly following this but will during them [16:43] Daviey, thanks. :) === chrisccoulson_ is now known as chrisccoulson [16:44] brendand, can you give us any information at all about these servers? make, model, etc.? [16:45] Good afternoon, I believe I have bumped into a bug in the ata_piix driver on my ICH7-M chipset board, the system is horribly slow and I cant use the harddrive, dmesg is filled with 'failed command: READ DMA EXT'. Did a bit of looking and there was an old bug where UDMA was set to 133 which is above ICH7 spec, but I see that its being backed right down to 33, which is when the system finally unlocks [16:45] Wipster: can you give us the bug # that you've filed ? [16:47] bjf: yet to file thought I'd pop in here first, will do that now [16:48] Wipster: thanks [16:48] bjf - i'm just raising an umbrella bug as skaet asked. it's going to be short on hard facts until our engineer gets in, but at least it will have the make and model(s) [16:48] brendand: anything will help, thanks [16:49] thanks brendand [16:49] brendand: are you able to offer remote access to folks which might be able to help? [16:49] ah, seen that in scrollback [16:50] Daviey - yeah, it's sub-optimal unfortunately [16:50] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/897773 [16:50] Launchpad bug 897773 in linux "[Acer AR320 F1 and Acer AR160 F1] Halting shortly after booting with Lucid (2.6.32-36.79) -proposed kernel" [Undecided,New] [16:50] two Acers [16:51] brendand, what was the last kernel version that was certified on these machines ? [16:54] tgardner - 2.6.32-33.70 [16:54] tgardner - interestingly the server kernel wasn't used then though. just the -generic one [16:55] brendand, I thought cert verified every kernel that goes into -updates ? what happened with 2.6.32-35.78 ? [16:55] ## [16:55] ## Kernel team meeting in 5 minutes [16:55] ## [16:57] tgardner: btw, http://pb.daviey.com/KFon/ see the missing firmware error [16:57] tgardner - that should be the case. sometimes we have to miss systems out for lack of time. [16:58] skaet, Daviey, thoughts on publishing this lucid kernel and dealing with the possible regression in the next cadence cycle ? [16:59] brendand, one thing you could have your tech do in his morning is to start working forward from 2.6.32-33.70 on these 2 machines to see where it begins breaking. [17:00] bjf: #897777 [17:01] Daviey, Any feel for how likely this might be to show up in the field? How prevalent are the Acer's in question? [17:02] skaet: Honestly, i don't know. I don't know of other deployments that focus on Acer servers, but that is just my experience. [17:05] Daviey, it looks like the firmware load was ultimately successful after installing linux-firmware. There are complaints that a couple of firmware files referenced in module aliases don't exist (which is correct) [17:06] tgardner: right, so i guess i need to discuss this with cjwatson, regarding ordering of installs? [17:06] As in, there was a hard warning for me to skip [17:07] Daviey, correct. it seems the installer shouldn't prompt for firmware until after installing linux-firmware [17:07] tgardner: thanks [17:08] oh wow, my wifi just went bonkers [17:08] Daviey - what's your opinion on this bug then? [17:09] brendand: as tgardner said, we don't have access to this class of hardware.. I'm really not sure what more we can do? [17:10] bjf, concern is how likely will this result in an incident report and cause more perturbation. Would like to get a bit more data on this one from the machines in question and from the OEM team as to how prevalent they are. What is impact of waiting a day? [17:12] skaet: low impact, i just don't want this to drag out. haveing to wait 24hrs. to get a single question answered is going to be a problem [17:13] brendand, any way that engineer can join this channel when he comes on shift, and we can get the bug preloaded with questions the experiments for him to run? [17:15] herton, I am not sure there will be a verification for bug 854050 in time, but it was verified in natty and had not the maintainer angered the gods of x86 maintenance it would be upstream. Can we except that from being reverted in Oneiric without explicit verification? [17:15] Launchpad bug 854050 in linux "BUG at /build/buildd/linux-2.6.38/mm/swapfile.c:255" [Medium,Confirmed] https://launchpad.net/bugs/854050 [17:15] skaet - he should be online in the late evening US time (taipei is +13??) [17:16] skeat - 9 or 10pm apparently [17:16] Eastern [17:16] ctf [17:16] brendand, i usually am still around when our taiwan folks start showing up [17:18] bjf - feel free to add any questions to the bug for him [17:18] smb, hmm since it was ok in natty and tested, I think it could be marked as well, do you have a direct contact with the reporter or only through the bug? would be good to poke him if he can check oneiric, still there are some days left [17:18] ogasawara, I'll take a look at why the kt-meeting script has not run since the 15th. [17:18] jsalisbury: cool, let me know if you need some help [17:19] ogasawara, ok, thanks. The best way to learn something is to fix it when it's broken ;-) [17:19] jsalisbury: I should have caught that last week, seemed to have missed it. [17:19] bjf - i'll direct him to join this channel and read the bug [17:19] herton, Only contact is through the bug report. I'll put in a note there. Just wanted to speak up before to prevent it from getting suddenly reverted [17:20] ogasawara, np. we moved some things over to cranberry from people while in Lexington. May be related. [17:20] smb, ok, this is verification week for oneiric, we have until friday for it to be tested, lets hope we can get some testing, otherwise we may see what to do [17:21] herton, Ok, I assume you would start reverting next Monday? So we could decide then before doing so? [17:22] smb, yep, if on friday it's still not tested/no feedback on the bug, then we must decide [17:22] brendand, i've added a comment to the bug and will monitor it. please make it a priority of your engineer [17:23] herton, ok [17:23] bjf - it's absolutely their first priority [17:24] * smb tries to make a big mental note to remember this on Friday [17:26] tgardner: I see the non-pae discussion has hit the next tech board agenda [17:27] tgardner: https://wiki.ubuntu.com/TechnicalBoardAgenda [17:27] I wonder who submitted it ? [17:27] tgardner: I suspect we should wait to drop non-pae until after that meeting [17:28] ogasawara, bug #897786 is milestoned for A2. when is taht ? [17:28] Launchpad bug 897786 in linux "Kernel is dropping non-PAE flavour" [Undecided,In progress] https://launchpad.net/bugs/897786 [17:28] tgardner: Alpha-2 is Feb 2nd. I think the tech board meeting is Dec 12. [17:29] ok, shuold be plenty of time [17:52] smb, ping [17:52] bug 897795 [17:52] Launchpad bug 897795 in linux "-virtual kernel missing rtl8139 drivers" [High,Confirmed] https://launchpad.net/bugs/897795 [17:52] that is bad news. [17:53] smoser, in a virt kernel? why does anyone care ? [17:53] is it one of the emulated drivers ? [17:54] "This driver is used in default configuration of openstack and of kvm also." [17:54] kvm with no parameters gives you a 8139 nic. [17:54] smoser, hmm, ok. should be an easy fix. [17:55] there were other changes there between the 2. i can post a list reqlly quick. [17:55] smoser, I assume Oneiric works OK ? [17:56] tgardner: Oneiric release did. [17:56] nothing has changed wrt 8139 between the 2 [17:58] is the kernel team meeting now, or one hour ago? [17:58] tgardner, yes. [17:58] http://paste.ubuntu.com/753890/ [17:58] kamal, was one 1 hour [17:58] ago [17:58] doh! [17:58] ug. we lost the e1000 driver too [17:59] and ne2k [17:59] all of those emulated by kvm. [17:59] smoser, ah, they all changed locations in the source tree. [17:59] gah, this happens /every/ release [17:59] the ethernet drivers were all reorged according to manufacturer [17:59] ah. and the whitelists lost them. [18:00] smoser, yep. i'll go through and figure it out. [18:00] tgardner: is there a test we can add that spots when drivers and firmware gets moved upstream? [18:01] we seem to spot stuff accidently at the moment. [18:01] well, the inclusion list for virtual should have failed the build. [18:01] oh [18:02] Daviey, smoser: I'll figure it out. [18:03] if I can gert LP to respond [18:06] tgardner: good luck with that aspect :) [18:13] tgardner: Is it reasonable to expect this to be fixed on the day of A1 release? [18:16] rephrase that, Daviey. [18:16] tgardner: Or rather, possible to squeeze this into A1? [18:16] is it unreasonable to ask that a new kernel be uploaded in the next few hours? [18:16] Daviey, if you can talk ogasawara into an upload. [18:16] ogasawara: Hello [18:16] Daviey: going to have to clear it with skaet and the release team. [18:17] hte best justification i can find for it is that we found this less than 24 hours after the new kernel hit the archive. and the cloud images are DOA with it. [18:17] ogasawara: So.. Cloud images are the easiest entry point for our users to try precise at this stage [18:17] currently they can't [18:17] ogasawara: I am on the release team, and will pursue that - but i wanted to see if it was viable first. [18:17] As in, can you get it all built in time etc. [18:17] Daviey, wel.... to be fair, EC2 is the easiest. [18:17] and that works. [18:19] Daviey: arm takes the longest, approx 12hrs, to technically I could get it uploaded and built. but not sure the additional impacts that has in terms of how long it takes to respin images etc. [18:20] ogasawara: Can we have it fixed for day of A1 release? [18:20] cloud image build process would take 4 hours after archive entry. [18:20] maybe 5 [18:21] Daviey: indeed I could have it queued and upload immediately upon A1's release. [18:22] ogasawara: thanks [18:22] * Daviey bumps this to release. [18:22] #ubuntu-release [18:43] tgardner, do you remember any specific reason for us to not applying 2.6.35.14 stable to maverick? Otherwise I'll take a look at it [18:44] herton, likely just missed it. [18:44] 2.6.35 is no longer officially supported by stable, right ? [18:45] hmm I don't know. the last stable we applied was .13, and there were one more release (2.6.35.14) [18:46] ogasawara, whats missing for virtual ? [18:46] apw, upstream reorged the location of a bunch of the net drivers. [18:47] tgardner, 3.1 or 3.2 ? ... though even 3.2 was in the archive long before the last upload [18:47] is this just another case of not testing at all until freeze? [18:48] apw, I don't think the meta package for 3.2 was uploaded until recently was it ? [18:48] ogasawara, we had 3.2-rc2 up for some time didn't we ? as the real kernel ? [18:49] i bumped it to -2 on monday yes, but -1 was already in [18:49] apw: yep, although we'd held off uploading linux-meta for a little longer [18:49] which is what would have triggered the cloud image issues [18:49] apw: but that did get uploaded last week [18:50] so it was in the archive from the 23rd, so they would have had bad images thursday and friday last week [18:51] so i guess i might let them off, might [18:51] apw, hard ass :) [18:51] * apw is a little tired of finding we don't bother testing until its tooo late for us to spin [18:52] apw, i update daily and i don't think i saw a 3.2 kernel until this weekend [18:52] apw: I think we've at least settled that we won't upload the fix until after Alpha1 [18:52] apw, I'm build testing a patch to fix the inclusion issues. [18:52] apw, but i could be off a day [18:55] bjf, na, you arn't allowed [18:57] arges_, search against component xserver-xorg-video-intel for bugs tagged 'natty freeze' [18:57] bryceh, tha [18:57] bryceh, thanks [18:57] arges_, in fact I probably should move all those to 'linux' since they're all pretty much kernel drm bugs [18:57] yay more bugs we can't fix [18:58] bring em on! [18:59] bjf, more bugs to spam :) [19:00] more karma [19:01] * cking --> calling it a day [19:03] * tgardner -> lunch [19:04] :-) [19:18] bryceh, thanks will take a look [19:19] still collecting more information at the moment to see if I can correlate with an existing bug first [19:22] arges_, I've been fussing with these types of bugs for years; if you give me some more details I may be able to make some analysis suggestions [19:23] bryceh, sure [19:26] bryceh, <3>[80750.499041] [drm:i915_do_wait_request] *E [19:26] RROR* i915_do_wait_request returns -11 (awaiting 794226 at 794222, next 794227) [19:26] <7>[80750.499132] [drm:i915_error_work_func], resetting chip [19:26] <3>[80750.502624] [drm:init_ring_common] *ERROR* render ring initialization fail [19:26] seeing this on ubuntu-natty on an x220 lenovo [19:27] its completely unresponsive, can't use the mouse, can't ssh, can't even log in via serial console etc [19:27] the person that collected this didn't grab any of the i915 debugfs stuff, so next time I asked that to be collected as well [19:27] arges_, wow, ok so not merely a gpu lockup [19:28] it takes a long time to reproduce and its intermittent [19:28] right /sys/kernel/debug/dri/0/i915_error_state is good to collect, although in this case I'm not sure it's going to be relevant [19:29] so grepping through the code I see that mm.wedged is probably true in i915_do_wait_request [19:29] to give us the -EAGAIN error [19:30] arges_, the "*ERROR* render ring initialization fail" message sounds relatively unusual, it might be possible to search for it on bugs.freedesktop.org [19:30] bryceh, the next set of tests are trying to turn dpms off and see if we hit, and now trying i915_semaphores=1 to see if we still hit errors [19:31] ok [19:31] arges_, there are also some drm debugging flags that can be set, although they probably wouldn't produce anything interesting in this case. drm.debug=0x04, etc. [19:31] bryceh, yea we set drm.debug=0x04 for that log (had to get it via serial console which was a bit painful) [19:31] arges_, have you ruled out 3d? IIRC we had a slew of gpu issues with 3d paths in natty. [19:32] bryceh, is there any easy way to turn that off? [19:33] arges_, generally having the user run the 2d environment and not run 3d apps is sufficient, but you can forcibly disable it in X, one sec [19:33] ok [19:33] Section "Extensions" [19:33] Option "Composite" "Disable" [19:33] EndSection [19:33] in xorg.conf. that should be sufficient [19:34] ok [19:34] i'll make that suggestion [19:35] arges_, I also have a package called xdiagnose which has a set of simple workload scripts, which I find helps speed the system to lockup. Useful for situations where the bug isn't reproducing very often. [19:35] bryceh, awesome, is this in a ppa? [19:36] arges_, it's in oneiric [19:36] arges_, part of the default install [19:36] ok [19:36] arges_, but you can just pull from lp:xdiagnose and run it on natty [19:37] indeed I did most of the dev work on natty so it should run without problem [19:37] bryceh, perfect, I'll give it a shot and see what we can dig up. [19:37] arges_, cool, good luck let me know how it goes [19:38] bryceh, thanks [20:43] Im trying to know if there some 'post-resume' uevent generated from kernel? [20:43] asking because i need to reinitialize some devices (in userspace) after suspend, but since there are several ways suspend can be triggered (short of echo foo > /sys/power/state) there's no common way to do that [21:00] ogasawara, pushed a patch for bug #897795 [21:00] Launchpad bug 897795 in linux "-virtual kernel missing rtl8139 drivers" [High,In progress] https://launchpad.net/bugs/897795 [21:01] tgardner: cool thanks. I'll make sure it's prepped for upload on Thurs. [21:04] ideas? [21:10] jmais, unsure if there is anything specific, you might run a udev monitor and see if it sees anything useful === ogasawara changed the topic of #ubuntu-kernel to: Home: https://wiki.ubuntu.com/Kernel/ || Ubuntu Kernel Team Meeting - Tues Dec 06 - 17:00 UTC || If you have a question just ask, and do wait around for an answer! [21:59] Hi all, I'm trying my luck at debugging a kernel oops that has been runing my day for about a week now.... [21:59] i am assuming I will need to build my own kernel with debugging symbols in order to make any headway? tips and guidance greatly appreacated. [22:00] Is there some documentation I can start chewing on? bug 897883 btw [22:00] Launchpad bug 897883 in linux "Kernel Oops" [Undecided,New] https://launchpad.net/bugs/897883 [22:15] awsoonn, debugging symbols for all kernels is available at ddeb.ubuntu.com [22:16] awsoonn, are you using alsa-sink ? [22:17] awsoonn, if this started about a week ago you may have had a kernle update in that time, it may be worth going back to an older kernle and seeing if that stops the panics [22:18] awsoonn, you have 3.0.0-13 in your panic string so perhaps 3.0.0-12 which you should still have [22:19] any comments on bfq vs cfg scheduling? [22:20] more specifically whether a system would support one and not the other? === jmaiss_ is now known as wamty [22:21] wamty, cfg? cfq ? [22:25] apw I was assuming it had somethign to do with teh update [22:26] awsoonn, yep added some info to to the bug, but basically we need to go back till you find the first stable kernel [22:26] apw, is the 'panic string' you're refering to the "... tanted: P C ..." [22:26] awsoonn, and report the last good and first bad kernel, so we have an area to search in [22:27] i am keying there off the 'IP:' lines, of which there is one per panics [22:27] kk, i'll hop on over... I've got to do some trickery with virtualbox though. I've recently upgraded it and the dkms module is preventing me from booting the old kernel. :( [22:29] btw, thanks for teh help [22:30] I've always wanted to get my hands into the kernel, this seemed like a good chance. :) [23:08] I made some kernel updates using update manager and I canĀ“t access to ubuntu anymore.. grub screen changed to a black screen and when asking for my login and password does not accept them..Does anyone knows how to solve it? Could you guys help me? Thanks..