/srv/irclogs.ubuntu.com/2014/12/01/#ubuntu-kernel.txt

infinityzequence: Friendly reminder about https://bugs.launchpad.net/ubuntu/+source/linux-lowlatency/+bug/139619307:43
ubot5Launchpad bug 1396193 in Kernel SRU Workflow "linux-lowlatency: <version to be filled> -proposed tracker" [Medium,In progress]07:43
zequenceinfinity: Thanks. Had a busy weekend. All waiting to build in the ppa.11:06
NikThapw: Hello, if you have time check this : https://bugs.launchpad.net/linux/+bug/1386695 . Thanks14:56
ubot5Launchpad bug 1386695 in Linux "[3.16.0-23] Resume from suspend/hibernation, GPU lock - possible regression" [Medium,Confirmed]14:56
apwNikTh, wassup?  the end of that implies you are testing a -25 kernel, but the one in proposed with the fixes applied claims to be a -26, did you get a test with -26 at all ?15:24
NikThapw: No. I didn't test with -26. I didn't get the -26 update when I updated the system today. Weird. Proposed are enabled, that's for sure. 15:53
NikThapw: I just get the -26 update, tested it and unfortunately it does not work. 16:10
NikThs/get/got16:11
=== spideyman is now known as Guest68406
=== arges` is now known as arges
apwNikTh, well say that, and they will rip out the fixes for this update16:29
apwNikTh, not really sure what one is supposed to do if a bios update changes things16:29
NikThapw: This is the latest version of BIOS available. I would update it either way, someday. The thing is your first kernel now works. But not the second. In comment #33 I have listed the kernels I tested.16:35
apwNikTh, right which is the opposite of what you reported before the update, which is confusing at best16:35
NikThapw: correct. Although I didn't see any reference to ACPI or similar at release notes of BIOS. Here you can read the latest update (http://www.msi.com/support/mb/880GMAE45.html#down-bios_16:37
=== cmagina_ is now known as cmagina
apwNikTh, well teh right thing to do is mark it verification-failed, and we'll have to start again16:46
NikThapw: start again ? all over ? with this kernel bisecting thing ? Haha, I don't have the time. I have the time to test any kernel you (or anyone else) produce, but to follow this procedure all over again..hmm, a bit difficult. 16:50
apwNikTh, well i think it is likely the patches we found before will do the trick, but we need this fix removed, as it does not work, then start again on top of the new release16:51
NikThapw:  OK, but your first kernel Works ( I have the links in comment #33). Should I remove the #verification-needed-utopic and replace it with the #verification-failed tag ?16:52
apwNikTh, verification-failed-utopic probabally but yes16:53
apwbjf, ^ looks like we have a verification failure on utopic16:53
NikThverification-failed is a ready tag as I can see. But should I remove the other one or leave it as it is ? (both)16:54
bjfapw, bug #?16:55
apwhttps://bugs.launchpad.net/linux/+bug/138669516:56
ubot5Launchpad bug 1386695 in linux (Ubuntu Vivid) "[3.16.0-23] Resume from suspend/hibernation, GPU lock - possible regression" [Medium,Triaged]16:56
apwseems a bios update has changed the bug to not need the fixes found by the preceeding testing16:56
apwbut to need different ones16:56
NikThI'll leave @penalver to handle the tags because it seems we edit them at the same time.. a bit confusing. :-)16:58
NikThapw: right. But I still believe for such bugs the latest BIOS update should be tested. Not an "outdated" one. 17:00
apwNikTh, oh indeed, just that it wasn't and has had an effect on update, makes all the work we did before, wrong and moot17:01
NikThapw : Indeed  17:03
apwand indeed will create a crap-load of work for stable to remove those broken fixes, and then we have to start again once we have a new base, sigh17:03
NikThapw: I thought it was easier to merge the patches from the first kernel, the one that now works (with the new BIOS) rather to start all over again. 17:06
NikThThis one works : http://people.canonical.com/~apw/lp1386695-utopic/17:06
apwyes, it likely is, but those are on the wrong base so i have to wait for stable to revert and respin their tree, and for that to make it out, then i can start retesting those again on top of that base, as that is where it will need to be17:07
NikThAh, so you will probably release the fix when ? With the happy new year ? :P 17:08
NikThapw: I have to go. Sorry for the extra work, but this time we (you) will fix it once and for all. There is other BIOS update available :-) 17:13
bjfapw, i'm looking at those two commits. do you feel like we should revert them. i'm leaning that way17:15
apwbjf, as they don't clearly fix the bug, i don't see how we can not revert them17:16
apwmuch as i hate to do it17:16
bjfapw, ack, i'm dealing with it. henrix, looks like i'm respinning utopic and lts-utopic17:29
henrixbjf: fun!  /me goes read backlog17:30
=== broder__ is now known as broder
=== slangase` is now known as slangasek
=== Guest72231 is now known as ypwong
=== Trevinho_ is now known as Trevinho
=== psivaa_ is now known as psivaa
binBASHtinoco arges present?21:48
argeshi21:48
binBASHhi21:48
binBASHI'm trying to help with https://bugs.launchpad.net/ubuntu/+source/linux/+bug/131855121:48
ubot5Launchpad bug 1318551 in linux (Ubuntu) "Kernel Panic - not syncing: An NMI occurred, please see the Integrated Management Log for details." [High,Confirmed]21:48
binBASHSo I've read now at https://wiki.ubuntu.com/Kernel/CrashdumpRecipe how to produce dump21:49
argesyea I've just started to look at the bug, tinoco has done the most research at this point21:49
binBASHcat /sys/kernel/kexec_crash_loaded 21:50
binBASH121:50
binBASHthe question now would be how to reproduce that bug :)21:50
tinocobinBASH: keeping machine idle21:50
binBASHas it occurs for me mostly at boot21:50
tinoco:(21:50
tinocobinBASH: if you are using intel_idle (and you are, by default probably)21:50
tinocoand HP proliant servers21:51
tinocohaving CPU idle will trigger the problem21:51
tinocobinBASH: if you are trying to reproduce this21:52
binBASHdoes governor matter?21:52
tinocobinBASH: make sure to test kdump tool before21:52
binBASHBecause I'm using powersave currently21:52
tinocobinBASH: keeping CPU idle will trigger C-states to go lower and lower21:52
tinocoP-states will go lower on C1E state.. 21:52
tinoco(frequency governor)21:52
tinocoC-states will shutdown parts of the core21:53
tinocoand that is where the problem is 21:53
binBASHOk, so I should probably test this without running X :)21:53
binBASHtinoco: thing is I saw at also at running servers in my datacenter. But mostly directly during bootup :)21:54
tinocobinBASH: we might be facing 2 separate things 21:54
tinocoon the same case.. i´ll split them if thats the case21:54
tinoconow im focused on proliant 360/380 panics due to NMI being triggered 21:55
binBASHProbably, but the question for me is, how to get crashdump when it crash at boot. Dunno if this is possible :)21:55
tinoco(original case description)21:55
tinocobinBASH: can´t u even boot ? 21:55
binBASHyup21:55
argesbinBASH: it depends on when at 'boot' it crashes. I think if you can use serial consoles and add 'debug' to the grub options it may give us more information21:56
tinocobinBASH: curious though.. this bug is intermittent and usually does not happen @ boot21:57
tinocosince all cpus are going to be powered on and on C0 or C1E state21:57
tinocobinBASH: you can use the workaround i provided . to be started as a init script21:58
tinoco(keeping all cpus under C1E max state)21:58
tinocoand then.. turn the workaround off21:58
tinocoand wait for the dump21:58
binBASHOk, I will look into this the next days. If I can manage to get a trace/dump :)22:03
binBASHThe funny thing is tinoco, if it occurs after boot, probably after some days. There is message output to screen every few seconds22:04
binBASHwith some stacktrace :)22:04
binBASHThe boot crashs are more like russian roulette. But with only one bullet taken out :)22:06
tinocobinBASH: thats what NMI are all about22:06
tinocoand CPU general faults22:06
tinocoif you have a double/triple fault22:06
tinocoyou have a panic22:06
tinocobinBASH: do you have a stack trace example22:06
tinocoto show me ?22:06
tinocobinBASH: id like to make sure you are not suffering from x2apic bug also22:06
tinocosimilar stack trace.. 22:07
binBASHNot yet, because I don't know how to get it :)22:07
binBASHI could take camera and make pic :D22:07
tinocoif you get the beginning of stack trace22:07
tinocoworks for me :)22:08
tinocoif the stack trace is huge.. you are getting the first lines.. not the latest frames (the one i need)22:08
binBASHCould make movie as well :)22:09
tinocobinBASH: lol22:10
binBASHIf you have better idea...22:10
tinocobinBASH: are u having this on a proliant ?22:11
tinocoOR on a gigabyte based motherboard ?22:11
tinococause i really think you case is different22:11
tinoco(we talked before, right ?)22:11
binBASHI have it on multiple boards22:11
binBASHnot only Gigabyte22:11
binBASHWait, I look what is the other22:11
tinocobinBASH: ALL your machines are workaround by the use of ¨noautogroup¨ ?22:12
tinocobinBASH: can i ask you to open a different bug (if not proliant servers)22:13
tinocoand tell me the bug #22:13
tinoco?22:13
binBASHIntel S2600CP22:13
tinocobinBASH: i really think we are dealing with different cases on this bug22:13
binBASHis the other one22:13
tinocobinBASH: this way you can open a new bug and attach the core file on it22:14
tinocofor me to investigate in parallel22:14
argesbinBASH: you can use from the Ubuntu machine 'ubuntu-bug linux' to gather relevant information22:14
binBASHtinoco: the problem is, I don't know how to get the core file :D22:15
tinocobinBASH: just a sec22:16
tinocoill paste it to you22:16
tinocojust made an example22:16
tinocobinBASH: http://paste.ubuntu.com/9336310/22:17
tinocoan example for precise.. 22:17
tinocobut its pretty similar if not the same for trusty22:17
tinocobinBASH: could you open a new bug22:18
tinocokeep a good description on whats happening22:18
tinocoand attach the /var/crash/* after the dump was created ?22:18
binBASHsure22:18
tinocothen i´ll assign myself to it 22:18
tinocojust so i can keep cases separate 22:18
binBASHthe problem will be probably, that it occurs before file systems are mounted :)22:19
binBASHbut we'll see...22:19
binBASHbecause tinoco I have /var/crash/* stuff22:19
binBASHbut not from that boot crashs22:19
tinocobinBASH: checking if noautogroup can be changed online somehow22:20
tinocoinaddy@workstation:~$ sudo sysctl -a | grep autogrou22:21
tinocokernel.sched_autogroup_enabled = 122:21
tinocoyou can boot with ¨noautogroup¨22:21
tinocoand enable it online22:21
tinocoso cores are generated (for your cause)22:21
tinocoit might work22:22
tinocothis commit: https://lkml.org/lkml/2011/2/20/10 enabled it to be a runtime flag22:22
binBASHok, I've enabled it now22:22
binBASHthough no crash yet :)22:22
tinocowhile true; do ps -ef; sleep 1; done22:22
tinoco:o) to create some work22:23
binBASHlet's hope it will trigger anything22:24
tinocobinBASH: i´m leaving now. please let me know when you open the new bug22:38
tinocoand when/if you could provide core dumps.. 22:38
tinocoin your case i think core dumps are going to be needed (not only pictures :()22:38
tinocotks ;)22:38
binBASH:-)22:38
binBASHI really would like to, but I think no way when it crashs during boot and no fs initialized :)22:39
tinocolets see if enabling this runtime works22:39
tinocowe might trigger some ¨logic¨ after sometime22:40
tinocoand for some reason it is being triggered @ the boot22:40
binBASHYeah, I've atted it now to sysctl.conf so it will be enabled by default here22:40
tinocolets see..22:40
binBASHadded22:40
tinocogood22:40
tinocoi´ll be back tomorrow 22:40
tinocolet me know when you´ve opened the other bug22:40
tinocoso i can assign myself to it22:40
tinoco;)22:40
binBASHYup, I will22:40
tinococya guys .. bb tomorrow22:40
binBASHtinoco: I mean, I'm fine with that noautogroup22:40
binBASHif it has no negative result.22:40
tinocobinBASH: yep.. i would like to investigate this22:41
tinocoif you can help us on reproducing/opening the bug22:41
tinocoit would be awesome22:41
tinocosince others can be facing this also22:41
tinocoand we don´t know the deep of this22:41
tinoco;)22:41
binBASHand the question is, why it appeared suddenly at 3.8 kernel :D22:42
binBASHanyways, sleep well tinoco and thx for additional hints22:42
tinocobinBASH: yep.. this is even better in case i have to bisect something for you22:42
tinocoif you have a ¨good¨ vs ¨no good¨22:42
tinocoi can provide bisection for you to test22:42
tinocoand give me feedback22:42
tinoco(maybe 10 kernels to be tested ? :o)22:43
binBASHwould be np22:43
tinocogreat, lets do this then.. if you can´t get a core22:43
tinocofill the bug and let me know22:43
tinocoi can start a bisection for you22:43
tinocoand you test the kernels i generate22:43
binBASHOk, will do it tomorrow evening22:43
tinocogreat ;) 22:43
tinocotks binBASH, talk to you tomorrow then22:43
binBASHsleep well, bye22:43

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!