[07:43] zequence: Friendly reminder about https://bugs.launchpad.net/ubuntu/+source/linux-lowlatency/+bug/1396193 [07:43] Launchpad bug 1396193 in Kernel SRU Workflow "linux-lowlatency: -proposed tracker" [Medium,In progress] [11:06] infinity: Thanks. Had a busy weekend. All waiting to build in the ppa. [14:56] apw: Hello, if you have time check this : https://bugs.launchpad.net/linux/+bug/1386695 . Thanks [14:56] Launchpad bug 1386695 in Linux "[3.16.0-23] Resume from suspend/hibernation, GPU lock - possible regression" [Medium,Confirmed] [15:24] NikTh, wassup? the end of that implies you are testing a -25 kernel, but the one in proposed with the fixes applied claims to be a -26, did you get a test with -26 at all ? [15:53] apw: No. I didn't test with -26. I didn't get the -26 update when I updated the system today. Weird. Proposed are enabled, that's for sure. [16:10] apw: I just get the -26 update, tested it and unfortunately it does not work. [16:11] s/get/got === spideyman is now known as Guest68406 === arges` is now known as arges [16:29] NikTh, well say that, and they will rip out the fixes for this update [16:29] NikTh, not really sure what one is supposed to do if a bios update changes things [16:35] apw: This is the latest version of BIOS available. I would update it either way, someday. The thing is your first kernel now works. But not the second. In comment #33 I have listed the kernels I tested. [16:35] NikTh, right which is the opposite of what you reported before the update, which is confusing at best [16:37] apw: correct. Although I didn't see any reference to ACPI or similar at release notes of BIOS. Here you can read the latest update (http://www.msi.com/support/mb/880GMAE45.html#down-bios_ === cmagina_ is now known as cmagina [16:46] NikTh, well teh right thing to do is mark it verification-failed, and we'll have to start again [16:50] apw: start again ? all over ? with this kernel bisecting thing ? Haha, I don't have the time. I have the time to test any kernel you (or anyone else) produce, but to follow this procedure all over again..hmm, a bit difficult. [16:51] NikTh, well i think it is likely the patches we found before will do the trick, but we need this fix removed, as it does not work, then start again on top of the new release [16:52] apw: OK, but your first kernel Works ( I have the links in comment #33). Should I remove the #verification-needed-utopic and replace it with the #verification-failed tag ? [16:53] NikTh, verification-failed-utopic probabally but yes [16:53] bjf, ^ looks like we have a verification failure on utopic [16:54] verification-failed is a ready tag as I can see. But should I remove the other one or leave it as it is ? (both) [16:55] apw, bug #? [16:56] https://bugs.launchpad.net/linux/+bug/1386695 [16:56] Launchpad bug 1386695 in linux (Ubuntu Vivid) "[3.16.0-23] Resume from suspend/hibernation, GPU lock - possible regression" [Medium,Triaged] [16:56] seems a bios update has changed the bug to not need the fixes found by the preceeding testing [16:56] but to need different ones [16:58] I'll leave @penalver to handle the tags because it seems we edit them at the same time.. a bit confusing. :-) [17:00] apw: right. But I still believe for such bugs the latest BIOS update should be tested. Not an "outdated" one. [17:01] NikTh, oh indeed, just that it wasn't and has had an effect on update, makes all the work we did before, wrong and moot [17:03] apw : Indeed [17:03] and indeed will create a crap-load of work for stable to remove those broken fixes, and then we have to start again once we have a new base, sigh [17:06] apw: I thought it was easier to merge the patches from the first kernel, the one that now works (with the new BIOS) rather to start all over again. [17:06] This one works : http://people.canonical.com/~apw/lp1386695-utopic/ [17:07] yes, it likely is, but those are on the wrong base so i have to wait for stable to revert and respin their tree, and for that to make it out, then i can start retesting those again on top of that base, as that is where it will need to be [17:08] Ah, so you will probably release the fix when ? With the happy new year ? :P [17:13] apw: I have to go. Sorry for the extra work, but this time we (you) will fix it once and for all. There is other BIOS update available :-) [17:15] apw, i'm looking at those two commits. do you feel like we should revert them. i'm leaning that way [17:16] bjf, as they don't clearly fix the bug, i don't see how we can not revert them [17:16] much as i hate to do it [17:29] apw, ack, i'm dealing with it. henrix, looks like i'm respinning utopic and lts-utopic [17:30] bjf: fun! /me goes read backlog === broder__ is now known as broder === slangase` is now known as slangasek === Guest72231 is now known as ypwong === Trevinho_ is now known as Trevinho === psivaa_ is now known as psivaa [21:48] tinoco arges present? [21:48] hi [21:48] hi [21:48] I'm trying to help with https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1318551 [21:48] Launchpad bug 1318551 in linux (Ubuntu) "Kernel Panic - not syncing: An NMI occurred, please see the Integrated Management Log for details." [High,Confirmed] [21:49] So I've read now at https://wiki.ubuntu.com/Kernel/CrashdumpRecipe how to produce dump [21:49] yea I've just started to look at the bug, tinoco has done the most research at this point [21:50] cat /sys/kernel/kexec_crash_loaded [21:50] 1 [21:50] the question now would be how to reproduce that bug :) [21:50] binBASH: keeping machine idle [21:50] as it occurs for me mostly at boot [21:50] :( [21:50] binBASH: if you are using intel_idle (and you are, by default probably) [21:51] and HP proliant servers [21:51] having CPU idle will trigger the problem [21:52] binBASH: if you are trying to reproduce this [21:52] does governor matter? [21:52] binBASH: make sure to test kdump tool before [21:52] Because I'm using powersave currently [21:52] binBASH: keeping CPU idle will trigger C-states to go lower and lower [21:52] P-states will go lower on C1E state.. [21:52] (frequency governor) [21:53] C-states will shutdown parts of the core [21:53] and that is where the problem is [21:53] Ok, so I should probably test this without running X :) [21:54] tinoco: thing is I saw at also at running servers in my datacenter. But mostly directly during bootup :) [21:54] binBASH: we might be facing 2 separate things [21:54] on the same case.. i´ll split them if thats the case [21:55] now im focused on proliant 360/380 panics due to NMI being triggered [21:55] Probably, but the question for me is, how to get crashdump when it crash at boot. Dunno if this is possible :) [21:55] (original case description) [21:55] binBASH: can´t u even boot ? [21:55] yup [21:56] binBASH: it depends on when at 'boot' it crashes. I think if you can use serial consoles and add 'debug' to the grub options it may give us more information [21:57] binBASH: curious though.. this bug is intermittent and usually does not happen @ boot [21:57] since all cpus are going to be powered on and on C0 or C1E state [21:58] binBASH: you can use the workaround i provided . to be started as a init script [21:58] (keeping all cpus under C1E max state) [21:58] and then.. turn the workaround off [21:58] and wait for the dump [22:03] Ok, I will look into this the next days. If I can manage to get a trace/dump :) [22:04] The funny thing is tinoco, if it occurs after boot, probably after some days. There is message output to screen every few seconds [22:04] with some stacktrace :) [22:06] The boot crashs are more like russian roulette. But with only one bullet taken out :) [22:06] binBASH: thats what NMI are all about [22:06] and CPU general faults [22:06] if you have a double/triple fault [22:06] you have a panic [22:06] binBASH: do you have a stack trace example [22:06] to show me ? [22:06] binBASH: id like to make sure you are not suffering from x2apic bug also [22:07] similar stack trace.. [22:07] Not yet, because I don't know how to get it :) [22:07] I could take camera and make pic :D [22:07] if you get the beginning of stack trace [22:08] works for me :) [22:08] if the stack trace is huge.. you are getting the first lines.. not the latest frames (the one i need) [22:09] Could make movie as well :) [22:10] binBASH: lol [22:10] If you have better idea... [22:11] binBASH: are u having this on a proliant ? [22:11] OR on a gigabyte based motherboard ? [22:11] cause i really think you case is different [22:11] (we talked before, right ?) [22:11] I have it on multiple boards [22:11] not only Gigabyte [22:11] Wait, I look what is the other [22:12] binBASH: ALL your machines are workaround by the use of ¨noautogroup¨ ? [22:13] binBASH: can i ask you to open a different bug (if not proliant servers) [22:13] and tell me the bug # [22:13] ? [22:13] Intel S2600CP [22:13] binBASH: i really think we are dealing with different cases on this bug [22:13] is the other one [22:14] binBASH: this way you can open a new bug and attach the core file on it [22:14] for me to investigate in parallel [22:14] binBASH: you can use from the Ubuntu machine 'ubuntu-bug linux' to gather relevant information [22:15] tinoco: the problem is, I don't know how to get the core file :D [22:16] binBASH: just a sec [22:16] ill paste it to you [22:16] just made an example [22:17] binBASH: http://paste.ubuntu.com/9336310/ [22:17] an example for precise.. [22:17] but its pretty similar if not the same for trusty [22:18] binBASH: could you open a new bug [22:18] keep a good description on whats happening [22:18] and attach the /var/crash/* after the dump was created ? [22:18] sure [22:18] then i´ll assign myself to it [22:18] just so i can keep cases separate [22:19] the problem will be probably, that it occurs before file systems are mounted :) [22:19] but we'll see... [22:19] because tinoco I have /var/crash/* stuff [22:19] but not from that boot crashs [22:20] binBASH: checking if noautogroup can be changed online somehow [22:21] inaddy@workstation:~$ sudo sysctl -a | grep autogrou [22:21] kernel.sched_autogroup_enabled = 1 [22:21] you can boot with ¨noautogroup¨ [22:21] and enable it online [22:21] so cores are generated (for your cause) [22:22] it might work [22:22] this commit: https://lkml.org/lkml/2011/2/20/10 enabled it to be a runtime flag [22:22] ok, I've enabled it now [22:22] though no crash yet :) [22:22] while true; do ps -ef; sleep 1; done [22:23] :o) to create some work [22:24] let's hope it will trigger anything [22:38] binBASH: i´m leaving now. please let me know when you open the new bug [22:38] and when/if you could provide core dumps.. [22:38] in your case i think core dumps are going to be needed (not only pictures :() [22:38] tks ;) [22:38] :-) [22:39] I really would like to, but I think no way when it crashs during boot and no fs initialized :) [22:39] lets see if enabling this runtime works [22:40] we might trigger some ¨logic¨ after sometime [22:40] and for some reason it is being triggered @ the boot [22:40] Yeah, I've atted it now to sysctl.conf so it will be enabled by default here [22:40] lets see.. [22:40] added [22:40] good [22:40] i´ll be back tomorrow [22:40] let me know when you´ve opened the other bug [22:40] so i can assign myself to it [22:40] ;) [22:40] Yup, I will [22:40] cya guys .. bb tomorrow [22:40] tinoco: I mean, I'm fine with that noautogroup [22:40] if it has no negative result. [22:41] binBASH: yep.. i would like to investigate this [22:41] if you can help us on reproducing/opening the bug [22:41] it would be awesome [22:41] since others can be facing this also [22:41] and we don´t know the deep of this [22:41] ;) [22:42] and the question is, why it appeared suddenly at 3.8 kernel :D [22:42] anyways, sleep well tinoco and thx for additional hints [22:42] binBASH: yep.. this is even better in case i have to bisect something for you [22:42] if you have a ¨good¨ vs ¨no good¨ [22:42] i can provide bisection for you to test [22:42] and give me feedback [22:43] (maybe 10 kernels to be tested ? :o) [22:43] would be np [22:43] great, lets do this then.. if you can´t get a core [22:43] fill the bug and let me know [22:43] i can start a bisection for you [22:43] and you test the kernels i generate [22:43] Ok, will do it tomorrow evening [22:43] great ;) [22:43] tks binBASH, talk to you tomorrow then [22:43] sleep well, bye