[00:02] dyek: no immediate ideas, it should work, I made a note to check on it [00:03] jjohansen: OK. Thank you! [04:39] hello, is it possible to use the ubuntu kernel build system to build kernel packages and then not start from scratch if I modify a file on when building for the 2nd time? [04:41] the make-kpkg clean purges all the old work [04:54] EruditeHermit: if you are using make-kpkg, then you aren't using the ubuntu kernel build system [04:54] BenC, I am using the instructions on this page. https://wiki.ubuntu.com/KernelTeam/GitKernelBuild [04:55] BenC, could you please advise me as to the best way to build kernels then? [05:02] a link to instructions would be enough [05:41] hello, could anyone point me to the current method for building kernel packages on Ubuntu? [05:53] EruditeHermit: check the build section at http://wiki.ubuntu.com/KernelTeam/KernelMaintenance [05:53] EruditeHermit: but, in general: fakeroot debian/rules binary-arch [05:57] but it depends what you're trying to do: build using the ubuntu sources, or build an upstream (or other) kernel tree into a .deb ? [05:57] jk-, I am trying to build an upstream kernel from git [05:57] and make a deb package out of it [05:58] ok, then make-kpkg is probably the best way then. [05:58] (as you're doing..) [05:58] however [05:58] when I patch some of the files [05:58] I don't want to rebuild the whole kernel [05:58] just that one file [05:59] then don't do the clean? [05:59] hmm I think it complained [05:59] but I will try again [05:59] it was a while back [05:59] so I was doing this [05:59] https://wiki.ubuntu.com/KernelTeam/GitKernelBuild [06:00] so you are saying I can make changes to the source [06:00] and then just do [06:00] CONCURRENCY_LEVEL=`getconf _NPROCESSORS_ONLN` fakeroot make-kpkg --initrd --append-to-version=-custom kernel_image kernel_headers [06:00] and it will work? [06:00] it should [06:00] ok [06:00] i will try [06:01] will the fakeroot debian/rules binary method also work? [06:01] I remember that was a quick way to build non kernel packages [06:01] but i've never tried it for the kernel [06:03] ah debbuild too [06:04] use make-kpkg if you're not building from an ubuntu source tree [06:04] upstream kernels don't have a debian/ directory [06:05] yeah [06:05] but once they are build once with make-kpkg does it not create the debian dir? [06:12] jk-, once they are build once with make-kpkg does it not create the debian dir? [06:12] yeah, it does the necessary stuff [06:14] so after one make-kpkg build can you use the fakeroot debian/rules binary method? [06:23] EruditeHermit: probably best to stick with one method [06:24] ok [06:24] you seem to be right [06:24] it seems to continue building [06:32] worst hotel internet ever [06:32] jk-: that bad? [06:33] worst hotel internet I ever had was just about unusable (really slow) and would die every 5 min, and then it was a crapshoot at reaquiring [06:34] it was completely worthless, worse than having no internet because you wasted time trying [06:34] just keeps losing the route; i seem to have a good connection to the AP [06:34] true :) [06:34] i guess it's "worst" because I paid for it this time :) [06:45] ah yeah that sucks [07:45] jk-, hey, I get the following error, can you help me rectify it? http://pastebin.com/YwAJMAws [07:46] I think it has something to do with the fact that I tried to compile a different branch first [07:46] which had the 2.6.29 kernel [07:46] but I did do a make-kpkg clean before I tried to compile the new one [09:56] jk-, hey are you about? it doesn't recognize that I patched files and therefore doesn't create new packages === _ruben_ is now known as _ruben === sconklin-gone is now known as sconklin [14:32] cnd, not with preeemption. whats the wrinkle? [14:33] tgardner: so there's a bug where a process is scheduling while atomic [14:33] preempt_count is 0x10000100 [14:34] meaning 1 soft irq count + PREEMPT_ACTIVE [14:34] so I'm thinking that a cond_resched was likely called during a softirq or tasklet [14:34] cnd, is this in a particular driver? [14:34] the thing is that the stack trace from the "scheduling while atomic" bug doesn't seem accurate [14:34] it lists the swapper task [14:35] a tasklet can reschedule, but not a softirq [14:35] and the swapper task is in the middle of the acpi_idle_simple call [14:35] a tasklet is run on top of a softirq... [14:35] whats the one run as a kernel threadcalled? [14:35] workqueue? [14:36] ok, I always have to go look. [14:36] yep, work queue, just checked [14:36] cnd, dunno, maybe ask smb. I'm a little rusty in that stuff. [14:37] so, I saw this thread: http://lkml.indiana.edu/hypermail/linux/kernel/0904.1/01762.html (that's the last email in the thread, read previous replies as necessary) [14:37] and they used function tracing to figure out what the stack was before the scheduling call [14:37] but I would have thought that the stack listed in the "scheduling while atomic" bug would have been correct [14:38] since we use 8K stacks [14:38] cnd, good luck with that. I gotta get some other stuff done before I get wound up in that. [14:38] heh [14:38] tgardner: if you have deep questions like that, where would you turn to get them answered? [14:38] apw, perhaps jj, smb [14:39] * smb has now idea [14:39] smb, you have no idea? [14:40] or you now have an idea :) [14:40] cnd gcc doens't always record enough information to make stack back tracking poissible [14:40] cnd, Oh, yes. That is more it. [14:40] That I have no idea [14:40] everything makes sense if there's a separate task stack for soft irqs [14:40] though i think if you turn on function tracing in ftrace (a build option) i think all functions end up with frames else you can't function trace them either [14:40] but that's now how we have our kernel set up [14:40] cnd: _kmalloc way be the one doing cond_resched [14:41] I'm currently building a kernel with CONFIG_DEBUG_SPINLOCK_SLEEP, which also adds sleep checking while atomic when cond_resched or might_sleep are called [14:42] that gives us the stack trace of current as opposed to the stack trace of rq->curr given by the "scheduling while atomic" bug [14:42] so maybe that will give us the stack trace of the offending softirq [14:43] apw, I can't build a new kernel through fdr binary-generic: [14:43] cnd: why? [14:43] dpkg-deb - error: Debian revision (`debug') doesn't contain any digits [14:43] cnd how so? [14:43] dpkg-deb: 1 errors in control file [14:43] dh_builddeb: dpkg-deb -Zbzip2 -z9 --build debian/linux-image-2.6.32-16-generic .. returned exit code 2 [14:43] dpkg-deb - error: Debian revision (`debug') doesn't contain any digits [14:43] that seems pretty clear [14:43] did you add 'debug' to the end of your version? [14:43] apw, isn't that something you recently changed though [14:43] oh, I think I know why [14:43] nope nothing i would have done [14:44] I did ~lpspinlock-debug [14:44] right so the - means two versions [14:44] apw, I was thinking of your debug package renaming [14:44] nope [14:44] just take out the - don't think that is legal [14:44] yeah, trying again [14:45] jjohansen: why what? [14:45] the debug stuff just covered with apw [14:45] k [14:45] we assume the format of that in the packaging and a - busts everything, don't do that, it hurts [14:46] oh, I think I get it now [14:46] well... I need to think some more [14:47] * cnd is still trying to figure out why rq->curr may be a different task than what's running on the processor [14:48] apw, I noticed that there isn't an unreleased version at the top of the ubuntu-lucid git tree last night [14:48] should there be one? (whenever I've pulled to a new version there was one) [14:48] cnd, he hasn't done a getabi yet [14:48] cnd quite normal while i am waiting for the builds to complete [14:48] ahh [14:49] ok [14:49] just do foo10 instead of ~foo10 [14:49] until the start new release appears [14:49] ok [14:55] cnd: well is it a different task though? the whole irq, softirq, tasklet stuff runs ontop of the current task, [14:55] jjohansen: yeah, that's why I'm confused as to what's going on [14:56] if the softirq is running on top of rq->curr, we should see a meaningful stack trace in the bug [14:56] but we aren't [14:57] cnd: the stack trace isn't even always right when not in irq [14:58] jjohansen: because of the way it's compiled, or because of some kernel internal stuff? [14:59] cnd: both of those, obviously compiling without stackframes makes stack traces a lot less accurate [15:00] but compiler optimizations can delay when you would expect the sp to be updated etc [15:01] the stacktrace should only be considered a best effort [15:02] I'm guessing that's why the mailing list thread I saw used function tracing to get a more accurate trace [15:03] cnd: yeah [15:04] jjohansen: ok, I have an idea of how to go about getting definitive information out of the bug I'm working [15:04] thanks [15:22] <_stink_> anyone know where I can find .debs of previous 2.6.32 versions for lucid? it seems that -16 has broken virtualbox guest additions, and i'd like to go back to -15, but it's not in the repo anymore. [15:22] <_stink_> this install was just done today, so i don't have -15 installed locally and available via grub. [15:27] anyone heard of any problems with the lucid livecd kernel / mdadm / kvm creating hangs / crashes? [15:28] i was trying to install lucid from the livecd onto a md partition (all on a kvm guest machine), and after about 30-240 seconds the kernel just hard crashes.. [15:29] i looked on google / lkml and nothing obvious appeared.. [15:31] kvm crashing, yes if it was 2.6.32-15 kernel [15:32] any suggestions? deboostrap? [15:32] though that sounds like it was sooner than your issue [15:32] and it was resolved in later kernels, which one did you test with [15:33] just a second... i'll get the version, it was from 2010-03-08 iso image [15:36] Linux ubuntu 2.6.32-15-generic #22-Ubuntu SMP Tue Mar 2 02:23:29 UTC 2010 x86_64 GNU/Linux [15:36] as that was -15 i would retest with a later image containing the -16 kernel, there was some bad kvm patches in that version [15:37] well the kvm host is running off jaunty, the guest is running lucid.. [15:38] hmmm i'll try with a karmic server install and debootstrap to lucid and see how it goes... [15:39] weird thing is there was no oops, the kvm host didn't complain.. the virt machine just died, no logs, no errors nothing.. [15:39] almost like a triple fault or something... [16:20] cking: ping [16:21] cnd, yo [16:21] do you have any thoughts on the TSC issue for non-arrandale hw? [16:21] cnd, like it can be screwed up on them too? [16:21] cking: bug 535077 [16:21] Malone bug 535077 in linux "WARNING: at /build/buildd/linux-2.6.32/kernel/trace/ring_buffer.c:1984 rb_add_time_stamp+0x20c/0x220()" [Medium,Triaged] https://launchpad.net/bugs/535077 [16:21] a few dupes of that bug already [16:22] booting with notsc fixed the issue [16:22] and the ts in the bug begins with 0xfffffff [16:22] so it looks very similar to what you found on the arrandale procs [16:22] yep - it's an issue apparently on some CPU's after coming out of S3 - and is listed in some Errata lists [16:23] cking: so it's not just arrandale? [16:23] cnd, apparently [16:23] cking: what is the impact of notsc, would it be worth it to just disable tsc by default? [16:23] cnd, I posted a patch in the mailing list that can work around this bug - it stops an overflow [16:24] cking, even if it stops the overflow, the register still isn't correct right? [16:24] won't it still cause issues? [16:24] cnd, yep, it's still get screwed. However,it may be worth seeing if a microcode update fixes it - this normally means getting a new BIOS which applies the microcode on boot [16:25] cking: so what's the impact of notsc, and what if we just disabled it by default? [16:25] is there some performance impact? [16:26] cnd, I think it disables hi-res time of day measurements. [16:26] cnd, I really cannot see much of an impact - but this needs looking into - I was thinking we should disable TSC for L+1 [16:26] tgardner, some I/O delay loop code uses it for sure [16:26] cking, disable on all 32 bit platforms ? [16:27] tgardner, not sure - I'd like to know which processors get the problem - it means surveying all the errata sheets :-( [16:27] it may depend on which machines have upto date microcode too [16:28] cking: so your patch will disable the oops message, right? [16:29] but it's just for the oops message you've been tracking? [16:29] cnd, the patch will fix the overflow, which causes the softlockup code to not produce stupid messages when the TSC warps to 0xffffffffxxxxxxxx [16:29] cnd, nope, I've also been trying to see if there is a fix to the generic TSC warp issue [16:29] (when coming out of S3) [16:30] cking: ok, so for now what should we be doing when we see bugs that are caused by the TSC warp [16:30] should I dupe them of some bug you are working on? [16:31] or just note that it's harmless, but you can boot with notsc if you want to get rid of them? [16:31] cnd, yes, duping it would be useful - and note it's harmless and boot with notsc [16:31] cking: what bug would you like me to dupe to? [16:31] * cking looks [16:33] cnd, how about bug 530487 - I addressed that yesterday [16:33] Malone bug 530487 in linux "BUG: soft lockup - CPU#2 stuck for 0s! [firefox-bin:1751] (during suspend/resume)" [Low,Fix released] https://launchpad.net/bugs/530487 [16:33] cking, ok thanks [16:33] let me modify the title of it to make it a little more generic too [16:34] done [16:35] cnd, since you deal with these more than me, can you get a list of CPUs which get hit by this TSC issue? [16:35] cking, I wonder if there is a way to test for this bogosity. Perhaps drive through an S3 state with an alarm wakeup during boot. [16:36] that'd just about wreck boot performance :) [16:36] cking, I can try to make a list, where do you find errata for procs? [16:36] tgardner, and it may wreck the user experience if the machines have a buggy resume [16:38] I suppose we need to see what the ramifications of totally disabling it - is anything really that dependant on such precise timing? [16:38] cking, how about udec delays? [16:38] usec* [16:39] tgardner, well, there are the traditional busy loops method, or by looping and checking the TSC to my knowledge - the former is just as valid isn't it? [16:40] my concern is when you have a usec delay on a TSC that warps backwards over a S3 cycle - that could lead to some weirdnesses [16:41] cking, its been awhile, but wasn't one of the ways by reading an ISA port (which assumed a constant response time) ? [16:42] tgardner, not sure - things change and I get forgetful on these details [16:56] cking, do you have a url for an errata for the tsc issue? [16:58] tgardner, the delayed io operations ultimately use __udelay, which can use TSC, or a delay loop. The TSC method does RT pre-emption, so notsc does have some effect on latency issues [16:59] cnd, only for Arrandale CPUs [16:59] cking: that's fine for now [16:59] I just would like to see what it says [16:59] cking, so the mythtv guys will have a cow since most of them run 32 bit. [17:00] tgardner: I don't think they will (I'm one of them :) [17:00] tgardner, yep, so I think we disable tsc if it bugs users on the S3 case [17:00] ..but on a need-to-do-so basis rather than a blanket notsc default [17:00] I've never noticed anything get terribly better when monkeying with higher prio stuff that would do preemption like that I think [17:01] cking, agreed [17:02] cking: what if we just added a check for a huge out of bounds count in native_sched_clock: http://lxr.linux.no/linux+v2.6.33/arch/x86/kernel/tsc.c#L43 [17:02] cnd, well, you know, somebody may want a udelay(1) to be pre-emptied because the iodelay is sucking away too many cycles! [17:02] if so, set tsc_disabled = 1, and get the clock from the jiffies? [17:03] cnd, this scares me, I'm not sure if disabling it like this is a good idea on a fully running system with all CPUs in action [17:03] worth an experiment [17:04] cking: if the tsc warping isn't causing issues, how is doing that going to cause issues? [17:06] cnd, I've seem it causes issues on Hardy on an Atom - a I/I delay on a hyperthread that got pushed to another CPU got confused when the TSC was skewed [17:06] I/I delay? [17:07] cking: if it got confused due to the tsc being off, how could it be worse if the jiffies result is off? [17:08] cnd, jiffies don't generally warp backwards [17:08] cking: isn't that a good thing in this case? [17:09] we don't want the clock to warp backwards [17:09] cnd, very true [17:09] * cking needs to think harder [17:10] the only issue I can see is when the switch occurs, if the jiffies calculation result would end up in a warp backwards relative to what the tsc result had been giving [17:10] I don't know what the likelyhood of that occurring is [17:10] cnd, it happened - it caused me pain [17:10] maybe we prevent it from switching to jiffies calculation if the tsc is unstable? (there's a var for that) [17:11] cking: what do you mean by "it caused me pain", you didn't try this out yet did you? [17:12] cnd, I saw it happen on resume one in a few hundred cycles and figuring out what was happening w/o a console caused me pain [17:13] cking, that's just from basing clock on tsc, I'm wondering how likely it is that switching from tsc to jiffies calculation would result in a warping [17:14] cnd, good question - no idea. This needs some careful examination of the code [17:15] * cking is exhausted of any more clock/TSC/delay knowledge. The devil is in the source code [17:15] ;-) [17:15] heh === bjf is now known as bjf-afk === Hedge|Hog is now known as Hedgehog === Hedgehog is now known as Guest83664 === sconklin is now known as sconklin-away === bjf-afk is now known as bjf === fabbione is now known as fabbione_vac