[00:02] <jjohansen> dyek: no immediate ideas, it should work, I made a note to check on it
[00:03] <dyek> jjohansen: OK. Thank you!
[04:39] <EruditeHermit> hello, is it possible to use the ubuntu kernel build system to build kernel packages and then not start from scratch if I modify a file on when building for the 2nd time?
[04:41] <EruditeHermit> the make-kpkg clean purges all the old work
[04:54] <BenC> EruditeHermit: if you are using make-kpkg, then you aren't using the ubuntu kernel build system
[04:54] <EruditeHermit> BenC, I am using the instructions on this page. https://wiki.ubuntu.com/KernelTeam/GitKernelBuild
[04:55] <EruditeHermit> BenC, could you please advise me as to the best way to build kernels then?
[05:02] <EruditeHermit> a link to instructions would be enough
[05:41] <EruditeHermit> hello, could anyone point me to the current method for building kernel packages on Ubuntu?
[05:53] <jk-> EruditeHermit: check the build section at http://wiki.ubuntu.com/KernelTeam/KernelMaintenance
[05:53] <jk-> EruditeHermit: but, in general: fakeroot debian/rules binary-arch
[05:57] <jk-> but it depends what you're trying to do: build using the ubuntu sources, or build an upstream (or other) kernel tree into a .deb ?
[05:57] <EruditeHermit> jk-, I am trying to build an upstream kernel from git
[05:57] <EruditeHermit> and make a deb package out of it
[05:58] <jk-> ok, then make-kpkg is probably the best way then.
[05:58] <jk-> (as you're doing..)
[05:58] <EruditeHermit> however
[05:58] <EruditeHermit> when I patch some of the files
[05:58] <EruditeHermit> I don't want to rebuild the whole kernel
[05:58] <EruditeHermit> just that one file
[05:59] <jk-> then don't do the clean?
[05:59] <EruditeHermit> hmm I think it complained
[05:59] <EruditeHermit> but I will try again
[05:59] <EruditeHermit> it was a while back
[05:59] <EruditeHermit> so I was doing this
[05:59] <EruditeHermit> https://wiki.ubuntu.com/KernelTeam/GitKernelBuild
[06:00] <EruditeHermit> so you are saying I can make changes to the source
[06:00] <EruditeHermit> and then just do
[06:00] <EruditeHermit> CONCURRENCY_LEVEL=`getconf _NPROCESSORS_ONLN` fakeroot make-kpkg --initrd --append-to-version=-custom kernel_image kernel_headers
[06:00] <EruditeHermit> and it will work?
[06:00] <jk-> it should
[06:00] <EruditeHermit> ok
[06:00] <EruditeHermit> i will try
[06:01] <EruditeHermit> will the fakeroot debian/rules binary method also work?
[06:01] <EruditeHermit> I remember that was a quick way to build non kernel packages
[06:01] <EruditeHermit> but i've never tried it for the kernel
[06:03] <EruditeHermit> ah debbuild too
[06:04] <jk-> use make-kpkg if you're not building from an ubuntu source tree
[06:04] <jk-> upstream kernels don't have a debian/ directory
[06:05] <EruditeHermit> yeah
[06:05] <EruditeHermit> but once they are build once with make-kpkg does it not create the debian dir?
[06:12] <EruditeHermit> jk-, once they are build once with make-kpkg does it not create the debian dir?
[06:12] <jk-> yeah, it does the necessary stuff
[06:14] <EruditeHermit> so after one make-kpkg build can you use the fakeroot debian/rules binary method?
[06:23] <jk-> EruditeHermit: probably best to stick with one method
[06:24] <EruditeHermit> ok
[06:24] <EruditeHermit> you seem to be right
[06:24] <EruditeHermit> it seems to continue building
[06:32] <jk-> worst hotel internet ever
[06:32] <jjohansen> jk-: that bad?
[06:33] <jjohansen> worst hotel internet I ever had was just about unusable (really slow) and would die every 5 min, and then it was a crapshoot at reaquiring
[06:34] <jjohansen> it was completely worthless, worse than having no internet because you wasted time trying
[06:34] <jk-> just keeps losing the route; i seem to have a good connection to the AP
[06:34] <jk-> true :)
[06:34] <jk-> i guess it's "worst" because I paid for it this time :)
[06:45] <jjohansen> ah yeah that sucks
[07:45] <EruditeHermit> jk-, hey, I get the following error, can you help me rectify it? http://pastebin.com/YwAJMAws
[07:46] <EruditeHermit> I think it has something to do with the fact that I tried to compile a different branch first
[07:46] <EruditeHermit> which had the 2.6.29 kernel
[07:46] <EruditeHermit> but I did do a make-kpkg clean before I tried to compile the new one
[09:56] <EruditeHermit> jk-, hey are you about? it doesn't recognize that I patched files and therefore doesn't create new packages
[14:32] <tgardner> cnd, not with preeemption. whats the wrinkle?
[14:33] <cnd> tgardner: so there's a bug where a process is scheduling while atomic
[14:33] <cnd> preempt_count is 0x10000100
[14:34] <cnd> meaning 1 soft irq count + PREEMPT_ACTIVE
[14:34] <cnd> so I'm thinking that a cond_resched was likely called during a softirq or tasklet
[14:34] <tgardner> cnd, is this in a particular driver?
[14:34] <cnd> the thing is that the stack trace from the "scheduling while atomic" bug doesn't seem accurate
[14:34] <cnd> it lists the swapper task
[14:35] <tgardner> a tasklet can reschedule, but not a softirq
[14:35] <cnd> and the swapper task is in the middle of the acpi_idle_simple call
[14:35] <cnd> a tasklet is run on top of a softirq...
[14:35] <tgardner> whats the one run as a kernel threadcalled?
[14:35] <cnd> workqueue?
[14:36] <tgardner> ok, I always have to go look.
[14:36] <cnd> yep, work queue, just checked
[14:36] <tgardner> cnd, dunno, maybe ask smb. I'm a little rusty in that stuff.
[14:37] <cnd> so, I saw this thread: http://lkml.indiana.edu/hypermail/linux/kernel/0904.1/01762.html (that's the last email in the thread, read previous replies as necessary)
[14:37] <cnd> and they used function tracing to figure out what the stack was before the scheduling call
[14:37] <cnd> but I would have thought that the stack listed in the "scheduling while atomic" bug would have been correct
[14:38] <cnd> since we use 8K stacks
[14:38] <tgardner> cnd, good luck with that. I gotta get some other stuff done before I get wound up in that.
[14:38] <cnd> heh
[14:38] <cnd> tgardner: if you have deep questions like that, where would you turn to get them answered?
[14:38] <tgardner> apw, perhaps jj, smb
[14:39]  * smb has now idea
[14:39] <cnd> smb, you have no idea?
[14:40] <cnd> or you now have an idea :)
[14:40] <apw> cnd gcc doens't always record enough information to make stack back tracking poissible
[14:40] <smb> cnd, Oh, yes. That is more it. 
[14:40] <smb> That I have no idea
[14:40] <cnd> everything makes sense if there's a separate task stack for soft irqs
[14:40] <apw> though i think if you turn on function tracing in ftrace (a build option) i think all functions end up with frames else you can't function trace them either
[14:40] <cnd> but that's now how we have our kernel set up
[14:40] <jjohansen> cnd: _kmalloc way be the one doing cond_resched
[14:41] <cnd> I'm currently building a kernel with CONFIG_DEBUG_SPINLOCK_SLEEP, which also adds sleep checking while atomic when cond_resched or might_sleep are called
[14:42] <cnd> that gives us the stack trace of current as opposed to the stack trace of rq->curr given by the "scheduling while atomic" bug
[14:42] <cnd> so maybe that will give us the stack trace of the offending softirq
[14:43] <cnd> apw, I can't build a new kernel through fdr binary-generic:
[14:43] <jjohansen> cnd: why?
[14:43] <cnd> dpkg-deb - error: Debian revision (`debug') doesn't contain any digits
[14:43] <apw> cnd how so?
[14:43] <cnd> dpkg-deb: 1 errors in control file
[14:43] <cnd> dh_builddeb: dpkg-deb -Zbzip2 -z9 --build debian/linux-image-2.6.32-16-generic .. returned exit code 2
 dpkg-deb - error: Debian revision (`debug') doesn't contain any digits
[14:43] <apw> that seems pretty clear
[14:43] <apw> did you add 'debug' to the end of your version?
[14:43] <cnd> apw, isn't that something you recently changed though
[14:43] <cnd> oh, I think I know why
[14:43] <apw> nope nothing i would have done
[14:44] <cnd> I did ~lpspinlock-debug
[14:44] <apw> right so the - means two versions
[14:44] <cnd> apw, I was thinking of your debug package renaming
[14:44] <apw> nope
[14:44] <apw> just take out the - don't think that is legal
[14:44] <cnd> yeah, trying again
[14:45] <cnd> jjohansen: why what?
[14:45] <jjohansen> the debug stuff just covered with apw
[14:45] <cnd> k
[14:45] <apw> we assume the format of that in the packaging and a - busts everything, don't do that, it hurts
[14:46] <cnd> oh, I think I get it now
[14:46] <cnd> well... I need to think some more
[14:47]  * cnd is still trying to figure out why rq->curr may be a different task than what's running on the processor
[14:48] <cnd> apw, I noticed that there isn't an unreleased version at the top of the ubuntu-lucid git tree last night
[14:48] <cnd> should there be one? (whenever I've pulled to a new version there was one)
[14:48] <tgardner> cnd, he hasn't done a getabi yet
[14:48] <apw> cnd quite normal while i am waiting for the builds to complete
[14:48] <cnd> ahh
[14:49] <cnd> ok
[14:49] <apw> just do foo10 instead of ~foo10
[14:49] <apw> until the start new release appears
[14:49] <cnd> ok
[14:55] <jjohansen> cnd: well is it a different task though?  the whole irq, softirq, tasklet stuff runs ontop of the current task,
[14:55] <cnd> jjohansen: yeah, that's why I'm confused as to what's going on
[14:56] <cnd> if the softirq is running on top of rq->curr, we should see a meaningful stack trace in the bug
[14:56] <cnd> but we aren't
[14:57] <jjohansen> cnd: the stack trace isn't even always right when not in irq
[14:58] <cnd> jjohansen: because of the way it's compiled, or because of some kernel internal stuff?
[14:59] <jjohansen> cnd: both of those, obviously compiling without stackframes makes stack traces a lot less accurate
[15:00] <jjohansen> but compiler optimizations can delay when you would expect the sp to be updated etc
[15:01] <jjohansen> the stacktrace should only be considered a best effort
[15:02] <cnd> I'm guessing that's why the mailing list thread I saw used function tracing to get a more accurate trace
[15:03] <jjohansen> cnd: yeah
[15:04] <cnd> jjohansen: ok, I have an idea of how to go about getting definitive information out of the bug I'm working
[15:04] <cnd> thanks
[15:22] <_stink_> anyone know where I can find .debs of previous 2.6.32 versions for lucid?  it seems that -16 has broken virtualbox guest additions, and i'd like to go back to -15, but it's not in the repo anymore.
[15:22] <_stink_> this install was just done today, so i don't have -15 installed locally and available via grub.
[15:27] <tankenmate> anyone heard of any problems with the lucid livecd kernel / mdadm / kvm creating hangs / crashes?
[15:28] <tankenmate> i was trying to install lucid from the livecd onto a md partition (all on a kvm guest machine), and after about 30-240 seconds the kernel just hard crashes..
[15:29] <tankenmate> i looked on google / lkml and nothing obvious appeared..
[15:31] <apw> kvm crashing, yes if it was 2.6.32-15 kernel
[15:32] <tankenmate> any suggestions? deboostrap?
[15:32] <apw> though that sounds like it was sooner than your issue
[15:32] <apw> and it was resolved in later kernels, which one did you test with
[15:33] <tankenmate> just a second... i'll get the version, it was from 2010-03-08 iso image
[15:36] <tankenmate> Linux ubuntu 2.6.32-15-generic #22-Ubuntu SMP Tue Mar 2 02:23:29 UTC 2010 x86_64 GNU/Linux
[15:36] <apw> as that was -15 i would retest with a later image containing the -16 kernel, there was some bad kvm patches in that version
[15:37] <tankenmate> well the kvm host is running off jaunty, the guest is running lucid..
[15:38] <tankenmate> hmmm i'll try with a karmic server install and debootstrap to lucid and see how it goes...
[15:39] <tankenmate> weird thing is there was no oops, the kvm host didn't complain.. the virt machine just died, no logs, no errors nothing..
[15:39] <tankenmate> almost like a triple fault or something...
[16:20] <cnd> cking: ping
[16:21] <cking> cnd, yo
[16:21] <cnd> do you have any thoughts on the TSC issue for non-arrandale hw?
[16:21] <cking> cnd, like it can be screwed up on them too?
[16:21] <cnd> cking: bug 535077
[16:21] <ubot3> Malone bug 535077 in linux "WARNING: at /build/buildd/linux-2.6.32/kernel/trace/ring_buffer.c:1984 rb_add_time_stamp+0x20c/0x220()" [Medium,Triaged] https://launchpad.net/bugs/535077
[16:21] <cnd> a few dupes of that bug already
[16:22] <cnd> booting with notsc fixed the issue
[16:22] <cnd> and the ts in the bug begins with 0xfffffff
[16:22] <cnd> so it looks very similar to what you found on the arrandale procs
[16:22] <cking> yep - it's an issue apparently on some CPU's after coming out of S3 - and is listed in some Errata lists
[16:23] <cnd> cking: so it's not just arrandale?
[16:23] <cking> cnd, apparently 
[16:23] <cnd> cking: what is the impact of notsc, would it be worth it to just disable tsc by default?
[16:23] <cking> cnd, I posted a patch in the mailing list that can work around this bug - it stops an overflow 
[16:24] <cnd> cking, even if it stops the overflow, the register still isn't correct right?
[16:24] <cnd> won't it still cause issues?
[16:24] <cking> cnd, yep, it's still get screwed. However,it may be worth seeing if a microcode update fixes it - this normally means getting a new BIOS which applies the microcode on boot
[16:25] <cnd> cking: so what's the impact of notsc, and what if we just disabled it by default?
[16:25] <cnd> is there some performance impact?
[16:26] <tgardner> cnd,  I think it disables hi-res time of day measurements.
[16:26] <cking> cnd, I really cannot see much of an impact - but this needs looking into - I was thinking we should disable TSC for L+1
[16:26] <cking> tgardner, some I/O delay loop code uses it for sure
[16:26] <tgardner> cking, disable on all 32 bit platforms ?
[16:27] <cking> tgardner, not sure - I'd like to know which processors get the problem - it means surveying all the errata sheets :-(
[16:27] <cking> it may depend on which machines have upto date microcode too
[16:28] <cnd> cking: so your patch will disable the oops message, right?
[16:29] <cnd> but it's just for the oops message you've been tracking?
[16:29] <cking> cnd, the patch will fix the overflow, which causes the softlockup code to not produce stupid messages when the TSC warps to 0xffffffffxxxxxxxx
[16:29] <cking> cnd, nope, I've also been trying to see if there is a fix to the generic TSC warp issue
[16:29] <cking> (when coming out of S3)
[16:30] <cnd> cking: ok, so for now what should we be doing when we see bugs that are caused by the TSC warp
[16:30] <cnd> should I dupe them of some bug you are working on?
[16:31] <cnd> or just note that it's harmless, but you can boot with notsc if you want to get rid of them?
[16:31] <cking> cnd, yes, duping it would be useful - and note it's harmless and boot with notsc
[16:31] <cnd> cking: what bug would you like me to dupe to?
[16:31]  * cking looks
[16:33] <cking> cnd, how about  bug 530487  - I addressed that yesterday
[16:33] <ubot3> Malone bug 530487 in linux "BUG: soft lockup - CPU#2 stuck for 0s! [firefox-bin:1751] (during suspend/resume)" [Low,Fix released] https://launchpad.net/bugs/530487
[16:33] <cnd> cking, ok thanks
[16:33] <cking> let me modify the title of it to make it a little more generic too
[16:34] <cking> done
[16:35] <cking> cnd, since you deal with these more than me, can you get a list of CPUs which get hit by this TSC issue?
[16:35] <tgardner> cking, I wonder if there is a way to test for this bogosity. Perhaps drive through an S3 state with an alarm wakeup during boot.
[16:36] <tgardner> that'd just about wreck boot performance :)
[16:36] <cnd> cking, I can try to make a list, where do you find errata for procs?
[16:36] <cking> tgardner, and it may wreck the user experience if the machines have a buggy resume
[16:38] <cking> I suppose we need to see what the ramifications of totally disabling it - is anything really that dependant on such precise timing?
[16:38] <tgardner> cking, how about udec delays?
[16:38] <tgardner> usec*
[16:39] <cking> tgardner, well, there are the traditional busy loops method, or by looping and checking the TSC to my knowledge - the former is just as valid isn't it?
[16:40] <cking> my concern is when you have a usec delay on a TSC that warps backwards over a S3 cycle - that could lead to some weirdnesses
[16:41] <tgardner> cking, its been awhile, but wasn't one of the ways by reading an ISA port (which assumed a constant response time) ?
[16:42] <cking> tgardner, not sure - things change and I get forgetful on these details
[16:56] <cnd> cking, do you have a url for an errata for the tsc issue?
[16:58] <cking> tgardner, the delayed io operations ultimately use __udelay, which can use TSC, or a delay loop. The TSC method does RT pre-emption, so notsc does have some effect on latency issues
[16:59] <cking> cnd, only for Arrandale CPUs
[16:59] <cnd> cking: that's fine for now
[16:59] <cnd> I just would like to see what it says
[16:59] <tgardner> cking, so the mythtv guys will have a cow since most of them run 32 bit.
[17:00] <cnd> tgardner: I don't think they will (I'm one of them :)
[17:00] <cking> tgardner, yep, so I think we disable tsc if it bugs users on the S3 case
[17:00] <cking> ..but on a need-to-do-so basis rather than a blanket notsc default
[17:00] <cnd> I've never noticed anything get terribly better when monkeying with higher prio stuff that would do preemption like that I think
[17:01] <tgardner> cking, agreed
[17:02] <cnd> cking: what if we just added a check for a huge out of bounds count in native_sched_clock: http://lxr.linux.no/linux+v2.6.33/arch/x86/kernel/tsc.c#L43
[17:02] <cking> cnd, well, you know, somebody may want a udelay(1) to be pre-emptied because the iodelay is sucking away too many cycles!
[17:02] <cnd> if so, set tsc_disabled = 1, and get the clock from the jiffies?
[17:03] <cking> cnd, this scares me, I'm not sure if disabling it like this is a good idea on a fully running system with all CPUs in action
[17:03] <cking> worth an experiment
[17:04] <cnd> cking: if the tsc warping isn't causing issues, how is doing that going to cause issues?
[17:06] <cking> cnd, I've seem it causes issues on Hardy on an Atom - a I/I delay on a hyperthread that got pushed to another CPU got confused when the TSC was skewed
[17:06] <cnd> I/I delay?
[17:07] <cnd> cking: if it got confused due to the tsc being off, how could it be worse if the jiffies result is off?
[17:08] <cking> cnd, jiffies don't generally warp backwards
[17:08] <cnd> cking: isn't that a good thing in this case?
[17:09] <cnd> we don't want the clock to warp backwards
[17:09] <cking> cnd, very true
[17:09]  * cking needs to think harder
[17:10] <cnd> the only issue I can see is when the switch occurs, if the jiffies calculation result would end up in a warp backwards relative to what the tsc result had been giving
[17:10] <cnd> I don't know what the likelyhood of that occurring is
[17:10] <cking> cnd, it happened - it caused me pain
[17:10] <cnd> maybe we prevent it from switching to jiffies calculation if the tsc is unstable? (there's a var for that)
[17:11] <cnd> cking: what do you mean by "it caused me pain", you didn't try this out yet did you?
[17:12] <cking> cnd, I saw it happen on resume one in a few hundred cycles and figuring out what was happening w/o a console caused me pain 
[17:13] <cnd> cking, that's just from basing clock on tsc, I'm wondering how likely it is that switching from tsc to jiffies calculation would result in a warping
[17:14] <cking> cnd, good question - no idea. This needs some careful examination of the code
[17:15]  * cking is exhausted of any more clock/TSC/delay knowledge. The devil is in the source code
[17:15] <cking> ;-)
[17:15] <cnd> heh