/srv/irclogs.ubuntu.com/2010/03/10/#ubuntu-kernel.txt

jjohansendyek: no immediate ideas, it should work, I made a note to check on it00:02
dyekjjohansen: OK. Thank you!00:03
EruditeHermithello, is it possible to use the ubuntu kernel build system to build kernel packages and then not start from scratch if I modify a file on when building for the 2nd time?04:39
EruditeHermitthe make-kpkg clean purges all the old work04:41
BenCEruditeHermit: if you are using make-kpkg, then you aren't using the ubuntu kernel build system04:54
EruditeHermitBenC, I am using the instructions on this page. https://wiki.ubuntu.com/KernelTeam/GitKernelBuild04:54
EruditeHermitBenC, could you please advise me as to the best way to build kernels then?04:55
EruditeHermita link to instructions would be enough05:02
EruditeHermithello, could anyone point me to the current method for building kernel packages on Ubuntu?05:41
jk-EruditeHermit: check the build section at http://wiki.ubuntu.com/KernelTeam/KernelMaintenance05:53
jk-EruditeHermit: but, in general: fakeroot debian/rules binary-arch05:53
jk-but it depends what you're trying to do: build using the ubuntu sources, or build an upstream (or other) kernel tree into a .deb ?05:57
EruditeHermitjk-, I am trying to build an upstream kernel from git05:57
EruditeHermitand make a deb package out of it05:57
jk-ok, then make-kpkg is probably the best way then.05:58
jk-(as you're doing..)05:58
EruditeHermithowever05:58
EruditeHermitwhen I patch some of the files05:58
EruditeHermitI don't want to rebuild the whole kernel05:58
EruditeHermitjust that one file05:58
jk-then don't do the clean?05:59
EruditeHermithmm I think it complained05:59
EruditeHermitbut I will try again05:59
EruditeHermitit was a while back05:59
EruditeHermitso I was doing this05:59
EruditeHermithttps://wiki.ubuntu.com/KernelTeam/GitKernelBuild05:59
EruditeHermitso you are saying I can make changes to the source06:00
EruditeHermitand then just do06:00
EruditeHermitCONCURRENCY_LEVEL=`getconf _NPROCESSORS_ONLN` fakeroot make-kpkg --initrd --append-to-version=-custom kernel_image kernel_headers06:00
EruditeHermitand it will work?06:00
jk-it should06:00
EruditeHermitok06:00
EruditeHermiti will try06:00
EruditeHermitwill the fakeroot debian/rules binary method also work?06:01
EruditeHermitI remember that was a quick way to build non kernel packages06:01
EruditeHermitbut i've never tried it for the kernel06:01
EruditeHermitah debbuild too06:03
jk-use make-kpkg if you're not building from an ubuntu source tree06:04
jk-upstream kernels don't have a debian/ directory06:04
EruditeHermityeah06:05
EruditeHermitbut once they are build once with make-kpkg does it not create the debian dir?06:05
EruditeHermitjk-, once they are build once with make-kpkg does it not create the debian dir?06:12
jk-yeah, it does the necessary stuff06:12
EruditeHermitso after one make-kpkg build can you use the fakeroot debian/rules binary method?06:14
jk-EruditeHermit: probably best to stick with one method06:23
EruditeHermitok06:24
EruditeHermityou seem to be right06:24
EruditeHermitit seems to continue building06:24
jk-worst hotel internet ever06:32
jjohansenjk-: that bad?06:32
jjohansenworst hotel internet I ever had was just about unusable (really slow) and would die every 5 min, and then it was a crapshoot at reaquiring06:33
jjohansenit was completely worthless, worse than having no internet because you wasted time trying06:34
jk-just keeps losing the route; i seem to have a good connection to the AP06:34
jk-true :)06:34
jk-i guess it's "worst" because I paid for it this time :)06:34
jjohansenah yeah that sucks06:45
EruditeHermitjk-, hey, I get the following error, can you help me rectify it? http://pastebin.com/YwAJMAws07:45
EruditeHermitI think it has something to do with the fact that I tried to compile a different branch first07:46
EruditeHermitwhich had the 2.6.29 kernel07:46
EruditeHermitbut I did do a make-kpkg clean before I tried to compile the new one07:46
EruditeHermitjk-, hey are you about? it doesn't recognize that I patched files and therefore doesn't create new packages09:56
=== _ruben_ is now known as _ruben
=== sconklin-gone is now known as sconklin
tgardnercnd, not with preeemption. whats the wrinkle?14:32
cndtgardner: so there's a bug where a process is scheduling while atomic14:33
cndpreempt_count is 0x1000010014:33
cndmeaning 1 soft irq count + PREEMPT_ACTIVE14:34
cndso I'm thinking that a cond_resched was likely called during a softirq or tasklet14:34
tgardnercnd, is this in a particular driver?14:34
cndthe thing is that the stack trace from the "scheduling while atomic" bug doesn't seem accurate14:34
cndit lists the swapper task14:34
tgardnera tasklet can reschedule, but not a softirq14:35
cndand the swapper task is in the middle of the acpi_idle_simple call14:35
cnda tasklet is run on top of a softirq...14:35
tgardnerwhats the one run as a kernel threadcalled?14:35
cndworkqueue?14:35
tgardnerok, I always have to go look.14:36
cndyep, work queue, just checked14:36
tgardnercnd, dunno, maybe ask smb. I'm a little rusty in that stuff.14:36
cndso, I saw this thread: http://lkml.indiana.edu/hypermail/linux/kernel/0904.1/01762.html (that's the last email in the thread, read previous replies as necessary)14:37
cndand they used function tracing to figure out what the stack was before the scheduling call14:37
cndbut I would have thought that the stack listed in the "scheduling while atomic" bug would have been correct14:37
cndsince we use 8K stacks14:38
tgardnercnd, good luck with that. I gotta get some other stuff done before I get wound up in that.14:38
cndheh14:38
cndtgardner: if you have deep questions like that, where would you turn to get them answered?14:38
tgardnerapw, perhaps jj, smb14:38
* smb has now idea14:39
cndsmb, you have no idea?14:39
cndor you now have an idea :)14:40
apwcnd gcc doens't always record enough information to make stack back tracking poissible14:40
smbcnd, Oh, yes. That is more it. 14:40
smbThat I have no idea14:40
cndeverything makes sense if there's a separate task stack for soft irqs14:40
apwthough i think if you turn on function tracing in ftrace (a build option) i think all functions end up with frames else you can't function trace them either14:40
cndbut that's now how we have our kernel set up14:40
jjohansencnd: _kmalloc way be the one doing cond_resched14:40
cndI'm currently building a kernel with CONFIG_DEBUG_SPINLOCK_SLEEP, which also adds sleep checking while atomic when cond_resched or might_sleep are called14:41
cndthat gives us the stack trace of current as opposed to the stack trace of rq->curr given by the "scheduling while atomic" bug14:42
cndso maybe that will give us the stack trace of the offending softirq14:42
cndapw, I can't build a new kernel through fdr binary-generic:14:43
jjohansencnd: why?14:43
cnddpkg-deb - error: Debian revision (`debug') doesn't contain any digits14:43
apwcnd how so?14:43
cnddpkg-deb: 1 errors in control file14:43
cnddh_builddeb: dpkg-deb -Zbzip2 -z9 --build debian/linux-image-2.6.32-16-generic .. returned exit code 214:43
apw<cnd> dpkg-deb - error: Debian revision (`debug') doesn't contain any digits14:43
apwthat seems pretty clear14:43
apwdid you add 'debug' to the end of your version?14:43
cndapw, isn't that something you recently changed though14:43
cndoh, I think I know why14:43
apwnope nothing i would have done14:43
cndI did ~lpspinlock-debug14:44
apwright so the - means two versions14:44
cndapw, I was thinking of your debug package renaming14:44
apwnope14:44
apwjust take out the - don't think that is legal14:44
cndyeah, trying again14:44
cndjjohansen: why what?14:45
jjohansenthe debug stuff just covered with apw14:45
cndk14:45
apwwe assume the format of that in the packaging and a - busts everything, don't do that, it hurts14:45
cndoh, I think I get it now14:46
cndwell... I need to think some more14:46
* cnd is still trying to figure out why rq->curr may be a different task than what's running on the processor14:47
cndapw, I noticed that there isn't an unreleased version at the top of the ubuntu-lucid git tree last night14:48
cndshould there be one? (whenever I've pulled to a new version there was one)14:48
tgardnercnd, he hasn't done a getabi yet14:48
apwcnd quite normal while i am waiting for the builds to complete14:48
cndahh14:48
cndok14:49
apwjust do foo10 instead of ~foo1014:49
apwuntil the start new release appears14:49
cndok14:49
jjohansencnd: well is it a different task though?  the whole irq, softirq, tasklet stuff runs ontop of the current task,14:55
cndjjohansen: yeah, that's why I'm confused as to what's going on14:55
cndif the softirq is running on top of rq->curr, we should see a meaningful stack trace in the bug14:56
cndbut we aren't14:56
jjohansencnd: the stack trace isn't even always right when not in irq14:57
cndjjohansen: because of the way it's compiled, or because of some kernel internal stuff?14:58
jjohansencnd: both of those, obviously compiling without stackframes makes stack traces a lot less accurate14:59
jjohansenbut compiler optimizations can delay when you would expect the sp to be updated etc15:00
jjohansenthe stacktrace should only be considered a best effort15:01
cndI'm guessing that's why the mailing list thread I saw used function tracing to get a more accurate trace15:02
jjohansencnd: yeah15:03
cndjjohansen: ok, I have an idea of how to go about getting definitive information out of the bug I'm working15:04
cndthanks15:04
_stink_anyone know where I can find .debs of previous 2.6.32 versions for lucid?  it seems that -16 has broken virtualbox guest additions, and i'd like to go back to -15, but it's not in the repo anymore.15:22
_stink_this install was just done today, so i don't have -15 installed locally and available via grub.15:22
tankenmateanyone heard of any problems with the lucid livecd kernel / mdadm / kvm creating hangs / crashes?15:27
tankenmatei was trying to install lucid from the livecd onto a md partition (all on a kvm guest machine), and after about 30-240 seconds the kernel just hard crashes..15:28
tankenmatei looked on google / lkml and nothing obvious appeared..15:29
apwkvm crashing, yes if it was 2.6.32-15 kernel15:31
tankenmateany suggestions? deboostrap?15:32
apwthough that sounds like it was sooner than your issue15:32
apwand it was resolved in later kernels, which one did you test with15:32
tankenmatejust a second... i'll get the version, it was from 2010-03-08 iso image15:33
tankenmateLinux ubuntu 2.6.32-15-generic #22-Ubuntu SMP Tue Mar 2 02:23:29 UTC 2010 x86_64 GNU/Linux15:36
apwas that was -15 i would retest with a later image containing the -16 kernel, there was some bad kvm patches in that version15:36
tankenmatewell the kvm host is running off jaunty, the guest is running lucid..15:37
tankenmatehmmm i'll try with a karmic server install and debootstrap to lucid and see how it goes...15:38
tankenmateweird thing is there was no oops, the kvm host didn't complain.. the virt machine just died, no logs, no errors nothing..15:39
tankenmatealmost like a triple fault or something...15:39
cndcking: ping16:20
ckingcnd, yo16:21
cnddo you have any thoughts on the TSC issue for non-arrandale hw?16:21
ckingcnd, like it can be screwed up on them too?16:21
cndcking: bug 53507716:21
ubot3Malone bug 535077 in linux "WARNING: at /build/buildd/linux-2.6.32/kernel/trace/ring_buffer.c:1984 rb_add_time_stamp+0x20c/0x220()" [Medium,Triaged] https://launchpad.net/bugs/53507716:21
cnda few dupes of that bug already16:21
cndbooting with notsc fixed the issue16:22
cndand the ts in the bug begins with 0xfffffff16:22
cndso it looks very similar to what you found on the arrandale procs16:22
ckingyep - it's an issue apparently on some CPU's after coming out of S3 - and is listed in some Errata lists16:22
cndcking: so it's not just arrandale?16:23
ckingcnd, apparently 16:23
cndcking: what is the impact of notsc, would it be worth it to just disable tsc by default?16:23
ckingcnd, I posted a patch in the mailing list that can work around this bug - it stops an overflow 16:23
cndcking, even if it stops the overflow, the register still isn't correct right?16:24
cndwon't it still cause issues?16:24
ckingcnd, yep, it's still get screwed. However,it may be worth seeing if a microcode update fixes it - this normally means getting a new BIOS which applies the microcode on boot16:24
cndcking: so what's the impact of notsc, and what if we just disabled it by default?16:25
cndis there some performance impact?16:25
tgardnercnd,  I think it disables hi-res time of day measurements.16:26
ckingcnd, I really cannot see much of an impact - but this needs looking into - I was thinking we should disable TSC for L+116:26
ckingtgardner, some I/O delay loop code uses it for sure16:26
tgardnercking, disable on all 32 bit platforms ?16:26
ckingtgardner, not sure - I'd like to know which processors get the problem - it means surveying all the errata sheets :-(16:27
ckingit may depend on which machines have upto date microcode too16:27
cndcking: so your patch will disable the oops message, right?16:28
cndbut it's just for the oops message you've been tracking?16:29
ckingcnd, the patch will fix the overflow, which causes the softlockup code to not produce stupid messages when the TSC warps to 0xffffffffxxxxxxxx16:29
ckingcnd, nope, I've also been trying to see if there is a fix to the generic TSC warp issue16:29
cking(when coming out of S3)16:29
cndcking: ok, so for now what should we be doing when we see bugs that are caused by the TSC warp16:30
cndshould I dupe them of some bug you are working on?16:30
cndor just note that it's harmless, but you can boot with notsc if you want to get rid of them?16:31
ckingcnd, yes, duping it would be useful - and note it's harmless and boot with notsc16:31
cndcking: what bug would you like me to dupe to?16:31
* cking looks16:31
ckingcnd, how about  bug 530487  - I addressed that yesterday16:33
ubot3Malone bug 530487 in linux "BUG: soft lockup - CPU#2 stuck for 0s! [firefox-bin:1751] (during suspend/resume)" [Low,Fix released] https://launchpad.net/bugs/53048716:33
cndcking, ok thanks16:33
ckinglet me modify the title of it to make it a little more generic too16:33
ckingdone16:34
ckingcnd, since you deal with these more than me, can you get a list of CPUs which get hit by this TSC issue?16:35
tgardnercking, I wonder if there is a way to test for this bogosity. Perhaps drive through an S3 state with an alarm wakeup during boot.16:35
tgardnerthat'd just about wreck boot performance :)16:36
cndcking, I can try to make a list, where do you find errata for procs?16:36
ckingtgardner, and it may wreck the user experience if the machines have a buggy resume16:36
ckingI suppose we need to see what the ramifications of totally disabling it - is anything really that dependant on such precise timing?16:38
tgardnercking, how about udec delays?16:38
tgardnerusec*16:38
ckingtgardner, well, there are the traditional busy loops method, or by looping and checking the TSC to my knowledge - the former is just as valid isn't it?16:39
ckingmy concern is when you have a usec delay on a TSC that warps backwards over a S3 cycle - that could lead to some weirdnesses16:40
tgardnercking, its been awhile, but wasn't one of the ways by reading an ISA port (which assumed a constant response time) ?16:41
ckingtgardner, not sure - things change and I get forgetful on these details16:42
cndcking, do you have a url for an errata for the tsc issue?16:56
ckingtgardner, the delayed io operations ultimately use __udelay, which can use TSC, or a delay loop. The TSC method does RT pre-emption, so notsc does have some effect on latency issues16:58
ckingcnd, only for Arrandale CPUs16:59
cndcking: that's fine for now16:59
cndI just would like to see what it says16:59
tgardnercking, so the mythtv guys will have a cow since most of them run 32 bit.16:59
cndtgardner: I don't think they will (I'm one of them :)17:00
ckingtgardner, yep, so I think we disable tsc if it bugs users on the S3 case17:00
cking..but on a need-to-do-so basis rather than a blanket notsc default17:00
cndI've never noticed anything get terribly better when monkeying with higher prio stuff that would do preemption like that I think17:00
tgardnercking, agreed17:01
cndcking: what if we just added a check for a huge out of bounds count in native_sched_clock: http://lxr.linux.no/linux+v2.6.33/arch/x86/kernel/tsc.c#L4317:02
ckingcnd, well, you know, somebody may want a udelay(1) to be pre-emptied because the iodelay is sucking away too many cycles!17:02
cndif so, set tsc_disabled = 1, and get the clock from the jiffies?17:02
ckingcnd, this scares me, I'm not sure if disabling it like this is a good idea on a fully running system with all CPUs in action17:03
ckingworth an experiment17:03
cndcking: if the tsc warping isn't causing issues, how is doing that going to cause issues?17:04
ckingcnd, I've seem it causes issues on Hardy on an Atom - a I/I delay on a hyperthread that got pushed to another CPU got confused when the TSC was skewed17:06
cndI/I delay?17:06
cndcking: if it got confused due to the tsc being off, how could it be worse if the jiffies result is off?17:07
ckingcnd, jiffies don't generally warp backwards17:08
cndcking: isn't that a good thing in this case?17:08
cndwe don't want the clock to warp backwards17:09
ckingcnd, very true17:09
* cking needs to think harder17:09
cndthe only issue I can see is when the switch occurs, if the jiffies calculation result would end up in a warp backwards relative to what the tsc result had been giving17:10
cndI don't know what the likelyhood of that occurring is17:10
ckingcnd, it happened - it caused me pain17:10
cndmaybe we prevent it from switching to jiffies calculation if the tsc is unstable? (there's a var for that)17:10
cndcking: what do you mean by "it caused me pain", you didn't try this out yet did you?17:11
ckingcnd, I saw it happen on resume one in a few hundred cycles and figuring out what was happening w/o a console caused me pain 17:12
cndcking, that's just from basing clock on tsc, I'm wondering how likely it is that switching from tsc to jiffies calculation would result in a warping17:13
ckingcnd, good question - no idea. This needs some careful examination of the code17:14
* cking is exhausted of any more clock/TSC/delay knowledge. The devil is in the source code17:15
cking;-)17:15
cndheh17:15
=== bjf is now known as bjf-afk
=== Hedge|Hog is now known as Hedgehog
=== Hedgehog is now known as Guest83664
=== sconklin is now known as sconklin-away
=== bjf-afk is now known as bjf
=== fabbione is now known as fabbione_vac

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!