jjohansen | dyek: no immediate ideas, it should work, I made a note to check on it | 00:02 |
---|---|---|
dyek | jjohansen: OK. Thank you! | 00:03 |
EruditeHermit | hello, is it possible to use the ubuntu kernel build system to build kernel packages and then not start from scratch if I modify a file on when building for the 2nd time? | 04:39 |
EruditeHermit | the make-kpkg clean purges all the old work | 04:41 |
BenC | EruditeHermit: if you are using make-kpkg, then you aren't using the ubuntu kernel build system | 04:54 |
EruditeHermit | BenC, I am using the instructions on this page. https://wiki.ubuntu.com/KernelTeam/GitKernelBuild | 04:54 |
EruditeHermit | BenC, could you please advise me as to the best way to build kernels then? | 04:55 |
EruditeHermit | a link to instructions would be enough | 05:02 |
EruditeHermit | hello, could anyone point me to the current method for building kernel packages on Ubuntu? | 05:41 |
jk- | EruditeHermit: check the build section at http://wiki.ubuntu.com/KernelTeam/KernelMaintenance | 05:53 |
jk- | EruditeHermit: but, in general: fakeroot debian/rules binary-arch | 05:53 |
jk- | but it depends what you're trying to do: build using the ubuntu sources, or build an upstream (or other) kernel tree into a .deb ? | 05:57 |
EruditeHermit | jk-, I am trying to build an upstream kernel from git | 05:57 |
EruditeHermit | and make a deb package out of it | 05:57 |
jk- | ok, then make-kpkg is probably the best way then. | 05:58 |
jk- | (as you're doing..) | 05:58 |
EruditeHermit | however | 05:58 |
EruditeHermit | when I patch some of the files | 05:58 |
EruditeHermit | I don't want to rebuild the whole kernel | 05:58 |
EruditeHermit | just that one file | 05:58 |
jk- | then don't do the clean? | 05:59 |
EruditeHermit | hmm I think it complained | 05:59 |
EruditeHermit | but I will try again | 05:59 |
EruditeHermit | it was a while back | 05:59 |
EruditeHermit | so I was doing this | 05:59 |
EruditeHermit | https://wiki.ubuntu.com/KernelTeam/GitKernelBuild | 05:59 |
EruditeHermit | so you are saying I can make changes to the source | 06:00 |
EruditeHermit | and then just do | 06:00 |
EruditeHermit | CONCURRENCY_LEVEL=`getconf _NPROCESSORS_ONLN` fakeroot make-kpkg --initrd --append-to-version=-custom kernel_image kernel_headers | 06:00 |
EruditeHermit | and it will work? | 06:00 |
jk- | it should | 06:00 |
EruditeHermit | ok | 06:00 |
EruditeHermit | i will try | 06:00 |
EruditeHermit | will the fakeroot debian/rules binary method also work? | 06:01 |
EruditeHermit | I remember that was a quick way to build non kernel packages | 06:01 |
EruditeHermit | but i've never tried it for the kernel | 06:01 |
EruditeHermit | ah debbuild too | 06:03 |
jk- | use make-kpkg if you're not building from an ubuntu source tree | 06:04 |
jk- | upstream kernels don't have a debian/ directory | 06:04 |
EruditeHermit | yeah | 06:05 |
EruditeHermit | but once they are build once with make-kpkg does it not create the debian dir? | 06:05 |
EruditeHermit | jk-, once they are build once with make-kpkg does it not create the debian dir? | 06:12 |
jk- | yeah, it does the necessary stuff | 06:12 |
EruditeHermit | so after one make-kpkg build can you use the fakeroot debian/rules binary method? | 06:14 |
jk- | EruditeHermit: probably best to stick with one method | 06:23 |
EruditeHermit | ok | 06:24 |
EruditeHermit | you seem to be right | 06:24 |
EruditeHermit | it seems to continue building | 06:24 |
jk- | worst hotel internet ever | 06:32 |
jjohansen | jk-: that bad? | 06:32 |
jjohansen | worst hotel internet I ever had was just about unusable (really slow) and would die every 5 min, and then it was a crapshoot at reaquiring | 06:33 |
jjohansen | it was completely worthless, worse than having no internet because you wasted time trying | 06:34 |
jk- | just keeps losing the route; i seem to have a good connection to the AP | 06:34 |
jk- | true :) | 06:34 |
jk- | i guess it's "worst" because I paid for it this time :) | 06:34 |
jjohansen | ah yeah that sucks | 06:45 |
EruditeHermit | jk-, hey, I get the following error, can you help me rectify it? http://pastebin.com/YwAJMAws | 07:45 |
EruditeHermit | I think it has something to do with the fact that I tried to compile a different branch first | 07:46 |
EruditeHermit | which had the 2.6.29 kernel | 07:46 |
EruditeHermit | but I did do a make-kpkg clean before I tried to compile the new one | 07:46 |
EruditeHermit | jk-, hey are you about? it doesn't recognize that I patched files and therefore doesn't create new packages | 09:56 |
=== _ruben_ is now known as _ruben | ||
=== sconklin-gone is now known as sconklin | ||
tgardner | cnd, not with preeemption. whats the wrinkle? | 14:32 |
cnd | tgardner: so there's a bug where a process is scheduling while atomic | 14:33 |
cnd | preempt_count is 0x10000100 | 14:33 |
cnd | meaning 1 soft irq count + PREEMPT_ACTIVE | 14:34 |
cnd | so I'm thinking that a cond_resched was likely called during a softirq or tasklet | 14:34 |
tgardner | cnd, is this in a particular driver? | 14:34 |
cnd | the thing is that the stack trace from the "scheduling while atomic" bug doesn't seem accurate | 14:34 |
cnd | it lists the swapper task | 14:34 |
tgardner | a tasklet can reschedule, but not a softirq | 14:35 |
cnd | and the swapper task is in the middle of the acpi_idle_simple call | 14:35 |
cnd | a tasklet is run on top of a softirq... | 14:35 |
tgardner | whats the one run as a kernel threadcalled? | 14:35 |
cnd | workqueue? | 14:35 |
tgardner | ok, I always have to go look. | 14:36 |
cnd | yep, work queue, just checked | 14:36 |
tgardner | cnd, dunno, maybe ask smb. I'm a little rusty in that stuff. | 14:36 |
cnd | so, I saw this thread: http://lkml.indiana.edu/hypermail/linux/kernel/0904.1/01762.html (that's the last email in the thread, read previous replies as necessary) | 14:37 |
cnd | and they used function tracing to figure out what the stack was before the scheduling call | 14:37 |
cnd | but I would have thought that the stack listed in the "scheduling while atomic" bug would have been correct | 14:37 |
cnd | since we use 8K stacks | 14:38 |
tgardner | cnd, good luck with that. I gotta get some other stuff done before I get wound up in that. | 14:38 |
cnd | heh | 14:38 |
cnd | tgardner: if you have deep questions like that, where would you turn to get them answered? | 14:38 |
tgardner | apw, perhaps jj, smb | 14:38 |
* smb has now idea | 14:39 | |
cnd | smb, you have no idea? | 14:39 |
cnd | or you now have an idea :) | 14:40 |
apw | cnd gcc doens't always record enough information to make stack back tracking poissible | 14:40 |
smb | cnd, Oh, yes. That is more it. | 14:40 |
smb | That I have no idea | 14:40 |
cnd | everything makes sense if there's a separate task stack for soft irqs | 14:40 |
apw | though i think if you turn on function tracing in ftrace (a build option) i think all functions end up with frames else you can't function trace them either | 14:40 |
cnd | but that's now how we have our kernel set up | 14:40 |
jjohansen | cnd: _kmalloc way be the one doing cond_resched | 14:40 |
cnd | I'm currently building a kernel with CONFIG_DEBUG_SPINLOCK_SLEEP, which also adds sleep checking while atomic when cond_resched or might_sleep are called | 14:41 |
cnd | that gives us the stack trace of current as opposed to the stack trace of rq->curr given by the "scheduling while atomic" bug | 14:42 |
cnd | so maybe that will give us the stack trace of the offending softirq | 14:42 |
cnd | apw, I can't build a new kernel through fdr binary-generic: | 14:43 |
jjohansen | cnd: why? | 14:43 |
cnd | dpkg-deb - error: Debian revision (`debug') doesn't contain any digits | 14:43 |
apw | cnd how so? | 14:43 |
cnd | dpkg-deb: 1 errors in control file | 14:43 |
cnd | dh_builddeb: dpkg-deb -Zbzip2 -z9 --build debian/linux-image-2.6.32-16-generic .. returned exit code 2 | 14:43 |
apw | <cnd> dpkg-deb - error: Debian revision (`debug') doesn't contain any digits | 14:43 |
apw | that seems pretty clear | 14:43 |
apw | did you add 'debug' to the end of your version? | 14:43 |
cnd | apw, isn't that something you recently changed though | 14:43 |
cnd | oh, I think I know why | 14:43 |
apw | nope nothing i would have done | 14:43 |
cnd | I did ~lpspinlock-debug | 14:44 |
apw | right so the - means two versions | 14:44 |
cnd | apw, I was thinking of your debug package renaming | 14:44 |
apw | nope | 14:44 |
apw | just take out the - don't think that is legal | 14:44 |
cnd | yeah, trying again | 14:44 |
cnd | jjohansen: why what? | 14:45 |
jjohansen | the debug stuff just covered with apw | 14:45 |
cnd | k | 14:45 |
apw | we assume the format of that in the packaging and a - busts everything, don't do that, it hurts | 14:45 |
cnd | oh, I think I get it now | 14:46 |
cnd | well... I need to think some more | 14:46 |
* cnd is still trying to figure out why rq->curr may be a different task than what's running on the processor | 14:47 | |
cnd | apw, I noticed that there isn't an unreleased version at the top of the ubuntu-lucid git tree last night | 14:48 |
cnd | should there be one? (whenever I've pulled to a new version there was one) | 14:48 |
tgardner | cnd, he hasn't done a getabi yet | 14:48 |
apw | cnd quite normal while i am waiting for the builds to complete | 14:48 |
cnd | ahh | 14:48 |
cnd | ok | 14:49 |
apw | just do foo10 instead of ~foo10 | 14:49 |
apw | until the start new release appears | 14:49 |
cnd | ok | 14:49 |
jjohansen | cnd: well is it a different task though? the whole irq, softirq, tasklet stuff runs ontop of the current task, | 14:55 |
cnd | jjohansen: yeah, that's why I'm confused as to what's going on | 14:55 |
cnd | if the softirq is running on top of rq->curr, we should see a meaningful stack trace in the bug | 14:56 |
cnd | but we aren't | 14:56 |
jjohansen | cnd: the stack trace isn't even always right when not in irq | 14:57 |
cnd | jjohansen: because of the way it's compiled, or because of some kernel internal stuff? | 14:58 |
jjohansen | cnd: both of those, obviously compiling without stackframes makes stack traces a lot less accurate | 14:59 |
jjohansen | but compiler optimizations can delay when you would expect the sp to be updated etc | 15:00 |
jjohansen | the stacktrace should only be considered a best effort | 15:01 |
cnd | I'm guessing that's why the mailing list thread I saw used function tracing to get a more accurate trace | 15:02 |
jjohansen | cnd: yeah | 15:03 |
cnd | jjohansen: ok, I have an idea of how to go about getting definitive information out of the bug I'm working | 15:04 |
cnd | thanks | 15:04 |
_stink_ | anyone know where I can find .debs of previous 2.6.32 versions for lucid? it seems that -16 has broken virtualbox guest additions, and i'd like to go back to -15, but it's not in the repo anymore. | 15:22 |
_stink_ | this install was just done today, so i don't have -15 installed locally and available via grub. | 15:22 |
tankenmate | anyone heard of any problems with the lucid livecd kernel / mdadm / kvm creating hangs / crashes? | 15:27 |
tankenmate | i was trying to install lucid from the livecd onto a md partition (all on a kvm guest machine), and after about 30-240 seconds the kernel just hard crashes.. | 15:28 |
tankenmate | i looked on google / lkml and nothing obvious appeared.. | 15:29 |
apw | kvm crashing, yes if it was 2.6.32-15 kernel | 15:31 |
tankenmate | any suggestions? deboostrap? | 15:32 |
apw | though that sounds like it was sooner than your issue | 15:32 |
apw | and it was resolved in later kernels, which one did you test with | 15:32 |
tankenmate | just a second... i'll get the version, it was from 2010-03-08 iso image | 15:33 |
tankenmate | Linux ubuntu 2.6.32-15-generic #22-Ubuntu SMP Tue Mar 2 02:23:29 UTC 2010 x86_64 GNU/Linux | 15:36 |
apw | as that was -15 i would retest with a later image containing the -16 kernel, there was some bad kvm patches in that version | 15:36 |
tankenmate | well the kvm host is running off jaunty, the guest is running lucid.. | 15:37 |
tankenmate | hmmm i'll try with a karmic server install and debootstrap to lucid and see how it goes... | 15:38 |
tankenmate | weird thing is there was no oops, the kvm host didn't complain.. the virt machine just died, no logs, no errors nothing.. | 15:39 |
tankenmate | almost like a triple fault or something... | 15:39 |
cnd | cking: ping | 16:20 |
cking | cnd, yo | 16:21 |
cnd | do you have any thoughts on the TSC issue for non-arrandale hw? | 16:21 |
cking | cnd, like it can be screwed up on them too? | 16:21 |
cnd | cking: bug 535077 | 16:21 |
ubot3 | Malone bug 535077 in linux "WARNING: at /build/buildd/linux-2.6.32/kernel/trace/ring_buffer.c:1984 rb_add_time_stamp+0x20c/0x220()" [Medium,Triaged] https://launchpad.net/bugs/535077 | 16:21 |
cnd | a few dupes of that bug already | 16:21 |
cnd | booting with notsc fixed the issue | 16:22 |
cnd | and the ts in the bug begins with 0xfffffff | 16:22 |
cnd | so it looks very similar to what you found on the arrandale procs | 16:22 |
cking | yep - it's an issue apparently on some CPU's after coming out of S3 - and is listed in some Errata lists | 16:22 |
cnd | cking: so it's not just arrandale? | 16:23 |
cking | cnd, apparently | 16:23 |
cnd | cking: what is the impact of notsc, would it be worth it to just disable tsc by default? | 16:23 |
cking | cnd, I posted a patch in the mailing list that can work around this bug - it stops an overflow | 16:23 |
cnd | cking, even if it stops the overflow, the register still isn't correct right? | 16:24 |
cnd | won't it still cause issues? | 16:24 |
cking | cnd, yep, it's still get screwed. However,it may be worth seeing if a microcode update fixes it - this normally means getting a new BIOS which applies the microcode on boot | 16:24 |
cnd | cking: so what's the impact of notsc, and what if we just disabled it by default? | 16:25 |
cnd | is there some performance impact? | 16:25 |
tgardner | cnd, I think it disables hi-res time of day measurements. | 16:26 |
cking | cnd, I really cannot see much of an impact - but this needs looking into - I was thinking we should disable TSC for L+1 | 16:26 |
cking | tgardner, some I/O delay loop code uses it for sure | 16:26 |
tgardner | cking, disable on all 32 bit platforms ? | 16:26 |
cking | tgardner, not sure - I'd like to know which processors get the problem - it means surveying all the errata sheets :-( | 16:27 |
cking | it may depend on which machines have upto date microcode too | 16:27 |
cnd | cking: so your patch will disable the oops message, right? | 16:28 |
cnd | but it's just for the oops message you've been tracking? | 16:29 |
cking | cnd, the patch will fix the overflow, which causes the softlockup code to not produce stupid messages when the TSC warps to 0xffffffffxxxxxxxx | 16:29 |
cking | cnd, nope, I've also been trying to see if there is a fix to the generic TSC warp issue | 16:29 |
cking | (when coming out of S3) | 16:29 |
cnd | cking: ok, so for now what should we be doing when we see bugs that are caused by the TSC warp | 16:30 |
cnd | should I dupe them of some bug you are working on? | 16:30 |
cnd | or just note that it's harmless, but you can boot with notsc if you want to get rid of them? | 16:31 |
cking | cnd, yes, duping it would be useful - and note it's harmless and boot with notsc | 16:31 |
cnd | cking: what bug would you like me to dupe to? | 16:31 |
* cking looks | 16:31 | |
cking | cnd, how about bug 530487 - I addressed that yesterday | 16:33 |
ubot3 | Malone bug 530487 in linux "BUG: soft lockup - CPU#2 stuck for 0s! [firefox-bin:1751] (during suspend/resume)" [Low,Fix released] https://launchpad.net/bugs/530487 | 16:33 |
cnd | cking, ok thanks | 16:33 |
cking | let me modify the title of it to make it a little more generic too | 16:33 |
cking | done | 16:34 |
cking | cnd, since you deal with these more than me, can you get a list of CPUs which get hit by this TSC issue? | 16:35 |
tgardner | cking, I wonder if there is a way to test for this bogosity. Perhaps drive through an S3 state with an alarm wakeup during boot. | 16:35 |
tgardner | that'd just about wreck boot performance :) | 16:36 |
cnd | cking, I can try to make a list, where do you find errata for procs? | 16:36 |
cking | tgardner, and it may wreck the user experience if the machines have a buggy resume | 16:36 |
cking | I suppose we need to see what the ramifications of totally disabling it - is anything really that dependant on such precise timing? | 16:38 |
tgardner | cking, how about udec delays? | 16:38 |
tgardner | usec* | 16:38 |
cking | tgardner, well, there are the traditional busy loops method, or by looping and checking the TSC to my knowledge - the former is just as valid isn't it? | 16:39 |
cking | my concern is when you have a usec delay on a TSC that warps backwards over a S3 cycle - that could lead to some weirdnesses | 16:40 |
tgardner | cking, its been awhile, but wasn't one of the ways by reading an ISA port (which assumed a constant response time) ? | 16:41 |
cking | tgardner, not sure - things change and I get forgetful on these details | 16:42 |
cnd | cking, do you have a url for an errata for the tsc issue? | 16:56 |
cking | tgardner, the delayed io operations ultimately use __udelay, which can use TSC, or a delay loop. The TSC method does RT pre-emption, so notsc does have some effect on latency issues | 16:58 |
cking | cnd, only for Arrandale CPUs | 16:59 |
cnd | cking: that's fine for now | 16:59 |
cnd | I just would like to see what it says | 16:59 |
tgardner | cking, so the mythtv guys will have a cow since most of them run 32 bit. | 16:59 |
cnd | tgardner: I don't think they will (I'm one of them :) | 17:00 |
cking | tgardner, yep, so I think we disable tsc if it bugs users on the S3 case | 17:00 |
cking | ..but on a need-to-do-so basis rather than a blanket notsc default | 17:00 |
cnd | I've never noticed anything get terribly better when monkeying with higher prio stuff that would do preemption like that I think | 17:00 |
tgardner | cking, agreed | 17:01 |
cnd | cking: what if we just added a check for a huge out of bounds count in native_sched_clock: http://lxr.linux.no/linux+v2.6.33/arch/x86/kernel/tsc.c#L43 | 17:02 |
cking | cnd, well, you know, somebody may want a udelay(1) to be pre-emptied because the iodelay is sucking away too many cycles! | 17:02 |
cnd | if so, set tsc_disabled = 1, and get the clock from the jiffies? | 17:02 |
cking | cnd, this scares me, I'm not sure if disabling it like this is a good idea on a fully running system with all CPUs in action | 17:03 |
cking | worth an experiment | 17:03 |
cnd | cking: if the tsc warping isn't causing issues, how is doing that going to cause issues? | 17:04 |
cking | cnd, I've seem it causes issues on Hardy on an Atom - a I/I delay on a hyperthread that got pushed to another CPU got confused when the TSC was skewed | 17:06 |
cnd | I/I delay? | 17:06 |
cnd | cking: if it got confused due to the tsc being off, how could it be worse if the jiffies result is off? | 17:07 |
cking | cnd, jiffies don't generally warp backwards | 17:08 |
cnd | cking: isn't that a good thing in this case? | 17:08 |
cnd | we don't want the clock to warp backwards | 17:09 |
cking | cnd, very true | 17:09 |
* cking needs to think harder | 17:09 | |
cnd | the only issue I can see is when the switch occurs, if the jiffies calculation result would end up in a warp backwards relative to what the tsc result had been giving | 17:10 |
cnd | I don't know what the likelyhood of that occurring is | 17:10 |
cking | cnd, it happened - it caused me pain | 17:10 |
cnd | maybe we prevent it from switching to jiffies calculation if the tsc is unstable? (there's a var for that) | 17:10 |
cnd | cking: what do you mean by "it caused me pain", you didn't try this out yet did you? | 17:11 |
cking | cnd, I saw it happen on resume one in a few hundred cycles and figuring out what was happening w/o a console caused me pain | 17:12 |
cnd | cking, that's just from basing clock on tsc, I'm wondering how likely it is that switching from tsc to jiffies calculation would result in a warping | 17:13 |
cking | cnd, good question - no idea. This needs some careful examination of the code | 17:14 |
* cking is exhausted of any more clock/TSC/delay knowledge. The devil is in the source code | 17:15 | |
cking | ;-) | 17:15 |
cnd | heh | 17:15 |
=== bjf is now known as bjf-afk | ||
=== Hedge|Hog is now known as Hedgehog | ||
=== Hedgehog is now known as Guest83664 | ||
=== sconklin is now known as sconklin-away | ||
=== bjf-afk is now known as bjf | ||
=== fabbione is now known as fabbione_vac |
Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!