/srv/irclogs.ubuntu.com/2010/03/10/#ubuntu-kernel.txt

jjohansen	dyek: no immediate ideas, it should work, I made a note to check on it	00:02
dyek	jjohansen: OK. Thank you!	00:03
EruditeHermit	hello, is it possible to use the ubuntu kernel build system to build kernel packages and then not start from scratch if I modify a file on when building for the 2nd time?	04:39
EruditeHermit	the make-kpkg clean purges all the old work	04:41
BenC	EruditeHermit: if you are using make-kpkg, then you aren't using the ubuntu kernel build system	04:54
EruditeHermit	BenC, I am using the instructions on this page. https://wiki.ubuntu.com/KernelTeam/GitKernelBuild	04:54
EruditeHermit	BenC, could you please advise me as to the best way to build kernels then?	04:55
EruditeHermit	a link to instructions would be enough	05:02
EruditeHermit	hello, could anyone point me to the current method for building kernel packages on Ubuntu?	05:41
jk-	EruditeHermit: check the build section at http://wiki.ubuntu.com/KernelTeam/KernelMaintenance	05:53
jk-	EruditeHermit: but, in general: fakeroot debian/rules binary-arch	05:53
jk-	but it depends what you're trying to do: build using the ubuntu sources, or build an upstream (or other) kernel tree into a .deb ?	05:57
EruditeHermit	jk-, I am trying to build an upstream kernel from git	05:57
EruditeHermit	and make a deb package out of it	05:57
jk-	ok, then make-kpkg is probably the best way then.	05:58
jk-	(as you're doing..)	05:58
EruditeHermit	however	05:58
EruditeHermit	when I patch some of the files	05:58
EruditeHermit	I don't want to rebuild the whole kernel	05:58
EruditeHermit	just that one file	05:58
jk-	then don't do the clean?	05:59
EruditeHermit	hmm I think it complained	05:59
EruditeHermit	but I will try again	05:59
EruditeHermit	it was a while back	05:59
EruditeHermit	so I was doing this	05:59
EruditeHermit	https://wiki.ubuntu.com/KernelTeam/GitKernelBuild	05:59
EruditeHermit	so you are saying I can make changes to the source	06:00
EruditeHermit	and then just do	06:00
EruditeHermit	CONCURRENCY_LEVEL=`getconf _NPROCESSORS_ONLN` fakeroot make-kpkg --initrd --append-to-version=-custom kernel_image kernel_headers	06:00
EruditeHermit	and it will work?	06:00
jk-	it should	06:00
EruditeHermit	ok	06:00
EruditeHermit	i will try	06:00
EruditeHermit	will the fakeroot debian/rules binary method also work?	06:01
EruditeHermit	I remember that was a quick way to build non kernel packages	06:01
EruditeHermit	but i've never tried it for the kernel	06:01
EruditeHermit	ah debbuild too	06:03
jk-	use make-kpkg if you're not building from an ubuntu source tree	06:04
jk-	upstream kernels don't have a debian/ directory	06:04
EruditeHermit	yeah	06:05
EruditeHermit	but once they are build once with make-kpkg does it not create the debian dir?	06:05
EruditeHermit	jk-, once they are build once with make-kpkg does it not create the debian dir?	06:12
jk-	yeah, it does the necessary stuff	06:12
EruditeHermit	so after one make-kpkg build can you use the fakeroot debian/rules binary method?	06:14
jk-	EruditeHermit: probably best to stick with one method	06:23
EruditeHermit	ok	06:24
EruditeHermit	you seem to be right	06:24
EruditeHermit	it seems to continue building	06:24
jk-	worst hotel internet ever	06:32
jjohansen	jk-: that bad?	06:32
jjohansen	worst hotel internet I ever had was just about unusable (really slow) and would die every 5 min, and then it was a crapshoot at reaquiring	06:33
jjohansen	it was completely worthless, worse than having no internet because you wasted time trying	06:34
jk-	just keeps losing the route; i seem to have a good connection to the AP	06:34
jk-	true :)	06:34
jk-	i guess it's "worst" because I paid for it this time :)	06:34
jjohansen	ah yeah that sucks	06:45
EruditeHermit	jk-, hey, I get the following error, can you help me rectify it? http://pastebin.com/YwAJMAws	07:45
EruditeHermit	I think it has something to do with the fact that I tried to compile a different branch first	07:46
EruditeHermit	which had the 2.6.29 kernel	07:46
EruditeHermit	but I did do a make-kpkg clean before I tried to compile the new one	07:46
EruditeHermit	jk-, hey are you about? it doesn't recognize that I patched files and therefore doesn't create new packages	09:56
=== _ruben_ is now known as _ruben
=== sconklin-gone is now known as sconklin
tgardner	cnd, not with preeemption. whats the wrinkle?	14:32
cnd	tgardner: so there's a bug where a process is scheduling while atomic	14:33
cnd	preempt_count is 0x10000100	14:33
cnd	meaning 1 soft irq count + PREEMPT_ACTIVE	14:34
cnd	so I'm thinking that a cond_resched was likely called during a softirq or tasklet	14:34
tgardner	cnd, is this in a particular driver?	14:34
cnd	the thing is that the stack trace from the "scheduling while atomic" bug doesn't seem accurate	14:34
cnd	it lists the swapper task	14:34
tgardner	a tasklet can reschedule, but not a softirq	14:35
cnd	and the swapper task is in the middle of the acpi_idle_simple call	14:35
cnd	a tasklet is run on top of a softirq...	14:35
tgardner	whats the one run as a kernel threadcalled?	14:35
cnd	workqueue?	14:35
tgardner	ok, I always have to go look.	14:36
cnd	yep, work queue, just checked	14:36
tgardner	cnd, dunno, maybe ask smb. I'm a little rusty in that stuff.	14:36
cnd	so, I saw this thread: http://lkml.indiana.edu/hypermail/linux/kernel/0904.1/01762.html (that's the last email in the thread, read previous replies as necessary)	14:37
cnd	and they used function tracing to figure out what the stack was before the scheduling call	14:37
cnd	but I would have thought that the stack listed in the "scheduling while atomic" bug would have been correct	14:37
cnd	since we use 8K stacks	14:38
tgardner	cnd, good luck with that. I gotta get some other stuff done before I get wound up in that.	14:38
cnd	heh	14:38
cnd	tgardner: if you have deep questions like that, where would you turn to get them answered?	14:38
tgardner	apw, perhaps jj, smb	14:38
* smb has now idea		14:39
cnd	smb, you have no idea?	14:39
cnd	or you now have an idea :)	14:40
apw	cnd gcc doens't always record enough information to make stack back tracking poissible	14:40
smb	cnd, Oh, yes. That is more it.	14:40
smb	That I have no idea	14:40
cnd	everything makes sense if there's a separate task stack for soft irqs	14:40
apw	though i think if you turn on function tracing in ftrace (a build option) i think all functions end up with frames else you can't function trace them either	14:40
cnd	but that's now how we have our kernel set up	14:40
jjohansen	cnd: _kmalloc way be the one doing cond_resched	14:40
cnd	I'm currently building a kernel with CONFIG_DEBUG_SPINLOCK_SLEEP, which also adds sleep checking while atomic when cond_resched or might_sleep are called	14:41
cnd	that gives us the stack trace of current as opposed to the stack trace of rq->curr given by the "scheduling while atomic" bug	14:42
cnd	so maybe that will give us the stack trace of the offending softirq	14:42
cnd	apw, I can't build a new kernel through fdr binary-generic:	14:43
jjohansen	cnd: why?	14:43
cnd	dpkg-deb - error: Debian revision (`debug') doesn't contain any digits	14:43
apw	cnd how so?	14:43
cnd	dpkg-deb: 1 errors in control file	14:43
cnd	dh_builddeb: dpkg-deb -Zbzip2 -z9 --build debian/linux-image-2.6.32-16-generic .. returned exit code 2	14:43
apw	<cnd> dpkg-deb - error: Debian revision (`debug') doesn't contain any digits	14:43
apw	that seems pretty clear	14:43
apw	did you add 'debug' to the end of your version?	14:43
cnd	apw, isn't that something you recently changed though	14:43
cnd	oh, I think I know why	14:43
apw	nope nothing i would have done	14:43
cnd	I did ~lpspinlock-debug	14:44
apw	right so the - means two versions	14:44
cnd	apw, I was thinking of your debug package renaming	14:44
apw	nope	14:44
apw	just take out the - don't think that is legal	14:44
cnd	yeah, trying again	14:44
cnd	jjohansen: why what?	14:45
jjohansen	the debug stuff just covered with apw	14:45
cnd	k	14:45
apw	we assume the format of that in the packaging and a - busts everything, don't do that, it hurts	14:45
cnd	oh, I think I get it now	14:46
cnd	well... I need to think some more	14:46
* cnd is still trying to figure out why rq->curr may be a different task than what's running on the processor		14:47
cnd	apw, I noticed that there isn't an unreleased version at the top of the ubuntu-lucid git tree last night	14:48
cnd	should there be one? (whenever I've pulled to a new version there was one)	14:48
tgardner	cnd, he hasn't done a getabi yet	14:48
apw	cnd quite normal while i am waiting for the builds to complete	14:48
cnd	ahh	14:48
cnd	ok	14:49
apw	just do foo10 instead of ~foo10	14:49
apw	until the start new release appears	14:49
cnd	ok	14:49
jjohansen	cnd: well is it a different task though? the whole irq, softirq, tasklet stuff runs ontop of the current task,	14:55
cnd	jjohansen: yeah, that's why I'm confused as to what's going on	14:55
cnd	if the softirq is running on top of rq->curr, we should see a meaningful stack trace in the bug	14:56
cnd	but we aren't	14:56
jjohansen	cnd: the stack trace isn't even always right when not in irq	14:57
cnd	jjohansen: because of the way it's compiled, or because of some kernel internal stuff?	14:58
jjohansen	cnd: both of those, obviously compiling without stackframes makes stack traces a lot less accurate	14:59
jjohansen	but compiler optimizations can delay when you would expect the sp to be updated etc	15:00
jjohansen	the stacktrace should only be considered a best effort	15:01
cnd	I'm guessing that's why the mailing list thread I saw used function tracing to get a more accurate trace	15:02
jjohansen	cnd: yeah	15:03
cnd	jjohansen: ok, I have an idea of how to go about getting definitive information out of the bug I'm working	15:04
cnd	thanks	15:04
_stink_	anyone know where I can find .debs of previous 2.6.32 versions for lucid? it seems that -16 has broken virtualbox guest additions, and i'd like to go back to -15, but it's not in the repo anymore.	15:22
_stink_	this install was just done today, so i don't have -15 installed locally and available via grub.	15:22
tankenmate	anyone heard of any problems with the lucid livecd kernel / mdadm / kvm creating hangs / crashes?	15:27
tankenmate	i was trying to install lucid from the livecd onto a md partition (all on a kvm guest machine), and after about 30-240 seconds the kernel just hard crashes..	15:28
tankenmate	i looked on google / lkml and nothing obvious appeared..	15:29
apw	kvm crashing, yes if it was 2.6.32-15 kernel	15:31
tankenmate	any suggestions? deboostrap?	15:32
apw	though that sounds like it was sooner than your issue	15:32
apw	and it was resolved in later kernels, which one did you test with	15:32
tankenmate	just a second... i'll get the version, it was from 2010-03-08 iso image	15:33
tankenmate	Linux ubuntu 2.6.32-15-generic #22-Ubuntu SMP Tue Mar 2 02:23:29 UTC 2010 x86_64 GNU/Linux	15:36
apw	as that was -15 i would retest with a later image containing the -16 kernel, there was some bad kvm patches in that version	15:36
tankenmate	well the kvm host is running off jaunty, the guest is running lucid..	15:37
tankenmate	hmmm i'll try with a karmic server install and debootstrap to lucid and see how it goes...	15:38
tankenmate	weird thing is there was no oops, the kvm host didn't complain.. the virt machine just died, no logs, no errors nothing..	15:39
tankenmate	almost like a triple fault or something...	15:39
cnd	cking: ping	16:20
cking	cnd, yo	16:21
cnd	do you have any thoughts on the TSC issue for non-arrandale hw?	16:21
cking	cnd, like it can be screwed up on them too?	16:21
cnd	cking: bug 535077	16:21
ubot3	Malone bug 535077 in linux "WARNING: at /build/buildd/linux-2.6.32/kernel/trace/ring_buffer.c:1984 rb_add_time_stamp+0x20c/0x220()" [Medium,Triaged] https://launchpad.net/bugs/535077	16:21
cnd	a few dupes of that bug already	16:21
cnd	booting with notsc fixed the issue	16:22
cnd	and the ts in the bug begins with 0xfffffff	16:22
cnd	so it looks very similar to what you found on the arrandale procs	16:22
cking	yep - it's an issue apparently on some CPU's after coming out of S3 - and is listed in some Errata lists	16:22
cnd	cking: so it's not just arrandale?	16:23
cking	cnd, apparently	16:23
cnd	cking: what is the impact of notsc, would it be worth it to just disable tsc by default?	16:23
cking	cnd, I posted a patch in the mailing list that can work around this bug - it stops an overflow	16:23
cnd	cking, even if it stops the overflow, the register still isn't correct right?	16:24
cnd	won't it still cause issues?	16:24
cking	cnd, yep, it's still get screwed. However,it may be worth seeing if a microcode update fixes it - this normally means getting a new BIOS which applies the microcode on boot	16:24
cnd	cking: so what's the impact of notsc, and what if we just disabled it by default?	16:25
cnd	is there some performance impact?	16:25
tgardner	cnd, I think it disables hi-res time of day measurements.	16:26
cking	cnd, I really cannot see much of an impact - but this needs looking into - I was thinking we should disable TSC for L+1	16:26
cking	tgardner, some I/O delay loop code uses it for sure	16:26
tgardner	cking, disable on all 32 bit platforms ?	16:26
cking	tgardner, not sure - I'd like to know which processors get the problem - it means surveying all the errata sheets :-(	16:27
cking	it may depend on which machines have upto date microcode too	16:27
cnd	cking: so your patch will disable the oops message, right?	16:28
cnd	but it's just for the oops message you've been tracking?	16:29
cking	cnd, the patch will fix the overflow, which causes the softlockup code to not produce stupid messages when the TSC warps to 0xffffffffxxxxxxxx	16:29
cking	cnd, nope, I've also been trying to see if there is a fix to the generic TSC warp issue	16:29
cking	(when coming out of S3)	16:29
cnd	cking: ok, so for now what should we be doing when we see bugs that are caused by the TSC warp	16:30
cnd	should I dupe them of some bug you are working on?	16:30
cnd	or just note that it's harmless, but you can boot with notsc if you want to get rid of them?	16:31
cking	cnd, yes, duping it would be useful - and note it's harmless and boot with notsc	16:31
cnd	cking: what bug would you like me to dupe to?	16:31
* cking looks		16:31
cking	cnd, how about bug 530487 - I addressed that yesterday	16:33
ubot3	Malone bug 530487 in linux "BUG: soft lockup - CPU#2 stuck for 0s! [firefox-bin:1751] (during suspend/resume)" [Low,Fix released] https://launchpad.net/bugs/530487	16:33
cnd	cking, ok thanks	16:33
cking	let me modify the title of it to make it a little more generic too	16:33
cking	done	16:34
cking	cnd, since you deal with these more than me, can you get a list of CPUs which get hit by this TSC issue?	16:35
tgardner	cking, I wonder if there is a way to test for this bogosity. Perhaps drive through an S3 state with an alarm wakeup during boot.	16:35
tgardner	that'd just about wreck boot performance :)	16:36
cnd	cking, I can try to make a list, where do you find errata for procs?	16:36
cking	tgardner, and it may wreck the user experience if the machines have a buggy resume	16:36
cking	I suppose we need to see what the ramifications of totally disabling it - is anything really that dependant on such precise timing?	16:38
tgardner	cking, how about udec delays?	16:38
tgardner	usec*	16:38
cking	tgardner, well, there are the traditional busy loops method, or by looping and checking the TSC to my knowledge - the former is just as valid isn't it?	16:39
cking	my concern is when you have a usec delay on a TSC that warps backwards over a S3 cycle - that could lead to some weirdnesses	16:40
tgardner	cking, its been awhile, but wasn't one of the ways by reading an ISA port (which assumed a constant response time) ?	16:41
cking	tgardner, not sure - things change and I get forgetful on these details	16:42
cnd	cking, do you have a url for an errata for the tsc issue?	16:56
cking	tgardner, the delayed io operations ultimately use __udelay, which can use TSC, or a delay loop. The TSC method does RT pre-emption, so notsc does have some effect on latency issues	16:58
cking	cnd, only for Arrandale CPUs	16:59
cnd	cking: that's fine for now	16:59
cnd	I just would like to see what it says	16:59
tgardner	cking, so the mythtv guys will have a cow since most of them run 32 bit.	16:59
cnd	tgardner: I don't think they will (I'm one of them :)	17:00
cking	tgardner, yep, so I think we disable tsc if it bugs users on the S3 case	17:00
cking	..but on a need-to-do-so basis rather than a blanket notsc default	17:00
cnd	I've never noticed anything get terribly better when monkeying with higher prio stuff that would do preemption like that I think	17:00
tgardner	cking, agreed	17:01
cnd	cking: what if we just added a check for a huge out of bounds count in native_sched_clock: http://lxr.linux.no/linux+v2.6.33/arch/x86/kernel/tsc.c#L43	17:02
cking	cnd, well, you know, somebody may want a udelay(1) to be pre-emptied because the iodelay is sucking away too many cycles!	17:02
cnd	if so, set tsc_disabled = 1, and get the clock from the jiffies?	17:02
cking	cnd, this scares me, I'm not sure if disabling it like this is a good idea on a fully running system with all CPUs in action	17:03
cking	worth an experiment	17:03
cnd	cking: if the tsc warping isn't causing issues, how is doing that going to cause issues?	17:04
cking	cnd, I've seem it causes issues on Hardy on an Atom - a I/I delay on a hyperthread that got pushed to another CPU got confused when the TSC was skewed	17:06
cnd	I/I delay?	17:06
cnd	cking: if it got confused due to the tsc being off, how could it be worse if the jiffies result is off?	17:07
cking	cnd, jiffies don't generally warp backwards	17:08
cnd	cking: isn't that a good thing in this case?	17:08
cnd	we don't want the clock to warp backwards	17:09
cking	cnd, very true	17:09
* cking needs to think harder		17:09
cnd	the only issue I can see is when the switch occurs, if the jiffies calculation result would end up in a warp backwards relative to what the tsc result had been giving	17:10
cnd	I don't know what the likelyhood of that occurring is	17:10
cking	cnd, it happened - it caused me pain	17:10
cnd	maybe we prevent it from switching to jiffies calculation if the tsc is unstable? (there's a var for that)	17:10
cnd	cking: what do you mean by "it caused me pain", you didn't try this out yet did you?	17:11
cking	cnd, I saw it happen on resume one in a few hundred cycles and figuring out what was happening w/o a console caused me pain	17:12
cnd	cking, that's just from basing clock on tsc, I'm wondering how likely it is that switching from tsc to jiffies calculation would result in a warping	17:13
cking	cnd, good question - no idea. This needs some careful examination of the code	17:14
* cking is exhausted of any more clock/TSC/delay knowledge. The devil is in the source code		17:15
cking	;-)	17:15
cnd	heh	17:15
=== bjf is now known as bjf-afk
=== Hedge\|Hog is now known as Hedgehog
=== Hedgehog is now known as Guest83664
=== sconklin is now known as sconklin-away
=== bjf-afk is now known as bjf
=== fabbione is now known as fabbione_vac

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!