/srv/irclogs.ubuntu.com/2016/01/13/#ubuntu-kernel.txt

tsimonq2	under "What do you need help with?" on https://wiki.ubuntu.com/Kernel/GettingInvolved , it shows an outdated link and I don't know where the real link would be.	02:42
dsmythies	tsimonq2, I fixed the link. Try it now.	03:59
apw	dsmythies, it is a consequence of CONFIG_ZONE_DEVICE turning off CONFIG_ZONE_DMA yes	08:29
apw	dsmythies, not convinced the constraints are reasonable, at least for x86_64 where you don't have HIGHMEM so i think there is room for both though the config constraints prevent that	09:06
apw	dsmythies, is that card used for something we are going to notice, like qemu ?	09:06
tsimonq2	thanks dsmythies	11:38
rtg	caribou: could you have a look at bug #1528101 and tell me if kexec is functioning as designed ?	13:26
ubot5	bug 1528101 in linux (Ubuntu) "ISST-LTE: kdump failed: second kernel booting hangs after /scripts/init-bottom when large min_free_kbytes value being set" [Undecided,New] https://launchpad.net/bugs/1528101	13:26
caribou	rtg: sure	13:27
apw	caribou, not sure if we have any sane way generically to know which options are good to keep and which are not for a kdump kernel, i wonder if we almost want a separate directory of overrides for kdump.	13:28
caribou	apw: I've been running tests to try to come up with some sensible rules for the definition of crashkernel	13:29
caribou	but there are so many variables that come into play that it is kinda hard	13:30
caribou	for instance, I've been testing on VM with up to 22Gb : the default 128M works fine	13:30
apw	so we might just say that this one is a speicifc override to be osmething sane like 1%	13:31
apw	and be happy	13:31
caribou	apw: but when I test on real hardware with 128G of memory, crashkernel=128M comes short not because of the memory but because of the fact that initrd is bigger and it OOMs	13:31
apw	caribou, ok, even with your magic smaller one ?	13:32
apw	so an =dep initrd is still too big for 128M /	13:33
apw	?	13:33
caribou	apw: that was on trusty so the change was not there; I went on to rebuild with MODULES=dep & got bitten by the initramfs-tools bug that I fixed	13:33
caribou	apw: it works now with a smaller initrd	13:33
apw	ahh ok	13:33
apw	of course this bug might be moot with that fix too, hrm	13:34
caribou	apw: but in this case, the boot sequence fails before kdump even comes into play	13:34
caribou	apw: partly, on system with SSDs, the kernel hook would fail when building the smaller initrd file	13:34
apw	which we should fix i assume	13:35
apw	for this bug, they are setting min_free somewhere, presumably in /etc/sysctl.d.conf and we really should not be honouring that in the constrained environment	13:36
apw	its not clear how one would know that though programatically	13:36
caribou	apw: LP: #1532146 is waiting for a sponsor if you have a minute ;-)	13:36
ubot5	Launchpad bug 1532146 in initramfs-tools (Ubuntu) "update-initramfs fails for MODULES=dep when root is on nvme device" [Medium,In progress] https://launchpad.net/bugs/1532146	13:36
apw	caribou, yeah its on my todo list :) i have another initramfs-tools bodge to ram in, just updateing aufs for a nasty hang bug	13:37
caribou	apw: that's fine, as long as it is queued to be fixed somewhere	13:37
caribou	apw: in that case (LP: #1532146), it doesn't even make it to kdump & fails before	13:38
ubot5	Launchpad bug 1532146 in initramfs-tools (Ubuntu) "update-initramfs fails for MODULES=dep when root is on nvme device" [Medium,In progress] https://launchpad.net/bugs/1532146	13:38
caribou	apw: so you're right, we should find a way to boot the kexec kernel w/o using any tailored sysctl values	13:38
caribou	oups, wrong bug : bug #1528101	13:39
ubot5	bug 1528101 in linux (Ubuntu) "ISST-LTE: kdump failed: second kernel booting hangs after /scripts/init-bottom when large min_free_kbytes value being set" [Undecided,New] https://launchpad.net/bugs/1528101	13:39
caribou	rtg: so for your question : kexec doesn't even get activated so yes, works as designed ;-)	13:39
rtg	caribou: since I don't really understand the fine points of the discussion, I'll bug apw and make him explain it.	13:41
caribou	apw: everything in /etc/sysctl.conf + /etc/sysctl.conf.d should not be honoured when the kexec kernel boots	13:43
apw	caribou, should not as in you think they are not, or you think they are and shouldn't ?	13:44
caribou	apw: looks like they are (if indeed they hare changing vm.min_free_kbytes with sysctl)	13:45
caribou	apw: so when the kexec kernel boots post-panic, the kernel should boot w/o implementing what's in sysctl.conf[.d] if possible	13:46
caribou	apw: the kexec kernel is only there to allow for the capture of the kernel dump	13:47
caribou	apw: so there is no reason to apply any user-define customization there	13:47
caribou	am I being clear ?	13:47
rtg	caribou: that makes sense to me	13:48
caribou	apw: rtg: it should be trivial to add a check in /etc/init.d/procps to verify if we have booted in the kexec kernel & not apply the changes	13:49
rtg	caribouhow would you preserve the original sysctl values ?	13:49
rtg	nm, the new kernel would have them as part of its image.	13:50
caribou	apw: rtg: /proc/vmcore is only present if we are booted on the kexec kernel post-panic. This is how kdump-config knows that it has to capture the content	13:51
rtg	caribou: what agent applies sysctl values to the kexec kernel ? kdump ? the initrd ?	13:51
tseliot	rtg: I've just fixed fglrx in xenial. It was pretty easy this time	13:52
caribou	rtg: /usr/sbin/procps apparently	13:52
caribou	rtg: no, /etc/init.d/procps	13:52
rtg	tseliot, thanks, it was the primary show stopper for a 4.4 kernel in xenial	13:52
tseliot	good :)	13:52
ricotz	rtg, tseliot, nice :)	13:55
apw	caribou, yeah, that sounds like a sensible option, i might suggest it ought ti possibly have a kdump.conf.d or osmething which is applies instead, just in case there is something you need	14:00
caribou	apw: make sense; the override would only apply kdump.conf if it exists	14:01
apw	caribou, then in the unlikley event there is something you need in both you link the ifle into both .d's and be happy	14:02
caribou	apw: though, it does nothing for this bug as I doubt that touching something as central as procps would qualify for SRU	14:03
apw	caribou, there is nothing to say you can't change things like that, we might need to make it opt in or something	14:03
apw	caribou, so like if there is a kdump.conf we use that _rather_ than the normal ones, and they can make it empty	14:03
apw	sysctl-kdump.conf	14:04
apw	so that the default doesn't change perhaps, something like that	14:04
apw	you are our dump expert, and if you say its wrong that kinda makes it wrong and wrong things can be fixed in sru's	14:05
caribou	apw: well the argument that it only applies in the context of the kexec post-panic kernel an never in a normal user context migth be sufficient	14:05
caribou	apw: am I ???? :-D	14:05
apw	caribou, clearly :)	14:05
caribou	ok, let me run with that & see what I can propose; I'll also comment in the bug	14:06
caribou	apw: ^	14:06
rtg	caribou: would you like me to take a stab at the patch and run it by you ?	14:06
apw	rtg, or you could sponsor it for him	14:07
rtg	apw, that to	14:07
rtg	too*	14:07
apw	sweet sorted	14:07
caribou	rtg: let me try it, then I'll run it by you guys	14:07
rtg	ack	14:07
caribou	apw: rtg: is it possible to force vm.min_free_kbytes as a boot parameter (for a temporary workaround) ?	14:14
apw	caribou, it is possible yes	14:18
apw	i can't say for cirtain without looking though	14:18
* rtg does not see anything promising in Documentation/kernel-parameters.txt		14:20
caribou	apw: don't bother, I'll look it up; thought you might know it off the top of your head	14:20
apw	ak	14:21
apw	ack	14:21
caribou	apw: fedora fixed it a while ago : https://lists.fedoraproject.org/pipermail/kexec/2014-November/001478.html	14:25
caribou	apw: not too found of how they fixed it though	14:25
apw	caribou, no, but then there is precident, perhaps we can do that in SRUs and something cleverer in the new releases	14:27
caribou	apw: indeed	14:28
apw	cirtainly i'd not want to hard code it like that, ugg	14:28
apw	and that only fixes it if it is in /etc/sysctl.conf at that, not sysctl.d ... hrm	14:29
rtg	eeew!	14:29
caribou	tbh, I'm not sure if the fix went in but it was reported	14:30
caribou	rtg: apw: here is a question for you : does the kernel rely on sysctl settings to boot correctly (i.e. do we use that mechanism to fix some behaviors) ?	14:32
apw	caribou, well as they are applied in root, they cannot be used to fix root disk related behavious	14:33
rtg	caribou: there might be a lot more console noise	14:33
caribou	rtg: I also see some hardening things in there we might want to keep	14:34
rtg	caribou: doesn't kexec immediately reboot ?	14:34
rtg	after acquiring the dump	14:35
caribou	rtg: yes	14:35
rtg	in which case hardening might not be so important	14:35
apw	caribou, and we don't bring up things like networking do we	14:39
caribou	apw: yes, especially if we want to do remote dumps	14:39
caribou	apw: but it is not even dependent on remote dumps : network comes up systematically	14:40
caribou	made my life soo much easier when I developped remote dumps	14:40
apw	caribou, then there are options there which may be necessary, hrm	14:42
apw	caribou, how about then having an sysctl-kdump.conf which is loaded _after_ all the others, which can set things "back"	14:43
caribou	apw: interesting; the only thing is to know _what_ to set "back"	14:44
apw	caribou, well we literally only know of one thing which should not be set, so it could start with that	14:44
apw	(if we know what a sane value is)	14:45
caribou	apw: in worst case scenario, it would be the sysadmin responsability to add to this file in case of problem; at least the mechanism is there	14:46
apw	caribou, yeah, something like that	14:46
caribou	apw: & we override the values for things we know would cause problems	14:46
apw	the only issue really is knowing what to put in that one	14:46
rtg	perhaps just vm.mmap_min_addr for now (which we _know_ is causing problems)	14:48
caribou	rtg: that's my idea for a starte	14:48
caribou	starter	14:48
caribou	rtg: apw: as I don't think we have an easy way to identify what was the original value in the kernel's namelist before sysctl changed it	14:49
apw	caribou, yeah, and its magic essentially	14:49
apw	asses	14:49
caribou	apw: ok let start with that	14:50
rtg	if it is never set after kexec boots, then it is by default set to the kernel value, right ?	14:50
rtg	which ought to be OK	14:50
caribou	rtg: the kexec kernel will apply the sysctl changes just like the regular kernel	14:51
caribou	rtg: just trying to find a way to avoid disabling sysctl alltogether	14:51
rtg	caribou: except that you are going to modify procps to do something different (I thought)	14:51
apw	caribou, of course we know how much memeory we have, it is 128M so a fixed value for that is "ok"	14:51
caribou	apw: let me run a first pass at it & see what I can come up with	14:54
apw	rtg, the point here is that sysctls can be requried for networking and the like	14:54
apw	rtg, and knowing whihc are ok and which not is hard	14:54
rtg	understood	14:54
apw	caribou, i assume what happens now is we copy these into the initrd, apply them all there, and then again once we mount root (for non kdump boots)	14:55
apw	caribou, which is why eliding them by name works for the fedora case	14:55
caribou	apw: no, apparently the sysctl stuff is taken from thre real root & not the initrd	14:56
apw	caribou, well that code is copying them into the initrd and they elide the problem one at that point	14:57
apw	anyhow have a go and see what happens	14:57
caribou	apw: sure	14:58
dsmythies	apw: I do not actually have the sound card (Creative Labs SB Audigy) and came here on behalf of someone else, after helping them isolate the issue (at least to the point where I needed help here). Yes, his AlsaMxer program isn't working,as the card isn't being detected. I also wonder if some of those other sound card differences in the kernel configuration file are for the same reason (ref: http://paste.ubuntu.com/14481736/ )	15:00
apw	dsmythies, oddness, the delta i was shown was just the one card	15:01
apw	rtg, ^ ?	15:01
apw	rtg, did we investigate whether we could remove that config snippet for CONFIG_ZONE_DEVICE ?	15:01
tjaalton	4.4 doesn't see my nvme drive.. boot fails because of that	15:03
tjaalton	mainlines at least	15:08
tjaalton	are the soon-to-hit xenial packages somewhere?	15:08
apw	tjaalton, ckt unstable ppa	15:09
rtg	apw, I have not.	15:09
tjaalton	apw: thx	15:09
apw	rtg, i think we need to resolve what we are donig about that one before we release 4.4 as well	15:11
rtg	apw, maybe have a diff setting for lowlatency ?	15:11
apw	rtg, well whether we can in fact turn on both for the case we care about, amd64	15:13
dsmythies	apw, rtg: Thiss issue is on both lowlatency and generic, but yes amd64.	15:13
apw	rtg, obbvoiusly the config constraints don't let us, but i think on amd64 we have no HIGHMEM so we have one more spare bit, and if we do then we can have zone_dma and zone_device together	15:14
apw	that means i386 can't have both, but we're not caring about those	15:14
rtg	apw, yeah, I'm not too concerned about NVDIMM on 32 bit	15:15
dsmythies	tjaalton: Have a look at this: http://askubuntu.com/questions/709794/dell-xps-13-9350-compatibility/719947#719947 I do not know if it is true or not that CONFIG_BLK_DEV_NVME=y is needed.	15:16
manjo	I have been modifying debian/rules.d/2-binary-arch.mk to skip modules signing if the private key is not available .. is there a way to tell builds in LP not to fail of if private key is not available? or what is the correct way of building it ?	15:18
manjo	apw, ^ ?	15:20
apw	manjo, we generate the private key during the build ?	15:22
tjaalton	dsmythies: sounds like initramfs fail to me	15:23
manjo	install-%: MODSECKEY=$(builddir)/build-$*/certs/signing_key.pem	15:23
manjo	apw, ^ looks like it expects the key to be there	15:23
manjo	apw, May be I am missing something .. I have been doing if [[ -f "$(MODSECKEY)" ]] ; then \ round the signing code ... which I don't think it the right thing to do	15:24
apw	manjo, right, it expects it to be there because it makes it	15:25
apw	manjo, so it should be making it, why is yours not	15:25
manjo	apw, ppisati ran into the same issue with arm builds (many months ago)	15:25
apw	maybe so, if so he may remmber what he was doing wrong	15:25
apw	manjo, but i say again, the build makes that key	15:25
manjo	apw, ok let me build in PPA and get it to fail... that will give me and idea what is going on ..	15:26
apw	manjo, see certs/Makefile	15:27
apw	@echo "### Now generating an X.509 key pair to be used for signing modules."	15:27
apw	it is that key which we then use to _resign_ the modules after stripping them	15:27
manjo	apw, I have been blindly carrying this patch is my PPA builds and May be it is not needed anymore ... let me build without it and see if the keys are generated	15:28
apw	manjo, we have always built our kernles in PPAs and never needed such a thing	15:29
rtg	manjo, there is this patch for newer kernels: 'UBUNTU: [Debian] Update to new signing key type and location'	15:29
manjo	rtg, ok so that must have fixed it for me .. I will drop this mod to skip signing	15:30
tjaalton	dsmythies: 4.4.0-0 finds the drive, but cryptsetup fails with "lvm is not available"	15:30
caribou	apw: rtg: back to the vm.min_free_kbytes	15:31
rtg	hmm	15:31
apw	tjaalton, you'd think we would have noticed that lvm wasn't available in initrd ?	15:32
apw	caribou, ?	15:32
caribou	apw: rtg: how about I add an option to kdump-config that would re-apply values as found in /etc/default/kdump-tools ?	15:32
caribou	apw: rtg: this option would be called by /etc/init.d/procps if /proc/vmcore is found	15:33
apw	/etc/default/kdump-tools is in shell format right? waht would you proposed to encode in tehre ?	15:33
rtg	caribou: what would be the packaged defaults in /etc/default/kdump-tools ? Anything ?	15:34
caribou	apw: yes, I would add SYSCTL=vm.min_free_kbytes=blah and loop through each SYSCTL varible found	15:35
caribou	rtg: we can put anything we know can cause problems & sysadmin can add to it'	15:35
rtg	seems reasonable	15:35
apw	caribou, or perhaps SYSCTL_UPDATES=/etc/sysctl-kdump.conf or something and use that file	15:35
caribou	this way I restrict modifications outside of the kdump-tools realm to a minimum	15:35
apw	caribou, but i like the idea of passing the problem to kdump-config in procps	15:36
caribou	apw: the advantage of looping through the existing /etc/default/kdump-tools file is that I don't need an extra file	15:36
apw	i think making that file overly complex is a mistake because things like systemd read them for you and convert them to environment variables and the like	15:37
apw	caribou, well mangling existing defaults which are there are difficult especially with that new non-standard format	15:38
apw	by making that a differnt sort of entry you make it hard to migrate configs later etc	15:38
apw	caribou, also why do we even need to change procps if you do it via kdump-tools	15:38
caribou	apw: true; especially since kdump-config sources the default file at the begininng. I'll use your option	15:39
caribou	apw: because, like in that specific bug, we don't get as far as running kdump-config, it fails before that	15:39
caribou	I can reproduce it btw	15:39
caribou	apw: that was my initial idea, but it has to happen right when procps runs	15:39
apw	caribou, but can we just get in before it, by being earlier	15:40
apw	kdump-config is running in initramfs or in real root when it runs	15:40
apw	?	15:40
caribou	apw: real root, so is procps	15:40
caribou	apw: so the procps modification is only to run sysctl -p /etc/kdump-config.conf if /proc/vmcore exists	15:41
apw	caribou, i presume the issue then is that the act of setting it to a large value ooms the box immediadly	15:44
apw	caribou, so you can't then set it back, you already ooms	15:44
caribou	apw: exactly	15:44
dsmythies	apw, rtg: For my own curiousity, could you point me to "the config snippet for CONFIG_ZONE_DEVICE". So far, I haven't been able to find it.	15:45
rtg	mm/Kconfig:config ZONE_DEVICE	15:45
apw	config ZONE_DEVICE	15:45
apw	bool "Device memory (pmem, etc...) hotplug support" if EXPERT	15:45
apw	default !ZONE_DMA	15:45
apw	depends on !ZONE_DMA	15:45
apw	we might also need to fix the ZONE_SHIFT calculation, but, i can't see why this won't work	15:46
tjaalton	apw: I'll check the diff between 4.3 & 4.4 initrd's and see what's going on there	15:46
apw	tjaalton, lack of -extras perhaps ?	15:47
apw	in your testing ?	15:47
tjaalton	well this same thing is with mainline builds	15:47
apw	caribou, so are you proposing to only apply that one file if you find it, and let kdump-ocnfig generate it or ?	15:47
tjaalton	-extra is installed	15:47
apw	caribou, as we cannot apply the real changes and then apply this one back over the top, as that will likely oom us in the middle	15:47
caribou	apw: looks like it (I'm testing it at the same time)	15:48
caribou	apw: I was thinking of applying the content of /etc/kdump-config.conf over the existing	15:48
apw	i think you'll find the first will blammo on you	15:49
caribou	apw: looks like the simple fact of applying vm.min_free_kbytes triggers the OOM	15:49
apw	which makse sense when you look at the kernel code, it forfully empties memeory to make the boundary	15:49
caribou	apw: then no matter what, we need to avoid this change to even happen	15:50
apw	ugg	15:51
apw	caribou, does the kernel know it is in kdump mode ?	15:52
caribou	apw: yes, as it exposes /proc/vmcore	15:52
caribou	apw: but isn't changing the kernel because one silly setup triggers OOM a bit of an overkill	15:53
caribou	?	15:53
rtg	apw, fix this in the kernel by ignoring vm.min_free_kbyte if kexec'ed ?	15:53
apw	rtg, i am thinking about it, yes	15:54
apw	if /proc/vmcore is valid in fact, but yes	15:54
rtg	hmm	15:54
apw	as kexecing is a valid way to reboot too	15:54
apw	and in the normal case we don't care, but if we are exposing a dump file to uspace then we coudl ignore uspace	15:54
apw	caribou, rtg, though there is a separate way of looking at this	15:55
caribou	userland is changing a kernel setup in a limited memory context, I would rather avoid changing this value if we very well know that we're in a limited contgext	15:55
apw	that the request for memory can _never_ succeed, the request for reserve is large than total ram	15:55
caribou	limited memory context	15:55
apw	returning -ENOWAY instead vbecaus the request is rediculous might be sane	15:55
caribou	apw: true	15:56
rtg	instead of OOM'ing	15:56
apw	right ooming is rarely useful	15:56
rtg	apw, doing it that way means we don't have to care if /proc/vmcore is valid	15:57
apw	rtg, so the code as it stands divides the limit you specify between all the non-highmem zones you have	16:00
apw	so if you say 1000 pages and you have 10 in DMA and 90 in NORMAL	16:00
apw	it will demand 100 in DMA and 900 in NORMAL even though that is patently nuts	16:00
rtg	apw, seems like refusing the sysctl is a reasonable action	16:02
apw	rtg, though even if i make it say you need enough memory in your zones to cope there are likely to be combinations which are just numerically valid and not viable	16:03
apw	rtg, and telling those appart is tricky at best	16:03
apw	caribou, eliding specific entries in userspace falls to procps, to sysctl itself detecting /proc/vmcore and ignoreing some names textually (me thinks) which is a little vile, but no vialer than doing so in the kernel	16:08
tjaalton	apw: weird, 4.2.0-16 doesn't have sbin/lvm either, but it still works	16:09
caribou	apw: ok, I'm looking into that	16:09
caribou	apw: and at the same time, asking for the kernel to keep a minimum of 312M free and setting crashkernel=128M makes no sense, they should just increase crashkernel	16:10
apw	caribou, well they claim to be using 768M for crashkernel i think, but either way its bonkers	16:11
tjaalton	apw: oh, nvme.ko is not shipped with 4.4 initrd, because the module location changed	16:13
apw	tjaalton, i thought initramfs-tools looked at those by module name only ?	16:14
tjaalton	dunno, but it should be in kernel/drivers/nvme/host/nvme.ko now	16:15
apw	oh perhaps it is shipping all "block/*" or something	16:15
tjaalton	instead of kernel/drivers/block/nvme.ko	16:15
apw	tjaalton, can you file me a bug against initramfs-tools for that please	16:15
rtg	tjaalton, that means I should fix debian.master/control.d/generic.inclusion-list:drivers/block/nvme.ko	16:16
apw	and give me the # and i'll batch that up with the other two critical fixes	16:16
apw	rtg, needs fixing there too yes	16:16
rtg	apw, I'll take care of it	16:16
tjaalton	rtg: you'll file the bug too?	16:16
rtg	tjaalton, nope, its dev kernel. I'll just fix it	16:17
apw	i think he is saying he will fix the kernel bit, but i would love a bug for initramgs-tools for that bit	16:17
tjaalton	right but initramfs-tools needs fixing too	16:17
tjaalton	yeah I'll file that bit	16:17
apw	copy_modules_dir kernel/drivers/block	16:18
apw	yeah it copies on all block devices, and that just moved out of there	16:18
tjaalton	fixed in debian https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=807000	16:18
ubot5	Debian bug 807000 in initramfs-tools "initramfs-tools: module nvme not included in block modules on kernel > 4.2" [Normal,Fixed]	16:18
apw	tjaalton, not sure that that fix makes a whole heap of sense in the face of the change the kernel is making, long term, but hey	16:21
tjaalton	apw: what do you mean? cryptsetup works again with that, just tested	16:25
tjaalton	apw: heh, two bugs already filed about this	16:28
tjaalton	https://bugs.launchpad.net/ubuntu/+source/initramfs-tools/+bug/1524879 https://bugs.launchpad.net/ubuntu/+source/initramfs-tools/+bug/1532146	16:29
ubot5	Launchpad bug 1524879 in initramfs-tools (Ubuntu) "initramfs-tools, Xenial is missing NVME kernel driver" [High,New]	16:29
ubot5	Launchpad bug 1532146 in initramfs-tools (Ubuntu) "update-initramfs fails for MODULES=dep when root is on nvme device" [Medium,In progress]	16:29
tjaalton	the first already properly assigned :P	16:30
* xnox thought apw was going to sponsor that cherrypick		16:32
xnox	as asked on irc some friday back.	16:32
rtg	ppisati, I forgot to mention that Ubuntu-4.4.0-1000.6 works well with your image	16:34
apw	xnox, then i am already doing it indeed :)	16:34
ppisati	rtg: good to know	16:35
dsmythies	apw, rtg: I am not understanding the config dependency trail, or what to change in order to be able to compile a test kernel. Do I just delete that ZONE_DEVICE snippet?	16:35
rtg	ppisati, hopefully it will work with Snappy as well	16:35
apw	dsmythies, no, we want to test building with the config requirement for !ZONE_DMA removed and that turned back on	16:36
ppisati	rtg: it will	16:36
apw	and see if tht builds on amd64	16:36
ppisati	rtg: though i might have found a problem with xenial/arm64	16:37
ppisati	rtg: so, no matter if i cross or native compile	16:37
rtg	ppisati, the module load problem ?	16:37
ppisati	rtg: right	16:37
ppisati	rtg: if i compile under wily is fine	16:37
rtg	ppisati, I noticed that it seems to be a toolchain issue	16:37
ppisati	rtg: while it blows in a xenial chroot	16:37
rtg	dannf found that one, right ?	16:38
ppisati	rtg: it happens if i crosscompile (xenial/amd64 -> arm64) or native (my arm64 ppa)	16:38
ppisati	rtg: i found it, and dannf confimed he sees that too in xenial 4.3	16:38
ppisati	dannf: ^	16:38
rtg	ppisati, what is Linaro using ?	16:38
dsmythies	apw: O.K. so delete this line: "depends on !ZONE_DMA" then. (sorry for being a bit think on thick on this one).	16:39
ppisati	rtg: no idea	16:39
ppisati	rtg: you mean the toolchain	16:39
rtg	yeah, maybe Salvetti has some thoughts	16:39
dannf	ppisati, rtg: my next step was to rebuild gcc w/o linaro patches to see if that fixes it	16:40
dannf	(otp, sorry for lag)	16:40
rtg	dannf, are there patches in gcc specific to that arm64 errata ?	16:41
dsmythies	I meant to say "sorry for being a bit thick on this one".	16:44
apw	dsmythies, yes, then readd CONFIG_ZONE_DMA=y to the config and run fakeroot debian/rules updateconfigs	16:44
apw	and then see if the build builds any more, i suspect it may not	16:44
apw	but it may be resolvable	16:44
apw	(on amd64)	16:44
dannf	rtg: yeah	16:46
dsmythies	apw: I will report back, but it might be awhile (like hours).	16:47
jmux	How can I build a kernel containing a ddeb package? I just have a linux-lts-trusty-tools ddeb as an output of my build.	17:31
apw	jmux, how are you building them ?	17:32
jmux	My problem is actually http://ddebs.ubuntu.com/pool/main/l/linux-lts-trusty/, which doesn't contain debug symbols for the range from 15-Jul-2014 to 10-Nov-2015	17:32
arges	jmux: depending on the version it may be archived in launchpad	17:33
jmux	apw: sbuild -A -s -d tramp --arch=i386 --add-depends=pkg-create-dbgsym linux-lts-trusty_3.13.0-34.60~precise1.dsc	17:34
apw	right, we moved ddebs into the librarian around that time, though i would expect them to be in there	17:34
apw	jmux, if you arn't on a real builder it will optimise away the .ddebs because they take an hour to package	17:35
jmux	apw: I get the ddeb for the linux-lts-trusty-tools. I even set AUTOBUILD=1	17:36
arges	apw: they are gone: https://launchpad.net/ubuntu/+source/linux-lts-trusty/3.13.0-34.60~precise1	17:36
apw	you need something like full_build=true passed to debian/rules	17:36
arges	this is how i've done it: http://chrisarges.net/2015/10/02/building-ubuntu-kernels-with-debug-symbols.html	17:37
apw	arges, full_build=true is the approved incantation there, though the effect is similar	17:38
arges	apw:noted	17:38
jmux	Hmm missed full_build in debian/rules.d/0-common-vars.mk	17:41
apw	jmux, it isn't something that one expects (like the spanish inquisition)	17:45
jmux	Probably an other problem "man xz: Multithreaded compression and decompression are not implemented yet, so this option has no effect for now."	17:45
jmux	So even if the build finishes in 10 minutes, the compression will take ages, I guess	17:46
apw	it takes ages for sure, like 10 on our massivly fast boxes, hours on others	17:46
apw	they make one cry in the main, whihc is why they get turned of by default for local builds	17:47
apw	did you look and see if the one you needed was in the librarian though ?	17:47
jmux	Is there a way to query librarian directly? The ddebs are in the build log…	17:48
apw	i'd say from the .changes from the buildd that the .ddebs were collected under the old system	17:49
apw	arges, does sts still keep history on those?	17:49
arges	apw: i don't think so	17:49
arges	i didn't find in the librarian so best bet it to regenerate them.	17:49
* jmux is not sure the debug symbols will actually help with MCE like http://paste.debian.net/365584/		17:53
apw	jmux, nope, did you try what it suggests ?	17:54
jmux	Yup - but didn't help me	17:54
jmux	http://paste.debian.net/365586/	17:55
apw	jmux, what are you hoping it will tell you? that says that when we were handling a page fault the internal state of the CPU was found to be wrong and unfixable	17:56
jmux	But these machines were working flawlessly on lucid for years. No idea, why they stop working on precise with trusty stack	17:56
apw	well they may be throwing errors all the time which the old kernel doesn't understand at all (which being lucid it would not)	17:57
apw	those newer kernels almost cirtainly have the ability to read MCEs at all whereas the older ones did not i suspect	17:59
jmux	apw: might be. I've just started collecting information. Not sure if a backtrace contains info about the origin of the page fault.	18:00
apw	jmux, if that EIP is accurate (and it implies it is not but hey) that is the very first byte of the exception handler	18:00
apw	jmux, and is this a 32 bit kernel ?	18:01
jmux	it might be a really unnoticed HW bug - no idea yet	18:01
jmux	Yup 32bit	18:02
apw	if the things is occuring often and reproducibly you could try rmeoving the MCE handler (if it is a module) and see	18:04
apw	if the machine is symptomless afterwards	18:04
apw	as it was on lucid, it may be just be that they are bogus, given it never exploded before	18:04
jmux	Problem is I really don't know what triggers it. I got the exact HW of a known problematic box and it took 12 hours to trigger.	18:06
apw	jmux, it is possible nomce on the kernel command line will shut it off for that purpose	18:06
apw	of course if it is real, all bets are off in that case	18:06
jmux	This happens for random people	18:06
jmux	About once in a week	18:07
jmux	Support has actually played with Intel C states in BIOS which fixed it for some people	18:18
apw	oh lovely	18:18
jmux	That's why I still hope it can be a kernel bug, I'll also update the kernel. But for that random bugs without a real trigger…	18:20
jmux	I had the crazy idea of the new kernel using more power, probably because of new features… instable system… all I know is Lucid worked for years and problem started with the precise updates	18:22
* jmux hopes that shipping truisty with xenial stack will help, but that's half a year away		18:23
apw	jmux, they likley would have simply not been detected before in lucid	18:36
apw	even if they were happening	18:37
jmux	And something like "MCA: Internal unclassified error: 402" doesn't help	18:38
jmux	I'll also try latest microcode updates	18:39
jmux	If I get the same error tomorrow, I have probably at least one vaid data point. And there are BIOS updates.	18:40
* jmux shivers thinking about BIOS updating a few thousand machines…		18:41
apw	jmux, a job for life ... literally	18:43
jmux	Oh - and I just got the MCE error via serial null modem connection	18:43
apw	jmux, a google of the specific error shows people sending their CPUs back for replacement to good effect o.O	18:46
apw	(which if it was true would make upgrading the BIOSen look like a fun game)	18:47
jmux	apw: hmm If all machine would show this MCE, but currently they just have a blank screen. Without the null modem, I wouldn't have seen this.	18:50
apw	jmux, yeah the "kms will guanretee you can see panics" promise, not so much	18:51
jmux	Probably disableing DPMS for known machines would help here	18:51
jmux	Probably installing mcelog will also help	18:56
apw	good luck	18:56
jmux	Everything linux wise I can deploy automatically. mcelog definitly makes sense, as this would probably allow to catch the MCE errors	18:58
jmux	Hmm https://access.redhat.com/solutions/1161573 has the same HW. Guess I'll have to contact some server guys tomorrow	19:01
jmux	"Following MCE logs were collected from serial consle."	19:01
jmux	It's not page_fault but intel_idle, but otherwise…	19:03
apw	jmux, i note it talks about using "nomodeset" which if you find a box which reproduces regularly is worth a shot	19:03
jmux	The "reproduces regularly" is the problem. So actually a kernel update might help, or the solution will just state "get a new CPU"	19:06
jmux	I hope my local crash will happen again in the next 12 hours	19:06
* jmux hopes for a software solution		19:07
jmux	apw: thanks for your support. /me will leave for the cinema in a few minutes	19:08
dsmythies	apw: The build failed, and for the reason you predicted, ZONES_SHIFT, which cascades into a bunch of other errors.	22:08
dsmythies	apw: #error ZONES_SHIFT -- too many zones configured adjust calculation	22:09
apw	dsmythies, yep thanks, will have a look at it perhaps in the morning -- if the other plates stay up	22:18
=== JanC_ is now known as JanC

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!