tsimonq2under "What do you need help with?" on https://wiki.ubuntu.com/Kernel/GettingInvolved , it shows an outdated link and I don't know where the real link would be.02:42
dsmythiestsimonq2, I fixed the link. Try it now.03:59
apwdsmythies, it is a consequence of CONFIG_ZONE_DEVICE turning off CONFIG_ZONE_DMA yes08:29
apwdsmythies, not convinced the constraints are reasonable, at least for x86_64 where you don't have HIGHMEM so i think there is room for both though the config constraints prevent that09:06
apwdsmythies, is that card used for something we are going to notice, like qemu ?09:06
tsimonq2thanks dsmythies 11:38
rtgcaribou: could you have a look at bug #1528101 and tell me if kexec is functioning as designed ?13:26
ubot5bug 1528101 in linux (Ubuntu) "ISST-LTE: kdump failed: second kernel booting hangs after /scripts/init-bottom when large min_free_kbytes value being set" [Undecided,New] https://launchpad.net/bugs/152810113:26
caribourtg: sure13:27
apwcaribou, not sure if we have any sane way generically to know which options are good to keep and which are not for a kdump kernel, i wonder if we almost want a separate directory of overrides for kdump.13:28
caribouapw: I've been running tests to try to come up with some sensible rules for the definition of crashkernel13:29
cariboubut there are so many variables that come into play that it is kinda hard13:30
cariboufor instance, I've been testing on VM with up to 22Gb : the default 128M works fine13:30
apwso we might just say that this one is a speicifc override to be osmething sane like 1%13:31
apwand be happy13:31
caribouapw: but when I test on real hardware with 128G of memory, crashkernel=128M comes short not because of the memory but because of the fact that initrd is bigger and it OOMs13:31
apwcaribou, ok, even with your magic smaller one ?13:32
apwso an =dep initrd is still too big for 128M /13:33
caribouapw: that was on trusty so the change was not there; I went on to rebuild with MODULES=dep & got bitten by the initramfs-tools bug that I fixed13:33
caribouapw: it works now with a smaller initrd13:33
apw ahh ok13:33
apwof course this bug might be moot with that fix too, hrm13:34
caribouapw: but in this case, the boot sequence fails before kdump even comes into play13:34
caribouapw: partly, on system with SSDs, the kernel hook would fail when building the smaller initrd file13:34
apwwhich we should fix i assume13:35
apwfor this bug, they are setting min_free somewhere, presumably in /etc/sysctl.d.conf and we really should not be honouring that in the constrained environment13:36
apwits not clear how one would know that though programatically13:36
caribouapw: LP: #1532146 is waiting for a sponsor if you have a minute ;-)13:36
ubot5Launchpad bug 1532146 in initramfs-tools (Ubuntu) "update-initramfs fails for MODULES=dep when root is on nvme device" [Medium,In progress] https://launchpad.net/bugs/153214613:36
apwcaribou, yeah its on my todo list :)  i have another initramfs-tools bodge to ram in, just updateing aufs for a nasty hang bug13:37
caribouapw: that's fine, as long as it is queued to be fixed somewhere13:37
caribouapw: in that case (LP: #1532146), it doesn't even make it to kdump & fails before13:38
ubot5Launchpad bug 1532146 in initramfs-tools (Ubuntu) "update-initramfs fails for MODULES=dep when root is on nvme device" [Medium,In progress] https://launchpad.net/bugs/153214613:38
caribouapw: so you're right, we should find a way to boot the kexec kernel w/o using any tailored sysctl values13:38
caribououps, wrong bug : bug #152810113:39
ubot5bug 1528101 in linux (Ubuntu) "ISST-LTE: kdump failed: second kernel booting hangs after /scripts/init-bottom when large min_free_kbytes value being set" [Undecided,New] https://launchpad.net/bugs/152810113:39
caribourtg: so for your question : kexec doesn't even get activated so yes, works as designed ;-)13:39
rtgcaribou: since I don't really understand the fine points of the discussion, I'll bug apw and make him explain it.13:41
caribouapw: everything in /etc/sysctl.conf + /etc/sysctl.conf.d should not be honoured when the kexec kernel boots13:43
apwcaribou, should not as in you think they are not, or you think they are and shouldn't ?13:44
caribouapw: looks like they are (if indeed they hare changing vm.min_free_kbytes with sysctl)13:45
caribouapw: so when the kexec kernel boots post-panic, the kernel should boot w/o implementing what's in sysctl.conf[.d] if possible13:46
caribouapw: the kexec kernel is only there to allow for the capture of the kernel dump13:47
caribouapw: so there is no reason to apply any user-define customization there13:47
caribouam I being clear ?13:47
rtgcaribou: that makes sense to me13:48
caribouapw: rtg: it should be trivial to add a check in /etc/init.d/procps to verify if we have booted in the kexec kernel & not apply the changes13:49
rtgcaribouhow would you preserve the original sysctl values ?13:49
rtgnm, the new kernel would have them as part of its image.13:50
caribouapw: rtg: /proc/vmcore is only present if we are booted on the kexec kernel post-panic. This is how kdump-config knows that it has to capture the content13:51
rtgcaribou: what agent applies sysctl values to the kexec kernel ? kdump ? the initrd ?13:51
tseliotrtg: I've just fixed fglrx in xenial. It was pretty easy this time13:52
caribourtg: /usr/sbin/procps apparently13:52
caribourtg: no, /etc/init.d/procps13:52
rtgtseliot, thanks, it was the primary show stopper for a 4.4 kernel in xenial13:52
tseliotgood :)13:52
ricotzrtg, tseliot, nice :)13:55
apwcaribou, yeah, that sounds like a sensible option, i might suggest it ought ti possibly have a kdump.conf.d or osmething which is applies instead, just in case there is something you need14:00
caribouapw: make sense; the override would only apply kdump.conf if it exists14:01
apwcaribou, then in the unlikley event there is something you need in both you link the ifle into both .d's and be happy14:02
caribouapw: though, it does nothing for this bug as I doubt that touching something as central as procps would qualify for SRU14:03
apwcaribou, there is nothing to say you can't change things like that, we might need to make it opt in or something14:03
apwcaribou, so like if there is a kdump.conf we use that _rather_ than the normal ones, and they can make it empty14:03
apwso that the default doesn't change perhaps, something like that14:04
apwyou are our dump expert, and if you say its wrong that kinda makes it wrong and wrong things can be fixed in sru's14:05
caribouapw: well the argument that it only applies in the context of the kexec post-panic kernel an never in a normal user context migth be sufficient14:05
caribouapw: am I ???? :-D14:05
apwcaribou, clearly :)14:05
caribouok, let me run with that  & see what I can propose; I'll also comment in the bug14:06
caribouapw: ^14:06
rtgcaribou: would you like me to take a stab at the patch and run it by you ?14:06
apwrtg, or you could sponsor it for him14:07
rtgapw, that to14:07
apwsweet sorted14:07
caribourtg: let me try it, then I'll run it by you guys14:07
caribouapw: rtg: is it possible to force vm.min_free_kbytes as a boot parameter (for a temporary workaround) ?14:14
apwcaribou, it is possible yes14:18
apwi can't say for cirtain without looking though14:18
* rtg does not see anything promising in Documentation/kernel-parameters.txt14:20
caribouapw: don't bother, I'll look it up; thought you might know it off the top of your head14:20
caribouapw: fedora fixed it a while ago : https://lists.fedoraproject.org/pipermail/kexec/2014-November/001478.html14:25
caribouapw: not too found of how they fixed it though14:25
apwcaribou, no, but then there is precident, perhaps we can do that in SRUs and something cleverer in the new releases14:27
caribouapw: indeed14:28
apwcirtainly i'd not want to hard code it like that, ugg14:28
apwand that only fixes it if it is in /etc/sysctl.conf at that, not sysctl.d ... hrm14:29
cariboutbh, I'm not sure if the fix went in but it was reported14:30
caribourtg: apw: here is a question for you : does the kernel rely on sysctl settings to boot correctly (i.e. do we use that mechanism to fix some behaviors) ?14:32
apwcaribou, well as they are applied in root, they cannot be used to fix root disk related behavious14:33
rtgcaribou: there might be a lot more console noise14:33
caribourtg: I also see some hardening things in there we might want to keep14:34
rtgcaribou: doesn't kexec immediately reboot ? 14:34
rtgafter acquiring the dump14:35
caribourtg: yes14:35
rtgin which case hardening might not be so important14:35
apwcaribou, and we don't bring up things like networking do we14:39
caribouapw: yes, especially if we want to do remote dumps14:39
caribouapw: but it is not even dependent on remote dumps : network comes up systematically14:40
cariboumade my life soo much easier when I developped remote dumps14:40
apwcaribou, then there are options there which may be necessary, hrm14:42
apwcaribou, how about then having an sysctl-kdump.conf which is loaded _after_ all the others, which can set things "back"14:43
caribouapw: interesting; the only thing is to know _what_ to set "back"14:44
apwcaribou, well we literally only know of one thing which should not be set, so it could start with that14:44
apw(if we know what a sane value is)14:45
caribouapw: in worst case scenario, it would be the sysadmin responsability to add to this file in case of problem; at least the mechanism is there14:46
apwcaribou, yeah, something like that14:46
caribouapw: & we override the values for things we know would cause problems14:46
apwthe only issue really is knowing what to put in that one14:46
rtgperhaps just vm.mmap_min_addr for now (which we _know_ is causing problems)14:48
caribourtg: that's my idea for a starte14:48
caribourtg: apw: as I don't think we have an easy way to identify what was the original value in the kernel's namelist before sysctl changed it14:49
apwcaribou, yeah, and its magic essentially14:49
caribouapw: ok let start with that14:50
rtgif it is never set after kexec boots, then it is by default set to the kernel value, right ?14:50
rtgwhich ought to be OK14:50
caribourtg: the kexec kernel will apply the sysctl changes just like the regular kernel14:51
caribourtg: just trying to find a way to avoid disabling sysctl alltogether14:51
rtgcaribou: except that you are going to modify procps to do something different (I thought)14:51
apwcaribou, of course we know how much memeory we have, it is 128M so a fixed value for that is "ok"14:51
caribouapw: let me run a first pass at it & see what I can come up with 14:54
apwrtg, the point here is that sysctls can be requried for networking and the like14:54
apwrtg, and knowing whihc are ok and which not is hard14:54
apwcaribou, i assume what happens now is we copy these into the initrd, apply them all there, and then again once we mount root (for non kdump boots)14:55
apwcaribou, which is why eliding them by name works for the fedora case14:55
caribouapw: no, apparently the sysctl stuff is taken from thre real root & not the initrd14:56
apwcaribou, well that code is copying them into the initrd and they elide the problem one at that point14:57
apwanyhow have a go and see what happens14:57
caribouapw: sure14:58
dsmythiesapw: I do not actually have the sound card (Creative Labs SB Audigy) and came here on behalf of someone else, after helping them isolate the issue (at least to the point where I needed help here). Yes, his AlsaMxer program isn't working,as the card isn't being detected. I also wonder if some of those other sound card differences in the kernel configuration file are for the same reason (ref:  http://paste.ubuntu.com/14481736/ )15:00
apwdsmythies, oddness, the delta i was shown was just the one card15:01
apwrtg, ^ ?15:01
apwrtg, did we investigate whether we could remove that config snippet for CONFIG_ZONE_DEVICE ?15:01
tjaalton4.4 doesn't see my nvme drive.. boot fails because of that15:03
tjaaltonmainlines at least15:08
tjaaltonare the soon-to-hit xenial packages somewhere?15:08
apwtjaalton, ckt unstable ppa15:09
rtgapw, I have not.15:09
tjaaltonapw: thx15:09
apwrtg, i think we need to resolve what we are donig about that one before we release 4.4 as well15:11
rtgapw, maybe have a diff setting for lowlatency ?15:11
apwrtg, well whether we can in fact turn on both for the case we care about, amd6415:13
dsmythiesapw, rtg: Thiss issue is on both lowlatency and generic, but yes amd64.15:13
apwrtg, obbvoiusly the config constraints don't let us, but i think on amd64 we have no HIGHMEM so we have one more spare bit, and if we do then we can have zone_dma and zone_device together15:14
apwthat means i386 can't have both, but we're not caring about those15:14
rtgapw, yeah, I'm not too concerned about NVDIMM on 32 bit15:15
dsmythiestjaalton: Have a look at this: http://askubuntu.com/questions/709794/dell-xps-13-9350-compatibility/719947#719947 I do not know if it is true or not that CONFIG_BLK_DEV_NVME=y is needed.15:16
manjoI have been modifying debian/rules.d/2-binary-arch.mk to skip modules signing if the private key is not available .. is there a way to tell builds in LP not to fail of if private key is not available? or what is the correct way of building it ? 15:18
manjoapw, ^ ? 15:20
apwmanjo, we generate the private key during the build ?15:22
tjaaltondsmythies: sounds like initramfs fail to me15:23
manjoinstall-%: MODSECKEY=$(builddir)/build-$*/certs/signing_key.pem15:23
manjoapw, ^ looks like it expects the key to be there 15:23
manjoapw, May be I am missing something .. I have been doing if [[ -f "$(MODSECKEY)" ]] ; then \ round the signing code ... which I don't think it the right thing to do 15:24
apwmanjo, right, it expects it to be there because it makes it15:25
apwmanjo, so it should be making it, why is yours not15:25
manjoapw, ppisati ran into the same issue with arm builds (many months ago)15:25
apwmaybe so, if so he may remmber what he was doing wrong15:25
apwmanjo, but i say again, the build makes that key15:25
manjoapw, ok let me build in PPA and get it to fail... that will give me and idea what is going on .. 15:26
apwmanjo, see certs/Makefile 15:27
apw        @echo "### Now generating an X.509 key pair to be used for signing modules."15:27
apwit is that key which we then use to _resign_ the modules after stripping them15:27
manjoapw, I have been blindly carrying this patch is my PPA builds and May be it is not needed anymore ... let me build without it and see if the keys are generated 15:28
apwmanjo, we have always built our kernles in PPAs and never needed such a thing15:29
rtgmanjo, there is this patch for newer kernels: 'UBUNTU: [Debian] Update to new signing key type and location'15:29
manjortg, ok so that must have fixed it for me .. I will drop this mod to skip signing 15:30
tjaaltondsmythies: 4.4.0-0 finds the drive, but cryptsetup fails with "lvm is not available"15:30
caribouapw: rtg: back to the vm.min_free_kbytes15:31
apwtjaalton, you'd think we would have noticed that lvm wasn't available in initrd ?15:32
apwcaribou, ?15:32
caribouapw: rtg: how about I add an option to kdump-config that would re-apply values as found in /etc/default/kdump-tools ?15:32
caribouapw: rtg: this option would be called by /etc/init.d/procps if /proc/vmcore is found15:33
apw/etc/default/kdump-tools is in shell format right?  waht would you proposed to encode in tehre ?15:33
rtgcaribou: what would be the packaged defaults in /etc/default/kdump-tools ? Anything ?15:34
caribouapw: yes, I would add SYSCTL=vm.min_free_kbytes=blah and loop through each SYSCTL varible found15:35
caribourtg: we can put anything we know can cause problems & sysadmin can add to it'15:35
rtgseems reasonable15:35
apwcaribou, or perhaps SYSCTL_UPDATES=/etc/sysctl-kdump.conf or something and use that file15:35
caribouthis way I restrict modifications outside of the kdump-tools realm to a minimum15:35
apwcaribou, but i like the idea of passing the problem to kdump-config in procps15:36
caribouapw: the advantage of looping through the existing /etc/default/kdump-tools file is that I don't need an extra file15:36
apwi think making that file overly complex is a mistake because things like systemd read them for you and convert them to environment variables and the like15:37
apwcaribou, well mangling existing defaults which are there are difficult especially with that new non-standard format15:38
apwby making that a differnt sort of entry you make it hard to migrate configs later etc15:38
apwcaribou, also why do we even need to change procps if you do it via kdump-tools15:38
caribouapw: true; especially since kdump-config sources the default file at the begininng. I'll use your option15:39
caribouapw: because, like in that specific bug, we don't get as far as running kdump-config, it fails before that15:39
caribouI can reproduce it btw15:39
caribouapw: that was my initial idea, but it has to happen right when procps runs15:39
apwcaribou, but can we just get in before it, by being earlier15:40
apwkdump-config is running in initramfs or in real root when it runs15:40
caribouapw: real root, so is procps15:40
caribouapw: so the procps modification is only to run sysctl -p /etc/kdump-config.conf if /proc/vmcore exists15:41
apwcaribou, i presume the issue then is that the act of setting it to a large value ooms the box immediadly15:44
apwcaribou, so you can't then set it back, you already ooms15:44
caribouapw: exactly15:44
dsmythiesapw, rtg: For my own curiousity, could you point me to "the config snippet for CONFIG_ZONE_DEVICE". So far, I haven't been able to find it.15:45
rtgmm/Kconfig:config ZONE_DEVICE15:45
apwconfig ZONE_DEVICE15:45
apw        bool "Device memory (pmem, etc...) hotplug support" if EXPERT15:45
apw        default !ZONE_DMA15:45
apw        depends on !ZONE_DMA15:45
apwwe might also need to fix the ZONE_SHIFT calculation, but, i can't see why this won't work15:46
tjaaltonapw: I'll check the diff between 4.3 & 4.4 initrd's and see what's going on there15:46
apwtjaalton, lack of -extras perhaps ?15:47
apwin your testing ?15:47
tjaaltonwell this same thing is with mainline builds15:47
apwcaribou, so are you proposing to only apply that one file if you find it, and let kdump-ocnfig generate it or ?15:47
tjaalton-extra is installed15:47
apwcaribou, as we cannot apply the real changes and then apply this one back over the top, as that will likely oom us in the middle15:47
caribouapw: looks like it (I'm testing it at the same time)15:48
caribouapw: I was thinking of applying the content of /etc/kdump-config.conf over the existing15:48
apwi think you'll find the first will blammo on you15:49
caribouapw: looks like the simple fact of applying vm.min_free_kbytes triggers the OOM15:49
apwwhich makse sense when you look at the kernel code, it forfully empties memeory to make the boundary15:49
caribouapw: then no matter what, we need to avoid this change to even happen15:50
apwcaribou, does the kernel know it is in kdump mode ?15:52
caribouapw: yes, as it exposes /proc/vmcore15:52
caribouapw: but isn't changing the kernel because one silly setup triggers OOM a bit of an overkill 15:53
rtgapw, fix this in the kernel by ignoring vm.min_free_kbyte if kexec'ed ?15:53
apwrtg, i am thinking about it, yes15:54
apwif /proc/vmcore is valid in fact, but yes15:54
apwas kexecing is a valid way to reboot too15:54
apwand in the normal case we don't care, but if we are exposing a dump file to uspace then we coudl ignore uspace15:54
apwcaribou, rtg, though there is a separate way of looking at this15:55
caribouuserland is changing a kernel setup in a limited memory context, I would rather avoid changing this value if we very well know that we're in a limited contgext15:55
apwthat the request for memory can _never_ succeed, the request for reserve is large than total ram15:55
cariboulimited memory context15:55
apwreturning -ENOWAY instead vbecaus the request is rediculous might be sane15:55
caribouapw: true15:56
rtginstead of OOM'ing15:56
apwright ooming is rarely useful15:56
rtgapw, doing it that way means we don't have to care if /proc/vmcore is valid15:57
apwrtg, so the code as it stands divides the limit you specify between all the non-highmem zones you have16:00
apwso if you say 1000 pages and you have 10 in DMA and 90 in NORMAL16:00
apwit will demand 100 in DMA and 900 in NORMAL even though that is patently nuts16:00
rtgapw, seems like refusing the sysctl is a reasonable action16:02
apwrtg, though even if i make it say you need enough memory in your zones to cope there are likely to be combinations which are just numerically valid and not viable16:03
apwrtg, and telling those appart is tricky at best16:03
apwcaribou, eliding specific entries in userspace falls to procps, to sysctl itself detecting /proc/vmcore and ignoreing some names textually (me thinks) which is a little vile, but no vialer than doing so in the kernel16:08
tjaaltonapw: weird, 4.2.0-16 doesn't have sbin/lvm either, but it still works16:09
caribouapw: ok, I'm looking into that16:09
caribouapw: and at the same time, asking for the kernel to keep a minimum of 312M free and setting crashkernel=128M makes no sense, they should just increase crashkernel16:10
apwcaribou, well they claim to be using 768M for crashkernel i think, but either way its bonkers16:11
tjaaltonapw: oh, nvme.ko is not shipped with 4.4 initrd, because the module location changed16:13
apwtjaalton, i thought initramfs-tools looked at those by module name only ?16:14
tjaaltondunno, but it should be in kernel/drivers/nvme/host/nvme.ko now16:15
apwoh perhaps it is shipping all "block/*" or something16:15
tjaaltoninstead of kernel/drivers/block/nvme.ko16:15
apwtjaalton, can you file me a bug against initramfs-tools for that please16:15
rtgtjaalton, that means I should fix debian.master/control.d/generic.inclusion-list:drivers/block/nvme.ko16:16
apwand give me the # and i'll batch that up with the other two critical fixes16:16
apwrtg, needs fixing there too yes16:16
rtgapw, I'll take care of it16:16
tjaaltonrtg: you'll file the bug too?16:16
rtgtjaalton, nope, its  dev kernel. I'll just fix it16:17
apwi think he is saying he will fix the kernel bit, but i would love a bug for initramgs-tools for that bit16:17
tjaaltonright but initramfs-tools needs fixing too16:17
tjaaltonyeah I'll file that bit16:17
apw                        copy_modules_dir kernel/drivers/block16:18
apwyeah it copies on all block devices, and that just moved out of there16:18
tjaaltonfixed in debian https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=80700016:18
ubot5Debian bug 807000 in initramfs-tools "initramfs-tools: module nvme not included in block modules on kernel > 4.2" [Normal,Fixed]16:18
apwtjaalton, not sure that that fix makes a whole heap of sense in the face of the change the kernel is making, long term, but hey16:21
tjaaltonapw: what do you mean? cryptsetup works again with that, just tested16:25
tjaaltonapw: heh, two bugs already filed about this16:28
tjaaltonhttps://bugs.launchpad.net/ubuntu/+source/initramfs-tools/+bug/1524879 https://bugs.launchpad.net/ubuntu/+source/initramfs-tools/+bug/153214616:29
ubot5Launchpad bug 1524879 in initramfs-tools (Ubuntu) "initramfs-tools, Xenial is missing NVME kernel driver" [High,New]16:29
ubot5Launchpad bug 1532146 in initramfs-tools (Ubuntu) "update-initramfs fails for MODULES=dep when root is on nvme device" [Medium,In progress]16:29
tjaaltonthe first already properly assigned :P16:30
* xnox thought apw was going to sponsor that cherrypick16:32
xnoxas asked on irc some friday back.16:32
rtgppisati, I forgot to mention that Ubuntu-4.4.0-1000.6 works well with your image16:34
apwxnox, then i am already doing it indeed :)16:34
ppisatirtg: good to know16:35
dsmythiesapw, rtg: I am not understanding the config dependency trail, or what to change in order to be able to compile a test kernel. Do I just delete that ZONE_DEVICE snippet?16:35
rtgppisati, hopefully it will work with Snappy as well16:35
apwdsmythies, no, we want to test building with the config requirement for !ZONE_DMA removed and that turned back on16:36
ppisatirtg: it will16:36
apwand see if tht builds on amd6416:36
ppisatirtg: though i might have found a problem with xenial/arm6416:37
ppisatirtg: so, no matter if i cross or native compile16:37
rtgppisati, the module load problem ?16:37
ppisatirtg: right16:37
ppisatirtg: if i compile under wily is fine16:37
rtgppisati, I noticed that it seems to be a toolchain issue16:37
ppisatirtg: while it blows in a xenial chroot16:37
rtgdannf found that one, right ?16:38
ppisatirtg: it happens if i crosscompile (xenial/amd64 -> arm64) or native (my arm64 ppa)16:38
ppisatirtg: i found it, and dannf confimed he sees that too in xenial 4.316:38
ppisatidannf: ^16:38
rtgppisati, what is Linaro using ?16:38
dsmythiesapw: O.K. so delete this line: "depends on !ZONE_DMA" then. (sorry for being a bit think on thick on this one).16:39
ppisatirtg: no idea16:39
ppisatirtg: you mean the toolchain16:39
rtgyeah, maybe Salvetti has some thoughts16:39
dannfppisati, rtg: my next step was to rebuild gcc w/o linaro patches to see if that fixes it16:40
dannf(otp, sorry for lag)16:40
rtgdannf, are there patches in gcc specific to that arm64 errata ?16:41
dsmythiesI meant to say "sorry for being a bit thick on this one".16:44
apwdsmythies, yes, then readd CONFIG_ZONE_DMA=y to the config and run fakeroot debian/rules updateconfigs16:44
apwand then see if the build builds any more, i suspect it may not16:44
apwbut it may be resolvable16:44
apw(on amd64)16:44
dannfrtg: yeah16:46
dsmythiesapw: I will report back, but it might be awhile (like hours).16:47
jmuxHow can I build a kernel containing a ddeb package? I just have a linux-lts-trusty-tools ddeb as an output of my build.17:31
apwjmux, how are you building them ?17:32
jmuxMy problem is actually http://ddebs.ubuntu.com/pool/main/l/linux-lts-trusty/, which doesn't contain debug symbols for the range from 15-Jul-2014  to 10-Nov-201517:32
argesjmux: depending on the version it may be archived in launchpad17:33
jmuxapw: sbuild -A -s -d tramp --arch=i386 --add-depends=pkg-create-dbgsym linux-lts-trusty_3.13.0-34.60~precise1.dsc17:34
apwright, we moved ddebs into the librarian around that time, though i would expect them to be in there17:34
apwjmux, if you arn't on a real builder it will optimise away the .ddebs because they take an hour to package17:35
jmuxapw: I get the ddeb for the linux-lts-trusty-tools. I even set AUTOBUILD=117:36
argesapw: they are gone: https://launchpad.net/ubuntu/+source/linux-lts-trusty/3.13.0-34.60~precise117:36
apwyou need something like full_build=true passed to debian/rules17:36
argesthis is how i've done it: http://chrisarges.net/2015/10/02/building-ubuntu-kernels-with-debug-symbols.html17:37
apwarges, full_build=true is the approved incantation there, though the effect is similar17:38
jmuxHmm missed full_build in debian/rules.d/0-common-vars.mk17:41
apwjmux, it isn't something that one expects (like the spanish inquisition)17:45
jmuxProbably an other problem "man xz: Multithreaded compression and decompression are not implemented yet, so this option has no effect for now."17:45
jmuxSo even if the build finishes in 10 minutes, the compression will take ages, I guess17:46
apwit takes ages for sure, like 10 on our massivly fast boxes, hours on others17:46
apwthey make one cry in the main, whihc is why they get turned of by default for local builds17:47
apwdid you look and see if the one you needed was in the librarian though ?17:47
jmuxIs there a way to query librarian directly? The ddebs are in the build log…17:48
apwi'd say from the .changes from the buildd that the .ddebs were collected under the old system17:49
apwarges, does sts still keep history on those?17:49
argesapw: i don't think so17:49
argesi didn't find in the librarian so best bet it to regenerate them.17:49
* jmux is not sure the debug symbols will actually help with MCE like http://paste.debian.net/365584/17:53
apwjmux, nope, did you try what it suggests ?17:54
jmuxYup - but didn't help me17:54
apwjmux, what are you hoping it will tell you?  that says that when we were handling a page fault the internal state of the CPU was found to be wrong and unfixable17:56
jmuxBut these machines were working flawlessly on lucid for years. No idea, why they stop working on precise with trusty stack17:56
apwwell they may be throwing errors all the time which the old kernel doesn't understand at all (which being lucid it would not)17:57
apwthose newer kernels almost cirtainly have the ability to read MCEs at all whereas the older ones did not i suspect17:59
jmuxapw: might be. I've just started collecting information. Not sure if a backtrace contains info about the origin of the page fault.18:00
apwjmux, if that EIP is accurate (and it implies it is not but hey) that is the very first byte of the exception handler18:00
apwjmux, and is this a 32 bit kernel ?18:01
jmuxit might be a really unnoticed HW bug - no idea yet18:01
jmuxYup 32bit18:02
apwif the things is occuring often and reproducibly you could try rmeoving the MCE handler (if it is a module) and see18:04
apwif the machine is symptomless afterwards18:04
apwas it was on lucid, it may be just be that they are bogus, given it never exploded before18:04
jmuxProblem is I really don't know what triggers it. I got the exact HW of a known problematic box and it took 12 hours to trigger.18:06
apwjmux, it is possible nomce on the kernel command line will shut it off for that purpose18:06
apwof course if it is real, all bets are off in that case18:06
jmuxThis happens for random people18:06
jmuxAbout once in a week18:07
jmuxSupport has actually played with Intel C states in BIOS which fixed it for some people18:18
apwoh lovely18:18
jmuxThat's why I still hope it can be a kernel bug, I'll also update the kernel. But for that random bugs without a real trigger…18:20
jmuxI had the crazy idea of the new kernel using more power, probably because of new features… instable system… all I know is Lucid worked for years and problem started with the precise updates18:22
* jmux hopes that shipping truisty with xenial stack will help, but that's half a year away18:23
apwjmux, they likley would have simply not been detected before in lucid18:36
apweven if they were happening18:37
jmuxAnd something like "MCA: Internal unclassified error: 402" doesn't help18:38
jmuxI'll also try latest microcode updates18:39
jmuxIf I get the same error tomorrow, I have probably at least one vaid data point. And there are BIOS updates.18:40
* jmux shivers thinking about BIOS updating a few thousand machines…18:41
apwjmux, a job for life ... literally18:43
jmuxOh - and I just got the MCE error via serial null modem connection18:43
apwjmux, a google of the specific error shows people sending their CPUs back for replacement to good effect o.O18:46
apw(which if it was true would make upgrading the BIOSen look like a fun game)18:47
jmuxapw: hmm If all machine would show this MCE, but currently they just have a blank screen. Without the null modem, I wouldn't have seen this.18:50
apwjmux, yeah the "kms will guanretee you can see panics" promise, not so much18:51
jmuxProbably disableing DPMS for known machines would help here18:51
jmuxProbably installing mcelog will also help18:56
apwgood luck18:56
jmuxEverything linux wise I can deploy automatically. mcelog definitly makes sense, as this would probably allow to catch the MCE errors18:58
jmuxHmm https://access.redhat.com/solutions/1161573 has the same HW. Guess I'll have to contact some server guys tomorrow19:01
jmux"Following MCE logs were collected from serial consle."19:01
jmuxIt's not page_fault but intel_idle, but otherwise…19:03
apwjmux, i note it talks about using "nomodeset" which if you find a box which reproduces regularly is worth a shot19:03
jmuxThe "reproduces regularly" is the problem. So actually a kernel update might help, or the solution will just state "get a new CPU"19:06
jmuxI hope my local crash will happen again in the next 12 hours19:06
* jmux hopes for a software solution19:07
jmuxapw: thanks for your support. /me will leave for the cinema in a few minutes19:08
dsmythiesapw: The build failed, and for the reason you predicted, ZONES_SHIFT, which cascades into a bunch of other errors.22:08
dsmythiesapw: #error ZONES_SHIFT -- too many zones configured adjust calculation22:09
apwdsmythies, yep thanks, will have a look at it perhaps in the morning -- if the other plates stay up22:18
=== JanC_ is now known as JanC

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!