/srv/irclogs.ubuntu.com/2017/12/20/#ubuntu-kernel.txt

dupondjeanyone could give me some pointers to debug the following?12:28
dupondje[ 2990.419420] pcieport 0000:00:1d.0: AER: Corrected error received: id=00e812:28
dupondje[ 2990.419434] pcieport 0000:00:1d.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=00e8(Transmitter ID)12:28
dupondje[ 2990.419441] pcieport 0000:00:1d.0:   device [8086:a118] error status/mask=00001000/0000200012:28
dupondje[ 2990.419446] pcieport 0000:00:1d.0:    [12] Replay Timer Timeout  12:28
dupondjealready tested latest daily built kernel, but same errors12:29
apwdupondje, are you seeing any other symptoms with that error, as that just says it is a corrected error12:35
apwie is it a one off, does it repeat, does the machine crater after it12:36
dupondjeapw: it repeats, sometimes after 1 minute, sometimes only 1 per hour12:41
dupondjeits random12:41
dupondjebut nothing seems to hang/lock/die whatever :)12:41
apwand i assume you didn't see them on older kernels12:42
apwif so it may well be we learned how to report them12:42
dupondjewell I saw it on stock 17.10 kernel and on daily kernel12:43
dupondjeits on a new laptop, so no idea about older kernels :)12:43
dupondjealso seeing:12:46
dupondje[   54.376166] nvme 0000:04:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, id=0400(Receiver ID)12:46
dupondje[   54.376169] nvme 0000:04:00.0:   device [1c5c:1284] error status/mask=00000081/0000e00012:46
dupondje[   54.376172] nvme 0000:04:00.0:    [ 0] Receiver Error         (First)12:46
dupondje[   54.376174] nvme 0000:04:00.0:    [ 7] Bad DLLP   12:46
dupondjebut way less frequent12:46
apwwas that nvme installed when you bought it ?12:46
apwboth of those errors are reporting that transactions on PCIE are failing, that would often imply h/w issues12:47
dupondjeyes12:47
dupondjebrand new device12:49
dupondjeits ofcourse possible something is wrong with it ...12:49
apwor indeed that "occasional" errors are to be expected as long as they are corrected12:52
apwyou really need to find another machine of the same type and find out if it has the issue12:52
apwwhat kind of machine is it12:52
dupondjeDell Precision 552012:53
TJ-It could be an ASPM issue12:53
dupondjecould try booting 16.04 on a liveusb12:54
dupondjeand see if it happens there also ...12:54
TJ-dupondje: is it a XPS 9560 ?12:56
TJ-oh no, sorry, you said!12:56
dupondjeTJ-: its not, but XPS 9560 is actually (exactly?) the same hardware ...12:57
dupondjeafaik12:57
TJ-dupondje: I see some reports with Dell + Hynix M.2 NVMe 12:57
dupondjeModel Number:                       PC300 NVMe SK hynix 512GB12:58
dupondje:D12:58
dupondjeTJ-: link?12:58
TJ-there's a workaround here but it sounds a bit drastic, maybe apw can comment? https://bbs.archlinux.org/viewtopic.php?id=22968212:58
TJ-dupondje: it may be the ACPI DSDT isn't correctly configuring the device13:01
dupondjebut thats a bug in the kernel? Guess I should better open some bugreport then?13:02
TJ-If it's ACPI it'll be a firmware bug. There's a common change I often recommend where there are unusual hardware issues, and it's very sucessful. See http://iam.tj/prototype/enhancements/Windows-acpi_osi.html13:06
dupondjestill thats a workaround then :) Needs to be fixed in the NVME firmware then?13:09
TJ-dupondje: no. if the issue is ACPI and that fixes it, the bug is in the Dell PC firmware assuming it is running with Windows and not fully configuring the system when Linux is the OS13:11
dupondjeguess I should poke Dell then :)13:14
dupondjethey deliver the laptop with Ubuntu 16.04 on it, so you would expect that it works fine13:14
TJ-Does it work without that error on 16.04? I think you said you're using 17.10 on it ?13:18
TJ-Recent kernels have tightened up the ACPI implementation so we see a lot more of this kind of issue as a result13:18
dupondjejust booted into livecd of 16.04 now13:22
dupondjebut seems to be fine on 16.04, no such errors13:25
dupondjeSo conclusion is that its a BIOS/ACPI bug that is visible only in recent kernels?13:35
TJ-dupondje: the hint is there, yes13:39
dupondjehmmmm, guess Dell doesn't have a bugtracker :P13:42
=== desole is now known as hggdh

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!