=== tedg_ is now known as tedg === kalikiana_ is now known as kalikiana [03:08] wants to know how can I debug apparmor for lxc/lxd related issue? [04:43] happyaron: https://gitlab.com/apparmor/apparmor/wikis/AppArmor_Failures [04:44] in addition there is a debug mode [04:44] as root echo 1 > /sys/module/apparmor/parameters/debug [04:45] this will cause a few extra messages to the kernel ring buffer (dmesg) that may help [04:47] another one to watch for is seccomp no_new_privs which can block apparmor from making domain transitions [04:48] apparmor does emit a message for this, but you may need the debug setting turned on [05:03] jjohansen: thanks! looking into that === DrKranz is now known as DktrKranz === balkamos_ is now known as balkamos === Elimin8r is now known as Elimin8er === bluesabre_ is now known as bluesabre === Dmitrii-Sh is now known as Dmitrii-Sh-PTO [12:46] Hi. How is bad bug comming along. Have we found out how to un-protect the SPI ? [13:04] jcdutton: the protection actually seems to be on the flash chip itself, which boots up read-only [13:06] jcdutton: and would normally get unlocked by some other means when required [13:12] ypwong: Mika is going to contact you about testing leaving FSMIE as-is [13:15] jcdutton: you disappeared, but yes, people are working on it [13:16] Maybe it has changed the Protected Regions? [13:21] jcdutton: my current theory is that the SMM firmware normally takes care of either (a) clearing the protect on first access; (b) going into lockdown when something unexpected is encountered [13:22] jdstrand: however we are dependent on those that have access to hardware to test. And they are currently having dinner (because of the timezone) and will return to testing after a short break [13:22] jcdutton: ^^ [13:25] I read that a reboot 3 times might help. My research is that the reboot requires complete power off. I.e. No Main, no battery, power off, then on again [13:27] jcdutton: yes, battery out, power off [13:27] jcdutton: and does it? [13:27] jcdutton: what did you discover? [13:28] jcdutton: probably should have written 'power off [13:28] jcdutton: probably should have written 'power cycle' [13:28] I don't have a problem Laptop, but the datasheets say that the write-lock is only cleared on a complete power cycle. [13:31] sladen, will do. If I break it I will have to go back to ODM to unbrick it. [13:34] ypwong: ta. what would also be useful are before/after dumps of the flash. This would help to confirm if it is eg. a broken checksum (not) being updated, and which is causing the firmware not to unlock the chip during boot the next time [13:36] ypwong: second would be the debug information that the intel-spi driver dumps out during boot, to see what the initial state of the SPI status and control registers are [13:36] ypwong: but take your time and have a well-earned break! [13:47] ypwong, What do ODM do to it to unbrick it? [13:56] jcdutton, they use a spi flasher writer to change the CMP bit from 1 to 0 [13:56] jcdutton, https://www.dediprog.com/pd/spi-flash-solution/sfdk01 [13:56] sladen, will get to that after my meeting that's 4 mins from now :) [14:02] ypwong, sounds a bit odd to me. what would have set that to 1? [14:02] jcdutton, that's what we are finding out [14:02] kernel codes look sane but somehow that bit changed to 1 [14:05] jcdutton: several lines of inquiry, latest speculation is related to https://github.com/torvalds/linux/commit/9d63f17661e25fd28714dac94bdebc4ff5b75f09 which if it's not in the running kernel might leave junk in the FIFO from the previous access [14:07] jcdutton: though my reading of the documentation is that the flash chip comes up with CMP=1, and needs to be explicitly cleared, or WSP=1 needs setting to use a different protection scheme [14:09] ypwong, the problem with CMP being wrong, is that next to it are Write-Once bits, that if they are written, will perm brick the laptop. [14:12] jcdutton: yup, the latest speculation is that although the code does a read-modify-write (to set something else in that register); if the 'read' was for the incorrect data, then the write will also be for the incorrect data [14:13] jcdutton: https://www.winbond.com/resource-files/w25q64fw_revd_032513.pdf is the Winbond doc [14:15] That commit is crazy code. [14:16] jcdutton: why so? [14:16] It does not check for len = 0 [14:18] mmm, that's certianly a good way to set all-1s [14:22] Does it need to do write-posting on that write? [14:27] this (risky) code could certainly do with some more error/sanity checking [14:27] ie. validating every single bit in what is going to be written back [14:29] Is this code actually used for anything, apart from re-flashing ? In which case, it should not write anything until someone actually wishes to re-flash [14:33] sladen, that patch does fix a bug in the read statement, that is a good fix, but I don't think that would cause the problems we are seeing. [14:35] jcdutton: the *whole problem* is that is collateral from the _init()/probe code. ...which should not be doing *anything* invasive/risky [14:35] jcdutton: the code isn't even used [14:36] jcdutton: so in theory "nothing" is happening [14:39] Surely some of the writel in the init need write-posting? [14:40] although, better if it never did a write in that init code. [14:45] jcdutton: write-posting? [14:52] With PCI, in order to actually write something, you have to read it afterwards, to force a PCI transaction to actually pass the data across the buss [14:52] so, a writel without a following readl is unlikely to behave as expected [14:55] I was asking, in case this chatting with the SPI is not via an PCI bus, in which case it does not matter about write-posting === rharper` is now known as rharper [15:09] jcdutton: no, but I do have a feeling that some of this code is failing to triple check whether the SPI is ready, and not in the middle of something else [20:53] sladen, where in the code is it writing the CMP bit? [21:03] jcdutton: the sr2 register in ... [21:03] I cannot see a write to sr2 in the init function [21:04] jcdutton: wait, trying to find it [21:05] I see, sr1-4 is all one 32 bit reg [21:05] jcdutton: it's in spi_nor_init() write_sr(nor, 0); [21:06] jcdutton: the intention is to _clear_ the protection bits [21:06] jcdutton: however [21:06] jcdutton: and the really worrying thing about this (as you've already noted) is that the same register has write-once fuses in it aswell [21:06] as at the very least that routine should mask those out [21:07] so that it is impossible for an _init_ routine to write anything dangerous, even if the reading got screwed up [21:07] Agreed [21:08] this is one avenue of investigation [21:08] another is that the BIOS System Management Firmware normally clears t [21:08] clears it [21:09] A simple AND 0xffff00ff would do it [21:10] But that only works if the QUAD and CMP should be 0. In some cases that might not be true. [21:11] Maybe 0xffff83ff would be safer, in the sense that it would be reversable in software. Those other bits are not reversable. [21:12] prefereably ~(DANGERBIT(X) | DANGEROUSBIT(Y) | DANGEROUSBIT(Z)) where the enums make it less easy to screw up [21:12] and more obvious what is happening [21:12] think the code is trying to enable the Quad mode [21:14] I think the only real solution to this problem is a intel-spi driver with a mass of quirks in it, so undo the damage it has done. [21:14] jcdutton: something like that is in the process of being tested... [21:14] jcdutton: but needs to not make the situation any worse [21:15] it can get worse??? [21:15] (if it is only CMP getting set it is reversible) [21:15] and in software [21:15] if any of the write-protect fuses get blown, then it is bad [21:17] Maybe we need to get a sample. write a version of the driver that is 100% safe, and have it dump the sr2 to the syslog. and gather results. I think netboot works for everyone, so one could create a netboot image that just prints the results, without causing more damage. [21:18] booting with 'debug' should be enough [21:19] Let the tool offer to fix the CMP bit, but explain that if the Write-once bits are set, it is a return to base fix. [21:19] No-one in their right mind is going to run that intel-spi driver in its current state. [21:20] Unless the ~dangerous code is added. [21:20] I also found out that this is a PCI device [21:21] so they also need to add the write-post bits. [21:23] knock up a proposed patch for both [21:23] along with citations explaining the why [21:24] given the number impacted people (probably 1000x those who have reported it) [21:25] it needs to be done slowly and carefully with review [21:25] Agreed. I offer to review the proposed fix. [21:27] sladen, FYI, I used to be a kernel developer, but don't have a lot of time for it now. [21:29] sladen, have there been any reports of 3 power cycles fixing the problem? [21:30] jcdutton: no, that was my guess on the basis of (1) first time blacklisting the driver (2) letting the firmware come up and get upset and broken checksums etc + cleanup (3) hopefully come up in a better state [21:31] jcdutton: however, if it's not the SMI disabling, but is in need CMP in SR2 on the flash getting set because of a corrupt FIFO read before, then that is probably not going to help [21:31] jcdutton: and the latest information points to the need to reset CMP in SR2 on the flash, in software [21:31] jcdutton: the difficulty for me and others is the lack of hardware to test [21:33] Agreed. particularly with the possibility of making it worse due to the write-once bits. [21:33] You really need someone with the laptop and a flash programmer to hand, who can test and fix if needed. [21:35] which we have [21:35] sort-of [21:35] and my approach to this would be working out to reliably brick it, in order to unbrick it by not doing that [21:37] Where is the fifo you speak off? [21:43] A stepping stone towards that could be a bit of code that simply reports: "Software fixable. CMP flipped" or "Hardware replacement needed" if the write-once are set. [21:43] and put that in a bootable image === ochosi_ is now known as ochosi