=== tedg_ is now known as tedg | ||
=== kalikiana_ is now known as kalikiana | ||
happyaron | wants to know how can I debug apparmor for lxc/lxd related issue? | 03:08 |
---|---|---|
jjohansen | happyaron: https://gitlab.com/apparmor/apparmor/wikis/AppArmor_Failures | 04:43 |
jjohansen | in addition there is a debug mode | 04:44 |
jjohansen | as root echo 1 > /sys/module/apparmor/parameters/debug | 04:44 |
jjohansen | this will cause a few extra messages to the kernel ring buffer (dmesg) that may help | 04:45 |
jjohansen | another one to watch for is seccomp no_new_privs which can block apparmor from making domain transitions | 04:47 |
jjohansen | apparmor does emit a message for this, but you may need the debug setting turned on | 04:48 |
happyaron | jjohansen: thanks! looking into that | 05:03 |
=== DrKranz is now known as DktrKranz | ||
=== balkamos_ is now known as balkamos | ||
=== Elimin8r is now known as Elimin8er | ||
=== bluesabre_ is now known as bluesabre | ||
=== Dmitrii-Sh is now known as Dmitrii-Sh-PTO | ||
jcdutton | Hi. How is bad bug comming along. Have we found out how to un-protect the SPI ? | 12:46 |
sladen | jcdutton: the protection actually seems to be on the flash chip itself, which boots up read-only | 13:04 |
sladen | jcdutton: and would normally get unlocked by some other means when required | 13:06 |
sladen | ypwong: Mika is going to contact you about testing leaving FSMIE as-is | 13:12 |
sladen | jcdutton: you disappeared, but yes, people are working on it | 13:15 |
sladen | <sladen jcdutton: the protection actually seems to be on the flash chip itself, which boots up read-only | 13:16 |
sladen | <sladen jcdutton: and would normally get unlocked by some other means when required | 13:16 |
jcdutton | Maybe it has changed the Protected Regions? | 13:18 |
sladen | jcdutton: my current theory is that the SMM firmware normally takes care of either (a) clearing the protect on first access; (b) going into lockdown when something unexpected is encountered | 13:21 |
sladen | jdstrand: however we are dependent on those that have access to hardware to test. And they are currently having dinner (because of the timezone) and will return to testing after a short break | 13:22 |
sladen | jcdutton: ^^ | 13:22 |
jcdutton | I read that a reboot 3 times might help. My research is that the reboot requires complete power off. I.e. No Main, no battery, power off, then on again | 13:25 |
sladen | jcdutton: yes, battery out, power off | 13:27 |
sladen | jcdutton: and does it? | 13:27 |
sladen | jcdutton: what did you discover? | 13:27 |
sladen | jcdutton: probably should have written 'power off | 13:28 |
sladen | jcdutton: probably should have written 'power cycle' | 13:28 |
jcdutton | I don't have a problem Laptop, but the datasheets say that the write-lock is only cleared on a complete power cycle. | 13:28 |
ypwong | sladen, will do. If I break it I will have to go back to ODM to unbrick it. | 13:31 |
sladen | ypwong: ta. what would also be useful are before/after dumps of the flash. This would help to confirm if it is eg. a broken checksum (not) being updated, and which is causing the firmware not to unlock the chip during boot the next time | 13:34 |
sladen | ypwong: second would be the debug information that the intel-spi driver dumps out during boot, to see what the initial state of the SPI status and control registers are | 13:36 |
sladen | ypwong: but take your time and have a well-earned break! | 13:36 |
jcdutton | ypwong, What do ODM do to it to unbrick it? | 13:47 |
ypwong | jcdutton, they use a spi flasher writer to change the CMP bit from 1 to 0 | 13:56 |
ypwong | jcdutton, https://www.dediprog.com/pd/spi-flash-solution/sfdk01 | 13:56 |
ypwong | sladen, will get to that after my meeting that's 4 mins from now :) | 13:56 |
jcdutton | ypwong, sounds a bit odd to me. what would have set that to 1? | 14:02 |
ypwong | jcdutton, that's what we are finding out | 14:02 |
ypwong | kernel codes look sane but somehow that bit changed to 1 | 14:02 |
sladen | jcdutton: several lines of inquiry, latest speculation is related to https://github.com/torvalds/linux/commit/9d63f17661e25fd28714dac94bdebc4ff5b75f09 which if it's not in the running kernel might leave junk in the FIFO from the previous access | 14:05 |
sladen | jcdutton: though my reading of the documentation is that the flash chip comes up with CMP=1, and needs to be explicitly cleared, or WSP=1 needs setting to use a different protection scheme | 14:07 |
jcdutton | ypwong, the problem with CMP being wrong, is that next to it are Write-Once bits, that if they are written, will perm brick the laptop. | 14:09 |
sladen | jcdutton: yup, the latest speculation is that although the code does a read-modify-write (to set something else in that register); if the 'read' was for the incorrect data, then the write will also be for the incorrect data | 14:12 |
sladen | jcdutton: https://www.winbond.com/resource-files/w25q64fw_revd_032513.pdf is the Winbond doc | 14:13 |
jcdutton | That commit is crazy code. | 14:15 |
sladen | jcdutton: why so? | 14:16 |
jcdutton | It does not check for len = 0 | 14:16 |
sladen | mmm, that's certianly a good way to set all-1s | 14:18 |
jcdutton | Does it need to do write-posting on that write? | 14:22 |
sladen | this (risky) code could certainly do with some more error/sanity checking | 14:27 |
sladen | ie. validating every single bit in what is going to be written back | 14:27 |
jcdutton | Is this code actually used for anything, apart from re-flashing ? In which case, it should not write anything until someone actually wishes to re-flash | 14:29 |
jcdutton | sladen, that patch does fix a bug in the read statement, that is a good fix, but I don't think that would cause the problems we are seeing. | 14:33 |
sladen | jcdutton: the *whole problem* is that is collateral from the _init()/probe code. ...which should not be doing *anything* invasive/risky | 14:35 |
sladen | jcdutton: the code isn't even used | 14:35 |
sladen | jcdutton: so in theory "nothing" is happening | 14:36 |
jcdutton | Surely some of the writel in the init need write-posting? | 14:39 |
jcdutton | although, better if it never did a write in that init code. | 14:40 |
sladen | jcdutton: write-posting? | 14:45 |
jcdutton | With PCI, in order to actually write something, you have to read it afterwards, to force a PCI transaction to actually pass the data across the buss | 14:52 |
jcdutton | so, a writel without a following readl is unlikely to behave as expected | 14:52 |
jcdutton | I was asking, in case this chatting with the SPI is not via an PCI bus, in which case it does not matter about write-posting | 14:55 |
=== rharper` is now known as rharper | ||
sladen | jcdutton: no, but I do have a feeling that some of this code is failing to triple check whether the SPI is ready, and not in the middle of something else | 15:09 |
jcdutton | sladen, where in the code is it writing the CMP bit? | 20:53 |
sladen | jcdutton: the sr2 register in ... | 21:03 |
jcdutton | I cannot see a write to sr2 in the init function | 21:03 |
sladen | jcdutton: wait, trying to find it | 21:04 |
jcdutton | I see, sr1-4 is all one 32 bit reg | 21:05 |
sladen | jcdutton: it's in spi_nor_init() write_sr(nor, 0); | 21:05 |
sladen | jcdutton: the intention is to _clear_ the protection bits | 21:06 |
sladen | jcdutton: however | 21:06 |
sladen | jcdutton: and the really worrying thing about this (as you've already noted) is that the same register has write-once fuses in it aswell | 21:06 |
sladen | as at the very least that routine should mask those out | 21:06 |
sladen | so that it is impossible for an _init_ routine to write anything dangerous, even if the reading got screwed up | 21:07 |
jcdutton | Agreed | 21:07 |
sladen | this is one avenue of investigation | 21:08 |
sladen | another is that the BIOS System Management Firmware normally clears t | 21:08 |
sladen | clears it | 21:08 |
jcdutton | A simple AND 0xffff00ff would do it | 21:09 |
jcdutton | But that only works if the QUAD and CMP should be 0. In some cases that might not be true. | 21:10 |
jcdutton | Maybe 0xffff83ff would be safer, in the sense that it would be reversable in software. Those other bits are not reversable. | 21:11 |
sladen | prefereably ~(DANGERBIT(X) | DANGEROUSBIT(Y) | DANGEROUSBIT(Z)) where the enums make it less easy to screw up | 21:12 |
sladen | and more obvious what is happening | 21:12 |
sladen | think the code is trying to enable the Quad mode | 21:12 |
jcdutton | I think the only real solution to this problem is a intel-spi driver with a mass of quirks in it, so undo the damage it has done. | 21:14 |
sladen | jcdutton: something like that is in the process of being tested... | 21:14 |
sladen | jcdutton: but needs to not make the situation any worse | 21:14 |
jcdutton | it can get worse??? | 21:15 |
sladen | (if it is only CMP getting set it is reversible) | 21:15 |
sladen | and in software | 21:15 |
sladen | if any of the write-protect fuses get blown, then it is bad | 21:15 |
jcdutton | Maybe we need to get a sample. write a version of the driver that is 100% safe, and have it dump the sr2 to the syslog. and gather results. I think netboot works for everyone, so one could create a netboot image that just prints the results, without causing more damage. | 21:17 |
sladen | booting with 'debug' should be enough | 21:18 |
jcdutton | Let the tool offer to fix the CMP bit, but explain that if the Write-once bits are set, it is a return to base fix. | 21:19 |
jcdutton | No-one in their right mind is going to run that intel-spi driver in its current state. | 21:19 |
jcdutton | Unless the ~dangerous code is added. | 21:20 |
jcdutton | I also found out that this is a PCI device | 21:20 |
jcdutton | so they also need to add the write-post bits. | 21:21 |
sladen | knock up a proposed patch for both | 21:23 |
sladen | along with citations explaining the why | 21:23 |
sladen | given the number impacted people (probably 1000x those who have reported it) | 21:24 |
sladen | it needs to be done slowly and carefully with review | 21:25 |
jcdutton | Agreed. I offer to review the proposed fix. | 21:25 |
jcdutton | sladen, FYI, I used to be a kernel developer, but don't have a lot of time for it now. | 21:27 |
jcdutton | sladen, have there been any reports of 3 power cycles fixing the problem? | 21:29 |
sladen | jcdutton: no, that was my guess on the basis of (1) first time blacklisting the driver (2) letting the firmware come up and get upset and broken checksums etc + cleanup (3) hopefully come up in a better state | 21:30 |
sladen | jcdutton: however, if it's not the SMI disabling, but is in need CMP in SR2 on the flash getting set because of a corrupt FIFO read before, then that is probably not going to help | 21:31 |
sladen | jcdutton: and the latest information points to the need to reset CMP in SR2 on the flash, in software | 21:31 |
sladen | jcdutton: the difficulty for me and others is the lack of hardware to test | 21:31 |
jcdutton | Agreed. particularly with the possibility of making it worse due to the write-once bits. | 21:33 |
jcdutton | You really need someone with the laptop and a flash programmer to hand, who can test and fix if needed. | 21:33 |
sladen | which we have | 21:35 |
sladen | sort-of | 21:35 |
sladen | and my approach to this would be working out to reliably brick it, in order to unbrick it by not doing that | 21:35 |
jcdutton | Where is the fifo you speak off? | 21:37 |
jcdutton | A stepping stone towards that could be a bit of code that simply reports: "Software fixable. CMP flipped" or "Hardware replacement needed" if the write-once are set. | 21:43 |
jcdutton | and put that in a bootable image | 21:43 |
=== ochosi_ is now known as ochosi |
Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!