/srv/irclogs.ubuntu.com/2017/12/21/#ubuntu-devel.txt

=== tedg_ is now known as tedg
=== kalikiana_ is now known as kalikiana
happyaronwants to know how can I debug apparmor for lxc/lxd related issue?03:08
jjohansenhappyaron: https://gitlab.com/apparmor/apparmor/wikis/AppArmor_Failures04:43
jjohansenin addition there is a debug mode04:44
jjohansenas root echo 1 > /sys/module/apparmor/parameters/debug04:44
jjohansenthis will cause a few extra messages to the kernel ring buffer (dmesg) that may help04:45
jjohansenanother one to watch for is seccomp no_new_privs which can block apparmor from making domain transitions04:47
jjohansenapparmor does emit a message for this, but you may need the debug setting turned on04:48
happyaronjjohansen: thanks! looking into that05:03
=== DrKranz is now known as DktrKranz
=== balkamos_ is now known as balkamos
=== Elimin8r is now known as Elimin8er
=== bluesabre_ is now known as bluesabre
=== Dmitrii-Sh is now known as Dmitrii-Sh-PTO
jcduttonHi. How is bad bug comming along. Have we found out how to un-protect the SPI ?12:46
sladenjcdutton: the protection actually seems to be on the flash chip itself, which boots up read-only13:04
sladenjcdutton: and would normally get unlocked by some other means when required13:06
sladenypwong: Mika is going to contact you about testing leaving FSMIE as-is13:12
sladenjcdutton: you disappeared, but yes, people are working on it13:15
sladen<sladen jcdutton: the protection actually seems to be on the flash chip itself, which boots up read-only13:16
sladen<sladen jcdutton: and would normally get unlocked by some other means when required13:16
jcduttonMaybe it has changed the Protected Regions?13:18
sladenjcdutton: my current theory is that the SMM firmware normally takes care of either  (a) clearing the protect on first access;  (b) going into lockdown when something unexpected is encountered13:21
sladenjdstrand: however we are dependent on those that have access to hardware to test.  And they are currently having dinner (because of the timezone) and will return to testing after a short break13:22
sladenjcdutton: ^^13:22
jcduttonI read that a reboot 3 times might help.  My research is that the reboot requires complete power off. I.e. No Main, no battery, power off, then on again13:25
sladenjcdutton: yes, battery out, power off13:27
sladenjcdutton: and does it?13:27
sladenjcdutton: what did you discover?13:27
sladenjcdutton: probably should have written 'power off13:28
sladenjcdutton: probably should have written 'power cycle'13:28
jcduttonI don't have a problem Laptop, but the datasheets say that the write-lock is only cleared on a complete power cycle.13:28
ypwongsladen, will do. If I break it I will have to go back to ODM to unbrick it.13:31
sladenypwong: ta.  what would also be useful are before/after dumps of the flash.  This would help to confirm if it is eg. a broken checksum (not) being updated, and which is causing the firmware not to unlock the chip during boot the next time13:34
sladenypwong: second would be the debug information that the intel-spi driver dumps out during boot, to see what the initial state of the SPI status and control registers are13:36
sladenypwong: but take your time and have a well-earned break!13:36
jcduttonypwong, What do ODM do to it to unbrick it?13:47
ypwongjcdutton, they use a spi flasher writer to change the CMP bit from 1 to 013:56
ypwongjcdutton, https://www.dediprog.com/pd/spi-flash-solution/sfdk0113:56
ypwongsladen, will get to that after my meeting that's 4 mins from now :)13:56
jcduttonypwong, sounds a bit odd to me. what would have set that to 1?14:02
ypwongjcdutton, that's what we are finding out14:02
ypwongkernel codes look sane but somehow that bit changed to 114:02
sladenjcdutton: several lines of inquiry, latest speculation is related to  https://github.com/torvalds/linux/commit/9d63f17661e25fd28714dac94bdebc4ff5b75f09  which if it's not in the running kernel might leave junk in the FIFO from the previous access14:05
sladenjcdutton: though my reading of the documentation is that the flash chip comes up with CMP=1, and needs to be explicitly cleared, or WSP=1 needs setting to use a different protection scheme14:07
jcduttonypwong, the problem with CMP being wrong, is that next to it are Write-Once bits, that if they are written, will perm brick the laptop.14:09
sladenjcdutton: yup, the latest speculation is that although the code does a read-modify-write (to set something else in that register);  if the 'read' was for the incorrect data, then the write will also be for the incorrect data14:12
sladenjcdutton: https://www.winbond.com/resource-files/w25q64fw_revd_032513.pdf  is the Winbond doc14:13
jcduttonThat commit is crazy code.14:15
sladenjcdutton: why so?14:16
jcduttonIt does not check for len = 014:16
sladenmmm, that's certianly a good way to set all-1s14:18
jcduttonDoes it need to do write-posting on that write?14:22
sladenthis (risky) code could certainly do with some more error/sanity checking14:27
sladenie. validating every single bit in what is going to be written back14:27
jcduttonIs this code actually used for anything, apart from re-flashing ?  In which case, it should not write anything until someone actually wishes to re-flash14:29
jcduttonsladen, that patch does fix a bug in the read statement, that is a good fix, but I don't think that would cause the problems we are seeing.14:33
sladenjcdutton: the *whole problem* is that is collateral from the _init()/probe code. ...which should not be doing *anything* invasive/risky14:35
sladenjcdutton: the code isn't even used14:35
sladenjcdutton: so in theory "nothing" is happening14:36
jcduttonSurely some of the writel in the init need write-posting?14:39
jcduttonalthough, better if it never did a write in that init code.14:40
sladenjcdutton: write-posting?14:45
jcduttonWith PCI, in order to actually write something, you have to read it afterwards, to force a PCI transaction to actually pass the data across the buss14:52
jcduttonso, a writel without a following readl is unlikely to behave as expected14:52
jcduttonI was asking, in case this chatting with the SPI is not via an PCI bus, in which case it does not matter about write-posting14:55
=== rharper` is now known as rharper
sladenjcdutton: no, but I do have a feeling that some of this code is failing to triple check whether the SPI is ready, and not in the middle of something else15:09
jcduttonsladen, where in the code is it writing  the CMP bit?20:53
sladenjcdutton: the sr2 register in ...21:03
jcduttonI cannot see a write to sr2 in the init function21:03
sladenjcdutton: wait, trying to find it21:04
jcduttonI see, sr1-4 is all one 32 bit reg21:05
sladenjcdutton: it's in spi_nor_init()  write_sr(nor, 0);21:05
sladenjcdutton: the intention is to _clear_ the protection bits21:06
sladenjcdutton: however21:06
sladenjcdutton: and the really worrying thing about this (as you've already noted) is that the same register has write-once fuses in it aswell21:06
sladenas at the very least that routine should mask those out21:06
sladenso that it is impossible for an _init_ routine to write anything dangerous, even if the reading got screwed up21:07
jcduttonAgreed21:07
sladenthis is one avenue of investigation21:08
sladenanother is that the BIOS System Management Firmware normally clears t21:08
sladenclears it21:08
jcduttonA simple AND 0xffff00ff  would do it21:09
jcduttonBut that only works if the QUAD and CMP should be 0. In some cases that might not be true.21:10
jcduttonMaybe 0xffff83ff  would be safer, in the sense that it would be reversable in software. Those other bits are not reversable.21:11
sladenprefereably   ~(DANGERBIT(X) | DANGEROUSBIT(Y) | DANGEROUSBIT(Z))  where the enums make it less easy to screw up21:12
sladenand more obvious what is happening21:12
sladenthink the code is trying to enable the Quad mode21:12
jcduttonI think the only real solution to this problem is a intel-spi driver with a mass of quirks in it, so undo the damage it has done.21:14
sladenjcdutton: something like that is in the process of being tested...21:14
sladenjcdutton: but needs to not make the situation any worse21:14
jcduttonit can get worse???21:15
sladen(if it is only CMP getting set it is reversible)21:15
sladenand in software21:15
sladenif any of the write-protect fuses get blown, then it is bad21:15
jcduttonMaybe we need to get a sample. write a version of the driver that is 100% safe, and have it dump the sr2 to the syslog. and gather results. I think netboot works for everyone, so one could create a netboot image that just prints the results, without causing more damage.21:17
sladenbooting with 'debug' should be enough21:18
jcduttonLet the tool offer to fix the CMP bit, but explain that if the Write-once bits are set, it is a return to base fix.21:19
jcduttonNo-one in their right mind is going to run that intel-spi driver in its current state.21:19
jcduttonUnless the ~dangerous code is added.21:20
jcduttonI also found out that this is a PCI device21:20
jcduttonso they also need to add the write-post bits.21:21
sladenknock up a proposed patch for both21:23
sladenalong with citations explaining the why21:23
sladengiven the number impacted people (probably 1000x those who have reported it)21:24
sladenit needs to be done slowly and carefully with review21:25
jcduttonAgreed. I offer to review the proposed fix.21:25
jcduttonsladen, FYI, I used to be a kernel developer, but don't have a lot of time for it now.21:27
jcduttonsladen, have there been any reports of 3 power cycles fixing the problem?21:29
sladenjcdutton: no, that was my guess on the basis of (1) first time blacklisting the driver (2) letting the firmware come up and get upset and broken checksums etc + cleanup (3) hopefully come up in a better state21:30
sladenjcdutton: however, if it's not the SMI disabling, but is in need CMP in SR2 on the flash getting set because of a corrupt FIFO read before, then that is probably not going to help21:31
sladenjcdutton: and the latest information points to the need to reset CMP in SR2 on the flash, in software21:31
sladenjcdutton: the difficulty for me and others is the lack of hardware to test21:31
jcduttonAgreed. particularly with the possibility of making it worse due to the write-once bits.21:33
jcduttonYou really need someone with the laptop and a flash programmer to hand, who can test and fix if needed.21:33
sladenwhich we have21:35
sladensort-of21:35
sladenand my approach to this would be working out to reliably brick it, in order to unbrick it by not doing that21:35
jcduttonWhere is the fifo you speak off?21:37
jcduttonA stepping stone towards that could be a bit of code that simply reports: "Software fixable. CMP flipped"  or "Hardware replacement needed" if the write-once are set.21:43
jcduttonand put that in a bootable image21:43
=== ochosi_ is now known as ochosi

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!