/srv/irclogs.ubuntu.com/2017/12/21/#ubuntu-devel.txt

=== tedg_ is now known as tedg
=== kalikiana_ is now known as kalikiana
happyaron	wants to know how can I debug apparmor for lxc/lxd related issue?	03:08
jjohansen	happyaron: https://gitlab.com/apparmor/apparmor/wikis/AppArmor_Failures	04:43
jjohansen	in addition there is a debug mode	04:44
jjohansen	as root echo 1 > /sys/module/apparmor/parameters/debug	04:44
jjohansen	this will cause a few extra messages to the kernel ring buffer (dmesg) that may help	04:45
jjohansen	another one to watch for is seccomp no_new_privs which can block apparmor from making domain transitions	04:47
jjohansen	apparmor does emit a message for this, but you may need the debug setting turned on	04:48
happyaron	jjohansen: thanks! looking into that	05:03
=== DrKranz is now known as DktrKranz
=== balkamos_ is now known as balkamos
=== Elimin8r is now known as Elimin8er
=== bluesabre_ is now known as bluesabre
=== Dmitrii-Sh is now known as Dmitrii-Sh-PTO
jcdutton	Hi. How is bad bug comming along. Have we found out how to un-protect the SPI ?	12:46
sladen	jcdutton: the protection actually seems to be on the flash chip itself, which boots up read-only	13:04
sladen	jcdutton: and would normally get unlocked by some other means when required	13:06
sladen	ypwong: Mika is going to contact you about testing leaving FSMIE as-is	13:12
sladen	jcdutton: you disappeared, but yes, people are working on it	13:15
sladen	<sladen jcdutton: the protection actually seems to be on the flash chip itself, which boots up read-only	13:16
sladen	<sladen jcdutton: and would normally get unlocked by some other means when required	13:16
jcdutton	Maybe it has changed the Protected Regions?	13:18
sladen	jcdutton: my current theory is that the SMM firmware normally takes care of either (a) clearing the protect on first access; (b) going into lockdown when something unexpected is encountered	13:21
sladen	jdstrand: however we are dependent on those that have access to hardware to test. And they are currently having dinner (because of the timezone) and will return to testing after a short break	13:22
sladen	jcdutton: ^^	13:22
jcdutton	I read that a reboot 3 times might help. My research is that the reboot requires complete power off. I.e. No Main, no battery, power off, then on again	13:25
sladen	jcdutton: yes, battery out, power off	13:27
sladen	jcdutton: and does it?	13:27
sladen	jcdutton: what did you discover?	13:27
sladen	jcdutton: probably should have written 'power off	13:28
sladen	jcdutton: probably should have written 'power cycle'	13:28
jcdutton	I don't have a problem Laptop, but the datasheets say that the write-lock is only cleared on a complete power cycle.	13:28
ypwong	sladen, will do. If I break it I will have to go back to ODM to unbrick it.	13:31
sladen	ypwong: ta. what would also be useful are before/after dumps of the flash. This would help to confirm if it is eg. a broken checksum (not) being updated, and which is causing the firmware not to unlock the chip during boot the next time	13:34
sladen	ypwong: second would be the debug information that the intel-spi driver dumps out during boot, to see what the initial state of the SPI status and control registers are	13:36
sladen	ypwong: but take your time and have a well-earned break!	13:36
jcdutton	ypwong, What do ODM do to it to unbrick it?	13:47
ypwong	jcdutton, they use a spi flasher writer to change the CMP bit from 1 to 0	13:56
ypwong	jcdutton, https://www.dediprog.com/pd/spi-flash-solution/sfdk01	13:56
ypwong	sladen, will get to that after my meeting that's 4 mins from now :)	13:56
jcdutton	ypwong, sounds a bit odd to me. what would have set that to 1?	14:02
ypwong	jcdutton, that's what we are finding out	14:02
ypwong	kernel codes look sane but somehow that bit changed to 1	14:02
sladen	jcdutton: several lines of inquiry, latest speculation is related to https://github.com/torvalds/linux/commit/9d63f17661e25fd28714dac94bdebc4ff5b75f09 which if it's not in the running kernel might leave junk in the FIFO from the previous access	14:05
sladen	jcdutton: though my reading of the documentation is that the flash chip comes up with CMP=1, and needs to be explicitly cleared, or WSP=1 needs setting to use a different protection scheme	14:07
jcdutton	ypwong, the problem with CMP being wrong, is that next to it are Write-Once bits, that if they are written, will perm brick the laptop.	14:09
sladen	jcdutton: yup, the latest speculation is that although the code does a read-modify-write (to set something else in that register); if the 'read' was for the incorrect data, then the write will also be for the incorrect data	14:12
sladen	jcdutton: https://www.winbond.com/resource-files/w25q64fw_revd_032513.pdf is the Winbond doc	14:13
jcdutton	That commit is crazy code.	14:15
sladen	jcdutton: why so?	14:16
jcdutton	It does not check for len = 0	14:16
sladen	mmm, that's certianly a good way to set all-1s	14:18
jcdutton	Does it need to do write-posting on that write?	14:22
sladen	this (risky) code could certainly do with some more error/sanity checking	14:27
sladen	ie. validating every single bit in what is going to be written back	14:27
jcdutton	Is this code actually used for anything, apart from re-flashing ? In which case, it should not write anything until someone actually wishes to re-flash	14:29
jcdutton	sladen, that patch does fix a bug in the read statement, that is a good fix, but I don't think that would cause the problems we are seeing.	14:33
sladen	jcdutton: the whole problem is that is collateral from the _init()/probe code. ...which should not be doing anything invasive/risky	14:35
sladen	jcdutton: the code isn't even used	14:35
sladen	jcdutton: so in theory "nothing" is happening	14:36
jcdutton	Surely some of the writel in the init need write-posting?	14:39
jcdutton	although, better if it never did a write in that init code.	14:40
sladen	jcdutton: write-posting?	14:45
jcdutton	With PCI, in order to actually write something, you have to read it afterwards, to force a PCI transaction to actually pass the data across the buss	14:52
jcdutton	so, a writel without a following readl is unlikely to behave as expected	14:52
jcdutton	I was asking, in case this chatting with the SPI is not via an PCI bus, in which case it does not matter about write-posting	14:55
=== rharper` is now known as rharper
sladen	jcdutton: no, but I do have a feeling that some of this code is failing to triple check whether the SPI is ready, and not in the middle of something else	15:09
jcdutton	sladen, where in the code is it writing the CMP bit?	20:53
sladen	jcdutton: the sr2 register in ...	21:03
jcdutton	I cannot see a write to sr2 in the init function	21:03
sladen	jcdutton: wait, trying to find it	21:04
jcdutton	I see, sr1-4 is all one 32 bit reg	21:05
sladen	jcdutton: it's in spi_nor_init() write_sr(nor, 0);	21:05
sladen	jcdutton: the intention is to _clear_ the protection bits	21:06
sladen	jcdutton: however	21:06
sladen	jcdutton: and the really worrying thing about this (as you've already noted) is that the same register has write-once fuses in it aswell	21:06
sladen	as at the very least that routine should mask those out	21:06
sladen	so that it is impossible for an _init_ routine to write anything dangerous, even if the reading got screwed up	21:07
jcdutton	Agreed	21:07
sladen	this is one avenue of investigation	21:08
sladen	another is that the BIOS System Management Firmware normally clears t	21:08
sladen	clears it	21:08
jcdutton	A simple AND 0xffff00ff would do it	21:09
jcdutton	But that only works if the QUAD and CMP should be 0. In some cases that might not be true.	21:10
jcdutton	Maybe 0xffff83ff would be safer, in the sense that it would be reversable in software. Those other bits are not reversable.	21:11
sladen	prefereably ~(DANGERBIT(X) \| DANGEROUSBIT(Y) \| DANGEROUSBIT(Z)) where the enums make it less easy to screw up	21:12
sladen	and more obvious what is happening	21:12
sladen	think the code is trying to enable the Quad mode	21:12
jcdutton	I think the only real solution to this problem is a intel-spi driver with a mass of quirks in it, so undo the damage it has done.	21:14
sladen	jcdutton: something like that is in the process of being tested...	21:14
sladen	jcdutton: but needs to not make the situation any worse	21:14
jcdutton	it can get worse???	21:15
sladen	(if it is only CMP getting set it is reversible)	21:15
sladen	and in software	21:15
sladen	if any of the write-protect fuses get blown, then it is bad	21:15
jcdutton	Maybe we need to get a sample. write a version of the driver that is 100% safe, and have it dump the sr2 to the syslog. and gather results. I think netboot works for everyone, so one could create a netboot image that just prints the results, without causing more damage.	21:17
sladen	booting with 'debug' should be enough	21:18
jcdutton	Let the tool offer to fix the CMP bit, but explain that if the Write-once bits are set, it is a return to base fix.	21:19
jcdutton	No-one in their right mind is going to run that intel-spi driver in its current state.	21:19
jcdutton	Unless the ~dangerous code is added.	21:20
jcdutton	I also found out that this is a PCI device	21:20
jcdutton	so they also need to add the write-post bits.	21:21
sladen	knock up a proposed patch for both	21:23
sladen	along with citations explaining the why	21:23
sladen	given the number impacted people (probably 1000x those who have reported it)	21:24
sladen	it needs to be done slowly and carefully with review	21:25
jcdutton	Agreed. I offer to review the proposed fix.	21:25
jcdutton	sladen, FYI, I used to be a kernel developer, but don't have a lot of time for it now.	21:27
jcdutton	sladen, have there been any reports of 3 power cycles fixing the problem?	21:29
sladen	jcdutton: no, that was my guess on the basis of (1) first time blacklisting the driver (2) letting the firmware come up and get upset and broken checksums etc + cleanup (3) hopefully come up in a better state	21:30
sladen	jcdutton: however, if it's not the SMI disabling, but is in need CMP in SR2 on the flash getting set because of a corrupt FIFO read before, then that is probably not going to help	21:31
sladen	jcdutton: and the latest information points to the need to reset CMP in SR2 on the flash, in software	21:31
sladen	jcdutton: the difficulty for me and others is the lack of hardware to test	21:31
jcdutton	Agreed. particularly with the possibility of making it worse due to the write-once bits.	21:33
jcdutton	You really need someone with the laptop and a flash programmer to hand, who can test and fix if needed.	21:33
sladen	which we have	21:35
sladen	sort-of	21:35
sladen	and my approach to this would be working out to reliably brick it, in order to unbrick it by not doing that	21:35
jcdutton	Where is the fifo you speak off?	21:37
jcdutton	A stepping stone towards that could be a bit of code that simply reports: "Software fixable. CMP flipped" or "Hardware replacement needed" if the write-once are set.	21:43
jcdutton	and put that in a bootable image	21:43
=== ochosi_ is now known as ochosi

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!