/srv/irclogs.ubuntu.com/2016/03/02/#ubuntu-kernel.txt

rtgsbeattie, re bug #1551894 - I'm building a test kernel with 3dfb7d8cdbc7ea0c2970450e60818bb3eefbad69 applied. I'll post in the bug when its done.01:22
ubot5bug 1551894 in linux (Ubuntu Xenial) "linux: 4.4.0-9.X fails yama ptrace restrictions tests" [Undecided,In progress] https://launchpad.net/bugs/155189401:22
=== leftyfb_ is now known as leftyfb
=== MaikZ_ is now known as MaikZ
=== willcooke_ is now known as willcooke
Odd_Blokeapw: Don't know if you're around (and have a minute), but if you could check my working in https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1551419/comments/8 to confirm that my reading of things is correct, it'd be much appreciated.11:50
ubot5Launchpad bug 1551419 in linux (Ubuntu) "Fix UUID endianness patch breaks cloud-init on Azure" [Critical,Confirmed]11:50
Odd_Blokeapw: (This is super-hot, because it breaks all Azure trusty instances on post-kernel-upgrade reboot)11:51
apwOdd_Bloke, is that the one where cloud-init loses its mind and thinks scorched-earth on a clearly installed instance is a good idea 12:14
apwOdd_Bloke, you sound convincing .... as this is stables, i recon we want bjf to chime in12:16
apwOdd_Bloke, i assume we are going to also fix cloud-init to not juju the machine on reboot anyhow12:16
Odd_Blokeapw: Yeah, I've asked bjf to chime in already; just wanted to make sure that I hadn't got the wrong end of the stick as soon as possible. :p12:20
Odd_Blokeapw: The problem with fixing it in cloud-init is that we support using a snapshot of an instance as an image.12:20
Odd_Blokeapw: Which means that distinguishing between 'first boot of a snapshotted image' and 'second boot of any image' is difficult.12:21
apwbut applying scortched earth should be something that snapshot has to tell you surely, else there is a high risk12:21
Odd_Blokeapw: So each data source provides a way of uniquely identifying instances.12:21
apwof exactly the kind of thing that occured here, an uninteded consequnce of another bug is "world ending"12:21
apwwhich is _never_ acceptable, think of ps4.512:21
apwthat'd be like newfs'ing / because i think it is blank12:22
apwwithout being told it is really expected to be blank12:22
apwOdd_Bloke, what i am saying is those snapshots should be marked, and it should ahve said "this is not a snapshot, but appears to have a different machine ID, this is utterly wrong, refusing to boot" not "yeah lets eat your machine"12:23
Odd_BlokeYeah, I've never been wild about supporting the snapshot-as-image-without-modification workflow.12:25
Odd_BlokeSo we should probably re-visit this.12:26
apwi can see the convienience, i can also see the potential for disaster for all instances it creates12:26
apweven if the original bug is a kernel one12:26
Odd_BlokeYeah, agreed.12:27
Odd_BlokeWe'll need to think about it, though, because I'm pretty sure changing this behaviour in trusty will break a lot of people; I expect that 'launch base image; perform customisation; snapshot' is the most common way that people produce derivative images, and they'd have to add a step in there.12:29
apwOdd_Bloke, which series are affected by this, is it primarily trusty ?12:30
Odd_Blokeapw: wily is fine, I haven't checked precise.12:30
Odd_BlokeI will do so now.12:30
apwgreat.  it being only one release makes life a heck of a lot less upsetting12:31
apwOdd_Bloke, the problem with fixing this is anyone running a newly created image with -8 or indeed anyone affected and rebooted and zapped12:31
apwOdd_Bloke, will  suffer the same problem again on update to the next version without the bug, right?  how are you going to mitigate that12:32
Odd_BlokeWill my tears fix it?12:32
Odd_Bloke:p12:32
apws/-8/the broken version/12:32
Odd_BlokeSo I was thinking about this yesterday.12:33
apwi think we need a paired fix for cloud-init which knows how to reconstruct the machine id for the broken kernel version which is only applied on the broken kernle version12:33
Odd_BlokeYeah, that.12:33
Odd_BlokeI _think_ it could happen in cloud-init's packaging, rather than in cloud-init itself.12:33
rtgapw, is there a way to tell that the kernel is broken other then version number ? I was thinking about folks that try mainline crack, etc.12:34
apwas we will need that anyway, and such a fix would immediatly mitigate the broken kernel version, and would let us confirm the fix and roll it out kernel side in a more leisurely manner12:34
apwrtg, we might be able to say its "upstream x.y.z-cktN" which is fail perhaps12:35
apwthough i doubt there are a lot of mainline kernel crack runners in Vms12:35
Odd_BlokeAnd we could make it work both ways: current_endian_version = $(cat /sys/...); reverse_endian_version = ...; if [ -d /var/lib/cloud/instances/$reverse_endian_version ]; then mv <that> <fixed path; fi12:35
apwOdd_Bloke, as if we rush out the kernel change and its wrong again, we just have two wrogo's to mitigate.12:36
Odd_BlokeYeah.12:36
apwOdd_Bloke, i also think we should be getting that out "now" regardless12:36
apwif we are blowing people up12:36
apwas we have to get that out before the fixed kernel can go out safely too12:36
rtgapw, "that" being the cloud_init fix ?12:37
rtgand how do we fix the kernel besides reverting the endian patch ?12:37
apwrtg, yes if we don't mitigate the broken kernel, then updating the kernle to fixed will re-blow up the instances and zap them a second time12:38
apwrtg, so we have to mitigate that by fixing cloud-init, and that has to occur sooner than the kernel12:38
rtgagreed12:39
Odd_BlokeWe also have to handle this in case Azure decide to upgrade their reported SMBIOS version to >=2.6 (probably without changing the endian-ness of what they report).12:41
apwOdd_Bloke: so we shoulf add a cloud-init task to the bug and cordinate thay getting out before the kernel12:45
Odd_Blokeapw: Yep, task already added and I'm looking in to it.12:46
apware you on the hook for that?12:46
Odd_Blokeapw: I think so.12:46
apwok good, that makes it make sense to try ans get this fix into the next upload but not expedite it out12:46
apwbjf12:46
apwbjf: fyi discussion above12:47
apwOdd_Bloke: did you say we might be able to detect the mallformed version? rather than thinking in terms of kernel versions? in your code fragment above. that would be safer for aure12:49
apwahh yes you are saying that, good12:50
Odd_Blokeapw: Yeah, so my plan is to assume that if cloud-init currently has an instance id of "11223344-5566-7788-DEAD-BEEFDEADBEEF" and it now thinks it should have "44332211-6655-8877-DEAD-BEEFDEADBEEF" then we assume it's the same system.12:51
apwdoesmthat account for the shift as well?12:52
apwbut i think the plan is sound12:52
Odd_Blokeapw: "the shift"?12:53
Odd_Bloke(There is a remote possiblity that this could produce false positives, but (I think) only on a snapshotted image which was then launched with a UUID that appeared exactly oppositely-endian in its first three fields)12:54
Odd_BlokeBut that's unlikely to happen before the heat death of the universe. :p12:55
ruchlosnever underestimate the mind of a computer that wants to punish humans to trigger such unlikely scenarios ;)12:57
Odd_BlokeAh, hmph, doing this in cloud-init packaging would mean it would only work once per cloud-init version.13:03
Odd_BlokeSo I do need to do it in cloud-init proper.13:04
Odd_BlokeOn the bright side, I won't have to work out to reverse endian-ness of parts of a UUID in shell. :p13:04
apwheh15:24
apwOdd_Bloke, you are able to test a kernel for us I assume to confirm a fix ...15:26
Odd_Blokeapw: Yep.15:27
=== psivaa_ is now known as psivaa
=== FourDollars_ is now known as FourDollars
xnoxhello =)16:27
xnoxapw, got a fresh bug for udebs16:27
xnoxhttps://bugs.launchpad.net/ubuntu/+source/linux/+bug/155231416:27
ubot5Launchpad bug 1552314 in linux (Ubuntu) "[s390x] zfcp.ko missing from scsi-modules udeb" [Medium,Confirmed]16:27
xnoxapw, just what you wanted to hear right? =))))) enough info to fix it?16:27
rtgxnox, it'll do16:28
xnoxrtg, tah.16:29
=== marga_ is now known as marga
=== utlemmin` is now known as utlemming

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!