[06:44] <alkisg> Hi, in ltsp, to netboot clients, we're passing 2 initrds. This works fine up to and including ubuntu 19.04.
[06:44] <alkisg> But I just tried the 19.10 lubuntu daily build, and the 5.2 kernel there doesn't see the additional initrd, it's like if we only passed one.
[06:44] <alkisg> Any recent commits that broke it? Any related changes?
[08:33] <apw> alkisg, as in you say initrd=x initrd=y ?
[08:33] <alkisg> apw: exactly
[08:33]  * apw is supprised that works at all, that you don't only get hte last one
[08:34] <alkisg> apw: oh the kernel has supported it since ages, and in the recent years all boot managers added support for that,
[08:34] <alkisg> grub, syslinux, ipxe, anyone that I tested has had patches in the recent years to support it
[08:34] <alkisg> *anything
[08:35] <apw> alkisg, the kernel supports concantenation of initramfs objects in a single passed block
[08:35] <alkisg> apw: although, I tried concating the two initrd, and that didn't work either, which probably means a change in the kernel itself
[08:36] <alkisg> So if someone is able to test that ^ (create an initrd with just a /test file, append it to initrd.img, boot, and see if /test is there),
[08:36] <apw> alkisg, the kernel only gets initrd=<start>,<len> of a single block
[08:36] <alkisg> and it doesn't work with 19.10 like it doesn't work for me, then I guess he reproduced the issue...
[08:37] <apw> alkisg, so all of the bootloaders must be concantenating; so it would be continuration which would have to be broken
[08:37] <alkisg> I'm testing in the exact same system with the same bootloaders etc
[08:37] <alkisg> (ipxe, in this case, I'll test kvm in a few minutes)
[08:37] <apw> alkisg, and would be equivalent for your test and the bootloader test
[08:38] <alkisg> Right, so it sounds to me like something in the kernel initrd decompression code has broken in 19.10
[08:39] <alkisg> Maybe my initrd.gz has different compression parameters and it can't follow it? Maybe the early microcode thing? I don't know...
[08:39] <apw> alkisg, a good point what are you using for that
[08:40] <apw> as in 19.10 the default compression changed to lz4 initramfs-tools
[08:41] <apw> though we have not turned off any compression options
[08:42] <alkisg> apw: find . ! -name ltsp.img | cpio -oH newc | gzip > "$_DST_DIR/ltsp.img"
[08:52] <apw> alkisg, do you have a dmesg of the failed boot
[08:53] <alkisg> Moment...
[08:57] <apw> alkisg, i can see no significant changes in the code to decode the initramfs from 5.0 to unstable
[08:57] <apw> alkisg, would you attach all of this to an 'ubuntu-bug linux' please too and tell us the number
[08:58] <alkisg> apw: ok; then I'd best collect all these and do a proper report tomorrow morning. Thanks a lot :)
[08:58] <apw> alkisg, thanks ... do ping me with the bug number
[08:59] <alkisg> will do
[12:30] <ogra> alkisg, apw, the bootloader typically just loads the initrds back to back into ram and hands the kernel the adresses of that merged ram space ... (we use this feature in core too, grub is very easy there if you just hand it two initrds, u-boot is a bit more tricky since you need to do the math and loading yourself) ... but this only works if both initrds use the same compression ... i'D bet there is a difference between the two initrds alkisg loads
[12:43] <apw> ogra, i wonder why that restriction exists, as its not like the decompresson format individually would understand the reset in the middle
[12:43] <apw> i am sure it iterates at the top level there, strange
[12:44] <apw> i would think that is a bug, and likely the bug
[12:44] <ogra> well, the bootloader loads the compressed bits back to back ... the kernel tries to uncompress ... my guess would be it only looks at the header at the start of that ram area so wont know that the second half uses a different compression
[12:52] <ogra> (also, i think, the kernel typically doesnt know there are two parts, it only gets start address and size from the BL)
[13:03] <alkisg> ogra: thanks, I'll try with lz4 or whatever else it is, and with uncompressed ltsp.img, and report back... but a bit later, too sleepy now...
[13:04] <alkisg> ogra: I was guessing that the code would be like "while ... check initrd header; decompress; done" => i.e. it would check the header/compression type on each new initrd it sees while "walking the list", not just once at start...
[13:05] <alkisg> But OK if it's a restriction, we can match it from the ltsp code
[13:28] <apw> alkisg, let me know in the bug if it does work with the 'same compression'
[13:29] <apw> alkisg, as i am not sure we are not going to end up missmatched with cpu firmware too
[13:31] <ogra> well, this is stuff i found during testing with u-boot, ipxe or grub might work differently (i never tried different compressions etc on grub and never used ipxe), i'm just assuming that the process is similar on other bootloaders
[13:32] <apw> ogra, right, this would be a kernel side limitation, just very odd
[13:33] <ogra> well, for u-boot is it pretty clear that the kernel only sees a single initrd blob in ram because it only hands start address and size over ... might be different in the other cases
[13:33] <ogra> (size=complete size of both parts)
[13:46] <apw> ogra, right there is only 'here,this-big' option to the kernel
[13:46] <apw> ogra, and it decompresses till the compressor says 'done' then it tries again if we are not at the end
[13:47]  * alkisg starts testing with kvm and a single concatenated initrd, which should be faster to test with...
[13:50] <ogra> aha ... well, if it tries agan if not at the end that might indeed mean it should support multiple compressions
[13:50] <ogra> and then i agree there is a bug ... i didnt know the kernel does that
[13:51] <alkisg> Here are the first results, with concat:
[13:51] <alkisg> initrd.img: ASCII cpio archive (SVR4 with no CRC) => 19.10
[13:51] <alkisg> ltsp.img:   gzip compressed data, last modified: Wed Aug 21 11:46:58 2019, from Unix ==> fails with gzip
[13:51] <alkisg> cat ltsp.raw initrd.img.original > initrd.img (no gzip) ==> succeeds
[13:52] <alkisg> So it works fine with raw, but not with gz now; while it worked from debian jessie and ubuntu 16.04 that I'm testing with, until all the recent versions up to 19.10
[13:52]  * alkisg tries to see how to make it lz4...
[13:53] <alkisg> unmkinitramfs ../initrd.img.original .
[13:53] <alkisg> cpio: premature end of archive
[13:54] <alkisg> apw: premature end of archive? maybe bad initrd in 19.10?
[13:54] <alkisg> (that's the stock one from the live cd)
[13:54] <alkisg> OK that would explain it then, why it can't follow up with the next one
[13:55] <apw> alkisg, odd
[13:55] <alkisg> So not a kernel thing, but a mkinitramfs thing
[13:55] <ogra> well, if its invalid, how would the livecd's boot ?
[13:55] <ogra> i'd assume we'd have heard if they do not ...
[13:55] <alkisg> ogra: e.g. if it's just some padding, it could do that
[13:56] <alkisg> I.e. decompress correctly, even with "oh premature end", but then it won't be able to follow up with the next initrd
[13:56]  * alkisg downloads the ubuntu.iso instead of the lubuntu one...
[13:56] <apw> right if it was a byte long it might parse and unpack the cpu firmware in section 1, the real initramfs in section 2, and think there is a section 3 of the padding and barf there
[13:59] <alkisg> apw: do you know if an installed systemd uses different code to update-initramfs vs the live cd? I'm only testing with live cds for 19.10...
[13:59] <alkisg> *system
[13:59] <alkisg> (live cds worked fine up to 19.04 too)
[14:00] <apw> alkisg, the initramfs is different in contend but made the same way
[14:00] <apw> content
[14:00] <alkisg> OK the content shouldn't matter with rdinit=/bin/sh, so...
[14:00]  * alkisg is downloading yesterday's ubuntu.iso, let's see...
[14:02] <apw> alkisg, i just mounted up the eoan ubuntu desktop image on my machine here, and the initrd in that is extractable without that error
[14:03] <apw> (and this machine is also running eoan)
[14:03] <ogra> apw, oh, since i have you here ... whats the reason for all the armhf metapackages (linux-raspi2, linux-snapdragon) to be in unverse (i have been trying to derive clssic images from core based ones using u-image and noticed that ... the server rootfs build in u-image doesnt have universe in its sources.list so you cant instal a kernel on these devices without hackery)
[14:03] <alkisg> Hrm. Let me see what happens with yesterdays'...
[14:04] <apw> ogra, because things end up in main through need mostly, and things which only end up in core avoid seeding
[14:04] <apw> $ md5sum /mnt/casper/initrd
[14:04] <apw> f3a8e1484ad2ebcca4437a8bf949266b  /mnt/casper/initrd
[14:04] <apw> alkisg, ^ is the one i can dething
[14:05] <ogra> apw, well, we officially support server images for pi and dragonboard ... that is what made me curious
[14:05] <ogra> i'd have the metas expected to be seeded via -supported or some such
[14:05] <apw> ogra, and the "we" part is where care is needed; does "ubuntu" or "canonical" support those
[14:05] <alkisg> (different one here) 4b0a58102b099fa26d89aad3a7952e7a  initrd.img.original
[14:05] <ogra> canonical AFAIK
[14:06] <ogra> (which in turn includes ubuntu i think)
[14:06] <apw> anyhow that isn't quite what main means; and you will find people arguing both ways to put it in main and not
[14:06] <ogra> we're offering these builds to customers for commercial products ... 
[14:06] <ogra> heh, ok 
[14:06] <ogra> well, it just made me curious ... i didnt mean to start a discussion about it 
[14:06] <apw> that is a separate commercial concern and not related to ubuntu support status
[14:06] <ogra> ok
[14:07] <apw> as we support each of those for customer specific times; and peopel tend to assume main == 5 years
[14:07] <apw> (in an lts)
[14:07] <ogra> right
[14:07] <apw> ogra, but we also move things to main to make isos, so ... 
[14:08] <apw> we can support longer than the ubuntu marked support not less
[14:08] <apw> sort of thing
[14:08] <ogra> yup
[14:09] <alkisg> apw: same problem with http://cdimage.ubuntu.com/daily-live/20190819/eoan-desktop-amd64.iso
[14:09] <alkisg> I'm running unmkinitramfs in 18.04,would it matter?
[14:09]  * alkisg tries booting the live cd to unmkinitramfs in 19.10...
[14:09] <apw> alkisg, maybe, does that old version support lz4
[14:09] <alkisg> It does show the contents
[14:09] <apw> oh, then not that
[14:10] <alkisg> Oh never mind. 56k
[14:10] <apw> alkisg, md5sum of your .iso please
[14:10] <apw> 56k ?
[14:10] <apw> 19 ?  i have an .iso from 21
[14:10] <apw> with a date of the 21st
[14:10] <alkisg> apw: no the iso was fine, but unmkinitramfs isn't,
[14:10] <alkisg> I saw early/main and thought it was ok, 
[14:10] <apw> oh
[14:11] <alkisg> but it only managed to uncompress the -rw-r--r-- 1 alkisg alkisg  30K Νοε  28  2018 AuthenticAMD.bin
[14:11] <alkisg> So no lz4
[14:11] <alkisg> OK that would explain the "premature" error, but not the "second initrd isn't shown when booted" error
[14:12]  * alkisg moves to test from 19.10 now...
[14:13] <alkisg> Heh nice desktop icons :D
[14:16] <alkisg> Verified that 18.04 can't decompress 19.10 initramfs (same md5sum, works in 19.10)
[14:26] <alkisg> cat /cdrom/casper/initrd ltsp.cpio.gz > test1; unmkinitramfs test1 .  ==> premature end of archive (all inside 19.10)
[14:26] <alkisg> cat /cdrom/casper/initrd ltsp.cpio > test2; unmkinitramfs test2 .  ==> works
[14:27] <alkisg> apw: would you consider it a bug when unmksquashfs fails when lz4+gz initrds are joined?
[14:28]  * alkisg moves on to testing netbooting again, but with uncompressed ltsp.img...
[14:30] <apw> alkisg, i thnk the manual page has a bugs section saying it is a bit crap
[14:30] <alkisg> Ah
[14:31] <apw> BUGS
[14:31] <apw>        unmkinitramfs cannot deal with multiple-segmented initramfs images, except  where
[14:31] <apw>        an  early (uncompressed) initramfs with system firmware is prepended to the regu‐
[14:31] <apw>        lar compressed initramfs.
[14:32] <apw> actually so crap it is hard to believe the text is right !
[14:36] <alkisg> Nope, 19.10 can't load even lz4+raw; I don't know how to create lz4+lz4 to test that. Checking dmesg...
[14:38] <apw> raw comes first
[14:39] <alkisg> I mean my raw, not the microcode
[14:39] <alkisg> I didn't say that correctly, I mean this:
[14:39] <alkisg>         find . ! -name ltsp.img | cpio -oH newc > "$_DST_DIR/ltsp.img"
[14:39] <alkisg> vs this:         find . ! -name ltsp.img | cpio -oH newc  | gzip > "$_DST_DIR/ltsp.img"
[14:39] <apw> ahh
[14:39] <alkisg> unmkinitramfs in 19.10 works without gzip, fails with gzip there,
[14:39] <alkisg> the kernel fails in both cases
[14:40] <apw> and you are concatting them how
[14:40] <alkisg> cat 1 2 > 3
[14:40] <apw> makes sense
[14:41] <apw> alkisg, there seems to be an lz4 command
[14:42] <alkisg> I tried this silly test: 19.10 kernel, 18.04 initramfs, plus ltsp.img ==> works
[14:42] <alkisg> So it's not a kernel change, but something in the new initramfs that makes the kernel choke
[14:42] <apw> ok which is raw+gz+gz i assume
[14:42] <alkisg> Right
[14:42] <apw> so you need to make the raw+lz4+lz4
[14:42] <apw> and the command lz4 with no arguemnts seems to be a compressing pipeline
[14:43] <alkisg> OK let me install that and test, ty
[14:43] <apw> alkisg, it is also possible that there is padding at the end of an lz4 that the kernel does not grok, we will find out from this test
[14:44] <alkisg>         find . ! -name ltsp.img | cpio -oH newc | lz4 > "$_DST_DIR/ltsp.img"
[14:44] <alkisg> This has the same issue
[14:44] <alkisg> I.e. it only sees the first initramfs
[14:45] <alkisg> Since raw+lz4+raw didn't work... yes it sounds like a padding in lz4
[14:48] <apw> alkisg, can you give me the actual error the kernel emits please, it matters
[14:49] <alkisg> apw: I don't see anything in dmesg, where would I find that?
[14:50] <apw> alkisg, never mind, i'll see if i can repo
[14:51] <alkisg> Ty :)
[14:51]  * alkisg tries injecting a couple of zeros between, in case it's a matter of padding...
[14:53] <apw> alkisg, don't think that will work, it expects the first two bytes of the remaining space to be a file magic
[15:03] <alkisg> Yeah it didn't; although it does have a few nulls at the end, adding/removing some didn't help
[15:16] <apw> [    0.288625] Initramfs unpacking failed: junk within compressed archive
[15:20]  * alkisg didn't have that in dmesg
[15:24] <apw> alkisg, as i can reproduce it, i will poke for a bit
[15:25] <apw> alkisg, could you file that bug for me
[15:25] <apw> alkisg, a +filebug will be fine
[15:26] <apw> alkisg, which kernel does this work with previously ?
[15:29] <alkisg> apw, 19.04; will check in a bit and file bug, ty
[15:38] <alkisg> apw: I think it's a bug in mkinitramfs though, not in the kernel, e.g. it might generate invalid length of lz4 or something...
[15:38] <apw> alkisg, also it does handle padding, as long as it is 0's
[15:38] <apw> alkisg, or the lz4 format is not length aware, or
[15:39] <alkisg> OK... but then this is the first kernel that we try lz4 booting, right?
[15:39] <alkisg> So "it worked in 19.04" doesn't make any sense
[15:39] <alkisg> Since it was gz, and gz still works
[15:40] <alkisg> So lz4 might have been broken since its initial implementation, and noone would have noticed, since concatenating something after lz4 isn't too common
[15:40] <alkisg> (or passing multiple initrds)
[15:43]  * alkisg wonders how to check if 19.04 used lz4 or gz
[15:45] <apw> alkisg, it only just changed in eoan
[15:46] <alkisg> OK then it sounds possible that it never worked correctly with lz4
[16:42] <alkisg> apw: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1840945
[16:42] <ubot5`> Ubuntu bug 1840945 in linux (Ubuntu) "Concatenated lz4 initrds don't work" [Undecided,New]
[16:42] <alkisg> I didn't run ubuntu-bug, as it would report the host, not whatever's inside kvm
[16:42] <alkisg> And I only had kernel/initrd inside kvm, so I couldn't run ubuntu-bug from there either
[16:42] <alkisg> If needed, I can do an installation inside a VM... but I'm not sure if it'll help
[20:31] <apw> alkisg, no that is enough