[04:24] * enyc meows [08:37] enyc, not heard anything against 5.4 this cycle so far [09:02] apw: indeed may be complete misnomer, turned out that user 5.4.0-47 still [09:02] apw: ooi any idea where there is a 'gap' in kernel cycle announced on kernel.ubuntu.com ? [09:02] enyc, i think it is mentioned at least on the front page [09:03] apw: sorry i mean WHY ther eis ... as annoinced on ... [09:03] * enyc thinks... maybe working on ubuntu 20.10 release kernel [09:04] enyc, the cycle would have been badly aligned against the 20.10 release which isn't a great plan, and there is some infrastructure work going on which makes it hard to realign that cycle; so it was decided to pause over that period [09:04] apw: i see =) [09:05] hrrm groovy 5.8 kernel [09:05] enyc, nothing earth shattering or concerning; we will be right back at it the cycle after [09:06] =) [17:10] On 20.04 I have a kernel hanging crypto tasks during a ceph install using ceph-ansible, to set up encrypted volumes. [17:10] Where's the best place to send info, and whatnot. [17:11] I've also run the proposed kernel, shows the same issues. I have not tried mainline yet. === Eighth_Doctor is now known as Conan_Kudo === Conan_Kudo is now known as Eighth_Doctor [19:35] ira: which tasks? does strace show them hung? if so, where does the /proc/pid/stack for the processes say they are hung? [19:36] I'm getting kernel wait messages. I wiped the machines and I'm installing the OEM kernel + 20.04 to see if things change. [19:38] The whole setup is MAAS + ansible, so reconfiguring it isn't the end of the world. [19:40] And yeah they are hung, the install hangs. [19:41] It is using ceph-volume to create dm-crypt encrypted disks. [19:44] if it's moments after boot I wonder if you're hitting the /dev/random entropy stuff. can you shove in a random seed from the maas server to the cloud-init on those things? [19:45] It's way after boot. [19:46] days? or minutes? [19:46] minutes. [19:47] still plausibly randomness, a machine you never touch has limited ability to gather its own [19:47] Except it installs fine on 18.04. [19:47] oh interesting :) [19:48] Same containers, scripting etc... [19:48] Rebooting the boxes right now to pick up 5.6. I [19:49] 'll do the install there, and see how it goes... if it flies through it takes about 20-30m. [19:50] If it all fails, I'll reset it to stock 20.04 and be glad to do a debug session. [19:50] (Even if not, I'd like to get this fixed, so we can use 20.04 and not 18.04 here.) [20:34] @sarnold the 5.6 kernel does it. [20:34] So I got machines hanging for interrogation :) [20:35] sweet :) well, I mean, ugh :) but you know.. a reproducer is handy :) [20:35] can you spot which processes look hung? what does strace say they are doing? /proc/pid/stack? [20:36] What's the same thing as fpaste on fedora here? [20:36] (old red hat engineer here... sorry man :) ) [20:36] (ex-red hat) [20:37] paste.ubuntu.com [20:37] pastebinit as a CLI [20:38] https://paste.ubuntu.com/p/X5S2brG89F/ [20:38] or just echo foobar | nc termbin.com 9999 [20:39] And 5.6 locked faster than 5.4 it looks like. [20:40] https://paste.ubuntu.com/p/zYVxKr8QPP/ [20:40] (the stack of that process.) [20:41] Thank you for helping :). [20:56] Anything else off the machines you'd like? [21:12] Looks like it doesn't happen every time, but it does happen, I see it made a few OSDs on that host successfully. [21:16] I've also tried the non-containerized built for ubuntu ceph octopus packages, and they don't work either, in the exact same way. In case there's a question there :) [21:25] https://github.com/NixOS/nixpkgs/issues/40282 [21:49] Reinstalling the machines, with blacklisting intel_qat, and also blocking install. [22:03] ira: oh wow looks like you've been very productive :D [22:03] ira: I think yo'ure right, blacklisting qat looks promising [22:04] here's to hoping.