[09:00] <nibbon_> o/
[09:01] <nibbon_> I'm having an annoying issue with qemu/libvirt while live migrating a workload from Focal to Jammy
[09:02] <cpaelzer> hi nibbon_, what is it that you are running into?
[09:03] <nibbon_> it's worth nothing that I hit the issue only when I try to live migrate workloads that were started on Bionic, then live migrated to Focal
[09:04] <cpaelzer> I think I have seen that working still, maybe depending on subfeatures used - but what problem exactly are you facing?
[09:05] <nibbon_> 2022-11-08T18:48:49.861877Z qemu-system-x86_64: check_section_footer: Read section footer failed: -5
[09:06] <cpaelzer> wow, never seen that
[09:06] <nibbon_> besides the warning reported by /usr/bin/kvm-spice, that's the only error qemu returns when it tries to start the workload on the Jammy hypervisor
[09:06] <cpaelzer> is that reproducible whenever you try or sporadic?
[09:07] <nibbon_> whenever I try to migrate a workload coming from Bionic and still running on Focal hypervisors
[09:08] <cpaelzer> nibbon_: are you using some --copy-storage-* options?
[09:09] <nibbon_> yes, I think we use --copy-storage-inc
[09:10] <cpaelzer> might be an issue with libvirt - see https://bugzilla.redhat.com/show_bug.cgi?id=1889131
[09:10] -ubottu:#ubuntu-server- bugzilla.redhat.com bug 1889131 in Red Hat Enterprise Linux Advanced Virtualization "Do migration with copy storage->cancel migration at memory copy phase -> do migration with copy storage again, failed" [Medium, Closed: Errata]
[09:10] <cpaelzer> I'm not sure how much that is backportable directly
[09:10] <cpaelzer> hmm
[09:10] <cpaelzer> nibbon_: how much can you try/epxeriment?
[09:11] <cpaelzer> could you put https://launchpad.net/~canonical-server/+archive/ubuntu/server-backports on one of the source hosts (which would give it a libvirt with the fix)
[09:11] <cpaelzer> and then try to migrate from there to jammy
[09:11] <cpaelzer> as an alternative to try out a few things, try if other disk migration ways would avoid it
[09:12] <nibbon_> I can perform some experiment in the staging
[09:13] <cpaelzer> all other hints I found so far are around I/O errors like https://git.mentality.rip/OpenE2K/qemu-e2k/commit/3e81f73c7a1286e251180c19f62829fe5c045e39 - but I assume that isn't your case as it is reproducible
[09:13] -ubottu:#ubuntu-server- Commit 3e81f73 in OpenE2K/qemu-e2k "tests: hide stderr for postcopy recovery test"
[09:16] <nibbon_> hmm, so your first suggestion is to install libvirt 8.0 on the Focal hv and then retry the migration, amirite?
[09:16] <nibbon_> otherwise trying, I guess, by using --copy-storage-all
[09:16] <cpaelzer> nibbon_: yes, if it really would be a libvirt issue fixed in 7.0 (as the bug suggested) than this might confiurm that
[09:16] <cpaelzer> nibbon_: if it does confirm we can look deeper into what might be backportable to try from a fixed 6.0 in focal
[09:17] <nibbon_> alright, I'll try those two options and report here the outcome
[09:17] <nibbon_> cpaelzer: thanks for the hints
[09:17] <cpaelzer> nibbon_: I doubts that -inc / -all will differ much, but it is worth a try. In addition if you have any one-off way to try other sync (NFS, shared storage server, ...) try that
[09:17] <cpaelzer> just to confirm that chasing down --copy-storage-* really is the right path
[09:19] <nibbon_> cpaelzer: ack
[10:05] <nibbon_> cpaelzer: first I tried changing --copy-storage-inc to --copy-storage-all, but it fails with the same error.
[10:05] <nibbon_> I'll try by installing libvirt 8.0 and see
[13:45] <nibbon_> cpaelzer: I managed to install libvirt 8.0 on the source hv and the live migration worked
[13:45] <cpaelzer> ok, so it might really be the upstream bug I linked earlier today
[13:46] <cpaelzer> would you mind filing a launchpad issue against focal, referring to how you run the migration and referring to that bug
[13:46] <cpaelzer> there must be something more to your setup though - which we need to find - as I can --copy-storage withotu hitting that
[13:46] <cpaelzer> that discussion can happen on the bug
[13:47] <cpaelzer> if we manage to reproduce it we can try to prep a PPA for your testing and a latter SRU
[13:47] <nibbon_> okay. It's worth noting that - unfortunately - the migration didn't work going the other way around (Jammy → Focal)
[13:48] <nibbon_> hmm, so you can live migrate a workload started on Bionic to Focal and then to Jammy without hitting the issue I reported?
[13:53] <cpaelzer> nibbon_: yes I think I can, let me check if I have some test logs left ...
[13:54] <cpaelzer> nibbon_: no, lost in machine redeployment :-/
[13:54] <nibbon_> because all the workloads started in Focal live migrate without any issue
[13:55] <nibbon_> it's this specific corner case that it's giving me some headaches
[13:55] <cpaelzer> I test some going through all release, but not all - let me check
[13:56] <cpaelzer> nibbon_: no :-/ I test various options between releases; and a common form (which in my case does not ahve --copy-storage) across multiple releases in a row
[13:57] <cpaelzer> nibbon_: so it might really be start bionic -> migrate focal -> migrate jammy while using --copy-storage on all stages
[14:08] <nibbon_> it can be; however, I don't think to be the only one to have such an issue 🤔
[14:10] <nibbon_> cpaelzer: your plan would be to backport the patch for the bug you linked earlier today to libvirt 6.0?
[14:11] <cpaelzer> yep, at least that seems to be the best shot for now
[15:53] <foo> teward: heh, that moment when I realize I already have /run
[15:53] <foo> ... and there is stuff in it. My ubuntu-fu (and linux-fu) is clearly dated, I somehow didn't remember this as a standard dir
[15:56] <foo> Looks like I need to have root user create .sock file 
[15:57] <sdeziel> foo: you shouldn't need as systemd supports creating a directory for your service (`RuntimeDirectory=`)
[15:58] <foo> sdeziel: ah, I have WorkingDirectory specified but not RuntimeDirectory. 
[15:58] <foo> sdeziel: so I think I need to set RuntimeDirectory=/run - thank you
[16:00] <sdeziel> foo: this directive takes a relative path so probably `RuntimeDirectory=myapp-api`. man systemd.exec will give you the full explanation and also mentions `RuntimeDirectoryMode=` which might be handy if the default perms doesn't allow www-data to access the socket you put in there
[16:04] <foo> sdeziel: surprised this didn't work: ExecStart=/home/dev/myapp/live/venv3.9/bin/gunicorn --timeout 1800 --workers 1 --log-level=debug --bind unix:/run/myapp-api.sock wsgi:web_app to create the socket in /run
[16:04] <foo> sdeziel: ... but probably a perms thing
[16:06] <sdeziel> foo: you use `User=dev` and `dev` cannot write to `/run`. That's why we are suggesting having a sub-directory (`/run/myapp-api`) that's created by systemd itself
[16:06] <foo> sdeziel: ohhh, I see, I see
[16:11] <foo> sdeziel: thanks, man systemd.exec spells out explicitly this dynamic. Testing
[16:32] <foo> sdeziel: thank you :) 
[16:32] <sdeziel> foo: have you got it working?
[16:33] <foo> sdeziel: yup :) 
[16:33] <sdeziel> foo: glad to hear that!\
[19:21] <teward> sdeziel: thanks for helping foo with the systemd / runtimedirectory part of it :)
[19:22] <teward> i've been busy with hair on fire situations
[19:22] <sdeziel> heh, we're all doing it cause we like it ;)
[19:23] <teward> sdeziel: true, but it's a good thing that we're all on the same page :)
[21:34] <Kehet> I submitted crash report to canonical earlier when installer crashed, can I somehow find this report again? I just wanted to add that this error disappeared when I created new partition table 
[21:34] <Kehet> well I believe those are anonymous anyway
[21:36] <sarnold> Kehet: yeah I think you can find your own reports
[21:38] <sarnold> sigh, the wiki page that I thought would have the directions only has instructions for the graphical interface
[21:40] <Kehet> yeah, I saw same page .. are those collected to launchpad?
[21:40] <sarnold> the clear majority go only to https://errors.ubuntu.com/
[21:40] <sarnold> I don't know the rules for when they'll go to launchpad vs when they go to the error reporter
[21:55] <sarnold> Kehet: aha! check /var/lib/whoopsie/whoopsie-id
[21:56] <sarnold> Kehet: if you've got one of those, https://errors.ubuntu.com/user/ ... (that big long blob)
[21:57] <Kehet> thanks 👍
[21:58] <sarnold> and *this* time I modified the wiki page so I can find it faster than twenty minutes next time I want it
[21:58] <sarnold> sheesh
[22:01] <Odd_Bloke> Bold to assume a wiki page will load in less than 20mins :p
[22:02] <sarnold> Odd_Bloke: they moved it to a larger vm! I think they added a second hamster and some fresh bedding
[22:03] <sarnold> Odd_Bloke: of course, every action has an opposite reaction; the wiki also spent six months or a year or something in read-only mode afterwards
[22:08] <Odd_Bloke> Oh, nice!
[22:09] <sarnold> Odd_Bloke: I'm afraid it might still be too little too late :( we've had pretty bad documentation fragmentation in the last few years as people tried to cope
[22:14] <Odd_Bloke> Yeah, I remember that happening.  (Does the Discourse-to-HTML doc generator support header anchors yet? :p)
[22:17] <sarnold> good question :)