[01:12] <Odd_Bloke> blackboxsw: I think we should hold off on changes like that until we've investigated whether this is a lxd bug.
[01:12] <Odd_Bloke> (Gone again now; accidentally switched to this channel and would have forgotten to respond tomorrow morning without the highlight. :p)
[14:23] <otubo> So I was writing a test for this PR: https://github.com/canonical/cloud-init/pull/70, which looks like this: https://pastebin.com/2z90v92F; and it started to fail on the most random place: https://pastebin.com/WSZcwMwB
[14:25] <Odd_Bloke> otubo: That looks to me like util.subp is being mocked without a return value.
[14:26] <Odd_Bloke> otubo: And, indeed, if you look at setUp in that class, that's what's happening.
[14:27] <otubo> Odd_Bloke: I knew I couldn't bypass all the functions that calls setup_swap, sorry about that.
[14:28] <otubo> Odd_Bloke: Actually I was trying to keep it simple and compact, calling only the function that it needs to be called, but I don't think that that's what's gonna happen :)
[14:28] <otubo> Odd_Bloke: thanks for the quick help!
[14:29] <Odd_Bloke> :)
[14:29] <Odd_Bloke> Happy to help!
[14:54] <otubo> Odd_Bloke: Ok I pushed the fixes from your comments, but the test I'm gonna leave for tomorrow probably :)
[14:56] <meena> rharper: my current chicken egg problem re system_info vs vendor-data is that to collect all vendor data, i'd need to get it from Hetzner which isn't properly recognized as such right now… despite being configured so… it's weird — it works on "first run" from boot, but not when run later…??
[14:56] <meena> i feel there's lots of inconsistencies in the code paths that cloud-init takes… on not-Linux.
[14:58] <meena> so, for reasons incomprehensible to me, ds.meta_data is empty.
[14:59] <meena> anyway, i'll post my collect-logs, and vendor-data output…
[15:00] <Odd_Bloke> meena: What do you mean by "isn't properly recognized as such"?
[15:00] <meena> but perhaps this is just a transient problem until we solve https://github.com/canonical/cloud-init/pull/61
[15:01] <meena> Odd_Bloke: cloud-init query cloud_name → none
[15:02] <Odd_Bloke> Oh, right.
[15:03] <meena> hmmmm, yeah, def apply_network_config() is gone from Goneri's branch… i wonder if that's… a problem…
[15:03] <Odd_Bloke> meena: Is that on a subsequent boot, or first boot?  And do, e.g., Ubuntu images detect it correctly?
[15:04] <meena> Odd_Bloke: subsequent, yes…
[15:04] <meena> Odd_Bloke: i don't know, i've been working on this Fbsd thing… for… long.
[15:04] <meena> also, i'm working off of tip + mine + goneri's patches
[15:05] <meena> so it's hard to say what's broken where 😅
[15:05] <meena> also, i'm dangerously overcaffeinated.
[15:05] <smoser> blackboxsw: why / where are you seeing errors with pstart ?
[15:06] <smoser> pstart should work. it should be valid. if its not, then we/I can/should fix it.
[15:08] <Odd_Bloke> smoser: You should be able to reproduce by running `lxd-proposed-snapshot disco -p disco-test-image` on a system with lxd 3.18.
[15:08] <Odd_Bloke> s/lxd/lxc/
[15:09] <Odd_Bloke> This will successfully create the image, but instances launched from that image will have the wrong UID owning all the files.
[15:10] <Odd_Bloke> This, of course, breaks everything. :p
[15:11] <smoser> its an lxd regression, they need to stop doing that.
[15:13] <meena> the formatting on launchpad is so horrible… https://bugs.launchpad.net/cloud-init/+bug/1855170
[15:14] <rharper> smoser: good luck with that ;  recall the last time we saw that with piping data to stdin;  https://github.com/lxc/lxd/issues/6188
[15:14] <rharper> "sorry we broke your workload" p.s. actually not sorry
[15:15] <smoser> rharper is less restrained in his typing than I was.. ;)
[15:15] <Odd_Bloke> meena: https://imgflip.com/i/3j4d9s
[15:15] <rharper> smoser: lol
[15:16]  * rharper gets some coffee 
[15:16] <smoser> the failure is dns?
[15:16] <smoser> (i hadnt seen a report or anything... just tried to recreate here)
[15:17] <Odd_Bloke> smoser: We've seen the failure manifest in a couple of ways, but we think they all boil down to incorrect UIDs.
[15:18] <Odd_Bloke> root doesn't own the files it should, so a bunch of things are going to fail, it's just a case of which thing fails first.
[15:18] <meena> Odd_Bloke: 😒
[15:18] <Odd_Bloke> meena: ^_^
[15:23] <rharper> smoser: yeah the critical failure  (why we noticed) is that some files that systemd would work with aren't owned by root (they have uids/gids that aren't mapped correctly) so systemd refuses to use them; and networkd itself doesn't come up;  no networking means snapd fails and cloud-init can't ssh-import-id
[15:24] <Odd_Bloke> It also breaks `lxc shell foo`, which is how I first noticed.
[15:38] <Odd_Bloke> Filled up the disk on my VM and now I can't remove any of the containers: Error: failed to begin transaction: failed to create dqlite connection: no available dqlite leader server found
[15:38] <Odd_Bloke> Banner day for lxd
[15:41] <smoser> ok....
[15:41] <smoser> i'm  not *sure* this is related
[15:54] <Odd_Bloke> Oh, my thing definitely isn't related.
[15:54] <Odd_Bloke> Just annoyed at lxd. ¬.¬
[16:09] <chillysurfer> hey, all! we're planning on a 19.4 release, is that correct? or did i dream that up? :)
[16:10] <blackboxsw> nice reproducer Odd_Bloke
[16:10] <blackboxsw> chillysurfer: correct. trying to get code landed for 19.4
[16:11] <blackboxsw> https://lists.launchpad.net/cloud-init/msg00236.html
[16:11] <blackboxsw> trying to wrap up the SRU testing for Ubuntu, but we've hit an lxc problem causing some issues with testing :/
[16:12] <blackboxsw> chillysurfer: any code that we want explicitly to get in for 19.4 try to get in shape and reviewed by end of week this week if we can
[16:13] <smoser> blackboxsw, Odd_Bloke . ok, this seems working for me.
[16:13] <smoser>  https://github.com/cloud-init/qa-scripts/pull/13/files
[16:14] <chillysurfer> blackboxsw: awesome thanks!!
[16:14] <Odd_Bloke> smoser: Nice!
[16:14] <Odd_Bloke> Will take a look post-standup.
[16:14] <chillysurfer> i was asking more for the recent pr that was opened up on our side
[16:15] <chillysurfer> we'll try to get that pushed through
[16:15] <smoser> and also https://github.com/cloud-init/qa-scripts/pull/12
[16:15] <smoser> somene (probalby not me) put 'set -x' in
[16:15] <blackboxsw> smoser: nice, testing now
[16:26] <blackboxsw> hrm, still seeing bogus uid /gid even despite having smoser's     lxc(['file', 'push', '--mode=0755', '--uid=0', '--gid=0', '-',
[16:26] <blackboxsw> on xenial -> eoan lxc launched proposed images
[16:27] <blackboxsw> results in no network connectivity
[16:27] <blackboxsw> and inability to ssh-import-id etc.
[16:40] <rharper> smoser: thanks!
[16:41] <Odd_Bloke> Yep, I'm also not seeing that as a fix, unfortunately.
[16:43] <Odd_Bloke> I'm still seeing ~everything owned by 1000000.
[16:46] <rharper> there's some mention of using snap restart lxd after updating lxd  ...
[16:46] <blackboxsw> yeah something more critical to lxc/d behavior has changed. Trying to reproduce outside of our tooling
[16:46] <rharper> it also restarts all containers as well
[16:55]  * blackboxsw ran snap restart lxd and tried again with same invalid 1000000 uid/guid on most files in /
[17:03] <blackboxsw> so lxc-proposed-shapshot runs lxc-pstart which on a 'stopped' container I get errors https://pastebin.canonical.com/p/m4jbw8jxf2/
[17:03] <blackboxsw> or for everyone to see :) https://pastebin.ubuntu.com/p/mYYqrMNxVK/
[17:03] <rharper> blackboxsw: you can't run it manually like that
[17:04] <blackboxsw> rharper: doesn't lxc-proposed-snapshot run in manually
[17:04]  * blackboxsw re-reads
[17:04] <rharper> the snapshot tool creates a temporary profile for a container which makes some config changes to set the init binary before running
[17:04] <rharper> tl;dr;  we inject a custom init so we can run stuff inside the container before it is booted with the default init of the system
[17:05] <rharper> and then we clean it up
[17:05] <blackboxsw> rharper: I was trying to do that via https://github.com/cloud-init/qa-scripts/blob/master/scripts/lxc-proposed-snapshot#L124-L131
[17:05] <blackboxsw> lxc init ,, then use lxc-pstart (which is what $cmd is there)
[17:05] <rharper> sure, did you set up the profile and modify the container config ?
[17:06] <rharper> at this point, I Don't think we're writing files with the incorrect uid/gid
[17:06] <rharper> I believe it's getting mapped wrong or possibly the publish is at fault
[17:06] <blackboxsw> +1 rharper we aren't writing them. I just wanted to get back to simple lxc cmds that reproduce it
[17:06] <blackboxsw> yeah agreed
[17:06] <rharper> that said, I'd like to post-pone any poking on this post-sru verification
[17:07] <rharper> it's a timesink for now; and we can debug after verification
[17:07] <blackboxsw> yeah I suppose we can just validate pre-upgrade & post-upgrade on a running contaianer
[17:07] <blackboxsw> for most of the 1offs
[17:08] <rharper> y
[17:08] <Odd_Bloke> blackboxsw: We can also run containers in a bionic VM.
[17:14] <blackboxsw> good point, will do
[17:20] <chillysurfer> is there a way to prevent Platform.destroy from being called in cloud-tests? i specify --preserve-instance but that seems to not stop the platform from being destroyed
[17:25] <MrGeneral> Howdy guys. I've installed Debian 9, and then cloud-init manually (this is for proxmox), though it's not changing anything at all, it's like if cloud-init wasn't installed... Any idea?
[17:25] <smoser> rharper, blackboxsw or Odd_Bloke one more psuedo-init fix
[17:25] <smoser>  https://github.com/smoser/qa-scripts/pull/new/fix/clean-shutdown
[17:25] <powersj> chillysurfer, what platform?
[17:25] <MrGeneral> am I perhaps missing the cfg file or something I need to edit? Should resize, change the root pw, setup networking, etc.
[17:31] <powersj> MrGeneral, have you looked at the cloud-init logs in /var/log
[17:31] <MrGeneral> nope, as I create the template, does the VM have it? hum, I'll check
[17:31] <MrGeneral> though I mounted it in Proxmox, IDE
[17:31] <MrGeneral> isn't it needed to mount in proxmox?
[17:33] <chillysurfer> powersj: azure
[17:34] <chillysurfer> powersj: i see the reference to AzureCloudPlatform.destroy but it doesn't seem like anything in azurecloud is calling that particular method
[17:35] <blackboxsw> MrGeneral: whether cloud-init is run or not depends on cloud init generator adding systemd units/services to the boot goals. If those cloud-init* stages are not included in boot goals (or if there was a systemd dependency chain that was in conflict, cloud-init services may not have been started on that boot.
[17:35] <MrGeneral> hmhm got it, I'll give ita try
[17:35] <blackboxsw> could check for systemd Breaking ordering cycle by deleting job logs in /var/log/syslog
[17:35] <blackboxsw> our maybe journalctl
[17:36] <MrGeneral> hmhm definitely will try
[17:37] <blackboxsw> probably could add that debug info to https://cloudinit.readthedocs.io/en/latest/topics/faq.html for common questions/debugging why cloud-init doesn't start
[17:38] <MrGeneral> I'll thank you :)
[17:38] <MrGeneral> *I'll check thank you
[17:56] <MrGeneral> Humm @blackboxsw, it looks like it's not reading the cloud-init image as per https://pve.proxmox.com/wiki/Cloud-Init_Support ( The next step is to configure a CDROM drive which will be used to pass the Cloud-Init data to the VM.
[17:56] <MrGeneral> qm set 9000 --ide2 local-lvm:cloudinit)
[17:56] <MrGeneral> it should though...
[17:56] <MrGeneral> Any hint?
[18:13] <blackboxsw> not sure MrGeneral, haven't used proxmox :/  but if you can access  the running vm somehow. try running DEBUG_LEVEL=2 DI_LOG=stderr /usr/lib/cloud-init/ds-identify --force   # to see what ds-identify thinks it detects are your datasource
[18:17] <powersj> blackboxsw, Odd_Bloke rharper where did you all get with lxc?
[18:30] <MrGeneral> Weird, @blackboxsw basically proxmox mounts cloud-init, and reats the meta from there, no idea what's going on :\
[18:31] <MrGeneral> Sec
[18:31] <MrGeneral> bash: /usr/lib/cloud-init/ds-identify: No such file or directory , nope :\
[18:32] <MrGeneral> That's the issue...
[18:33] <MrGeneral> e.g debian 10 works fine, as centos 7, these read from the mounted ide2
[18:33] <MrGeneral> So this HAS to be the issue...
[18:33] <MrGeneral> Any idea tho? :|
[18:33] <MrGeneral> https://cloudinit.readthedocs.io/en/latest/topics/datasources/nocloud.html / You can provide meta-data and user-data to a local vm boot via files on a vfat or iso9660 filesystem. The filesystem volume label must be cidata or CIDATA.
[18:34] <MrGeneral> should be this, dunno
[18:34] <MrGeneral> cdrom contains that, on same iso encoding
[18:35] <MrGeneral> or OpenNebula..
[18:35] <chillysurfer> powersj: yeah looking at the logging, --preserve-instance keeps the vm running but then it turns around and destroys the resource group which then of course deletes the vm as well
[18:40] <MrGeneral> Basically https://cloudinit.readthedocs.io/en/latest/topics/datasources/nocloud.html I need to enable this. Heh.
[18:47] <MrGeneral> Issue seems to be activating nocloud @blackboxsw and I need to have it load the meta-data from cdrom.
[18:47] <MrGeneral> That is the issue.
[18:48] <powersj> MrGeneral, not sure many of us are hugely familiar with proxmox, but have you tried going through https://pve.proxmox.com/wiki/Cloud-Init_Support
[18:49] <MrGeneral> yep definitely have.
[18:49] <MrGeneral> just cant get cloud-init to load the meta-data from the cdrom
[18:49] <MrGeneral> thanks btw powersj
[18:50] <MrGeneral> checking a working instance now, as centos7 works perfectly well
[18:51] <powersj> MrGeneral, ok and were you able to run cloud-id. I know the file it produces wasn't there, but did you try running it
[18:53] <MrGeneral> let me check sec.
[19:11] <rharper> powersj: we're avoiding the issue for now but using bionic VMs to create containers
[20:03]  * meena is gonna write some code…
[20:11] <meena> anyone wanna tell me how why this test is failing: https://travis-ci.org/canonical/cloud-init/jobs/623839861#L279-L295
[20:17] <Odd_Bloke> MrGeneral: Debian 9 is oldstable, right?  The version of cloud-init there is very old, so you may want to try a more recent version.
[20:22] <Odd_Bloke> meena: So that failure is telling you that m_user.call_args_list is empty instead of containing the two calls you expect.
[20:22] <Odd_Bloke> If you're pretty sure the code under test is correct, the most likely issue is the mocking.
[20:23] <chillysurfer> anybody seen this error when trying to build deb with lxd for running cloud-tests? `FileNotFoundError: [Errno 2] No such file or directory: 'ud'`
[20:23] <chillysurfer> unit tests pass when i run locally but seems to be a problem building the deb in an lxd container
[20:24] <Odd_Bloke> chillysurfer: Could you pastebin a bit more context?
[20:24] <Odd_Bloke> meena: You could also check if _user or _group are getting called, which might suggest that you aren't going down the FreeBSD path properly.
[20:26] <chillysurfer> Odd_Bloke: https://gist.github.com/trstringer/9ebb45558934f007a2d8af4fde560350
[20:27] <meena> Odd_Bloke: oh. i didn't think of that.
[20:27] <chillysurfer> Odd_Bloke: specifically: https://gist.github.com/trstringer/9ebb45558934f007a2d8af4fde560350#file-output-log-L2630
[20:27] <chillysurfer> i think ud is the userdata file but not sure why this would fail with lxd but not locally
[20:29] <Odd_Bloke> chillysurfer: Is this a clean git tree?  (master?)
[20:30] <chillysurfer> Odd_Bloke: it's pretty much just a single unrelated commit on top of master: https://github.com/trstringer/cloud-init/tree/thstring/fix-azure-cloud-tests
[20:30] <chillysurfer> i needed to make that commit to even get this far. without passing userdata it was bombing out
[20:31] <chillysurfer> Odd_Bloke: i could try passing in some user data to the test, but that seems unnecessary. how do you pass userdata to citest?
[20:31] <Odd_Bloke> Yeah, that doesn't _look_ like it shold be the issue.
[20:31] <chillysurfer> Odd_Bloke: agreed
[20:31] <chillysurfer> but i can't repro it locally
[20:31] <Odd_Bloke> chillysurfer: So the failure you're seeing is during the package build that the cloud_tests perform.
[20:31] <meena> Odd_Bloke: it is, indeed called.
[20:31] <chillysurfer> Odd_Bloke: correct
[20:31] <Odd_Bloke> So passing something into citest wouldn't get you anywhere, those are just the regular unit tests failing for some reason.
[20:32] <chillysurfer> Odd_Bloke: ah right. duh
[20:33] <chillysurfer> Odd_Bloke: for what it's worth, i initalized lxd with `lxd init --auto`
[20:33] <chillysurfer> maybe that's a bad lxd initialization for this build?
[20:33] <Odd_Bloke> That's also unlikely to be it.
[20:33] <chillysurfer> thought so
[20:34] <chillysurfer> but i'm surely not the first person/computer running a build in lxd from master..?
[20:34] <chillysurfer> maybe it's environmental
[20:34] <blackboxsw> rharper: and Odd_Bloke error hit on oracle SRU (not a regression) https://pastebin.ubuntu.com/p/MrC2Hd62gz/
[20:34] <rharper> blackboxsw: on xenial ?
[20:34] <blackboxsw> in the above, I think we might be able to handle/avoid this error in OpenstackLocal ds detection
[20:34] <Odd_Bloke> blackboxsw: Is that new?
[20:34] <blackboxsw> Odd_Bloke: not new at all
[20:34] <blackboxsw> it was in prevous SRUs too
[20:35] <rharper> I see that on xenial if you attempt to re-run local with networking still up
[20:35] <blackboxsw> just something we could improve (even though oracle shouldn't be on openstack datasourcE_list in the future
[20:35] <rharper> ie, if you reboot, it works as expected;
[20:35] <blackboxsw> rharper: that's due to ephemeraldhcp setup
[20:35] <Odd_Bloke> chillysurfer: What's the full command line you ran to kick the tests off?
[20:35] <rharper> I've only seen that if the ephemeral runs while networking is already up
[20:35] <blackboxsw> yeah
[20:35] <rharper> blackboxsw: well, you shouldn't normally see it
[20:35] <rharper> during boots
[20:35] <rharper> which I'm not
[20:35] <rharper> so how are you seeing it ?
[20:35] <blackboxsw> on oracle's case, I think normally we shouldn't see that
[20:36] <rharper> I don't think we should see that _anywhere* via normal boots
[20:36] <rharper> so interested in how you saw it
[20:36] <chillysurfer> Odd_Bloke: tox -e citest -- tree_run --verbose --data-dir results --preserve-data --platform azurecloud --os-name bionic --preserve-instance --test modules/write_files
[20:36] <blackboxsw> rharper: I actually see it on initial boot on oracle. I'll double check now
[20:36] <rharper> please do
[20:36] <blackboxsw> intiial boot (not upgraded to proposed)
[20:36] <Odd_Bloke> chillysurfer: So a short-term workaround would be to build your own deb and pass that in.
[20:36] <blackboxsw> will ping with results
[20:36] <rharper> of which release ?
[20:36] <blackboxsw> xenial and bionic
[20:36] <chillysurfer> Odd_Bloke: yep good idea
[20:36] <rharper> very strange
[20:36] <rharper> oh
[20:36] <rharper> baremetal ?
[20:36] <chillysurfer> Odd_Bloke: that *should* work i would hope
[20:36] <blackboxsw> rharper: not bare metal
[20:36] <blackboxsw> paravirt I think
[20:36] <rharper> check
[20:37] <rharper> on iscsi we're up
[20:37] <rharper> on paravirt, this should not happen unless network config is somehow baked into the iamge
[20:37] <rharper> image
[20:37] <Odd_Bloke> rharper: blackboxsw: If this isn't new, let's file a bug and move on.
[20:37] <rharper> Odd_Bloke: +1 but let's confirm it's not new
[20:37] <blackboxsw> Remote Data Volume:
[20:37] <blackboxsw> PARAVIRTUALIZED
[20:37] <blackboxsw> Boot Volume Type:
[20:37] <blackboxsw> PARAVIRTUALIZED
[20:37] <rharper> k
[20:37] <blackboxsw> yeah wil lconfirm it's not new. and will file a bug
[20:37] <rharper> smells wrong to me
[20:37] <Odd_Bloke> rharper: I thought blackboxsw did confirm it wasn't new.
[20:38] <rharper> well, he said he saw it before
[20:38] <Odd_Bloke> By all means if we aren't sure, let's make sure we are. :)
[20:38] <rharper> but then somehow we didn't file a bug
[20:38] <rharper> so lets do that
[20:38] <blackboxsw> I think it was on first boot, but will double check
[20:38] <rharper> k
[20:38] <powersj> Odd_Bloke, chillysurfer I can reproduce the test escapes in a fresh bionic lxc of master
[20:40] <blackboxsw> rharper: Odd_Bloke right, on sru 19.2.36 we had two tracebacks (not triaged in the logs) on oracle but I recall we spot checked that and found the same latest/metadata 404 route traces (which is why oracle has their own datasource now anyway instead of using openstack
[20:40] <powersj> https://paste.ubuntu.com/p/Dzrn2TjBqN/
[20:40] <blackboxsw> I'm awaiting a clean vm launch now to confirm
[20:41] <rharper> blackboxsw: ok, so lets file a bug with the details;  it seems fixable
[20:41] <Odd_Bloke> Right, yeah, that's it.
[20:41] <Odd_Bloke> This doesn't happen in the Oracle DS, and the expectation was that we would be moving Soon.
[20:41] <rharper> ah
[20:42] <blackboxsw> right it's a logical path that shouldn't be in use shortly anyway. As long as we confirm as well that Openstack works without hitting this ephemeraldhcp trace, because the platform behaves differenty. This is a no fix needed.
[20:42] <blackboxsw> which I think rharper already confirmed (no traces on openstack SRU testing)
[20:43] <blackboxsw> rharper: Odd_Bloke confirmed, existing oracle on 19.2.36 has this same issue
[20:43] <rharper> blackboxsw: ack, no traces on OpenStack
[20:43] <blackboxsw> on clean boot
[20:43] <blackboxsw> ultimately the proper OpenStack ds is detected (just not OpenstackLocal)
[20:44] <blackboxsw> so we just detect a little later in the process. And we're working on CPC images to not default datasource_list: [Openstack] anyway on xenial++ I thought.
[20:44]  * blackboxsw looks the other way. "Nothing to see here'
[20:47] <blackboxsw> https://github.com/cloud-init/ubuntu-sru/pull/71 softlayer SRU
[20:47] <blackboxsw> https://github.com/cloud-init/ubuntu-sru/pull/66 azure SRU
[20:48] <blackboxsw> https://github.com/cloud-init/ubuntu-sru/pull/74 oracle SRU
[20:48] <blackboxsw> https://github.com/cloud-init/ubuntu-sru/pull/74 ignore ssh pub keys
[20:52] <meena> oh hmmm…
[20:55]  * blackboxsw wraps up puppet SRU verification test now
[20:59] <Odd_Bloke> blackboxsw: Comment on #73.
[21:04] <blackboxsw> thanks Odd_Bloke responded and force-pushed
[21:16] <Odd_Bloke> blackboxsw: A suggestion added. :)
[21:16] <blackboxsw> and accepted
[21:16] <blackboxsw> :)
[21:18] <smoser> so did my pstart stuff fix the issue ?
[21:18] <smoser> (i didn't actually *see* an issue, i just responded to blackboxsw's totally rude suggestion that we should drop some code i offered)
[21:19] <smoser> s/offered/authored/
[21:19] <blackboxsw> haha!
[21:19] <blackboxsw> sorry smoser, I was wondering if the pstart approach was giving us flak as an unsupported lxc init use-case
[21:20] <blackboxsw> we were going to talk about this post-SRU. so probably tomorrow  to dig into the lxd issue being caused, but I'll run through your additional patches and see if we can't fix what lxc/lxd 3.18  broke
[21:21] <blackboxsw> smoser: at the moment, I'm running lxd related proposed tests on a bionic kvm and things are working as they used to
[21:21] <blackboxsw> something w/ focal + lxd snap
[21:21] <meena> how do i start a debugger on the failing tests?
[21:21] <blackboxsw> meena: unit tests add an import pdb; pdb.set_trace() and run .tox/py3/bin/nosetests tests/unittests/.....your file
[21:22] <Odd_Bloke> smoser: The first patch didn't fix it for me, unfortunately.
[21:22] <blackboxsw> meena: unit tests add an import pdb; pdb.set_trace() and run .tox/py3/bin/nosetests tests/unittests/.....your file   -s
[21:22] <Odd_Bloke> Haven't tried since.
[21:22] <blackboxsw> smoser: same with the first two patches you suggested (hadn't tried the third yet)
[21:23] <smoser> Odd_Bloke: is there a url that shows failure?
[21:23] <smoser> or can you paste it or something ?
[21:23] <smoser> at least 'lxc-proposed-snapshot disco test-d1' worked.
[21:25] <Odd_Bloke> smoser: Including launching a container from the produced image?
[21:26] <smoser> i'll give it a try.
[21:26] <blackboxsw> smoser: I can run this           lxc-proposed-snapshot -p disco test-d1;  lxc launch <fingerprint> test; lxc exec test -- ls -l  and I see
[21:26] <blackboxsw> drwxr-xr-x   1 1000000 1000000  102 Dec  4 03:04 usr
[21:27] <blackboxsw> didn't actually get any error logs on snapshot creation
[21:32]  * blackboxsw wonders if it has something to do with lxc now publishing cloudimages that are squashfs/metadata instead of parts and us using a unified tarball for image snapshot upload
[21:33] <blackboxsw> s/instead of parts/in separate file parts/
[21:33] <smoser> this is really messed up
[21:34] <meena> what do i have to do to get a newly pip installed package importeded, or importable…
[21:34] <meena> https://documen.tician.de/pudb/starting.html ← installed this debugger, which sucks slightly less than pdb
[21:34] <meena> also, i enjoy the nostalgia of things looking like Turbo C
[21:35] <blackboxsw> meena: if you are talking about adding it the the current tox env, I think you can .tox/py3/bin/pip3 install X
[21:35] <blackboxsw> and test whether importable via .tox/py3/bin/python3   import X
[21:38] <Odd_Bloke> smoser: I take it that means you reproduced? :p
[21:39] <blackboxsw> smoser: merged your PR 14 on qa-scripts.  retested again and still not-WFM
[21:39] <Odd_Bloke> lxd has been using split images forever.
[21:39] <blackboxsw> Odd_Bloke: right, so this would have hit us last SRU
[21:39] <blackboxsw> or earlier
[21:39] <smoser> yeah. it does not work, blackboxsw
[21:39] <smoser> but it is really wierd what is going on there.
[21:39] <smoser> so the easiest way to quickly find out information
[21:39] <smoser> is to edit psuedo-init
[21:40] <smoser>  then run 'lxc-pstart -v -v -v $NAME /bin/bash'
[21:40] <Odd_Bloke> I haven't used pudb, but am a big fan of ipdb.
[21:40] <smoser> then sudo cat /var/snap/lxd/common/lxd/logs/d1/console.log
[21:40] <blackboxsw> +1 smoser I'll play around with that -> relocating to coffee shop
[21:42] <Odd_Bloke> blackboxsw: What happened to looking at it post-SRU? :p
[21:43] <smoser> http://paste.ubuntu.com/p/rvTjbdJSwx/
[21:44] <rharper> https://github.com/kubernetes/kubernetes/issues/55151
[21:45] <rharper> closed no response. bummer
[21:45] <rharper> same error as your paste smoser
[21:47] <smoser> yeah
[21:47] <smoser> so... if you muck around with pstart, you can get in to that failed case
[21:47] <smoser> and look around
[21:47] <smoser> (just coment out the 'mount') that is what was fialing for me
[21:47] <smoser> and then i can run  /bin/bash
[21:47] <smoser> and look... nothing obvious
[21:47] <smoser> oh.
[21:47] <smoser> yeah.
[21:47] <smoser> mount is foobarred
[21:48] <smoser> root@d1:~# ls -l `which mount`
[21:48] <smoser> -rwsr-xr-x 1 1000000 1000000 47184 Aug 22 23:40 /usr/bin/mount
[21:48] <smoser> so yeah, perms are getting not set bakc
[21:48] <smoser> shiftfs bug ?
[21:48] <smoser> or whatever is doing the filesystem uid shift is foobarred
[21:48] <smoser> and not changin stuff.
[21:49] <rharper> oh they hijack mount now
[21:49] <rharper> so help auto mount some things IIRC
[21:50] <Odd_Bloke> blackboxsw: Found a potential issue with https://github.com/cloud-init/ubuntu-sru/pull/71/files so I haven't reviewed further
[21:56] <blackboxsw> thanks Odd_Bloke I'll kick it again with the proper wait logic as we could have missed a trace message
[21:57] <Odd_Bloke> Thanks!
[22:03] <meena> yeah; so, i can enter the debugger, step thru the code, and still have no idea why the wrong method is called. maybe something is wrongly mocked
[22:05] <Odd_Bloke> Annoying!
[22:05] <Odd_Bloke> meena: So one thing I sometimes do is to get the debugger into the code under test and then start poking around to see what values things have.
[22:06] <Odd_Bloke> Sometimes you discover that the thing you want _is_ mocked somehow, but not in the way that the test code is written for.
[22:08] <meena> Odd_Bloke: it's all mocked, that's for sure, the question is just, why is the wrong (mock) function called?
[22:09] <Odd_Bloke> meena: Poking at the mocked function can help you understand what the incorrect mocking may be.
[22:10] <meena> we created a cloud with a distro freebsd, and a ds DataSourceNone, and then we run the the handler function of the cc_users_groups module. aaand, i have no idea which function is called in the mock.
[22:11] <meena> and why it's the wrong one :O or if i even have my mocks setup correctly hmmmmm
[22:11] <meena> so, is this correct? https://github.com/canonical/cloud-init/pull/93/files#diff-f957401b91b90e8385085395fe1f3623R51 ← are the class mocks passed first, or the method mocks?
[22:12] <meena> how do i ask the mock object, which *fully qualified* function it refers to?
[22:14] <Odd_Bloke> meena: An easier way of checking would be to change the order of the two method decorators.
[22:15] <Odd_Bloke> But not the names.
[22:15] <Odd_Bloke> Then one of _user and m_user will point at .create_group, and that should answer your question.
[22:17] <Odd_Bloke> powersj: Your three man page PRs are Approved now, will leave it to you to squash merge. :)
[22:17] <powersj> Odd_Bloke, will do thanks!
[22:18] <chillysurfer> how do cloud-tests ssh into the cloud machines? is it straightforward what key is used and the user name? need to troubleshoot a platform machine that is being tested that apparently cloud-tests can't ssh into, but it seems up and running
[22:19] <meena> Odd_Bloke: where do you get these good ideas from?
[22:21] <meena> Odd_Bloke: still, i'd like to be able to tell from the mock object what the fully qualified name is… this is… painful.
[22:25] <meena> great, fixed test: https://github.com/canonical/cloud-init/pull/93 pls merge :P
[22:25] <rharper> chillysurfer: I believe in the results dir the artifacts include the ssh key pair
[22:26] <rharper> chillysurfer: in the topdir of results, there should be a cloud_init_rsa/cloud_init_rsa.pub set of files
[22:36] <chillysurfer> rharper: oh nice
[22:36] <chillysurfer> rharper: and it's user `ubuntu` is that right?
[22:39] <powersj> chillysurfer, for azure it should be ubuntu, I think for lxd we still use root
[22:42] <rharper> chillysurfer: for ubuntu images at least
[23:09] <powersj> meena, for FreeBSD can you review https://github.com/canonical/cloud-init/pull/93 just to give it a +1
[23:11] <meena> powersj: i just wrote that code
[23:11] <powersj> doh, so you are igalic on github?
[23:11] <meena> powersj: i am @igalic
[23:11] <powersj> sigh
[23:11] <blackboxsw> man softlayer really taking 3 mins to spin up an instance?
[23:11] <meena> so
[23:11] <powersj> lol
[23:12] <meena> github won't allow me to approve my code
[23:12] <powersj> yeah... ok I'll have someone else review
[23:13] <meena> powersj: did i not link the horrible man page?
[23:14] <powersj> hmm
[23:15] <meena> https://www.freebsd.org/cgi/man.cgi?pw#USER_LOCKING
[23:16] <powersj> ah ok I just need someone to review the tests and we should be g2g
[23:18] <meena> vs https://www.freebsd.org/cgi/man.cgi?pw#USER_OPTIONS s ee description of -h
[23:20] <meena> https://github.com/canonical/cloud-init/pull/53 has anyone seen rharper around?
[23:22] <meena> https://github.com/canonical/cloud-init/pull/42 cuz, there's some stuff i'd really like rharper to look at