[17:41] <blackboxsw> paride: Odd_Bloke rharper I can't remember if we no longer support CentOS/7 source rpm builds
[17:41] <blackboxsw> to upload to copr repo
[17:50] <blackboxsw> ./tools/run-container --source-package centos/8 works
[17:50] <blackboxsw> but dependency issues trying centos/7 . I think I recall paride mentioning a few weeks back that we can no longer support centos/7 builds there
[18:00] <blackboxsw> falcojr: for the moment, I think building centos/8 --source-package and uploading and requesting a build for el8 should suffice
[18:37] <rharper> blackboxsw: master cannot build centos7 since we dropped py2 support; I discussed with paride that we'd want the copr builds for centos7 to happen against 19.4-stable;
[18:48] <chillysurfer> is the cloudinit.util dict merging functions for general use? i'm assuming those are for merging cloud config. but is it normal to use them for typical dict merging in the rest of the code base?
[18:52] <Odd_Bloke> chillysurfer: If you grep for mergemanydict in the codebase, you can see it's used for a few different things; it might depend a little on what you consider "typical" dict merging as to whether or not it's appropriate in your case. :)
[18:57] <chillysurfer> Odd_Bloke: yep that's the one i'm specifically looking at. we even use it in our azure data source
[18:58] <chillysurfer> i might try to get around that though if possible. like most things, it doesn't seem free :)
[19:13] <blackboxsw> +1 rharper I had forgotten the context around that.
[19:13] <blackboxsw> thanks
[19:17] <meena> Odd_Bloke: so twitter actually worked!
[19:20] <Odd_Bloke> meena: Yep!
[19:20] <Odd_Bloke> That explains why the checkmarks on that PR are all blue now.
[21:44] <Ac-town> hmm not seeing anything on the docs for this, how can I return an error during a runcmd step? so that the error is passed down into data/status.json?
[21:46] <Ac-town> or even scripts-per-boot
[21:46] <Ac-town> ideally I want to collect that file and parse it to see if we failed any boot steps and if so what
[22:37] <blackboxsw> Ac-town: cloud-init status --long should report errors. https://paste.ubuntu.com/p/mZmwvqMpVW/  I get that error when I lxc launch ubuntu-daily:xenial  -c user.user-data="$(cat bogus_runcmd.yaml)"
[22:37] <blackboxsw> where bogus_runcmd.py contains "#cloud-config\nruncmd:\n  - 1/0"
[22:39] <blackboxsw> one thing to wonder/check is whether your runcmd content is actually valid cloud-config yaml on the system: https://cloudinit.readthedocs.io/en/latest/topics/faq.html#how-can-i-debug-my-user-data
[22:40] <Ac-town> do these errors only report when cloud-init fails to parse? or when those commands fail
[22:41] <Ac-town> for one example, we have our first puppet run in scripts-per-boot, and if that fails for some reason the host is going to fail, I'm looking for a sane way to report back what failed so I can watch the status across our whole deploy
[22:42] <Ac-town> but there are no errors in status.json under modules-final
[22:48] <blackboxsw> that doc url I linked only validates user-data cloud-config. (which actually produces some semblance of error through cloud-init status --long though those errors surfaced are widely different depending on what problems there are in the user-data YAML.)   I'll have to check scripts-per-(boot|instance|always) to see how cloud-init behaves there on failed "external" scripts
[22:49] <blackboxsw> I would want cloud-init to try processing and running all scripts in those directories and not stop after the first error, but I'd also want cloud-init to collect any and all errors and surface them from cloud-init status --long (which reads /run/cloud-init/status.json and result.json)
[22:51] <blackboxsw> Ac-town: I'm trying to think of how best to handle this, if cloud-init isn't surfacing an error in a scripts-per-boot maybe that's worth a bug and we can discuss on the bug whether that is out of scope for cloud-init.
[22:52] <Ac-town> I can easily see the errors section to be in the case that there was a cloud-init level error, but when I have puppet and commands in runcmd that are critical to boot it would be nice to avoid parsing logs
[22:52] <blackboxsw> I'm trying to launch a broken instance now with a bad script to see
[22:52] <blackboxsw> Ac-town: especially parsing cloud-init logs to be frank (as they are way too busy)
[22:53] <blackboxsw> Ac-town: your puppetmaster doesn't represent puppet apply's that fail on given nodes?
[22:53] <blackboxsw> or maybe you are running masterless
[22:53] <Ac-town> it's not an ideal puppet in any sense
[22:53] <Ac-town> no puppetdb
[22:54] <Ac-town> but there's still a puppetmaster
[22:54] <Ac-town> puppet is also managed by an external team, so getting those tools isn't trivial
[22:55] <blackboxsw> +1  right, understood. I'm trying to think of the best approach to take into account custom script failure (which doens't really fail cloud-init proper)
[22:56] <blackboxsw> I was toying with minor abuse/use of the  existing call home functionality  https://cloudinit.readthedocs.io/en/latest/topics/examples.html#call-a-url-when-finished
[22:56] <blackboxsw> which could post your custom script success failure exit codes maybe?
[22:56] <blackboxsw> to some url or your choosing
[22:56] <Ac-town> that would cool actually
[22:56] <blackboxsw> but again you want an easy way to "see" that failure from cloud-init I suppose
[22:57] <Ac-town> the idea of having it from the machine would be it's better for us to reach out to a host than a host to reach out
[22:57] <Ac-town> since I was going to write a small service to allow us to collect this data, or ansible
[22:58] <Ac-town> a vm workflow is this:
[22:58] <Ac-town> scale out, vm is created, puppet runs in scripts-per-boot and sets up all our basic host things and some packages used in runcmd, runcmd uses those packages to setup our data and start the service
[22:59] <Ac-town> my ideal would be to look across the infra to either follow the status or to see if there is a problem across the stack
[22:59] <blackboxsw> ok validated that I dump the same bogus script from runcmd in /var/lib/cloud/scripts/per-boot/1.sh
[22:59] <blackboxsw> and I see the error ('scripts-per-boot', RuntimeError('Runparts: 1 failures in 1 attempted commands',))
[22:59] <blackboxsw> across reboots
[23:00] <Ac-town> what about runcmd?
[23:00] <Ac-town> (it's possible we're not passing down puppets exit code)
[23:00] <blackboxsw> and same-ish from runcmd: https://paste.ubuntu.com/p/mZmwvqMpVW/
[23:00] <Ac-town> I see a rsync step that fails and runcmd says success
[23:01] <blackboxsw> note note 'scripts-user' instead of ';scripts-per-boot'
[23:01] <blackboxsw> interesting, so I imagine you see that rsync failure in cloud-init-output.log instead of cloud-init.log?
[23:01] <Ac-town> I'm looking at the cloud-final journalctl
[23:02] <Ac-town> for some reason we're not writing out the logs in /var/log, why I'm not sure yet
[23:02] <Ac-town> but it's always been the case
[23:02] <blackboxsw> interesting. ok, might be rsyslog custo configuration on the system
[23:03] <Ac-town> could be, either way I see that scripts-per-boot return "SUCCESS" and same with scripts-user
[23:03] <blackboxsw> log configuration can/does get manipulated in  /etc/cloud/cloud.cfg.d/05_logging.cfg on a lot of systems
[23:05] <blackboxsw> on recent versions of cloud-init can show you what cloud-init sees a logging configuration   via `cloud-init query merged_cfg._log`
[23:05] <Ac-town> I see a cloud-init.log file managed there and the file does get made but it has a size of zero
[23:05] <blackboxsw> as root user
[23:05] <blackboxsw> ahh interesting
[23:05] <Ac-town> what version should that query work? I got an error from that
[23:05] <blackboxsw> could it be puppet manifest clearing it out?
[23:05] <Ac-town> we're on 18.5
[23:05] <blackboxsw> v. 20ish
[23:05] <blackboxsw> what disto?
[23:06] <blackboxsw> distro?
[23:06] <Ac-town> centos7 atm
[23:06] <blackboxsw> ahh, best we can get you is 19.4 via our copr repos. cloud-init upstream dropped python2 support end of 2019
[23:08] <blackboxsw> which lives @ https://copr.fedorainfracloud.org/coprs/g/cloud-init/el-testing/
[23:09] <blackboxsw> our daily upstream builds py3 only support https://copr.fedorainfracloud.org/coprs/g/cloud-init/cloud-init-dev/
[23:10] <blackboxsw> which will get you el8 deb packages
[23:10] <Ac-town> swapped out info for generic names, but this is the runcmd section https://paste.ubuntu.com/p/HQ6WhxhCkt/
[23:16] <blackboxsw> interesting, here's my same section with a broken runcmd script https://paste.ubuntu.com/p/mDCwCXSyX5/
[23:16] <blackboxsw> I also see the failure noted in cloud-init-output.log for that runcmd script that was created
[23:17] <blackboxsw> so it may be easier to parse WARNING out of /var/log/cloud-init-output.log for that case if upgrading to 20.3 of cloud-init is not possible on those images
[23:17] <Ac-town> what if multiple commands are listed in the runcmd? we have 10 steps in ours and at least one of those steps should always work
[23:18] <Ac-town> I can see if we can get it updated, but that's also managed by another team as they make the images :(
[23:20] <blackboxsw> right. so the runcmd section is written into a single consolidated script I think. that makes for a very long script that would have to take into account your separate steps/scripts running.
[23:21] <blackboxsw> makes me think maybe you'd want to emit separate per-boot scripts in cloud-config write_files section maybe so they can operate independently but I probably don't understand the problem you are trying to solve.
[23:22] <Ac-town> that's a decent option
[23:22] <Ac-town> didn't think about that
[23:23] <blackboxsw> I did a sample write_files puppet setup here earlier for a feature test https://github.com/cloud-init/ubuntu-sru/blob/master/bugs/d3e71b5e.txt
[23:24] <blackboxsw> you could b64 your content to make it easier to provide  your big scripts
[23:25] <Ac-town> so user write_files to write out scipts into scripts-user?
[23:25] <Ac-town> yep, we do that for some things already
[23:26] <Ac-town> I guess per-instance would make sense, it's after per-boot
[23:26] <blackboxsw> problem is if users try to provide user-data to their instance launch (if that's a thing you support) it would override that
[23:26] <blackboxsw> gotta run.
[23:26] <blackboxsw> see folks later
[23:26] <Ac-town> ty
[23:26] <blackboxsw> I'll check in later/tomorrow Ac-town good luck :)