[13:56] <cjp256> if anyone has the cycles, can I get a review on 4560? I have one more queued up after it I'd like to get in for 23.4 if possible
[14:08] <Guest99> Can I use cloud-init to completly auto install a server image? I have deployed a python http server on port 3003 and created the user-data and meta-data files and placed them in the www directory. I can see that when I boot up qemu that it sees the http server and gives a 200 on the user-data and meta-data files yet when I watch over a SPICE
[14:08] <Guest99> terminal it always gets stuck on the first step which is select your language/locale/keyboard I've tried some many different combinations of user-data and I even did the installer manually and then grabbed the installer user-data file out of the /var/log/autoinstall directory and tried it with that and still it gets stuck on that same bit. Im kinda
[14:08] <Guest99> lost on where to go from here debuggin wise have you got any tips or help to aid me in the right direction?
[14:44] <minimal> Guest99: from the reference to /var/log/autoinstall i'm guessing you're using ubuntu server, is that correct?
[15:47] <blackboxsw> cjp256: will grab it today.
[15:49] <dbungert> Guest99: one common problem that people run into is that if you're sending autoinstall via cloud-config, it has to be under an autoinstall top level keyword - double check that https://canonical-subiquity.readthedocs-hosted.com/en/latest/intro-to-autoinstall.html
[15:50] <blackboxsw> Guest99: how are you providing the URL to your system? I'm presuming kernel cmdline in GRUB with `ds=nocloud-net;s=http://your-service:3003/`?  I'd also expect you provide an "autoinstall" param on the kernel cmdline too to avoid getting prompted for input 
[15:50] <Guest99> minimal yeah the server version. Ultimately I'm trying to build a image that is like desktop ubuntu but with a lot of the stuff removed for our medical staff (Nurses) who are placed around the country.
[15:50] <Guest99> I will quickly add it toa paste bin
[15:53] <blackboxsw> Guest99: given you see the GET of user-data and meta-data. I'm just going to guess that the format of the autoinstall config in #cloud-config user-data is not valid and cloud-init doesn't process it in the ephemeral boot of the installer and so the installer doesn't see a processed autoinstall key. One can check by entering the shell via the help menu and `sudo cloud-init query userdata` or 
[15:53] <blackboxsw> `sudo cloud-init schema --system`
[15:54] <Guest99> https://pastebin.com/We1yn5nB
[15:58] <Guest99> blackboxsw I will try with your grub string as it's different to mine. I read I would need to escape the ';' so it would look like `ds=nocloud-net\;s=http://your-service:3003/`
[15:58] <Guest99> But I will try without first
[15:59] <minimal> blackboxsw: does the Ubuntu installer support using cloud-init for BOTH doing an install via subquity and THEN using cloud-init with a NoCloud-Net DS also?
[16:01] <blackboxsw> Guest99: yes that is invalid yaml user-data. try running `python3 -c 'import yaml; yaml.safe_load(open("your-userdata.yaml"))'  or `sudo cloud-init schema --system --annotate` and either will show you invalid yaml syntax for user-data
[16:01] <blackboxsw> Guest99: yes that is invalid yaml user-data. try running `python3 -c 'import yaml; yaml.safe_load(open("your-userdata.yaml"))` or `sudo cloud-init schema --system --annotate` and either will show you invalid yaml syntax for user-data
[16:01] <Guest99> Unfortunatly it didn't work. Ahh I see thanks
[16:02] <Guest99> Where should i run the cloud-init schema --system --anaote command? in the www directory with the user-data?
[16:04] <blackboxsw> Guest99: `sudo cloud-init schema --system`  only works on the target machine that has cloud-init installed and active to report the cached user-data processed by cloud-init. Since you already have a host that has cloud-init installed (and failed) you could run `sudo cloud-init schema --config-file <your_yaml_file>` to start seeing errors in config and it'd allow you to quickly change <your_yaml_file> and reattempt validation
[16:05] <blackboxsw> Guest99: so something like `sudo cloud-init query userdata > my_userdata.yaml; sudo cloud-init schema --config-file my_userdata.yaml`
[16:05] <blackboxsw> .. on that qemu system you have
[16:06] <Guest99> Okay thanks! I will give it a go :D
[16:06] <dbungert> Guest99: you have several things that should be under autoinstall and aren't, like source, ssh, storage, updates
[16:07] <dbungert> also line 76 needs an indentation fix
[16:07] <dbungert> minimal: yes, you can use cloud-config to configure the install environment, then have autoinstall, then have more user-data for the target system actually being installed.
[16:08] <blackboxsw> thx dbungert and Guest99 line 8 needs a trailing ":" after primary 
[16:19] <blackboxsw> minimal: the Ubuntu installer has two boot stages, "ephemeral"  and "first boot". Ephemeral stages uses cloud-init to detect any viable discovered datasource and consume that userdata which provides config performed during "ephemeral" stage or "autoinstall" config  which performs target system setup such as disk/partition setup etc.
[16:19] <blackboxsw>   The installer then provides supplemental configuration required by "first boot" as DataSourceNone configuration to /etc/cloud/cloud.cfg.d to setup so cloud-init in that boot stage will only detect DataSourceNone config
[16:22] <blackboxsw> minimal: so ephemeral boot can accept any viable datasource (including nocloud content which could be provided by kernel cmdline or in image in /var/lib/cloud/seed etc or via a mounted device with CIDATA label etc). And that user-data doesn't have to be "autoinstall:" config, but the installer image itself and ephemeral boot stage is designed to take users through the installer prompts to ultimately configure a target vm.
[16:26] <blackboxsw> typically the easiest way to provide that nocloud data is manipulating grub on the cmdline appending cloud-init's ds=nocloud-net params like https://ubuntu.com/server/docs/install/autoinstall-quickstart and https://cloudinit.readthedocs.io/en/latest/reference/datasources/nocloud.html#method-2-local-filesystem-kernel-commandline-or-smbios
[16:27] <blackboxsw> easy, yet manual. 
[16:56] <minimal> which reminds me I need to go back to collect up info/logs to write up an Issue for the NoCloud-Net network handling
[18:22] <meena> is there a way to override a runcmd instance script provided by hetzner? it's basically just: `udevadm trigger -c add -s block -p ID_VENDOR=HC --verbose -p ID_MODEL=Volume` and doesn't work on FreeBSD
[18:23] <rawtaz> i have a really weird problem. using terraform im provisioning a debian 12 image on an openstack (not mine). it uses cloud-init to among other things create one "admin" user (with users:) and put an ed2519 pubkey into that user's authorized_keys (with ssh_authorized_keys: for that user).
[18:23] <rawtaz> the weird thing is that while the instance is provisioned seemingly properly, it's up and has network, only like one third of the times i provision this sucker the authorized_keys are actually populated. which makes me unable to log in on the host in the other two out of three times.
[18:25] <rawtaz> what makes it weird is that this exact setup is not changing at all between when it works and doesnt. the only thing i do betwen reprodutions of the problem is to `terraform destroy` and then clean all terraform files, and then i start over with a new `terraform init` followed by apply. so its literally the exact same thing running every time. yet, sometimes cloud-init does not populate the authorized_keys xD
[18:25] <rawtaz> if anyone recognize any of this symptom, feel free to let me know. im gonna try to change the image used to debian 11 instead, to see if i can isolate it to the openstack provider's debian 12 image.
[18:26] <rawtaz> its just so weird.
[18:27] <rawtaz> ive used this terraform set up with the exact same cloud-init configuration like a year ago, on the same openstack provider, and it worked fine every single time i provisioned an instance (which were a number of times). that suggests it might be debian 12 image used, but its also possible something else changed in their infra since then.
[18:34] <rawtaz> around a year ago or whenever i ran this and it worked fine 100% of the time, cloud-init v. 20.4.1 was used, now it's v. 22.4.2 in case that matters.
[18:35] <minimal> meena: I assume that's coming from their vendor stuff, what if you remove "scripts_vendor" from cloud.cfg's module list? ;-)
[18:36] <meena> I've killed the machine already, but that would make sense.
[18:36] <meena> it's just confusing that we log it as: 2023-11-02 18:14:38,648 - cc_scripts_user.py[WARNING]: Failed to run module scripts_user (scripts in /var/lib/cloud/instance/scripts)
[18:36] <meena> cc_scripts_user.py
[18:37] <minimal> rawtaz: have you checked the /var/log/cloud-init.log file when it doesn't work? or can you not SSH in to do that? ;-) Console access instead?
[18:39] <rawtaz> minimal: i have not. im still doing some big picture isolating of low hanging fruit, e.g. making sure i can reproduce it and then trying the old image i know worked before. after that, i might try to get console access (i'll have to remove `ssh_pwauth: false` and set a password for that, effectively changing the cloud-init config, which kind of sucks). i do have console access but cant log in on it due to no password for root
[18:40] <rawtaz> i do however have the logs for the provisioning of the instance, where i see most of the cloud-init and dmesg output. i guess there's a more specific cloud-init log though, i recall.
[18:41] <rawtaz> just to be clear; the symptoms that make me think it's cloud-init not adding pubkey to authorized_keys are: 1) the instance log does not show the "Authorized keys from /home/admin/.ssh/authorized_keys for user admin" output when it doesnt work; and 2) i cant ssh to it, the different methods are exhausted when i try.
[18:43] <rawtaz> question: im of course interested in /var/log/cloud-init.log and /var/log/cloud-init-output.log - is there a setting to enable more verbose debugging in them?
[18:46] <minimal> rawtaz: for cloud-init.log that's controlled by /etc/cloud/cloud.cfg.d/05_logging.cfg
[18:47] <rawtaz> and that is not something i can change with user-data, right?
[18:47] <minimal> basically in "[logger_cloudinit]" section you want level=DEBUG
[18:47] <rawtaz> (i presume it's set in the image that the openstack provider loads when provisioning the instance
[18:48] <minimal> hmm, not sure. Once you get in check it anyway as it might have debugging on alreadt
[18:49] <rawtaz> yeah. thanks :)
[18:51] <minimal> are you referring to "ssh_authorized_keys:" in top-level of user-data or "ssh_authorized_keys:" inside a "users:" section?
[18:52] <blackboxsw> rawtaz: by default cloud-init's log level should be set to DEBUG in most images. And the logs are very very noisy. For network config, openstack cloud-init will typically get network config from network_data.json, it'd be good to 'egrep "Applying network|network_data" /var/log/cloud-init.log' on a failed system as it's possible to see that network_data.json was provided by the openstack API and that cloud-init applied it.
[18:52] <rawtaz> minimal: the latter, under a user in users:
[18:53] <rawtaz> blackboxsw: that's good news :D
[18:56] <minimal> rawtaz: then you'd want to search the cloud-init.log file for "Running module users_groups" and then in that section, when it handles the relevant user~(s), you should see "Writing to /home/XYZ/ssh/authorizedkeys - wb: [600] 99 bytes"
[18:56] <minimal> s/ssh/.ssh/
[18:57] <rawtaz> awesome. i will look for that if i can get myself into the instance after it shows the problem. thank you
[19:04] <blackboxsw> rawtaz: if you are seeing highly repeatable failure on openstack that seems to alternate on a predictable frequency (as in every other launch or every third launch) it's possible that there's a misconfiguration issue in some of the openstack nodes participating in the nova service back plane resulting in invalid data from that service every time round-robin load balancing talks to the "broken" service node.
[19:08] <rawtaz> hmm, good point. but even if that is the case, wouldnt my provisionings compete with other users on the same openstack platform's provisionings, i mean in the round-robin? such that i can never know if i hit the first, second or third round, so to speak, as others interefere all the time.
[21:05] <blackboxsw> rawtaz: good pt. wasn't sure how public/private this openstack deployment is for you.
[21:06] <rawtaz> it's one that is used by many customers. i have no way of knowing when or whether other customers provision at the same time, but i would presume that over a couple of hours i can hardly be the only one.
[21:08] <rawtaz> ive been doing provision after provision now, every time the exact same way so the only difference is time (which varies by at most 30 seconds between each try).
[21:09] <rawtaz> i am unable to reproduce it over six attempts on the debian 11 image with cloud-init 20.4.1 but can reproduce it on the debian 12 image with cloud-init 22.4.2
[21:09] <rawtaz> the bad news is that i have not been able to reproduce it with a slightly altered user-data which sets a password for root so i could get to the logs :(
[21:19] <rawtaz> BEWM! i think i got a reproduction on the debian 12 and with ability to log in (assuming cloud-init correctly set that part of the user-data)
[21:20] <rawtaz> considering the fact that the exact same user-data can result in both success and failure, i'd say that suggests theres something in the openstack. but heck. very inconsistent
[21:21] <blackboxsw> yes agreed, if you are using the same base image, it's either a timing thing with network setup or a timing thing where openstack isn't providing the right network-config info to the VM launched
[21:30] <rawtaz> grr. i cant login with the user that's in my users: in the user-data, neither on the console of the instance nor using  ssh -o PubkeyAuthentication=no -o PreferredAuthentications=password ..
[21:31] <rawtaz> i wonder if it's even created. i might have to see if i can put a root user password in the user-data at a higher level than users: (in case users: isnt processed at all)
[21:43] <rawtaz> i think i found something that might be very relevant. in the failed provisionings, i have "cloud-init[546]: 2023-11-02 16:09:00,377 - cc_final_message.py[WARNING]: Used fallback datasource" right after the "Cloud-init v. 22.4.2 finished" line.
[21:44] <minimal> yeah that doesn't seem good
[21:44] <rawtaz> this warning is not present when provisioning succeeds, and what is described about it at https://cloudinit.readthedocs.io/en/latest/reference/datasources/fallback.html is in line with the symptoms im seeing (parts or all of user-data not being used at all).
[21:44] <blackboxsw> ok yes definitely. `cloud-init status --long` or `cloud-id` should confirm that you are DatasourceNone. So something didn't detect openstack for some reason
[21:45] <blackboxsw> and `cat /run/cloud-init/ds-identify.log` should tell you whether OpenStack  datasource was detected as a maybe or match on the failed runs.
[21:45] <rawtaz> wow, spot on there blackboxsw. the failed ones has Datasource DataSourceNone and the successful ones have Datasource DataSourceOpenStack
[21:46] <blackboxsw> I expect there's a possibility of URL GET retry expiration  in /var/log/cloud-init.log resulting in some message like OpenStack not detected
[21:46] <rawtaz> the thing is i have no idea how i could possibly get into the system. im not aware of a default login/password in this debian image they provision :D
[21:47] <rawtaz> i think you may be right.. i think that with this information the next step would be to talk with the hosting provider about it. is there anything else you think i should do/gather before that?
[21:47] <rawtaz> (without being able to log in to the failed systems)
[21:55] <blackboxsw> rawtaz: you could override default root user with something like this in user-data https://dpaste.com/HLYJM7YZA
[21:56] <blackboxsw> hrm..... n/m yet the prob is that your openstack datasource isn't being detected so user-data isn't passed through nevermind .... thinking again
[21:56] <rawtaz> blackboxsw: the problem seems to be that uer-data isnt processed at all :D
[21:57] <rawtaz> im writing them a mail and will talk with them tomorrow. i have a good feeling that theyll be able to resolve it now that we identified such a specific message :)
[21:57] <rawtaz> it could very well be a round-robin thing, that i cant see a pattern in due to other customers rotating it as well
[22:04] <blackboxsw> sounds good, right, I can't see how you'd otherwise be able to debug it if Openstack isn't detected on some launches unless you created your own snapshot image that defined DataSourceNone configuration in /etc/cloud/cloud.cfg.d/90-myds.cfg which provided the cloud-config I mentioned above. Then at least you'd have set the default password if DataSourceNone got detected. Example  https://pastebin.mozilla.org/cx8Xiem3
[22:05] <blackboxsw> #do_not_use_in_production  as it sets a default passwd for root :)
[22:07] <rawtaz> yeah that would have been good. but its too much work, i'd rather they fix the issue for real
[22:07] <rawtaz> haha nice :)
[22:07] <blackboxsw> yep, not your job to sort the openstack config issues
[22:09] <rawtaz> thank you. really, for all the input and help. it was very valuable!
[22:09] <rawtaz> both of you :)
[22:13] <rawtaz> bam, mail done
[22:13] <rawtaz> thats about nine hours of my life i will never get back :/
[22:13] <rawtaz> i had to do other work today, bah