[13:56] if anyone has the cycles, can I get a review on 4560? I have one more queued up after it I'd like to get in for 23.4 if possible [14:08] Can I use cloud-init to completly auto install a server image? I have deployed a python http server on port 3003 and created the user-data and meta-data files and placed them in the www directory. I can see that when I boot up qemu that it sees the http server and gives a 200 on the user-data and meta-data files yet when I watch over a SPICE [14:08] terminal it always gets stuck on the first step which is select your language/locale/keyboard I've tried some many different combinations of user-data and I even did the installer manually and then grabbed the installer user-data file out of the /var/log/autoinstall directory and tried it with that and still it gets stuck on that same bit. Im kinda [14:08] lost on where to go from here debuggin wise have you got any tips or help to aid me in the right direction? [14:44] Guest99: from the reference to /var/log/autoinstall i'm guessing you're using ubuntu server, is that correct? [15:47] cjp256: will grab it today. [15:49] Guest99: one common problem that people run into is that if you're sending autoinstall via cloud-config, it has to be under an autoinstall top level keyword - double check that https://canonical-subiquity.readthedocs-hosted.com/en/latest/intro-to-autoinstall.html [15:50] Guest99: how are you providing the URL to your system? I'm presuming kernel cmdline in GRUB with `ds=nocloud-net;s=http://your-service:3003/`? I'd also expect you provide an "autoinstall" param on the kernel cmdline too to avoid getting prompted for input [15:50] minimal yeah the server version. Ultimately I'm trying to build a image that is like desktop ubuntu but with a lot of the stuff removed for our medical staff (Nurses) who are placed around the country. [15:50] I will quickly add it toa paste bin [15:53] Guest99: given you see the GET of user-data and meta-data. I'm just going to guess that the format of the autoinstall config in #cloud-config user-data is not valid and cloud-init doesn't process it in the ephemeral boot of the installer and so the installer doesn't see a processed autoinstall key. One can check by entering the shell via the help menu and `sudo cloud-init query userdata` or [15:53] `sudo cloud-init schema --system` [15:54] https://pastebin.com/We1yn5nB [15:58] blackboxsw I will try with your grub string as it's different to mine. I read I would need to escape the ';' so it would look like `ds=nocloud-net\;s=http://your-service:3003/` [15:58] But I will try without first [15:59] blackboxsw: does the Ubuntu installer support using cloud-init for BOTH doing an install via subquity and THEN using cloud-init with a NoCloud-Net DS also? [16:01] Guest99: yes that is invalid yaml user-data. try running `python3 -c 'import yaml; yaml.safe_load(open("your-userdata.yaml"))' or `sudo cloud-init schema --system --annotate` and either will show you invalid yaml syntax for user-data [16:01] Guest99: yes that is invalid yaml user-data. try running `python3 -c 'import yaml; yaml.safe_load(open("your-userdata.yaml"))` or `sudo cloud-init schema --system --annotate` and either will show you invalid yaml syntax for user-data [16:01] Unfortunatly it didn't work. Ahh I see thanks [16:02] Where should i run the cloud-init schema --system --anaote command? in the www directory with the user-data? [16:04] Guest99: `sudo cloud-init schema --system` only works on the target machine that has cloud-init installed and active to report the cached user-data processed by cloud-init. Since you already have a host that has cloud-init installed (and failed) you could run `sudo cloud-init schema --config-file ` to start seeing errors in config and it'd allow you to quickly change and reattempt validation [16:05] Guest99: so something like `sudo cloud-init query userdata > my_userdata.yaml; sudo cloud-init schema --config-file my_userdata.yaml` [16:05] .. on that qemu system you have [16:06] Okay thanks! I will give it a go :D [16:06] Guest99: you have several things that should be under autoinstall and aren't, like source, ssh, storage, updates [16:07] also line 76 needs an indentation fix [16:07] minimal: yes, you can use cloud-config to configure the install environment, then have autoinstall, then have more user-data for the target system actually being installed. [16:08] thx dbungert and Guest99 line 8 needs a trailing ":" after primary [16:19] minimal: the Ubuntu installer has two boot stages, "ephemeral" and "first boot". Ephemeral stages uses cloud-init to detect any viable discovered datasource and consume that userdata which provides config performed during "ephemeral" stage or "autoinstall" config which performs target system setup such as disk/partition setup etc. [16:19] The installer then provides supplemental configuration required by "first boot" as DataSourceNone configuration to /etc/cloud/cloud.cfg.d to setup so cloud-init in that boot stage will only detect DataSourceNone config [16:22] minimal: so ephemeral boot can accept any viable datasource (including nocloud content which could be provided by kernel cmdline or in image in /var/lib/cloud/seed etc or via a mounted device with CIDATA label etc). And that user-data doesn't have to be "autoinstall:" config, but the installer image itself and ephemeral boot stage is designed to take users through the installer prompts to ultimately configure a target vm. [16:26] typically the easiest way to provide that nocloud data is manipulating grub on the cmdline appending cloud-init's ds=nocloud-net params like https://ubuntu.com/server/docs/install/autoinstall-quickstart and https://cloudinit.readthedocs.io/en/latest/reference/datasources/nocloud.html#method-2-local-filesystem-kernel-commandline-or-smbios [16:27] easy, yet manual. [16:56] which reminds me I need to go back to collect up info/logs to write up an Issue for the NoCloud-Net network handling [18:22] is there a way to override a runcmd instance script provided by hetzner? it's basically just: `udevadm trigger -c add -s block -p ID_VENDOR=HC --verbose -p ID_MODEL=Volume` and doesn't work on FreeBSD [18:23] i have a really weird problem. using terraform im provisioning a debian 12 image on an openstack (not mine). it uses cloud-init to among other things create one "admin" user (with users:) and put an ed2519 pubkey into that user's authorized_keys (with ssh_authorized_keys: for that user). [18:23] the weird thing is that while the instance is provisioned seemingly properly, it's up and has network, only like one third of the times i provision this sucker the authorized_keys are actually populated. which makes me unable to log in on the host in the other two out of three times. [18:25] what makes it weird is that this exact setup is not changing at all between when it works and doesnt. the only thing i do betwen reprodutions of the problem is to `terraform destroy` and then clean all terraform files, and then i start over with a new `terraform init` followed by apply. so its literally the exact same thing running every time. yet, sometimes cloud-init does not populate the authorized_keys xD [18:25] if anyone recognize any of this symptom, feel free to let me know. im gonna try to change the image used to debian 11 instead, to see if i can isolate it to the openstack provider's debian 12 image. [18:26] its just so weird. [18:27] ive used this terraform set up with the exact same cloud-init configuration like a year ago, on the same openstack provider, and it worked fine every single time i provisioned an instance (which were a number of times). that suggests it might be debian 12 image used, but its also possible something else changed in their infra since then. [18:34] around a year ago or whenever i ran this and it worked fine 100% of the time, cloud-init v. 20.4.1 was used, now it's v. 22.4.2 in case that matters. [18:35] meena: I assume that's coming from their vendor stuff, what if you remove "scripts_vendor" from cloud.cfg's module list? ;-) [18:36] I've killed the machine already, but that would make sense. [18:36] it's just confusing that we log it as: 2023-11-02 18:14:38,648 - cc_scripts_user.py[WARNING]: Failed to run module scripts_user (scripts in /var/lib/cloud/instance/scripts) [18:36] cc_scripts_user.py [18:37] rawtaz: have you checked the /var/log/cloud-init.log file when it doesn't work? or can you not SSH in to do that? ;-) Console access instead? [18:39] minimal: i have not. im still doing some big picture isolating of low hanging fruit, e.g. making sure i can reproduce it and then trying the old image i know worked before. after that, i might try to get console access (i'll have to remove `ssh_pwauth: false` and set a password for that, effectively changing the cloud-init config, which kind of sucks). i do have console access but cant log in on it due to no password for root [18:40] i do however have the logs for the provisioning of the instance, where i see most of the cloud-init and dmesg output. i guess there's a more specific cloud-init log though, i recall. [18:41] just to be clear; the symptoms that make me think it's cloud-init not adding pubkey to authorized_keys are: 1) the instance log does not show the "Authorized keys from /home/admin/.ssh/authorized_keys for user admin" output when it doesnt work; and 2) i cant ssh to it, the different methods are exhausted when i try. [18:43] question: im of course interested in /var/log/cloud-init.log and /var/log/cloud-init-output.log - is there a setting to enable more verbose debugging in them? [18:46] rawtaz: for cloud-init.log that's controlled by /etc/cloud/cloud.cfg.d/05_logging.cfg [18:47] and that is not something i can change with user-data, right? [18:47] basically in "[logger_cloudinit]" section you want level=DEBUG [18:47] (i presume it's set in the image that the openstack provider loads when provisioning the instance [18:48] hmm, not sure. Once you get in check it anyway as it might have debugging on alreadt [18:49] yeah. thanks :) [18:51] are you referring to "ssh_authorized_keys:" in top-level of user-data or "ssh_authorized_keys:" inside a "users:" section? [18:52] rawtaz: by default cloud-init's log level should be set to DEBUG in most images. And the logs are very very noisy. For network config, openstack cloud-init will typically get network config from network_data.json, it'd be good to 'egrep "Applying network|network_data" /var/log/cloud-init.log' on a failed system as it's possible to see that network_data.json was provided by the openstack API and that cloud-init applied it. [18:52] minimal: the latter, under a user in users: [18:53] blackboxsw: that's good news :D [18:56] rawtaz: then you'd want to search the cloud-init.log file for "Running module users_groups" and then in that section, when it handles the relevant user~(s), you should see "Writing to /home/XYZ/ssh/authorizedkeys - wb: [600] 99 bytes" [18:56] s/ssh/.ssh/ [18:57] awesome. i will look for that if i can get myself into the instance after it shows the problem. thank you [19:04] rawtaz: if you are seeing highly repeatable failure on openstack that seems to alternate on a predictable frequency (as in every other launch or every third launch) it's possible that there's a misconfiguration issue in some of the openstack nodes participating in the nova service back plane resulting in invalid data from that service every time round-robin load balancing talks to the "broken" service node. [19:08] hmm, good point. but even if that is the case, wouldnt my provisionings compete with other users on the same openstack platform's provisionings, i mean in the round-robin? such that i can never know if i hit the first, second or third round, so to speak, as others interefere all the time. [21:05] rawtaz: good pt. wasn't sure how public/private this openstack deployment is for you. [21:06] it's one that is used by many customers. i have no way of knowing when or whether other customers provision at the same time, but i would presume that over a couple of hours i can hardly be the only one. [21:08] ive been doing provision after provision now, every time the exact same way so the only difference is time (which varies by at most 30 seconds between each try). [21:09] i am unable to reproduce it over six attempts on the debian 11 image with cloud-init 20.4.1 but can reproduce it on the debian 12 image with cloud-init 22.4.2 [21:09] the bad news is that i have not been able to reproduce it with a slightly altered user-data which sets a password for root so i could get to the logs :( [21:19] BEWM! i think i got a reproduction on the debian 12 and with ability to log in (assuming cloud-init correctly set that part of the user-data) [21:20] considering the fact that the exact same user-data can result in both success and failure, i'd say that suggests theres something in the openstack. but heck. very inconsistent [21:21] yes agreed, if you are using the same base image, it's either a timing thing with network setup or a timing thing where openstack isn't providing the right network-config info to the VM launched [21:30] grr. i cant login with the user that's in my users: in the user-data, neither on the console of the instance nor using ssh -o PubkeyAuthentication=no -o PreferredAuthentications=password .. [21:31] i wonder if it's even created. i might have to see if i can put a root user password in the user-data at a higher level than users: (in case users: isnt processed at all) [21:43] i think i found something that might be very relevant. in the failed provisionings, i have "cloud-init[546]: 2023-11-02 16:09:00,377 - cc_final_message.py[WARNING]: Used fallback datasource" right after the "Cloud-init v. 22.4.2 finished" line. [21:44] yeah that doesn't seem good [21:44] this warning is not present when provisioning succeeds, and what is described about it at https://cloudinit.readthedocs.io/en/latest/reference/datasources/fallback.html is in line with the symptoms im seeing (parts or all of user-data not being used at all). [21:44] ok yes definitely. `cloud-init status --long` or `cloud-id` should confirm that you are DatasourceNone. So something didn't detect openstack for some reason [21:45] and `cat /run/cloud-init/ds-identify.log` should tell you whether OpenStack datasource was detected as a maybe or match on the failed runs. [21:45] wow, spot on there blackboxsw. the failed ones has Datasource DataSourceNone and the successful ones have Datasource DataSourceOpenStack [21:46] I expect there's a possibility of URL GET retry expiration in /var/log/cloud-init.log resulting in some message like OpenStack not detected [21:46] the thing is i have no idea how i could possibly get into the system. im not aware of a default login/password in this debian image they provision :D [21:47] i think you may be right.. i think that with this information the next step would be to talk with the hosting provider about it. is there anything else you think i should do/gather before that? [21:47] (without being able to log in to the failed systems) [21:55] rawtaz: you could override default root user with something like this in user-data https://dpaste.com/HLYJM7YZA [21:56] hrm..... n/m yet the prob is that your openstack datasource isn't being detected so user-data isn't passed through nevermind .... thinking again [21:56] blackboxsw: the problem seems to be that uer-data isnt processed at all :D [21:57] im writing them a mail and will talk with them tomorrow. i have a good feeling that theyll be able to resolve it now that we identified such a specific message :) [21:57] it could very well be a round-robin thing, that i cant see a pattern in due to other customers rotating it as well [22:04] sounds good, right, I can't see how you'd otherwise be able to debug it if Openstack isn't detected on some launches unless you created your own snapshot image that defined DataSourceNone configuration in /etc/cloud/cloud.cfg.d/90-myds.cfg which provided the cloud-config I mentioned above. Then at least you'd have set the default password if DataSourceNone got detected. Example https://pastebin.mozilla.org/cx8Xiem3 [22:05] #do_not_use_in_production as it sets a default passwd for root :) [22:07] yeah that would have been good. but its too much work, i'd rather they fix the issue for real [22:07] haha nice :) [22:07] yep, not your job to sort the openstack config issues [22:09] thank you. really, for all the input and help. it was very valuable! [22:09] both of you :) [22:13] bam, mail done [22:13] thats about nine hours of my life i will never get back :/ [22:13] i had to do other work today, bah