/srv/irclogs.ubuntu.com/2023/11/02/#cloud-init.txt

cjp256if anyone has the cycles, can I get a review on 4560? I have one more queued up after it I'd like to get in for 23.4 if possible13:56
Guest99Can I use cloud-init to completly auto install a server image? I have deployed a python http server on port 3003 and created the user-data and meta-data files and placed them in the www directory. I can see that when I boot up qemu that it sees the http server and gives a 200 on the user-data and meta-data files yet when I watch over a SPICE14:08
Guest99terminal it always gets stuck on the first step which is select your language/locale/keyboard I've tried some many different combinations of user-data and I even did the installer manually and then grabbed the installer user-data file out of the /var/log/autoinstall directory and tried it with that and still it gets stuck on that same bit. Im kinda14:08
Guest99lost on where to go from here debuggin wise have you got any tips or help to aid me in the right direction?14:08
minimalGuest99: from the reference to /var/log/autoinstall i'm guessing you're using ubuntu server, is that correct?14:44
blackboxswcjp256: will grab it today.15:47
dbungertGuest99: one common problem that people run into is that if you're sending autoinstall via cloud-config, it has to be under an autoinstall top level keyword - double check that https://canonical-subiquity.readthedocs-hosted.com/en/latest/intro-to-autoinstall.html15:49
blackboxswGuest99: how are you providing the URL to your system? I'm presuming kernel cmdline in GRUB with `ds=nocloud-net;s=http://your-service:3003/`?  I'd also expect you provide an "autoinstall" param on the kernel cmdline too to avoid getting prompted for input 15:50
Guest99minimal yeah the server version. Ultimately I'm trying to build a image that is like desktop ubuntu but with a lot of the stuff removed for our medical staff (Nurses) who are placed around the country.15:50
Guest99I will quickly add it toa paste bin15:50
blackboxswGuest99: given you see the GET of user-data and meta-data. I'm just going to guess that the format of the autoinstall config in #cloud-config user-data is not valid and cloud-init doesn't process it in the ephemeral boot of the installer and so the installer doesn't see a processed autoinstall key. One can check by entering the shell via the help menu and `sudo cloud-init query userdata` or 15:53
blackboxsw`sudo cloud-init schema --system`15:53
Guest99https://pastebin.com/We1yn5nB15:54
Guest99blackboxsw I will try with your grub string as it's different to mine. I read I would need to escape the ';' so it would look like `ds=nocloud-net\;s=http://your-service:3003/`15:58
Guest99But I will try without first15:58
minimalblackboxsw: does the Ubuntu installer support using cloud-init for BOTH doing an install via subquity and THEN using cloud-init with a NoCloud-Net DS also?15:59
blackboxswGuest99: yes that is invalid yaml user-data. try running `python3 -c 'import yaml; yaml.safe_load(open("your-userdata.yaml"))'  or `sudo cloud-init schema --system --annotate` and either will show you invalid yaml syntax for user-data16:01
blackboxswGuest99: yes that is invalid yaml user-data. try running `python3 -c 'import yaml; yaml.safe_load(open("your-userdata.yaml"))` or `sudo cloud-init schema --system --annotate` and either will show you invalid yaml syntax for user-data16:01
Guest99Unfortunatly it didn't work. Ahh I see thanks16:01
Guest99Where should i run the cloud-init schema --system --anaote command? in the www directory with the user-data?16:02
blackboxswGuest99: `sudo cloud-init schema --system`  only works on the target machine that has cloud-init installed and active to report the cached user-data processed by cloud-init. Since you already have a host that has cloud-init installed (and failed) you could run `sudo cloud-init schema --config-file <your_yaml_file>` to start seeing errors in config and it'd allow you to quickly change <your_yaml_file> and reattempt validation16:04
blackboxswGuest99: so something like `sudo cloud-init query userdata > my_userdata.yaml; sudo cloud-init schema --config-file my_userdata.yaml`16:05
blackboxsw.. on that qemu system you have16:05
Guest99Okay thanks! I will give it a go :D16:06
dbungertGuest99: you have several things that should be under autoinstall and aren't, like source, ssh, storage, updates16:06
dbungertalso line 76 needs an indentation fix16:07
dbungertminimal: yes, you can use cloud-config to configure the install environment, then have autoinstall, then have more user-data for the target system actually being installed.16:07
blackboxswthx dbungert and Guest99 line 8 needs a trailing ":" after primary 16:08
blackboxswminimal: the Ubuntu installer has two boot stages, "ephemeral"  and "first boot". Ephemeral stages uses cloud-init to detect any viable discovered datasource and consume that userdata which provides config performed during "ephemeral" stage or "autoinstall" config  which performs target system setup such as disk/partition setup etc.16:19
blackboxsw  The installer then provides supplemental configuration required by "first boot" as DataSourceNone configuration to /etc/cloud/cloud.cfg.d to setup so cloud-init in that boot stage will only detect DataSourceNone config16:19
blackboxswminimal: so ephemeral boot can accept any viable datasource (including nocloud content which could be provided by kernel cmdline or in image in /var/lib/cloud/seed etc or via a mounted device with CIDATA label etc). And that user-data doesn't have to be "autoinstall:" config, but the installer image itself and ephemeral boot stage is designed to take users through the installer prompts to ultimately configure a target vm.16:22
blackboxswtypically the easiest way to provide that nocloud data is manipulating grub on the cmdline appending cloud-init's ds=nocloud-net params like https://ubuntu.com/server/docs/install/autoinstall-quickstart and https://cloudinit.readthedocs.io/en/latest/reference/datasources/nocloud.html#method-2-local-filesystem-kernel-commandline-or-smbios16:26
blackboxsweasy, yet manual. 16:27
minimalwhich reminds me I need to go back to collect up info/logs to write up an Issue for the NoCloud-Net network handling16:56
meenais there a way to override a runcmd instance script provided by hetzner? it's basically just: `udevadm trigger -c add -s block -p ID_VENDOR=HC --verbose -p ID_MODEL=Volume` and doesn't work on FreeBSD18:22
rawtazi have a really weird problem. using terraform im provisioning a debian 12 image on an openstack (not mine). it uses cloud-init to among other things create one "admin" user (with users:) and put an ed2519 pubkey into that user's authorized_keys (with ssh_authorized_keys: for that user).18:23
rawtazthe weird thing is that while the instance is provisioned seemingly properly, it's up and has network, only like one third of the times i provision this sucker the authorized_keys are actually populated. which makes me unable to log in on the host in the other two out of three times.18:23
rawtazwhat makes it weird is that this exact setup is not changing at all between when it works and doesnt. the only thing i do betwen reprodutions of the problem is to `terraform destroy` and then clean all terraform files, and then i start over with a new `terraform init` followed by apply. so its literally the exact same thing running every time. yet, sometimes cloud-init does not populate the authorized_keys xD18:25
rawtazif anyone recognize any of this symptom, feel free to let me know. im gonna try to change the image used to debian 11 instead, to see if i can isolate it to the openstack provider's debian 12 image.18:25
rawtazits just so weird.18:26
rawtazive used this terraform set up with the exact same cloud-init configuration like a year ago, on the same openstack provider, and it worked fine every single time i provisioned an instance (which were a number of times). that suggests it might be debian 12 image used, but its also possible something else changed in their infra since then.18:27
rawtazaround a year ago or whenever i ran this and it worked fine 100% of the time, cloud-init v. 20.4.1 was used, now it's v. 22.4.2 in case that matters.18:34
minimalmeena: I assume that's coming from their vendor stuff, what if you remove "scripts_vendor" from cloud.cfg's module list? ;-)18:35
meenaI've killed the machine already, but that would make sense.18:36
meenait's just confusing that we log it as: 2023-11-02 18:14:38,648 - cc_scripts_user.py[WARNING]: Failed to run module scripts_user (scripts in /var/lib/cloud/instance/scripts)18:36
meenacc_scripts_user.py18:36
minimalrawtaz: have you checked the /var/log/cloud-init.log file when it doesn't work? or can you not SSH in to do that? ;-) Console access instead?18:37
rawtazminimal: i have not. im still doing some big picture isolating of low hanging fruit, e.g. making sure i can reproduce it and then trying the old image i know worked before. after that, i might try to get console access (i'll have to remove `ssh_pwauth: false` and set a password for that, effectively changing the cloud-init config, which kind of sucks). i do have console access but cant log in on it due to no password for root18:39
rawtazi do however have the logs for the provisioning of the instance, where i see most of the cloud-init and dmesg output. i guess there's a more specific cloud-init log though, i recall.18:40
rawtazjust to be clear; the symptoms that make me think it's cloud-init not adding pubkey to authorized_keys are: 1) the instance log does not show the "Authorized keys from /home/admin/.ssh/authorized_keys for user admin" output when it doesnt work; and 2) i cant ssh to it, the different methods are exhausted when i try.18:41
rawtazquestion: im of course interested in /var/log/cloud-init.log and /var/log/cloud-init-output.log - is there a setting to enable more verbose debugging in them?18:43
minimalrawtaz: for cloud-init.log that's controlled by /etc/cloud/cloud.cfg.d/05_logging.cfg18:46
rawtazand that is not something i can change with user-data, right?18:47
minimalbasically in "[logger_cloudinit]" section you want level=DEBUG18:47
rawtaz(i presume it's set in the image that the openstack provider loads when provisioning the instance18:47
minimalhmm, not sure. Once you get in check it anyway as it might have debugging on alreadt18:48
rawtazyeah. thanks :)18:49
minimalare you referring to "ssh_authorized_keys:" in top-level of user-data or "ssh_authorized_keys:" inside a "users:" section?18:51
blackboxswrawtaz: by default cloud-init's log level should be set to DEBUG in most images. And the logs are very very noisy. For network config, openstack cloud-init will typically get network config from network_data.json, it'd be good to 'egrep "Applying network|network_data" /var/log/cloud-init.log' on a failed system as it's possible to see that network_data.json was provided by the openstack API and that cloud-init applied it.18:52
rawtazminimal: the latter, under a user in users:18:52
rawtazblackboxsw: that's good news :D18:53
minimalrawtaz: then you'd want to search the cloud-init.log file for "Running module users_groups" and then in that section, when it handles the relevant user~(s), you should see "Writing to /home/XYZ/ssh/authorizedkeys - wb: [600] 99 bytes"18:56
minimals/ssh/.ssh/18:56
rawtazawesome. i will look for that if i can get myself into the instance after it shows the problem. thank you18:57
blackboxswrawtaz: if you are seeing highly repeatable failure on openstack that seems to alternate on a predictable frequency (as in every other launch or every third launch) it's possible that there's a misconfiguration issue in some of the openstack nodes participating in the nova service back plane resulting in invalid data from that service every time round-robin load balancing talks to the "broken" service node.19:04
rawtazhmm, good point. but even if that is the case, wouldnt my provisionings compete with other users on the same openstack platform's provisionings, i mean in the round-robin? such that i can never know if i hit the first, second or third round, so to speak, as others interefere all the time.19:08
blackboxswrawtaz: good pt. wasn't sure how public/private this openstack deployment is for you.21:05
rawtazit's one that is used by many customers. i have no way of knowing when or whether other customers provision at the same time, but i would presume that over a couple of hours i can hardly be the only one.21:06
rawtazive been doing provision after provision now, every time the exact same way so the only difference is time (which varies by at most 30 seconds between each try).21:08
rawtazi am unable to reproduce it over six attempts on the debian 11 image with cloud-init 20.4.1 but can reproduce it on the debian 12 image with cloud-init 22.4.221:09
rawtazthe bad news is that i have not been able to reproduce it with a slightly altered user-data which sets a password for root so i could get to the logs :(21:09
rawtazBEWM! i think i got a reproduction on the debian 12 and with ability to log in (assuming cloud-init correctly set that part of the user-data)21:19
rawtazconsidering the fact that the exact same user-data can result in both success and failure, i'd say that suggests theres something in the openstack. but heck. very inconsistent21:20
blackboxswyes agreed, if you are using the same base image, it's either a timing thing with network setup or a timing thing where openstack isn't providing the right network-config info to the VM launched21:21
rawtazgrr. i cant login with the user that's in my users: in the user-data, neither on the console of the instance nor using  ssh -o PubkeyAuthentication=no -o PreferredAuthentications=password ..21:30
rawtazi wonder if it's even created. i might have to see if i can put a root user password in the user-data at a higher level than users: (in case users: isnt processed at all)21:31
rawtazi think i found something that might be very relevant. in the failed provisionings, i have "cloud-init[546]: 2023-11-02 16:09:00,377 - cc_final_message.py[WARNING]: Used fallback datasource" right after the "Cloud-init v. 22.4.2 finished" line.21:43
minimalyeah that doesn't seem good21:44
rawtazthis warning is not present when provisioning succeeds, and what is described about it at https://cloudinit.readthedocs.io/en/latest/reference/datasources/fallback.html is in line with the symptoms im seeing (parts or all of user-data not being used at all).21:44
blackboxswok yes definitely. `cloud-init status --long` or `cloud-id` should confirm that you are DatasourceNone. So something didn't detect openstack for some reason21:44
blackboxswand `cat /run/cloud-init/ds-identify.log` should tell you whether OpenStack  datasource was detected as a maybe or match on the failed runs.21:45
rawtazwow, spot on there blackboxsw. the failed ones has Datasource DataSourceNone and the successful ones have Datasource DataSourceOpenStack21:45
blackboxswI expect there's a possibility of URL GET retry expiration  in /var/log/cloud-init.log resulting in some message like OpenStack not detected21:46
rawtazthe thing is i have no idea how i could possibly get into the system. im not aware of a default login/password in this debian image they provision :D21:46
rawtazi think you may be right.. i think that with this information the next step would be to talk with the hosting provider about it. is there anything else you think i should do/gather before that?21:47
rawtaz(without being able to log in to the failed systems)21:47
blackboxswrawtaz: you could override default root user with something like this in user-data https://dpaste.com/HLYJM7YZA21:55
blackboxswhrm..... n/m yet the prob is that your openstack datasource isn't being detected so user-data isn't passed through nevermind .... thinking again21:56
rawtazblackboxsw: the problem seems to be that uer-data isnt processed at all :D21:56
rawtazim writing them a mail and will talk with them tomorrow. i have a good feeling that theyll be able to resolve it now that we identified such a specific message :)21:57
rawtazit could very well be a round-robin thing, that i cant see a pattern in due to other customers rotating it as well21:57
blackboxswsounds good, right, I can't see how you'd otherwise be able to debug it if Openstack isn't detected on some launches unless you created your own snapshot image that defined DataSourceNone configuration in /etc/cloud/cloud.cfg.d/90-myds.cfg which provided the cloud-config I mentioned above. Then at least you'd have set the default password if DataSourceNone got detected. Example  https://pastebin.mozilla.org/cx8Xiem322:04
blackboxsw#do_not_use_in_production  as it sets a default passwd for root :)22:05
rawtazyeah that would have been good. but its too much work, i'd rather they fix the issue for real22:07
rawtazhaha nice :)22:07
blackboxswyep, not your job to sort the openstack config issues22:07
rawtazthank you. really, for all the input and help. it was very valuable!22:09
rawtazboth of you :)22:09
rawtazbam, mail done22:13
rawtazthats about nine hours of my life i will never get back :/22:13
rawtazi had to do other work today, bah22:13

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!