GreatSnoopyhello. Can anyone help me debug a cloud init issue ? I am trying to get cloud-init to work with centos on Azure. The official cloudinit in centos (0.7.9) hangs for a while and does nothing, eventually. So i installed the very last and greatest cloudinit from sources, however now it gives me "util.py[WARNING]: No instance datasource found! Likely bad things to come!"14:53
GreatSnoopyalthough i have a datasource_list: [Azure] in a config file in conf.d14:54
GreatSnoopywhat changed in the newer releases of cloudinit ?14:54
GreatSnoopyit seems I do have a /usr/lib/python2.7/site-packages/cloudinit/sources containing a DataSourceAzure.py14:55
morgana2313Hello. Is there an easy way to use macro's/variables that contain the cloud-instance ip-adress and hostname in a write_files module?15:23
blackboxswmornin folks.  GreatSnoopy have you tried our daily copr builds? https://copr.fedorainfracloud.org/coprs/g/cloud-init/cloud-init-dev/15:30
GreatSnoopyactually yes15:31
GreatSnoopythat was before trying the source15:31
GreatSnoopythe same behavior applies with those builds15:32
GreatSnoopyI tried the source version as a last resort15:32
blackboxswthat is version 17.1.x off of our master.  The no datasource found is intriguing. I'll look over our Azure changes to see if there's something indicative of a prob there.  GreatSnoopy, so I'd be curious to see a paste of your /var/log/cloud-init.log. It should report that it attempts to run Azure datasource and complains if something is amiss15:32
GreatSnoopyjust a moment15:33
GreatSnoopyfor the record, the above is produced by running cloud-init --debug init15:36
blackboxswGreatSnoopy: +1 on that. ok so you upgraded from 0.7.5 -> 0.7.9 originally and saw this problem too?15:47
GreatSnoopyblackboxsw: so the succession of events is the following:15:48
GreatSnoopyfirst, used 0.7.9 that is availabpe in epel. That hangs for like 2 minutes, does nothing in the end when ran manually. And if the machine is rebooted, it just hangs there indefinitely and with ssh stopped is basically a bricked VM15:49
GreatSnoopyso i went trying newer cloud-init15:49
GreatSnoopyfirst used that copr repo15:49
GreatSnoopywhich gave me the behavior of finishing fast but also doing nothing and complaining about unavailable data sources15:50
GreatSnoopyas in the log15:50
GreatSnoopythen i tried the actual sources15:50
GreatSnoopyinstalled dependencies and the cloud-init distribution with pip install .15:50
GreatSnoopy(and -r requirements.txt)15:51
GreatSnoopybut this solved nothing15:51
GreatSnoopyso basically, the copr build and the raw, unpackaged build installed by hand (mis)behave in the same way15:51
GreatSnoopybut differently than 0.7.9 :)15:51
GreatSnoopywhich is the current epel version available15:52
blackboxswyeah the copr report cloud-init-dev is actually only 4 hours old. Our CI builds after every commit I think.    sorry for the thrashing you've experienced here. Would you be able to "cloud-init collect-logs"  on the commandline and attach it to a new cloud-init bug @ https://bugs.launchpad.net/cloud-init/+filebug15:52
blackboxswI find this peculiar in your logs https://bugs.launchpad.net/cloud-init/+filebug15:53
blackboxswoops I mean this: 2017-11-20 15:34:09,308 - __init__.py[DEBUG]: Searching for network data source in: []15:53
blackboxswI'd have expected to see Azure in that empty list15:53
GreatSnoopythe config I added is the following, maybe I am using it wrong:15:54
blackboxswazure datasource though in cloud-init version 17.1 looks to be only run in init-local timeframe15:54
blackboxswand in your logs I only see the stage labelled 'init' which is actually cloud-init's 'init-network' stage15:55
blackboxswwhich means Azure (as init-local stage Datasource) doesn't match as a network data source.15:55
blackboxswI *think815:56
blackboxswI *think*... though haven't had my coffee yet to confirm15:56
GreatSnoopybecause i ran it manually15:56
GreatSnoopymaybe ?15:56
GreatSnoopyif i let the machine boot, it will hang, actually15:56
GreatSnoopycloud-init collect-logs seems to do nothing, /var/log/cloud-init* remain empty15:59
blackboxswGreatSnoopy,  there are semaphore files blocking cloud-init fresh re-runs. If you are willing to let cloud-init re-run on your system in entirety, you could 'sudo rm -rf /var/log/cloud-init* /var/lib/cloud; sudo reboot'16:00
blackboxswthat gives cloud-init the perception that it has never run before16:00
GreatSnoopyi can do that, but that would brick the machine16:01
GreatSnoopyvar/lib/cloud/instances/ is actually empty16:01
GreatSnoopyi can ofc delete every trace of them, but shouldn't it be able to run in the cli otherwise ?16:01
GreatSnoopyi mean so that i do not need to reboot16:01
GreatSnoopyand lose control of the machine ?16:01
GreatSnoopy[root@democentfihn ~]# rm -Rf /var/log/cloud-init* /var/lib/cloud [root@democentfihn ~]# reboot16:03
blackboxswGreatSnoopy: so Azure datasource is init-local only, so running on the commandline you'd need to try "cloud-init init --local" since Azure is only run during init-local.  I'm thinking though this might be that hang you were talking about though16:17
blackboxswok, I'll try firing up a Centos image on Azure today to see if I can reproduce the prob16:18
GreatSnoopywell, i rebooted the machine, and quite as expected ssh now is stopped so i cannot log any more16:19
GreatSnoopydunno if this is due to cloudinit or waagent16:19
smoserfor s in cloud-init-local cloud-init cloud-config cloud-final; do echo == $s.service ==; systemctl restart $s.service || break; done16:19
smoserthat would run everything in order that they would run.16:19
GreatSnoopyi will pick the next VM and try again, but basically I am back to square one when using 0.7.9: basically i boot the machine but although in the boot diagnostic i can see the familiar cloudinit table with network configuration, the rest of the config does not run and ssh does not get to be started - basically i cannot log into the machine16:21
blackboxswGreatSnoopy: check this out. https://bugs.launchpad.net/cloud-init/+bug/1717611 this change did land in azure which might be affecting you16:22
ubot5Launchpad bug 1717611 in cloud-init "Azure: Azure datasource needs to wait longer for SSH pubkey to be dropped by waagent" [Medium,Fix released]16:22
blackboxswGreatSnoopy: do you have a pointer to a public centos image we can use on Azure?16:23
blackboxswor is this custom16:23
GreatSnoopyI am not sure I understand what you are asking from me :) It is supposed to be a custom image that we are building to have cloudinit pre-enabled so that we can then provision other machines16:29
GreatSnoopybut we are not even there yet16:29
GreatSnoopynow we have a vm created from a regular Azure Centos baze16:29
GreatSnoopywhich I think is provided by OpenLogic16:30
blackboxswgotcha, I was just wondering if you were using stock CentOS in azure or a custom image. I'll spin up an instance on azure to checkout16:54
GreatSnoopyfor now I'm just trying to get to nonstandard but i cannot pass the standard phase :))16:55
GreatSnoopyideally this should work with 0.7.9 from epel, but if needed I can make my images with a newer cloudinit as long as it works16:56
blackboxswI'm with you, will ping you when I have progress on this. if you could file a bug in launchpad that'd help us reference progress on this.16:57
blackboxswhttps://bugs.launchpad.net/cloud-init/+filebug   your simple paste of cloud.cfg.d & cloud-init.log with the steps to reproduce the hang would be sufficient16:58
GreatSnoopyJust to simplify things, can we start by investigating the 0.7.9 issue ? Point being made is that ideally i should get it working with what is provided in the more mainstream OS repos17:05
GreatSnoopyalso, can you validate the soundness of the config I gave to cloudinit ?17:05
GreatSnoopyi mean this http://pastebin.centos.org/435771/15111932/17:05
GreatSnoopyblackboxsw: or let's ask the other way around: what would be the preferred/recommended way to install cloud-init on centos in azure?17:13
blackboxswGreatSnoopy: I know cloud-init in certain clouds is already baked into centos images. Trying to confirm on azure now.17:22
GreatSnoopyunfortunately, not the case in azure - at least up to centos 7.317:23
GreatSnoopythey only have cloudinit for ubuntu and coreos17:24
blackboxswif a given cloud doesn't have an image that contains cloud-init. I'd install it then shut it down and take a snapshot or make an image of it in that cloud so I could reference that un subsequent VM creations17:24
GreatSnoopythat is exactly what I am trying to do :)17:24
GreatSnoopyhence the question, which is the recommended way to get cloudinit on that machine : the package in epel, slightly older - 0.7.9 or should i go and install the latest from source ?17:25
blackboxswand I wouldn't want to run cloud-init on that image before I snapshotted it (or I'd remove /var/log/cloud-init* /var/lib/cloud before snapshotting)(17:25
GreatSnoopythat's understood, i always delete those items17:26
blackboxswGreatSnoopy: probably easiest for your to try to use 0.7.9 as it's in epel. But, if there are bugs with 0.7.9 the only fix we'd propse would land in upstream 17.X17:27
blackboxswwe don't backport fixes to centos epel (and it's up to centos when they want to pull in latest cloud-init)17:28
blackboxswand it sounds like 0.7.9 and 17.1 are both causing probs for you. let's see what's up with that. I do think you might be hitting that infinite wait on ssh keys though  on 0.7.917:29
blackboxswon systems like that, I'd expect you'd see"waiting for SSH public key files" in the logs. if you ever got there.17:30
blackboxswGreatSnoopy: looks like that ssh key times out at 900 seconds17:30
blackboxswso that's a 15 minute wait17:30
* blackboxsw had to hit the calculator17:31
GreatSnoopywhat ssh keys does it expect and who is supposed to create those ? because although the source is a standard source image, i spin the instances via terraform17:35
GreatSnoopyand the only thing I pass to the instance is custom data with the cloudinit yaml17:35
blackboxswGreatSnoopy: it looks like DatasourceAzure.py is waiting for ssh key from azure fabric to configure the instance (as the UI/api provides an ssh key that is used to contact the instance)17:41
GreatSnoopythat should not be normal behavior :17:47
GreatSnoopybecause even in the GUI i can create a vm that has only password - no key17:47
GreatSnoopyor is it a different one ? system only ?17:48
GreatSnoopythat is provided no matter what the user actually provides ?17:48
smoserGreatSnoopy: it only waits for files to appear which are listed on the cdrom in the metadata there.18:03
smoserso... password only wont have any ssh keys listed in the metadata so it wont wait for anything18:04
GreatSnoopycan i manually retrieve the file so that i can check the data received before i reboot the machine ?18:05
GreatSnoopywhere does that file "land" initially ?18:05
smoserwalinux-agent would put it into /var/lib/waagent18:08
smoserfor *.crt files in that directory18:08
GreatSnoopyone more question : waagent should be disabled so that its cloud-init the one that starts it, or should be left enabled ?18:10
GreatSnoopycloudinit's relationship with waagent seems a little bit of chicken and egg dillema18:10
smoserGreatSnoopy: cloud-init no longer needs walinux-agent.18:14
smoserand so its default behavior is suggested.18:14
smoserwhich is 'agent_command' of '__builtin__'18:14
GreatSnoopyinteresting, but won't azure "see" the instance as failed if the cloud fabric cannot communicate with the agent ? or does cloudinit also create a replacement for that ?18:15
GreatSnoopyin my previous experience, not having waagent running results in the instance being marked as failed after reboot18:16
GreatSnoopybecause it cannot communicate with the agent18:16
smoserGreatSnoopy: i' not sure when it went in, but yeah, you dont need walinux-agent anymore.18:23
smoseryeah. and newer ubuntu instances do not use it... let me check fof sure18:24
GreatSnoopyfiled this https://bugs.launchpad.net/cloud-init/+bug/173340319:09
ubot5Launchpad bug 1733403 in cloud-init "cloud-init does not work reliably in Azure with Centos" [Undecided,New]19:09
blackboxswthanks for this bug GreatSnoopy and the good context.19:11
GreatSnoopyI will come back tomorrow...for now I am out of VM's to brick :D19:19
GreatSnoopyyou know the most stupid part.... we managed to get this step BEFORE for both centos7 and debian19:20
GreatSnoopywhy this is not working any more I don't know19:20
blackboxswGreatSnoopy: thx again, one thing I wonder is your datasource config represents agent_command :['systemctl', 'start', 'waagent' ]   ...   I wonder if it'd work with   ['service', 'walinuxagent', 'start'] instead19:36
blackboxswthe datasource itself checks to see if agent_command  == ['service', 'walinuxagent', 'start'] and grabs content from metadata in that case.19:36
GreatSnoopylets see, although that would be, well... ugly :)19:37
blackboxswyeah, think I misread the code. I think it checks to see if agent_command == '__builtin__' and then tries to get to metadata to pull in any ssh keys etc.19:39
blackboxswI'm referencing docs at https://cloudinit.readthedocs.io/en/latest/topics/datasources/azure.html as well19:39
blackboxsw... as I don't use azure too often :/19:40
GreatSnoopyfor s in cloud-init-local cloud-init cloud-config cloud-final; do echo == $s.service ==; systemctl restart $s.service || break; done == cloud-init-local.service == == cloud-init.service == Job for cloud-init.service failed because the control process exited with error code. See "systemctl status cloud-init.service" and "journalctl -xe" for details.19:41
GreatSnoopy2017-11-20 19:40:48,245 - util.py[DEBUG]: Running command ['blkid', '-tTYPE=udf', '-odevice'] with allowed return codes [0, 2] (shell=False, capture=True) 2017-11-20 19:40:48,378 - handlers.py[DEBUG]: finish: init-network/search-AzureNet: SUCCESS: no network data found from DataSourceAzureNet 2017-11-20 19:40:48,378 - util.py[WARNING]: No instance datasource found! Likely bad things to come! 2017-11-20 19:40:48,378 - util.p19:41
GreatSnoopyi mean http://pastebin.centos.org/435866/19:42
GreatSnoopyin any case, i will be back tomorrow, and this time I will also rerun the whole process again (including the vm creation)19:43
blackboxswgood deal thx GreatSnoopy19:44
GreatSnoopycurrently that was not made by me, and i will have to check if something got left out, just to be sure19:44
GreatSnoopythanks a bunch, guys. See you tomorrow, have a nice day !19:44
blackboxswyou too19:45
smoserblackboxsw: i'm grabbing merge of fix-ec2-fallback-nic now20:57
blackboxswsweet, I'm on the #jinja2 stuff. no other changes needed20:57
blackboxswsmoser: with that fix-ec2-fallback-nic branch landed, shall we do a minor SRU?21:00
smoserwe could.21:00
blackboxswwe have 2 fixes for ec2 that'd be helpful.21:00
blackboxswand it'd make the SRU simple21:00
smoserare you thinking cherry-pick ?21:03
smoserblackboxsw: ?21:04
smoserthat is trunk -> bionic riht now21:04
smoserwhich we absolutely should do21:05
blackboxswsmoser: I was thinking master !cherry-pick21:11
blackboxswforgot about the others21:11
blackboxswbut the thing about doing something painful, is to repeat the process often :)21:11
blackboxswit can only get better with practice.21:11
smoserblackboxsw: i'm fine with SRU21:42
smoserbut we need to do bionic first21:42
smoserand that can happen "right now" if you want to propose, i'll upload21:42
blackboxswok will do smoser21:51
smoserblackboxsw: i'm pusing the integration test one21:54
smoserso wait on that ?21:54
smoseri'm tox && git push on it21:54
blackboxswfire awaay21:54
smoserthats in now21:55
blackboxswok grabbing21:56
blackboxswsmoser: this is what I see http://pastebin.ubuntu.com/22:01
blackboxswshould the AliYun have been "bionic"22:01
blackboxswinstead of UNRELEASED?22:01
smoserblackboxsw: ah.22:03
smoserjust squahs it into your new commit22:03
smoserit was committed as UNRELEASED as it was not released.22:03
smoseradn then the next release would just pick it up22:03
smoserkind of queueing things22:03
smoserso you  just drop that old changelog entry and pull the AliYum comment up22:03
smosermake sense ?22:03
blackboxswyeah squash, gotcha22:04
smoserput the debian/ ones at the top.22:05
smoserno real reason22:05
smoserjust how i've done it before22:05
smoserthos will be your top two entries22:05
smoserwith your name instead of mine22:05
smoseri'll get that in and uploaded later tonight if you MP it22:05
smoserbut have to run for now.22:06
blackboxswsmoser: https://code.launchpad.net/~chad.smith/cloud-init/+git/cloud-init/+merge/33399822:08
blackboxswmoving on to artful,zesty,xenial22:08
blackboxswsmoser: forgot with bionic(devel) should I remove bug #'s which don't affect ubuntu? or just on SRU series (artful, zesty, xenial)22:16
blackboxswrepushed with my name removed from changelog22:19
smoserblackboxsw: ah. i just leave them in for ubuntu-devel22:21
blackboxswok thx. seeing merge conflict on artful for some reason22:21
smoserso re-push with the bug numbers on devel22:22
blackboxswsmoser: re-push contains all bug #'s only removed my name in brackets22:22
smoserblackboxsw: when you do artful, zesty, xenial22:25
smosercherry-pick the templates fix22:25
smoserand i'mo going to move your debian/cloud-int.templates to before the upstrema snapshot comment22:25
blackboxswI'm in hangout for a quick resolution22:28
blackboxswcherry pick is good. just not sure about why I'm seeing merge conflic22:28
blackboxswcherry pick is good. just not sure about why I'm seeing merge conflict22:28
smoseri'll fix that22:30
smoserhe has username in changelog22:30
smoseror, just leave it22:30
smoserlets just levae it22:31
smoserbut we should lint those sorts of things on merge proposal22:31
smoserblackboxsw: i'm not in a hurry to do the others tonight.22:33
smoserjust uploaded bionic22:33
blackboxswkthx. sounds good22:33
blackboxswhave a good one22:33

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!