/srv/irclogs.ubuntu.com/2020/06/29/#cloud-init.txt

=== hjensas__ is now known as hjensas|afk
=== cpaelzer_ is now known as cpaelzer
bilschhey looking for some help on cloud-init and growfs / resizefs.15:31
bilsch```  growpart:    ignore_growroot_disabled: false    mode: auto    devices:      - "/"      - "/home"      - "/var"      - "/var/log"      - "/var/log/audit"      - "/var/tmp"    resizefs: true    resize_rootfs: true```15:31
bilscherr, no markdown raw or I suck at webchat ;)15:31
bilschso, maybe another way to ask - does resizefs need anything special to get other filesystems than /? I can see growpart has resized the partitions but the fs was not grown. Not sure if this is os / cloud-init version specific but this is on rhel7 and cloud-init version 18.5.6 ( yea, rhel7 :shame: but ... )15:33
Odd_Blokebilsch: I'm not 100% sure I understand the question.  Could you perhaps use https://paste.ubuntu.com/ to paste your configuration, and describe what it is you want/expect it to do?16:08
blackboxswhttps://github.com/canonical/cloud-init/pull/375 landed. thanks lucasmoura16:21
* blackboxsw is reviewing the vmware PR now16:21
blackboxswOdd_Bloke: sorry looks like you just rebased https://github.com/canonical/cloud-init/pull/46416:22
blackboxswI think it's stale again because of my merge16:22
bilschOdd_Bloke https://paste.ubuntu.com/p/rhhrBnVKY7/16:52
blackboxswhi smoser/rharper: I added the following response to the vmware review https://github.com/canonical/cloud-init/pull/441/files#r447132910 suggesting maybe a datasource config option to override vmware's default image customization behavior. If either of you disagree with approach, just let me or the PR know.17:27
blackboxswsmoser: had reviewed the earlier invocation of that PR, but I think very little has changed from the first attempt17:27
blackboxswat least as far as overrides from cloud-init user-data image customization side17:28
Odd_Blokebilsch: Thanks!  Can you paste the output of `mount`?17:42
rharperblackboxsw: ok17:45
powersjblackboxsw, eta on the SRU?18:25
bilschOdd_Bloke https://paste.ubuntu.com/p/VJRfjt7sJV/18:25
bilschlooking at the properties for cc_resizefs.py I don't think it even takes an array / list or looks up devices / mounts. I also don't see where growpart calls a resizefs etc. I kinda wonder if this works at all or is just cryptic. I'm hoping its as easy as "yea dummy set this config / yaml key" ;)18:29
Odd_Blokebilsch: This line suggest to me that your configuration isn't being consumed by cloud-init: cc_growpart.py[DEBUG]: No 'growpart' entry in cfg.  Using default: {'ignore_growroot_disabled': False, 'mode': 'auto', 'devices': ['/']}18:52
Odd_BlokeSo that would probably be the next thing to debug. :)18:53
bilschits set as user data for the vm in ec2 ...19:23
blackboxswpowersj: only waiting on cdoqa. test run. it's queued, but I don't think it's run yet19:24
blackboxswpowers, /me needs to attach all our verification logs. cloud-init side of testing is complete19:25
powersjblackboxsw, awesome, thanks!19:25
Odd_Blokebilsch: What does `sudo cloud-init query userdata` give you?19:26
blackboxswgood one Odd_Bloke. bilsch if the following was your full user-data. it's missing a leading header line containing  "#cloud-config"19:31
blackboxswhttps://paste.ubuntu.com/p/rhhrBnVKY7/19:31
blackboxswbut that query cmd mentioned would tell you for certain19:33
bilschah, I did not paste the full file. That "#cloud-config" header is there19:43
bilsch`sudo cloud-init query userdata  #cloud-config  runcmd:    - yum -y remove ansible  growpart:    ignore_growroot_disabled: false    mode: auto    devices:      - "/"      - "/home"      - "/var"      - "/var/log"      - "/var/log/audit"      - "/var/tmp"    resizefs: true    resize_rootfs: true`19:43
bilschbah newlines and such19:44
bilschhttps://paste.ubuntu.com/p/SZwVc9mMRj/19:44
bilschwould the spacing in there cause issues? Its technically valid yaml but not sure how strict the parser is19:45
blackboxswbilsch: the newlines/whitespace is probably what is breaking cloud-init's interpretation of the growpart key maybe? Try:   grep "Failed at merging" /var/log/cloud-init.log.       I presume if it was invalid cloud-config or yaml you'd get that message.19:48
bilschthe grep returned nothing19:49
bilschthough, I think I jus t re-created without the leading spaces19:49
blackboxswalso something I sometimes do:    sudo cloud-init query userdata > my.yaml; cloud-init devel schema --config-file my.yaml19:49
bilschoh thats handy thanks19:49
bilschyea those leading spaces are a problem19:50
blackboxswyeah we are building that schema validation cmdline utility up, so it's still considered a 'devel' subcommand.  It'll at least validate proper yaml (and all keys and config values for about 10 of the cloud-config modules)19:50
bilschCloud config schema errors: format-l1.c1: File my.yaml needs to begin with "#cloud-config"19:50
blackboxswok there it is. silly white space19:51
blackboxswand about 90% of the problems in cloud-init deploys that people have.... darn you YAML19:51
bilschyea we need another file markup language. someone should fix that!19:52
blackboxswand strict yaml cloud-init processing :)19:52
bilschyea assume you guys know of yamllint? saves so much time ( and yea I know... I should have done that )19:52
blackboxswbilsch: I am curious about your deploy still (while we have something 'broken') I'm wondering why cloud-init didn't match the failure log19:52
blackboxswdo you have a match from grep Trace /var/log/cloud-init.log ?19:53
bilschyea sure whats up19:53
bilschyea 2 tracebacks19:53
blackboxswok those should point to the specific type of trace in trying to process invalid user-data19:53
blackboxswunfortunately, cloud-init tries hard to succeed, even if vendor data or user-data is "broken" so the VMs still come up. We are generally trying to move it from "bring the system up as best you can" approach to "complain loudly because nobody looks in logs for warnings :)"19:54
bilschso, both look like permission denied. These boxes have selinux on them19:54
blackboxswok. I'll get a test system with leading whitespace and reproduce locally then. Thanks for peeking19:55
bilschyea I second that motion make it fail so I learn and fix my broken crap19:55
bilschyea for sure thanks for the help!19:55
blackboxswyeah same for me too. I don't want to dig into a log to find out why I'm not 100% in line with my config19:55
blackboxswsurely thx Odd_Bloke19:55
bilschyea I prefer determinism - if something is not configured right say so, break, seg fault the kernel whatever19:56
blackboxswagreed19:56
bilschoh ha terraform was happily waitng for me to say yes to test the fixed yaml ( heredoc with spacing proper to the tf file )19:56
bilschare the resizefs and resize_rootfs top-level or nested within growpart?20:00
bilschalso rathher than constantly re-creating a vm is there a way to just apply the modified yaml locally?20:04
bilscheg, save the yaml like before, tweak / whatever and use cloud-init to do all the things?20:04
blackboxswbilsch: can you run cloud-id ( I think terraform == NoCloud datasource type right?)20:06
blackboxswor "cloud-init status --long"20:06
bilsch cloud-init status --longstatus: donetime: Mon, 29 Jun 2020 20:03:25 +0000detail:DataSourceEc220:06
bilschbunch of new lines in there, want it in pastebin?20:06
blackboxswthat 2nd command will tell you (if on NoCloud, where your seed directory is coming from)20:06
blackboxswnah it's good20:07
bilschso is growpart intended to also automagically expand the filesystems?20:07
bilschI see the devices expanded but not the filesystems20:08
bilschpost init of a fresh vm20:08
bilschdata blocks changed from 523776 to 524261920:08
bilschthat is only after I ran xfs_growfs.  cloud-init had expanded the partition just fine20:08
blackboxswbilsch: the hammer that let's you re-run everything in cloud-init is `cloud-init clean --logs --reboot` that'll wipe the system and re-run. The problem with ec2 datasource is that you have already set the user-data on the metadata service in ec2 I think. So, even though you cleaned cloud-init, it'll still grab the original user-data.20:13
bilschahh20:13
bilschok20:13
blackboxswnocloud datasource is different in that it has a seed directory that you can re-write after the fact and re-run with new user-data content20:13
bilschno biggie just looking to debug in place / reduce cycle time to get it right is all20:13
bilschalmost as much time to go mess with ec2 console and muck with it vs just letting tf rebuild it so20:14
blackboxswyeah we reduce cycles by using 'lxc launch ubuntu-daily:focal mylocalvm'    which also has images:centos/7 sles/* etc20:14
bilschah ok that makes sense20:14
bilschhttps://github.com/canonical/cloud-init/blob/master/cloudinit/config/cc_resizefs.py#L242-L247 so I'm back to questioning if resizefs works for anything other than the root filesystem....20:19
blackboxswhttps://cloudinit.readthedocs.io/en/latest/topics/modules.html#growpart so what you want is  https://github.com/canonical/cloud-init/blob/master/cloudinit/config/cc_growpart.py#L276 I think?20:23
bilschwell, growpart and resizefs appear to be separate modules / tools20:35
bilschgrowpart is just the partition table change20:36
bilschresizefs looks for / runs the proper command to grow the filesystem for an already-expanded partition / volume etc20:36
Odd_Blokeblackboxsw: bilsch: I'm catching up, but the complication with completely aborting a boot is that makes it harder for people to get log files off of it to diagnose the issue; obviously kernel panics have a similar effect, but you generally can't cause a kernel panic by passing misformatted YAML to your cloud provider. ;)20:41
bilschheh - yea even kill -9 1 does not work anymore ;(20:42
bilschand yea it does make sense that you won't want to make the pain too bad, gotta give people a chance to find the information20:43
bilschI've tried a few incantations on the growpart devices - I get the partition expanded via growpart but only / via resizefs20:44
AnhVoMSFT@blackboxsw @rharper at which point during cloud-init-local.service that systemd will trigger other units that are marked "after" cloud-init-local.service20:44
rharperAnhVoMSFT: since it's in oneshot more, after the first Exec= line is complete,20:47
rharperAnhVoMSFT: so cloud-init init --mode=local must exit before units depending on it can start20:47
AnhVoMSFTI see - thanks @rharper. We're seeing this strange issue in RHEL with cloud-init (18.5) where if an NFS mount exists in /etc/fstab, cloud-init will hang in "mount -a" during deallocate/restart of a VM20:48
rharperhttps://www.freedesktop.org/software/systemd/man/systemd.service.html  ; "Behavior of oneshot is similar to simple; however, the service manager will consider the unit up after the main process exits. It will then start follow-up units. RemainAfterExit= is particularly useful for this type of service. Type=oneshot is the implied default if neither Type= nor ExecStart= are specified. Note that if this option is used without20:48
rharperRemainAfterExit= the service will never enter "active" unit state, but directly transition from "activating" to "deactivating" or "dead" since no process is configured that shall run continously. In particular this means that after a service of this type ran (and which has RemainAfterExit= not set) it will not show up as started afterwards, but as dead.20:48
rharperAnhVoMSFT: is the mount entry marked with _net ... what's the bit20:49
rharper_netdev20:49
AnhVoMSFTthe mount is added manually by the customer to /etc/fstab , not through cloud-init mounts config20:50
rharpercloud-init local does not call  mount -a, that happens in cloud-init init (network mode);20:50
rharperok, it must have _netdev in the options field if it depends on networking20:50
rharperthis informs systemd-fstab-generator which creates .mount files to set them to run After=network-online.target20:51
AnhVoMSFTit does not have _netdev in the options field20:52
rharperthat said, mount -a only runs in cloud-init init (stage 2) and networking should be up20:52
rharperso I'm not sure why it would hang; so I suspect that maybe networking isn't coming all the way up (or no route to the mount)20:52
AnhVoMSFTthe mount unit indicates type=nfs, which will have after=network-online.target added automatically by systemd I believe20:53
rharperyep20:53
AnhVoMSFTyeah, but you're right it's in cloud-init's init phase, not init-local20:54
rharperbut a mount -a will force mounting of all entries when it's run; meant to bring up any new entries added since fstab-generator ran20:54
rharperthe fstab generator runs before cloud-init local does;  so if we add a new mount, then we trigger a mount -a ;  the ephemeral disk in azure's case20:54
rharperbut ... it should come up; so that means networking issues (or possibly missing nfs client)20:55
AnhVoMSFTif I move "mounts" to the config phase it works fine20:55
rharperthe ubuntu image, I don't think has nfs-common package included by default20:55
rharpersounds like networking isn't fully yp20:55
rharperup20:55
AnhVoMSFTthis issue does not happen in Ubuntu, but in RHEL only, which is strange20:55
rharperI suspect it's network-manager related20:55
AnhVoMSFTit does look like some sort of issue with networking20:55
rharperI know otubo was chasing NM "being all the way up" issues20:55
rharperthis was maybe 6 months ago, but I thought the workaround there was to ensure the Network-Manager-wait-online.service was also waited upon by cloud-init.service20:56
AnhVoMSFTthe init's phase also runs before network-online ?20:58
AnhVoMSFThttps://paste.ubuntu.com/p/h4yftHdYbC/20:59
AnhVoMSFTthat's what the cloud-init.service looks like in RHEL20:59
rharperthat doesn't look right to me21:00
rharperupstream we run After=networking.serivce NetworkManager.service;  and I thought otubo added a drop-in to include NetworkManager-wait-online.service;21:01
rharperbasically cloud-init.service runs after OS networking is up; but before network-online.target;  which means that cloud-init knows that networking is up; and can fetch networking based #include cloud-configs, which need to be present before we run cloud-config.service21:02
AnhVoMSFTlet us try that quickly, then we can open a support ticket on Redhat and get that fixed21:05
rharperhttps://bugs.launchpad.net/cloud-init/+bug/1869181/comments/12  ;  I poked around with getting NM to fully come up in Ubuntu and it needed more work;   especially tricky  w.r.t the ordering NM needs dbus and strange things happen (boot dep cycle)21:09
ubot5Ubuntu bug 1869181 in cloud-init "[Focal] cloud-init service never get nework actived during MaaS deploy." [Undecided,Incomplete]21:09
rharperAnhVoMSFT: the DefaultDependencies=no to both NM and NM-wait-online.service I think  and then adding the After=NetworkManager-wait-online.service helped ;  that may be enough on Centos/RHEL; Ubuntu the netplan bits convering to NM config weren't quite there;  on Cent/RH they use the sysconfig rh plugin, so I don't think you'll see the rest of the issues I saw in that bug21:10
AnhVoMSFTwe tried adding the After=NetworkManager-wait-online.service to cloud-init.service but that did not help21:11
AnhVoMSFTi did not add DefaultDependencies=no21:11
rharperAnhVoMSFT: journalctl -b -o short-monotonic -u NetworkManager.service -u NetworkManager-wait-online.service -u cloud-init.service -u network-online.target ;21:14
rharperthat should print in timestamp order ... if you see cloud-init.service dumping the netinfo table and not everything is then, something isn't ordered correctly (or NM is failng to bring everything online)21:14
rharpers/then/there21:14
AnhVoMSFTadding the DefaultDependencies=no also did not help21:21
AnhVoMSFTlet me check the journalctl output to see what is missing21:21

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!