[00:37] thank smoser [00:37] blackboxsw, tox && git pushing now [00:38] then i'll upload again [00:46] blackboxsw, ok. that is uploaded. i go away for the night now. [00:49] blackboxsw, https://code.launchpad.net/~smoser/cloud-init/+git/cloud-init/+merge/330954 [00:49] and now i really *am* gone. [00:49] later. [00:50] uploaded [16:13] smoser: do we know if any public cloud where we can test DataSourceOVF ? [16:14] well you can test it locally [16:14] by providing an ovf iso. doc shows how to create one [16:15] ok [16:15] sankaradita has one [16:16] ok [16:17] https://code.launchpad.net/~raharper/cloud-init/+git/cloud-init/+merge/330995 this is what I'm going to test; I think it's a good cleanup w.r.t reducing the number of devices we actually probe (and re-using blkid cache) [16:18] rharper: for a workaround for the case? or a medium term fix? [16:18] long term [16:19] k [16:19] it's only looking for devices with iso9660 filesystem, blkid can do that quite quickly (and cloud-init respects the blkid cache); ds-identify will probe this early [16:19] so, this should speed up OVF probing dramatically for systems with more than just one block device [16:27] rharper, i'm pretty sure the mounting we're doing in the default case is doing a mount -t iso9660 [16:27] which i'd think would fail unless youhad iso9660 [16:27] so i'm not sure how this would cause races with other mounts. [16:34] not sure why we're failing https://code.launchpad.net/~cloud-init-dev/+recipe/cloud-init-daily-xenial [16:36] test_openstack_on_non_intel_is_maybe hmm [16:36] checking maybe it's a leaked mock thing? [16:36] or unmocked leak I mean [16:40] yeah. thats all i can think of. [16:41] i can't reproduce it on the branch (ubuntu/xenial) though [16:41] even with patches applied [16:41] so trying in a container [18:32] blackboxsw, https://code.launchpad.net/~smoser/cloud-init/+git/cloud-init/+merge/331012 [18:50] hi [18:50] how can i run something after cloud-init is done? [18:50] basically i want my custom systemd unit file "openvpn2.service" run after cloud-init setup the hostname [18:51] smoser: approved [18:54] what i tried and didn't work: (1) appending After=cloud-init.target to openvpn2.service, (2) supplying user-data via droplets **user-data kwargs "systemctl daemon-reload; systemctl restart openvpn2" [18:55] seven-eleven: :n [18:56] seven-eleven: sorry I'm looking now. I thought systemd chaining would do that for you. but I'm checking which service you need to be after [18:56] maybe i can tell cloud-init in /etc/cloud/cloud.cfg @ cloud_final_modules: to restart my openvpn2, so it grabs the new hostname :] [18:56] blackboxsw, let me paste my service [18:57] blackboxsw, my customized openvpn2.service unit http://dpaste.com/1YF2AV0 which uses %H for hostname [19:15] it's actually starting my service fine, but when i want to restart the service i have to run `systemctl daemon-reload`, else it fails. i can leave it like that, but it's not perfect :-) [19:16] it fails with the error "Failed to start OpenVPN connection to bus1." and bus1 is the hostname of the vm i created the snapshot from [19:18] smoser: re mount -t iso9660, we passed mtype to the util.mount_cb, all this does is change the list of candidate devices from os.listdir(dev) to blkid -odevice -tTYPE=iso9660; we still pass mtype into mount_cb; [19:19] seven-eleven: not sure if this helps, but what about After=cloud-final.service [19:19] that's the last stage to run [19:19] http://cloudinit.readthedocs.io/en/latest/topics/boot.html#final [19:20] seven-eleven: you could also add a runcmd: ['systemctl daemon-restart '] and that should be run in final stage [19:20] rharper, right. [19:20] but two things i was suggesting [19:21] blackboxsw, let me try. i was also thinking to maybe add Before= to cloud-final [19:21] a.) you can/should further limit th results through the regex that was already being done. otherwise you risk extending the breadth inadvertantly. [19:21] b.) the fact that we were passing '-t iso9660' makes me think that this was not actually the bug [19:22] because there is basically no way that 'mount -t iso9660' was going to work on any of the entries in the fstab provided. [19:22] so i dont know how it would have caused issues. [19:23] a) already limits to block devices with iso9660; the regex is likely too narrow but practically I see you point; if there were possibly a block device that's outside of the regex, OVF claims to not want to look at it even if it has an iso9660; that's debatable but certainly increased the "scope" however unlikely [19:24] b) is more interesting; I wonder if mount's -t opens and peeks at the filesystem type [19:25] if that's done via exclusive-open, it could race with a .mount unit [19:25] which expects an exclusive-open [19:25] right. so we should just avoid inadvertantly extending the scope. [19:25] (and by doing so actually further *limit* this questionable search, which is good) [19:25] smoser: yes; I can add the regex check back into the search; [19:25] pratically it won't matter but it's certainly possible [19:25] so yeah, it could be doing a exclusive open and a read-check for the fs magic [19:26] but that'd seem *so* fast [19:26] that i cant imagine we'd actually hit the issue [19:26] open; read(4096); check; close() [19:26] that is really really really fast [19:26] races are races, however small [19:26] the block device can be slow to respond [19:26] but you woudlnt have seen someone report this [19:26] or if you did, they couldnt reproduce [19:27] thats my feeling. [19:27] right, and [19:27] i dont deny that it *could* happen [19:27] its mostly only reproducible with large sets of mounts [19:27] agreed [19:27] just very unlikely that it is the source of the bug if i understand it right. [19:28] the data from wolsen was that after disabling OVF datasource, some multi-hundred reboots never reproduces, where with OVF in, it reproduces every 3 or so [19:28] this on a contrived instrance with like 26 ebs volumes [19:30] that sounds plausible [19:33] rharper, also limiting the mount-callbac-umount [19:33] we are also passing -o ro [19:35] it was also doing util.peek [19:35] which does an open and a read [19:35] well, filtering the set of devices we poke to zero (unless they have an iso9660) will surely work [19:36] I generally expect that OVF run much faster that we're not checking all that regex devices, opening/read and also mount_cb each one [19:38] rharper, well, blkid is going to also do an open [19:38] fwiw [19:38] no [19:38] it's cached [19:38] ds-identify runs blkid first [19:38] then we just read the cache [19:38] i'm not sure that is the case [19:39] why do you think not? [19:39] we do call find_devs_with with no_cache=False. but i have never actually been able to determine what blkid does wrt its cache. [19:39] when it determines something is valid and when not [19:40] run: [19:40] sudo strace -o /tmp/out blkid [19:40] and then run it again [19:40] with -c/dev/null [19:41] we only pass -c/dev/null if no_cache == true [19:41] ie, thats supposed to be the "re-read everything" [19:41] it should re-use its cache if *not* given that [19:41] right, I just meant to compare the syscalls [19:41] so you can see what the cache buys you [19:43] w.r.t the cost; for the instance with 26 volumes or so, we have a total of 1.5 seconds time spent do 1) read /proc/mounts 2) regex match 3) peek file 4) mount_cb; [19:43] thats about 60 ms per device; the filtering helps eliminate all of those; that's pretty nice; [19:47] so i just never really understood what -c/dev/null does [19:47] it seems to still do stuff [19:49] -c means to ignore the cache; I suspect it depends on the query to some degree, [21:00] smoser: around? have time for a ho on the mount race? I've got some questions I wanted to bounce off you if you're available [21:03] quick. yeah. i got 10 minutes [21:05] k [21:05] smoser: https://hangouts.google.com/hangouts/_/canonical.com/cloud-init?authuser=1 [21:06] in [21:24] blackboxsw, sorry i was afk and could test only now. After=cloud-final.service didn't help. but by using run_cmd it worked. I used digitalocean's API and provided run_cmd through the user_data attribute: http://dpaste.com/29JDA2Z [21:39] firewall-cmd --reload on centos 7 eats DOCKER-USER iptables chain. is that a problem? [21:39] sorry wrong channel [21:51] :) === blackboxsw changed the topic of #cloud-init to: Reviews: http://bit.ly/ci-reviews | Next status meeting: Monday 10/2 14:00 UTC === blackboxsw changed the topic of #cloud-init to: Reviews: http://bit.ly/ci-reviews | Meeting minutes: https://goo.gl/Ts2k8t | Next status meeting: Monday 10/2 14:00 UTC === blackboxsw changed the topic of #cloud-init to: Reviews: http://bit.ly/ci-reviews | Meeting minutes: https://goo.gl/mrHdaj | Next status meeting: Monday 10/2 14:00 UTC [22:20] seven-eleven: good work!