[09:01] smoser: do you think my updates are okay? [10:30] Hi, is there any way to rename network interface in ubuntu 16.04 on AWS back to eth0? [13:04] Wulf: cloud-init 0.7.9? [13:04] Wulf: PM me plz. I don't think it's a cloud-init issue [15:11] niluje, i'll take a look now. i'm sorry. [15:12] sorry to be oppressive :x [15:14] niluje, no worries. [15:14] blackboxsw, can you confirm that [15:14] 4d9f24f5c385cb7fa21d87a097ccd9a297613a75 [15:14] is broken in the same way as [15:14] https://code.launchpad.net/~chad.smith/cloud-init/+git/cloud-init/+merge/327325 [15:14] was ? that path is just plain wrong? [15:15] Wulf, https://www.freedesktop.org/wiki/Software/systemd/PredictableNetworkInterfaceNames/ [15:15] see [15:15] I don't like this, how do I disable this? [15:18] niluje, you kind of have 3 unit tests in one [15:18] smoser: checking [15:19] smoser: I can split them, but usually I prefer to test a feature in a single test rather than split it into smaller tests [15:21] well, but you're really testing 3 different things. [15:21] because split means having more tests, which take more time to run, and 99.9% of the time it doesn't make sense to test only a function when it works (for instance) and not when it fails [15:21] a.) valid API / expected working case. [15:21] b.) no user-data and vendor-data [15:21] c.) local port retry [15:22] blackboxsw, i'd also like your opninon on the use of HTTPConnection (via requests.packages.urllib3.connection) in niluje's MP [15:22] https://code.launchpad.net/%7Ejcastets/cloud-init/+git/cloud-init/+merge/325740/+index [15:22] smoser: so you want me to split the unittests then? [15:22] (won't do before monday) [15:23] blackboxsw, . your thoughts on ^ too ? i think generally we do want them to be more granular [15:24] smoser: possible 0.7.6 to 0.7.9 regression when using a config drive with a static network config; curious about your opinion: if after initial boot a config drive goes away, it looks like 0.7.6 will retain the network config while 0.7.9 will panic and replace the static network config with a dhcp config from the fallback data source. [15:27] larsks, the config drive metadata service's check_instance_id() should end up saying that the cached object in /var/lib/cloud/instance/obj.pkl is still valid. [15:27] │ logan- [15:27] oops [15:28] smoser: Looks like not, but I will check. It looks like with 0.7.6, cloud-init.service simply fails when there is no data source, so -config and -final never run. [15:28] Let me take a closer look at my 0.7.9 setup. [15:28] larsks, is this opssibly related to ovirt ? [15:28] or is it really openstack [15:29] Not openstack. Right now, I'm testing locally with libvirt + config drive. [15:29] (ovirt uses config drive, and i think doesn't set the dmi data to the instance id, which woudl make that fail) [15:29] But I think the original problem is in RHEV, which I think means ovirt. [15:29] i think generally we do want them to be more granular -> again I'll follow your recommendations, but for my personal projects I really feel being granular is cleaner, but it's often a pain to do (you need to do the test setup more than once) and doesn't actually help to test the code better [15:29] but let me know :) [15:29] niluje: smoser yes please on the separation/simplification of unit tests to simply assert 1 or 2 things instead of compound set of assertions. We want each unit test to be representative of a single "thing" that it is testing, which makes for easier error tracking/resolution when the test fails in the future [15:30] right. i think that is quite likely broken, but i'm not really sure how it would work previously. [15:30] okay [15:30] larsks, and i'm not sure how to make it work. [15:30] blackboxsw: will do, thanks :) [15:30] I think we (cloud-init) should pull together a doc on unit test writing just to capture general intent/approach/style etc. [15:30] hm.. [15:30] thanks niluje, sorry if it feels a bit pedantic [15:31] smoser: the bahvior in 0.7.6 seems fine (fail if no data source). Why does 0.7.9 try to continue in that case? Is there a way to disable the fallback datasource? [15:31] probably need a little addendum to the HACKING doc on cloudinit.readthedocs.io [15:31] If not, should there be an option to do that? [15:31] this path where it correctly identifies its the same instance id will only work on intel (as it uses dmi data, and although arm64 in theory could do dmi practice and kvm are different) [15:31] if there's something else I'll fix it on monday morning, hoping we can merge the MR soon :) [15:32] +1 niluje I'll put some eyes on the branch too today to see if I can make more concise comments [15:32] larsks, it re-writes netowrk data on 0.7.9 as it doesn't know that it is not a new instance, so it goes the path of "do a dhcp on eth0 so the network datasources can find network metadata" [15:33] blackboxsw: if you are interested in having a node to do some testing, I can give you access to one [15:33] smoser: same [15:33] Right, I understand that. But is there any way to disable that behavior if so desired? [15:33] The problem is not just that it does a dhcp, but that it is actually writing a new network configuration to disk. [15:38] larsks, right. but at the point where there is no config drive it can only really think that this is a new instance, or it has been snapshotted and moved to another cloud.... [15:38] its looking for a metadata source. [15:39] No, I get that. I am just arguing that it shouldn't be writing configs if ultimately it doesn't find a valid datasource. [15:40] But I understand what you're saying. I am trying to figure out if this whole "config drive goes away" thing is standard RHEV behavior... [15:40] ...because if it's not, "don't do that" seems like the quickest fix. [15:40] you can configure "manual_cache_clean" to true [15:40] yeah.. [15:41] larsks, so right now blackboxsw is working on moving the ec2 datasource to run at "local" time. [15:42] if we replaced all network-time datasources to run at local, then we could essentially decide failure at the local time frame. [15:42] and leave the networking in place [15:42] the path would still result in cloud-init considering that a failure though [15:42] as it would not have found a datasource (as there is none, and the previous one is not found to be valid) [15:45] smoser: I wish that there was a shorter term fix when no network datasources are enabled (e.g., when datasource_list is [ConfigDrive, NoCloud]). [15:52] larsks, well, 2 fixes to the platform [15:52] a.) match the dmi data to the instance id [15:52] b.) do not detach the drive ever [15:53] both of these things qualify as "act more like the platform that you're imitating" [15:54] larsks, i do agree it sucks. several months ago someone had pointed this out to me, and that was when i asked you what you knew of ovirt [15:54] i considered raising an issue there. [15:54] just kind of ran out of time/motivation [15:54] :-( [15:56] larsks, if there is other unice id in dmi information on that platform, we could adjust the check_instance_id() to have stored that bit too and ccompare that it is not new [15:57] smoser: which dmi field are we checking? [15:58] and another option that might make sense would be to allow vendor_data to declare manual_cache_clean [15:58] if the cached obj.plk had manual_cache_clean=True, then we could trust it [15:58] (rather than requiring that to come from system config) [15:58] the field looked at is system-uuid [15:59] i do recall that they had a unique id somewhere in their dmi data [16:00] Thanks. Let me look into that a bit. [16:02] smoser: that 4d9f24f5c385cb7fa21d87a097ccd9a297613a75 is a completely different failure on my end than what was fixes in my gce mock branch. I'm seeing a magic number traceback in 4d9f24f5c385cb7fa21d87a097ccd9a297613a75. Will peek at that (as well as getting my gce-mock-fix branch pushed) [16:03] hmm PEBKAC. issue was on my side with a stale pyc file. checking now. [16:09] :) [16:09] smoser: same failure mode which the branch I have fixes. [16:10] you see failure ? [16:10] because i do not [16:10] (and neither does c-i) [16:10] smoser: not a failure, I add a pdb here [16:10] https://www.irccloud.com/pastebin/pP1aE6ZT/ [16:10] .tox/py3/bin/python3 -m nose --tests tests/unittests/test_datasource/test_gce.py:TestDataSourceGCE.test_get_data_returns_false_if_not_on_gce -x -s [16:10] and this tox line gets to that pdb which it shouldn't [16:11] because platform_reports_gce should be mocked to return False in that case [16:11] (fwiw, i dont think its valid to call python3 like that... you wont get the virtualenv installed things) [16:12] (that is what ./tools/tox-venv does. ./tools/tox-venv py3 python3 -m nose --tests ...) [16:13] smoser: here too :) tox -e py3 -- --tests tests/unittests/test_datasource/test_gce.py:TestDataSourceGCE.test_get_data_returns_false_if_not_on_gce -s [16:14] (and you can run that way by just tox -e py3 tests/unittests/test_datasource/test_gce.py:TestDataSourceGCE.test_get_data_returns_false_if_not_on_gce) [16:14] but ok. let me look. [16:14] ahhh good good, I was getting tired of all the extra typing on those -- [16:15] but tox-venv is faster [16:15] as it doesn't do the setup.py [16:23] smoser: today I'm testing dhclient in init-local for aws with centos [16:24] then freebsd (so I'm adapting that WIP branch and starting to make it actually work properly) [16:51] niluje: forgot to respond earlier about your offer to setup access to a test system for us. I think that is a good offer, we are trying to increase our test matrix coverage and this may assist in upcoming SRUs. While I don't think we have the bandwidth to integrate testing w/ your system, if it doesn't cost anything to allow us to access it. It would certainly assist us as we get a chance to login and validate [16:51] upcoming cloud-init changes. [16:53] let me try typing that with proper grammar and punctuation. niluje: If you can setup system access for us and it doesn't cost anything it might help us in the future when we look at expanding out test matrix. [17:01] niluje, i do have a scaleway account [17:02] registered under smoser@brickies.net [17:03] blackboxsw, [17:03] https://code.launchpad.net/~smoser/cloud-init/+git/cloud-init/+ref/bug/fix-gce-test [17:03] that fixes the openstack metadata service to patch / use-mock correctly [17:09] smoser: correct and it's merged [17:09] oops [17:09] jussec [17:10] hrm smoser I landed that branch you approved in master a few mins ago [17:10] and your change is a rewrite. ok, checking it out now [17:14] smoser: ahh I see what you did to fix it. right the start method returns the actual mocked object so subsequent changes to return_value get honored [17:14] smoser: +1 on that change to avoid the additional decorators [17:15] you'll have a minor conflict w/ master I presume [17:15] as I landed that other fix [19:01] blackboxsw, https://code.launchpad.net/~smoser/cloud-init/+git/cloud-init/+merge/327388 [19:01] you want to review that quick ? [19:02] and ack and then i'll pul it [19:03] reading it smoser [19:10] smoser: approved https://code.launchpad.net/~smoser/cloud-init/+git/cloud-init/+merge/327388 [19:17] blackboxsw, thanks [19:48] blackboxsw, [19:48] =False) [19:48] smoser@milhouse:/home/smoser-public/src/cloud-init/cloud-init$ tox-venv py3 python3 -m nose tests/unittests/test_runs/test_simple_run.py tests/unittests/test_distros/test_create_users.py [19:48] that works. [19:49] but dropping the test_simple_run before test_create_users will fail [19:49] ie, its leaving some mocks or something in place making test_create_users not fail [19:54] powersj, that is the only other non-jsonpatch failure in py36 [19:54] ie, it should be a failure in 3.5 but it was just hidden [19:56] yeah [19:56] test_create_users [20:00] smoser: working on kvm backend for testing. I am trying to generate SSH keys on the image, which works, but then run into ssh_deletekeys, which deletes those keys. Have a best practice for doing this? [20:00] should I generate and inject the key we want to use by hijacking the user_data/cloud-config and put the key in that way? [20:04] so we can ssh into the system and know the keys, right? [20:09] its kind of invasive and system ddpendent [20:09] but maybe [20:10] https://gist.github.com/smoser/b32bb1c33564d1d46971cd9ded2e8477 [20:10] we run our own ssh on port 9999 that reads its own keys and such [20:17] smoser: hmm I was hoping to stick to port 22, so that when we extend to this to cloud providers I don't have to deal with firewall related issues [20:50] powersj, well, that means that you can't really test any of the system ssh [20:52] :\