=== shardy is now known as shardy_lunch [13:22] Hello everyone. I'm trying (continue to try) to make ConfigDrive working on Debian. It just can't use ConfigDrive, as far as I can see. There is ConfigDrive module, but it is ignored (?). Can someone help me? https://snag.gy/ZcfIvd.jpg [13:22] cloud-init 0.7.7 === shardy_lunch is now known as shardy === rangerpbzzzz is now known as rangerpb [13:46] Оh, I found what's wrong. 0.7.7 works really badly with ConfigDrive. After I installed 0.7.9 (from Sid, it start to work). Still, rather annoying. [14:10] amarao, i'm not sure what would be broken, not enough info there to know [14:10] you could potentially file a bug against debian asking them to patch their 0.7.7 version with a fix for you [14:11] i'mn not sure what is wrong, we've definitely used config drive successfully for quite some time [14:11] but if 0.7.9 works, that seems simplist [14:11] rharper, so... bonds are probably never going to work in a container [14:11] at least not unprivilidged [14:11] smoser: we should ask stgaber [14:11] well, they do kernel stuff [14:11] in /sys/ [14:12] oh... [14:12] but... namespaces? [14:12] wait we talked about this once. he thought they *should* work, if they were implemented differently [14:12] huh [14:12] that ifupdown path uses /sys/ which is not namespaced [14:12] hrm, I can look at the networkd code [14:12] not that I Want to [14:12] [14:12] http://paste.ubuntu.com/24215499/ [14:13] thats what happens when you try (after you've modprobed 'bonding' on the host) [14:13] but in any case, the rename should avoid the mac duplication [14:13] rharper, i'm not arguing otherwise :) [14:13] ok [14:13] i just wanted to be able to recreate in a container [14:13] i looked at examples/tests/bonding_network.yaml though (curtin) for an example [14:13] yeah; that looks broken; [14:14] shouldnt the bond have address of one of the nics ? [14:14] shame they can't namespace mount /sys [14:14] it will [14:14] when you enslave a device to a bond [14:14] it will pick up one of the slaves' mac address as assign to to bondX [14:14] http://bazaar.launchpad.net/~curtin-dev/curtin/trunk/view/head:/examples/tests/bonding_network.yaml [14:14] but why do we explicitly declare its mac then [14:14] the bond itself does not require an ip or anything [14:15] that's not required but it's OK [14:15] hm.. ok. [14:15] but that's not the issue [14:15] the mac is going to get duplicated when you list nics by mac [14:15] ls /sys/class/net/*/mac_adress [14:15] i know. i understand the issue, i'm not arguing that it is not a bug. [14:16] ok; I'm somewhat confused with the tangents [14:16] they're not tangents. :). i was trying to recreate it. [14:18] in a container yes; it recreates just fine in a VM; I certainly support fixing/filing bugs against the container [14:18] i was hoping to be able to look around and find a different way to ignore 'bond' devices than "they are virtual" [14:19] as i have other virtuals that id o not wish to ignore. [14:19] do you have an example of how to recreate? [14:25] smoser: lemme pastebin, it's basically launch an image with the attached network-config from the bug and create a VM with the right macaddress; [14:26] I think bond is a special w.r.t mac cloning; I don't think other netdevs assume the mac of an underlying secondary device [14:28] (the other virtuals are 'tap', lxc) [14:28] macvtap, tun, dummy; ppp, etc... [14:28] there are a lot of them [14:28] https://www.freedesktop.org/software/systemd/man/systemd.netdev.html [14:48] rharper, https://code.launchpad.net/~raharper/cloud-init/+git/cloud-init/+merge/320291 [14:50] rharper, sure there are lots of virtual types. but when renaming, i'd rather just specifically ignore bonds than ignore all virtual types (which breaks renaming in lxc as it's nics are virtual). [14:51] anyway, i suggested some changes at that MP, if you want to chat about them we can do that. they're small, and then i think we should get this in [14:55] smoser: replied for the v2; most are fine; I think I do want to make sure the get_cfg_by_path is sane; [14:56] as I said, without that change, the unittest doesn't pass; I've not used pdb to walk through the get_cfg_by_path to see why it didn't fetch value from 'system_info' but that's what I needed for it to pass unittest [14:58] sure. you can and should. but either yoru change or the '_get_arch_package_mirror_info' is wrong. [14:58] then the unit test is being passed the wrong config is suspect [15:11] smoser: so if we provide a network-config in a localds seed it's on the iso but we don't copy it off AFAICT; [15:11] we should do that [15:13] "the iso" [15:13] meaning NoCloud ? [15:13] yeah [15:13] cloud-localds [15:13] its probably just stored in the object [15:13] I can put in a network-config file in there and pass via iso seed [15:13] yeah [15:13] that gets pickled [15:14] but we write out user-data and other .cfg files [15:14] just something helpful to see [15:14] well, we write out user-data. [15:14] so, I'm looking at the util.get_cfg_by_path and I don't see where it should know to pre-pend 'system_info' in the distro object ... what am I missing? [15:15] the Distro() object is passed cfg that is just the system_info() portion [15:15] system_info: portion [15:15] so your unit test is probably just wrong in that it passes it the whole thing. [15:15] i suspect [15:16] well, heh not my unittest; I just added some on top of how the others worked [15:16] but I understand what you're saying now [15:17] the unittest it supplying the distro object the whole cfg, not just the system_info portion [15:17] if that's how it works, I can fix up which cfg the unittest passes into the Distro class [15:19] but do you agree that *one* of the two things is wrong. [15:22] rharper, cloudinit/stages.py in Init.distro is where the distro object is created. [15:22] and there it clearly gets the system_config [15:22] well, maybe not clearly :) [15:30] smoser: it works fine if I pass the write system_info key in; [16:15] powersj, around ? [16:15] i have some good news and some baad news [16:15] yes [16:15] :\ [16:15] which first ? [16:15] (has enough bad news with the test failures this weekend) [16:15] "baad" news, as you say, first please [16:16] bad news is that something needs improvements in order to avoid failures in cloud-init-integration-[xyz] [16:16] good news is that those are valid failures. [16:16] similar to what we saw with curtin [16:17] you added a test that tested a new feature that was not goign to work with old cloud-init [16:17] and then we ran tests with old cloud-init expecting that new feature to work [16:17] ahhh! ok [16:17] and \o/ surprise! [16:17] it didnt work :) [16:17] ok so I need to update the proposed tests to run from source of proposed [16:18] yeah, i guess we need to do somethinng like that. [16:18] and similar the daily to test the daily [16:18] ok that I can fix :) [16:18] currently looking at curtin CI and why jenkins launchpad plugin is passing the wrong arguments [16:19] what sucks is that twice (curtin and cloud-init) now i've spent a couple hours thinking and debugging [16:19] powersj: the new version of tests can do '--ppa=cloud-init-dev/daily', which may be useful for the daily tests [16:19] why is this breaking! ... when the answer is that fixing it in trunk does not fix all versions everywhere :) [16:19] smoser: right [16:20] magicalChicken: that will work, I just need to make sure the unit tests that are ran are from that ppa as well. Otherwise we get this test mismatch between what code is ran versus what tests are ran. [16:20] magicalChicken, thats cool. but what we (re)found out is that we need to test the integration tests that were in the source of the package. [16:20] or we have to deal with the mix-matched expectations [16:21] and somehow version every test [16:21] oh, right i missed the unit tests bit [16:21] this is the integration test path [16:21] powersj, added a unit test for a new feature that went in [16:21] and then we tested that new integration test with old cloud-init [16:21] and it failed [16:21] :) [16:21] which is expected. [16:21] ok, that makes sense :) [16:21] at least the test works [16:21] ie, we ran trunk integration test and old code [16:21] right [16:22] we found this out with curtin a while ago too [16:22] DUH! [16:23] the solution we came up with for curtin was that we installed the package and then downloaded the source with apt [16:23] so that they were guaranteed to match expectations [16:23] simple but inefficient solution here is to clone the branch we want to test and use tree_run inside that branch, so both cloud-init and integration tests are the version in that branch [16:24] has the disadvantage that a deb is built each time [16:26] magicalChicken, well, we're interested in testing *exactly* what is in -proposed or in the archive provided. [16:26] not something we build, which *might* work out to be the same. [16:27] smoser: fair enough, pulling the deb-src like with curtin should work then [16:27] in curtin it was easier, because we actually install curtin into a container and use it there. [16:32] smoser: when you get a chance, I wanted to discuss how we handle v2 as input; this goes to why I had a .config property on the NetworkState class (it enabled passthrough of v2 in the netplan renderer); I updated a comment in the MR [16:49] ok. http://c.brickies.net/hangout [16:51] k [17:56] smoser, paulmey Odd_Bloke larsks Hey would you guys mind peeking at https://code.launchpad.net/~bbaude/cloud-init/+git/cloud-init/+merge/320411 ? [18:24] smoser: I pushed a fix for the proposed tests to avoid mismatches. It is the same as what we do for curtin. As far as the other failures for integration-[x,y,z], those were introduced last Friday see LP: #1674317. I pushed a MP with fix and ran one test locally. [18:24] https://code.launchpad.net/~powersj/cloud-init/+git/cloud-init/+merge/320414 [18:30] can someone take pity on me and remind how to run the cloud-init test suite locally? [18:31] rangerpb: if you are trying to run what the CI tests run, then you need the 'tox' command [18:31] just execute tox ? [18:31] yep it will then go and build the python envs needed to run the checks we have seutp [18:31] thanks man [18:32] e.g. python2 and python3 unit tests + flake8 + xenial based tests [18:32] np [18:35] powersj, can the tests be run outside the context of tox or is that hopeless? [18:36] i have quite a few things to fix apparently [18:37] rangerpb: you can run outside of tox. Tox is nice because all you need is tox and it keeps you from needing to install other things. However, seeing your failures, I would start with just running nose. Familiar with that? [18:37] nope [18:38] smoser: I've pushed an update to the v2 net branch; I believe I've fixed up all of the issues you raised [18:38] im familiar with running unittests straight up but i think these tests go beyond just that [18:38] rangerpb: essentially nose runs the unit tests and the command would be something like 'python -m nose tests/unittests assuming you have all the requirements [18:39] ok [18:39] there are quite a few things to install, so best to do in a VM, LXD, or virtualenv [18:43] rangerpb: also looking at your failures, TestAzureBounce is trying to read from /sys/firmware/acpi/tables/OEM0, but that is not always going to be on a system. So you would want to mock that or some how deal with it. [18:44] yeah, i did some refactoring there that is biting me in the ass [18:50] powersj, is there a way with tox to be more specific about what to test ? [18:53] rangerpb: yes you can specify an env. by running 'tox -e ' like 'tox -e py27' to run only the python2 tests. Then if you want to go further and only specify a certain test, you can do that by passing more arguments like 'tox -e py27 -- tests/unittests/test_sshutil.py' to run only the tests in test_sshutil.py [18:53] think i got it [18:54] and done to class and method it looks like [18:54] yes! [19:09] rharper, [19:09] 2017-03-20 19:08:56,496 - network_state.py[WARNING]: NetworkState Version2: missing macaddress [19:10] that is expected path in lxc [19:10] right, if you don't have a macaddr and no other match criteria it leaves a warning [19:10] so i dont want the warning. [19:10] as its expected path. [19:10] well [19:10] no [19:10] but I understand; LXD knows the mac [19:10] no reason to put it in the config it generates [19:11] I find with dropping the warning; but oracles should emit everything they know [19:11] well, it certainly *could*, but it does not. so this would make a WARN statement in /vr/log/cloud-init.log in all containers [19:12] rangerpb: That doesn't set the hostname back, right? [19:12] Odd_Bloke, that == ? [19:12] rangerpb: Your MP. [19:13] i understand your position, but since its expected case... we can't generate warnings when its operating as expected. [19:13] Odd_Bloke, no it doesn't... are you talking about the context method I removed? [19:14] smoser: ack [19:14] rangerpb: Yeah; I believe we set it back previously because cloud-init's hostname handling gets confused later on if we don't. [19:14] Odd_Bloke, in talking with paulmey, he said it isn't needed. [19:14] i suppose there is some weird use case? [19:14] rangerpb: If it isn't needed, why are we setting the hostname at all? [19:15] azure needs it [19:15] That should be left for cc_set_hostname to do later in the cloud-init run. [19:15] azure needs it when the nic is bounced [19:15] rangerpb: OK, so I'm confused about what paulmey said wasn't needed. [19:15] :) [19:15] azure needs it when the nic is bounced so the dns can get set up correctly [19:15] Odd_Bloke, that make sense? [19:17] rangerpb: Right, so paulmey said that we didn't need to set the hostname _back_? [19:17] rharper, testing http://paste.ubuntu.com/24216898/ [19:17] In Azure terms, he's (presumably ;) correct. In cloud-init terms, however, ISTR that cc_set_hostname has some behaviour that is affected. [19:17] Give me a minute to see if I can remind myself of what it was. [19:18] smoser: ok [19:18] 2017-03-20 19:17:57,467 - network_state.py[DEBUG]: NetworkState Version2: missing "macaddress" info in config entry: eth0: {'dhcp4': True, 'dhcp6': True} [19:18] smoser: I think it's fine to pull it out entirely; one could submit the same v2 config to a VM [19:18] hm.. [19:18] smoser: paste me your netconfig and I'll fix [19:18] ok then. [19:19] lets just drop it to a debug [19:19] rharper, http://paste.ubuntu.com/24216919/ [19:20] Odd_Bloke, He didn't ... but ill follow up with him for sure [19:20] created with http://paste.ubuntu.com/24216920/ [19:20] actually, created with http://paste.ubuntu.com/24216923/ [19:20] I think we can relax the macaddr in the parse [19:20] rangerpb: OK, so when you said "in talking with paulmey, he said it isn't needed", what specifically were you referring to? [19:20] I did on the rendering side [19:21] Odd_Bloke, that context method [19:22] OK, I see. [19:22] rharper, and those errors on your mp by the c-ii are valid [19:22] i didnt' seem them , but no 'indent' on py27 [19:22] I don't know what the c-ii is [19:22] rangerpb: So the problem with setting the hostname and not setting it back is that cc_update_hostname (which is distinct from cc_set_hostname) has some logic that compares the previous hostname cloud-init detected to the desired one and behaves accordingly. [19:24] and how is the desired one determined ? [19:24] So in the case that "update_hostname: B" is given, but Azure wants "A", I think the following can happen: (1) Azure DS will set hostname to "A", (2) cc_update_hostname will set hostname to "B", (3) user reboots, (4) Azure DS sets hostname to "A", (5) cc_update_hostname compares previous-hostname and cloud-config, and determines no hostname change is needed. [19:24] Leaving the instance with hostname "A" instead of "B". [19:25] I _think_ that's the failure case. [19:25] Odd_Bloke, so ... you would prefer the use of the context method yes? [19:26] i need to work thsi out because its right where the test cases are failing and I dont want to do this twice [19:26] rangerpb: Well, I'm pretty sure we need to set the hostname back somehow. I'd done that with the context manager, but I'm not fussy. [19:27] yep ok [19:27] Odd_Bloke, ok, let me see if I can work that back around [19:28] though I do feel a little bit like I am stuck between Mom and Dad here [19:28] would nice to nuke the agent path in DataSourceAzure to simplify things [19:29] rharper, c-i bot [19:29] https://code.launchpad.net/~raharper/cloud-init/+git/cloud-init/+merge/319259 [19:29] err.. https://code.launchpad.net/~raharper/cloud-init/+git/cloud-init/+merge/320291 [19:29] smoser: looks like I need a py2 or 3 wrapper for using indent from textwrap [19:29] rangerpb: :( Sorry. [19:29] I ran into it a while ago, need to look up the wrapper [19:30] rharper, http://paste.ubuntu.com/24216970/ [19:31] smoser: k [19:31] do you want me to pull and push? [19:32] Odd_Bloke, what is the purpose of this conditional? -> https://git.launchpad.net/cloud-init/tree/cloudinit/sources/DataSourceAzure.py#n123 [19:34] Odd_Bloke, and also why is line 124 in that important? shouldnt this always be run ? [19:35] smoser: ok, dropped the warning, and picked up your diff; commited and pushed back up [19:35] bbiab (picking up kiddos) [19:36] rharper, i can do it here. [19:36] ok [19:39] rangerpb: I believe that allows people to modify whether or not they want this behaviour. [19:40] I don't really recall why, I'm afraid. [19:42] i think Odd_Bloke is probably right above. [19:42] wrt the case that the temporary hostname is protectign against. [19:43] https://code.launchpad.net/~raharper/cloud-init/+git/cloud-init/+merge/319259 rharper, wrt your comment there, you could have used the same merge proposal. just push over the top of the one (--force) and launchpad will figure it out. [19:55] smoser: it pushed fine (without force) I did push to the same MP [19:55] at least I think I did [20:00] you pointed to another one [20:00] "I lacked the git-fu to force a proper rebase against itself, so I've pushed a new branch for review:" [20:09] oh, sorry [20:10] I don't think I repointed to the old one; but hopefully you'll find the new commits in my rebase-netconfig-v2-passthrough branch [20:18] smoser: I think I understand; you're saying I could have forced pushed my rebase branch onto the previous MR; and yes --force would have done that; I sort of have an irrational fear of --force and so I worked around it by starting afresh and then abandon the other; I think it does mean split discussions (which may be one of your concerns) [20:19] i dont really are... i was just letting you know. as even a rebase would have required --force [20:20] rharper, anyway. for right now i merged your branch. [20:20] i'm running an integration test on a few modules and then going to upload to ubuntu. [20:20] ok [20:21] so on the rebase; I've found that if I pull from remote (and basically re do the rebase a second time) and then push; it goes in, but I find that I need to do the rebase effectively twice annoying; I suppose I should just use --force there [20:22] or figure out why I can't fast-forward the remote (merge trunk in) === StoneTable is now known as aisrael === lunarlamp is now known as mariusv [20:30] powersj, https://jenkins.ubuntu.com/server/job/cloud-init-integration-x/123/display/redirect [20:31] that is curious. [20:31] Odd_Bloke, rangerpb, sorry I wasn't here earlier... [20:31] what is cloud-init-integration-x ? [20:31] is that daily ppa installed into x ? [20:32] I didn't know about the cc modules later on needing the original 'localhost' hostname [20:34] I was told that the context method was necessary for waagent, which didn't seem to be true [20:35] but I guess it's necessary to do the network bounce at that time in the cloud-init workflow [20:36] paulmey, i think i have another patch set up ... just booting the image on azure as we speak [20:36] if it works, ill update you and highlight you [20:36] good, thanks [20:38] smoser: that is run by checking out master, building, and running the integration tests. Those errors are fixed from my merge you pulled shortly ago. [20:39] long story short, master was broken and our integration tests found it. Ideally our tox tests would have caught it, but flake8 is too lax. pylint would have caught it though. [20:42] the tests found a bug \o/ [20:42] i think that's the first one [20:42] we got one last week too I believe :) [20:42] nice [20:46] powersj, yeah... iwas confused for a bit. i was thinking it was just testing xenial or xenial-proposed [20:46] which should not have shown the error. [20:46] ok, we do have a xenial-proposed test cloud-init-integration-proposed-x [20:53] rharper, you there ? [20:53] here [20:53] https://bugs.launchpad.net/ubuntu/+source/cloud-init/+bug/1669860 [20:53] the bond rename thing... you have the same nic there miultiple times in the config [20:54] oversite [20:54] http://paste.ubuntu.com/24217410/ [20:54] ok. [20:54] wait [20:55] how do I get the ci tests to run again ? [20:55] tox -e citest -- run -v -n zesty --deb=cloud-init_all.deb [20:55] thats the easiest thing. [20:55] smoser: sorry, I'll upload a fix, the upper values should have been gone [20:55] rangerpb: are you trying to run integration tests or just the simple unit tests? [20:56] right ok. [20:56] integration, the unit tests finish now and I updated my PR [20:56] rangerpb: awesome! then trying the command smoser printed to run the integration tests might be good too [20:56] thats just local though right ? [20:56] the cloud-init_all.deb is produced if you build your working tree [20:58] smoser: it appears that you could read the list of devices out if /sys/class/net/bonding_masters [20:58] will need to create a second or third bond to see what that looks like with multiple, but that could provide a filter [20:58] rharper, i was justlooking and thinking maybe just ignore it if /sys/class/net//bonding [20:59] if that existed [20:59] well, the bond0.108 vlan also has the mac of the bond, [20:59] so that's a vlan, but it's mac is duplicate because of bond [20:59] it get's a bit stacked [20:59] and we can have a bridge over a bonded vlan [20:59] or vlan'ed bond [20:59] which was sort of why I was saying no-virt [21:00] but I know the lxc case exists [21:02] maybe its simpler. [21:02] apply_network_config_names() already filters on 'physical' [21:02] so we do have a bit of that information present. [21:02] but i guess it doesnt have the macs. hm. [21:04] nah. i dont think that helps [21:05] that's on *config* [21:05] input [21:07] my point was that when we "get_current_state"; the non-virt filter really does the right thing (except for LXC) which certainly could handle renames outside of cloud-init; frustrating ... [21:15] i got to run, i think the right hting to do is to just skip the bonds when collecting info for bymac in this case. [21:15] i wonder if it will even work though [21:15] i wonder if you can rename a part of a bond [21:15] seems unlikely [21:17] yeah, you cant. [21:17] but the goal i guess is nto that, but rather to not attempt the rename. [21:25] yeah [21:26] I'll be running trunk cloud-init through curtin vmtests setup to do networkd/netplan testing [21:26] will update here if I find anything [21:26] thanks for helping push this through === daniel1 is now known as Odd_Bloke === rangerpb is now known as rangerpbzzzz