=== spandhe_ is now known as spandhe [06:48] Hi all. I recently made a new AMI on AWS after doing running through linux system updates. I noticed the new one has some severe lag after startup. [06:48] Per logs, it seems the 'config_apt-pipelining module is taking 4-5 mins to execute now. Anyone run into similar issues before? [07:20] Does anyone know what the apt-pipelining config module does? Is it purely just disabling apt-pipelining? or does it do anything else? [09:28] Checked code, it just writes a config file to apt config dir to disable/enabling apt pipelining, nothing else [09:28] so likely something else causing the issue actually that just happens block that config module from finishing [09:53] rharper: smoser: What clouds do you know of that use network_config? [13:23] Odd_Bloke, it works on openstack with config drive and nocloud currently. [13:23] and the 'fallback' will be in place on others. [13:24] Odd_Bloke currently its in place only for local data sources (not requiring network) [13:24] the next step is to add it for data sources that require a network. example EC2 or Openstack Metadata [13:24] smoser: Sorry, I should have explained what I was looking for more fully: I'm trying to track down a problem with precise's ConfigDrive handling of network_config, and I'm looking for a place where the OpenStack configuration is known-good. [13:25] smoser: Because I don't want to consider fixing it in precise if it turns out I'm just looking at a funky OpenStack configuration. :p [13:26] oh. that. [13:27] Odd_Bloke, what is it that you're looking at? is it openstack metadata service ? or config drive? [13:28] smoser: Config drive. [13:28] ok, so that has a shot at working, but even then i think that probably on precise the cloud-init local job doesn't fully block networking from coming up. [13:28] smoser: (I'm hoping that we'll be able to convince the partner to just not have precise in the region they're seeing this issue, but want to be sure of all the facts before pushing for that :) [13:28] Because it EOLs in a year anyway, and this is a new region, etc. [13:30] and thus if it doesn't block networking coming up, then it best case we ifdown something and then ifup it back up. [13:30] the new stuff is better, in that we block networking from coming up. [13:38] smoser: Well, looking at it, I'm not sure it _does_ work at all. It looks like configuration is put in to keys that aren't later read from; but I want to confirm that I'm not just dealing with a weird OpenStack configuration that cloud-init mishandles. [13:38] smoser: (This was totally refactored by trusty, and that works fine) [13:38] oh. [13:41] So I want a cloud which does network_config "properly" so I can just validate my finding of brokenness. :) [13:42] Odd_Bloke, and you want that to work with precise [13:42] smoser: Well, once we know where we're at, we can go and talk to the partner about whether it's worth making it work. [13:47] Odd_Bloke, ok. quickly reading that.. [13:47] i think that what is there is support for config drive v1 [13:47] which is probably not alive in any openstack cloud [13:47] config drive v2 is what you'd probably see anywhere. [13:47] v2 came probably 3 years ago at least [14:54] smoser: did you see my ping re: xenial cloud-image not getting user ubuntu installed, which breaks when we add keys ? [14:54] yikes. [14:54] no. i didnt. [14:56] smoser: http://paste.ubuntu.com/15621102/ [14:56] ubuntu user not in /etc/passwd, so the ssh key add failed (xenial cloud image from 2016-04-03 ) [14:56] running a synced curtin vmtest should trip it [14:56] hm.. [14:56] (ie new enough xenial cloud image) [14:56] cat /etc/cloud/build.info [14:56] build_name: server [14:56] serial: 20160403-141429 [14:57] that works in lxc at least. [14:57] lxd [14:57] you're not useradding ubuntu ? [14:57] (ie, just laucnhed an instance here and there is a ubuntu user) [14:57] where is that normally added? (default users/groups) ? [14:57] its part of config (/etc/cloud/cloud.cfg) [14:57] right. cloud-init does; [14:59] also, there is another one related to booting an image a second time; http://paste.ubuntu.com/15630937/ [15:08] rharper, so what happened is you failed to get the datasource [15:08] and you used the fallback datasource [15:09] which just generates ssh keys [15:09] heh, *I didn't* fail [15:09] and there is apparently a bug in that where it creates a driectory rather than symlinking [15:09] :) [15:09] well, maybe I'm speaking too soon [15:09] always chance for a PEBCAC [15:10] we're providing the normal seed via iso [15:10] well, it is probably failing to find a source. [15:10] if you can get a aconsole log it will probably mention that [15:11] fallback datasource BAD THINGS TO COME or somethign liek that [15:11] cloud-init.log or ? [15:11] cloud-init.log should have WARN in it [15:11] yeah [15:11] hard to know what to look for [15:11] why would it fail ? [15:11] to find the iso ? [15:12] that's after it actually loads and reads seed from /dev/vdc [15:17] smoser: which datasource should our seed.img in our curtin tests show up under ? (NoCloud) right ? [15:17] yeah. [15:18] http://paste.ubuntu.com/15631292/ [15:20] rharper, ok. i'll take a look in 10 mimnutes trying to finish something up for matsubara [15:20] wouldn't the fallback seed still read and use /etc/cloud/cloud.cfg (which the default users get installed?) [15:20] smoser: thanks [15:21] yeah.. i'm not sure why the warn about the user. === jgrimm is now known as jgrimm-afk [15:38] smoser: so, another reason curtin needs to disable cloud-init network; nic name races; cloud-init emits the systemd link stuff, it classed with the udev rules we wrote and now I got a renam5-eth2 in there [15:39] hm.. [15:39] it shoudlnt clash though. [15:39] as 70-persistent should always be favored [15:39] well [15:39] it's not =/ [15:39] the fallback code decided that my eth2 would ne a nice eth0 link [15:40] then I suppose it got a rename even, and raced (and 70 won) [15:40] but not before ifup had run and setup some link information in the kernel [15:41] ok. i'll start poking [15:41] * rharper is trying again with networking disabled [15:42] that fixed my test run [15:43] it appears that the disturbance of networking takes cloud-init down some other path that fails to use the local datasource [15:43] oh. yeah. [15:43] it does. [15:43] thats why you're seeing the DataSourceNone [15:43] because networking never comes up. [15:44] that seems odd; especially for a nocloud ds [15:44] but I'm sure I'm missing something [16:44] rharper, what curtin tests were failing for you ? [17:15] rharper, ^ i just ran successfully on diglett all tests except for one trusty one (which i think failed due to io load with --processes=-1) [17:16] ie: http://paste.ubuntu.com/15634350/ [17:40] smoser: it's a new one I'm adding for vlan stuff [17:40] well, pfft. [17:40] in particular, it's due to the RTNETLINK stuff [17:41] smoser: I think the more general concern is that in the case that network fails, cloud-init fails even when nocloud datasource is present [17:41] and networking is failing due to cloud-init networking (in my case, emitting the systemd link file is the direct cause) [17:41] so before I add the vlan case which triggers this 100% of the time [17:42] I'm adding the code to emit network: config: disabed in curtin target [17:42] well, boot failed because cloud-init wanted networking (as do other things in boot). they wait until networking is available (network.target) [17:42] and there was no network.target reached, so it failed. [17:43] i dont think its related to systemd link file [17:43] see /lib/udev/rules.d/80-net-setup-link.rules [17:43] .link files are only paid attention to if NAME=="" [17:44] and your 70-persistent.... would have set NAME= [17:45] yes [17:45] it's related [17:45] the link file forced a iface rename of eth2 to eth0 [17:45] and it's racing with existing data in the routing table [17:45] how ? [17:45] I don't fully understand the sequence [17:45] but when the vlan config goes to ip set link up on the interface for the vlan [17:46] it finds an existing route [17:46] and fails [17:46] in the routing table, I see things like eth2-rename [17:46] and if I disable cloud-init networking, the eni is perfectly fine and solid [17:46] I'll post the branch in a minute [17:46] if you'd like to debug it more [17:54] smoser: https://code.launchpad.net/~raharper/curtin/trunk.test-vlan/+merge/291023 ; if you remove the bit in curtin/net/__init__.py where I now write a config to disable cloud-init networking, the XenialTestNetworkVlan testcase will fail and you can boot the install disk to see what's going on === jgrimm-afk is now known as jgrimm