[03:08] blackboxsw: ping [03:08] hey [03:08] pong [03:08] i was just about to accept your merge [03:08] i was going to just put a comment in [03:08] except UrlError as urle: [03:08] smoser: sorry was tweaking the description to be more appropriate [03:08] message = str(urle) [03:08] # older versions of requests may not get the url [03:08] # into the message. [03:08] and fix up the commit message. [03:08] yeah. [03:08] and that'd be it. [03:08] ahh sounds good. want me to push [03:08] then i'im good. [03:08] i'll comment and you can take it. [03:08] +1 will do [03:09] just approved with changes on the mp [03:09] smoser: I was gonna land https://code.launchpad.net/~rjschwei/cloud-init/+git/cloud-init/+merge/333575 [03:09] please grab [03:09] thanks [03:09] i'm fine with that too [03:09] I can wait til morning on that 2nd branch though [03:09] ok will do tonight [03:09] fix c-i first though [03:09] agreed [03:09] ie, the urlerror message first [03:09] just to have less broken tips [03:09] thanks [03:10] and i'm out. [03:10] later [03:10] thanks have a good one === shardy is now known as shardy_lunch === shardy_lunch is now known as shardy [15:37] oh for petes sake jenkins [15:37] blackbox fixed cloud-init but then jenkins cries [15:37] https://jenkins.ubuntu.com/server/job/cloud-init-ci-nightly/161/console [16:50] smoser: can we rerun nighlty? [16:50] to get a good value [16:51] powersj: smoser: I'll fixup qa-scripts/scripts/launch-ec2 for bionic while smoser is working a unit test for fallback_nic on upgrade [16:51] launch-ec2 on my end was working from xenial :/ I'll fixit up on bionic now [16:53] $ echo raw support for rharper | haste -r [16:53] https://hastebin.com/raw/zurabikoko [16:53] \o/ [16:53] can you hastebin your haste tool ? [16:54] blackboxsw: https://hastebin.com/akazezaroh [16:54] thats what i have so far [16:54] thx smoser will pull that in [16:54] and handle other issues [16:59] blackboxsw: it seems wierd that boto doesnt expose 'InvalidGroup.NotFound' [16:59] or any of those. [17:00] or even 'code' on the Error or seomthing [17:00] Yeah that seems broken. string parsing in the error message is not appropriate [17:01] not an appropriate design decision [17:01] I wonder if there's a structure I can import. [17:01] I'll look at boto3 modules [17:06] ahh recent python3-boto3 in bionic has some exception goodnees [17:44] smoser: powersj just pushed qa-scripts/scripts/launch-ec2 for bionic [17:45] de823f2..58c9a97 [17:45] will test the failed upgrade [17:47] blackboxsw: https://code.launchpad.net/~smoser/cloud-init/+git/cloud-init/+ref/bug/1732917-fix-fallback-interface [17:47] see what you think about that [17:47] i dont think we actually *need* the setter [17:47] (though i could add one) [17:49] smoser: spelling inteface [17:52] other than the typo on the class variable '_fallback_inteface' it should work. [17:53] blackboxsw: https://hastebin.com/kivunetopa [17:53] (hean, and i fixed those and pushed --force) [17:54] tox now passes [17:54] blackboxsw launch_ec2 is really nice. [17:55] per your latest 'haste' I think boto3 on bionic may have an official exception for that. I'll try (changing my keypair [17:56] bummer, still botocore.exceptions.ClientError: An error occurred (InvalidKeyPair.NotFound) when calling the DescribeKeyPairs operation: The key pair 'cloud-init-integration-chad' does not exist [17:56] ok taking your try/except changes [17:56] and thanks [17:56] I want to add the ipv6 setup support and hit the blog with it in hand. [17:57] let's get past this upgrade SRU bump [17:57] blackboxsw: i laucnhed an instance, ssh'd to it [17:58] and then it disappeared [17:58] oh. you terminated it for me ? [17:59] seems like keep_alive should be default :) [18:01] smoser: yeah --keep-alive [18:01] sorry [18:01] could surface kill-it :) [18:01] I'll change that param [18:02] --destroy :) [18:08] ugh [18:08] launched instance [18:08] typed 'apt-get update' [18:08] 0% [Connecting to security.ubuntu.com (2001:67c:1560:8001::14)] [18:08] hung [18:10] same [18:10] other apt repos worked [18:11] all local to amazon though [18:13] works on 0.7.7 [18:14] so thats bad news [18:14] the others resolve to ipv4 though maybe ? [18:15] yet why would 0.7.7 work [18:17] well, we get ipv5 address [18:17] so something notices that and returns the ipv6 address for the security.ubuntu.com [18:18] and then we do not have outbound connectivity i guess [18:19] upgrading from 0.7.7 xenial (with working apt connectivity) -> 17.1 [18:19] that should be fine, no ? [18:19] checking to be sure [18:20] blackboxsw: did you recreate this ? [18:20] the failure in that bug [18:21] not yet smoser [18:21] trying to though [18:21] i launched instance [18:21] upgraded [18:21] rebooted [18:21] no WARN [18:22] ugh [18:23] no error on my side. on upgrade path. [18:23] trying specifically from 0.7.9~233 [18:23] for my next pass [18:31] hm. [18:33] hmm is right, from 0.7.9 -> 17.1 (upgrade without clean) I reboot without error [18:33] as cloud-init doesn't re-run [18:34] hrm. checking that bug traceback again [18:39] blackboxsw: perhaps this is not on ec2 [18:40] he never says he is. [18:40] definitely datasource got used. but nots ure. [18:41] smoser: wierd comment from him [18:41] https://bugs.launchpad.net/ubuntu/+source/cloud-init/+bug/1732917/comments/1 :( hmm [18:41] Launchpad bug 1732917 in cloud-init (Ubuntu) "17.1 update breaks EC2 nodes" [Undecided,New] [18:41] he says the failure happens when restarting the cloud-init? but goes away when restarting the node? [18:41] I'm misreading that [18:42] I'm just not really sure what that's saying [18:43] yeah. [18:43] ohh maybe running cloud-init init or something? [18:43] we could see what happens on openstack if we set it to use the Ec2 datasoruce [19:00] blackboxsw: good news is that this isnt as serious as it seemed at first [19:03] yeah, I'm just trying to see if maybe a complex networking setup would cause this? [19:03] not sure [19:03] we need to identify the issue witih ipv6 too [19:03] the hang on security.ubutnu [19:03] stepping away for 20 [19:03] gotta help w/ lunch [19:27] blackboxsw: https://code.launchpad.net/~smoser/cloud-init/+git/cloud-init/+merge/333905 [19:28] ugh [19:29] its not too big a deal i dont think [19:31] smoser: blackboxsw: do you have logs from the upgrade cloud-init reboot ? [19:32] http://paste.ubuntu.com/25982990/ [19:32] thx [19:32] there is a /var/log/cloud-init [19:33] http://paste.ubuntu.com/25982994/ [19:33] 2late [19:33] line 109 is intersting there. [19:33] oh. thats tmp file deletion [19:33] lucky it didnt fail [19:34] smoser: this is both original and upgraded in the same file ? [19:36] smoser: your paste looks like launch, reboot, upgrade, reboot ? is that right ? [19:37] 2017-11-17 18:32:43,838 - stages.py[DEBUG]: cache invalid in datasource: DataSourceEc2 [19:37] first time i;ve seen this [19:37] smoser: actually, I'm really confused; [19:37] 2017-11-17 18:17:36,206 - handlers.py[DEBUG]: finish: modules-final: SUCCESS: running modules for final [19:37] 2017-11-17 18:19:06,783 - util.py[DEBUG]: Cloud-init v. 17.1 running 'single' at Fri, 17 Nov 2017 18:19:06 +0000. Up 102.29 seconds. [19:38] rharper: that was launch with old, upgrade, reboot, possibly more reboots. [19:38] 0.7.9 finished at 18:17 [19:38] then there's a single mode ? [19:38] after upgrade ? [19:38] I would have expected the reboot [19:38] i dont thin ki ran that. [19:38] well [19:38] your log shows you did [19:38] * rharper looks at blackboxsw [19:38] oh [19:39] chad might have run that for me. [19:39] --proposed reboots [19:39] same in changes [19:39] 2017-11-17 18:31:21,520 - util.py[DEBUG]: Cloud-init v. 17.1 running 'single' at Fri, 17 Nov 2017 18:31:21 +0000. Up 277.24 seconds. [19:39] 2017-11-17 18:31:21,521 - stages.py[DEBUG]: Using distro class [19:39] what's 17.1 single mode doing ? [19:40] hm. [19:40] yeah not sure there [19:40] oh [19:40] don't we just boot; sudo apt update && sudo apt install cloud-init && reboot ? [19:40] its the upgrade [19:40] let me see. [19:40] what is upgrade doing ? [19:40] 2017-11-17 18:31:21,523 - cc_apt_pipelining.py[DEBUG]: Wrote /etc/apt/apt.conf.d/90cloud-init-pipelining with apt pipeline depth setting 0 [19:40] 2017-11-17 18:31:21,523 - util.py[DEBUG]: Reading from /proc/uptime (quiet=False) [19:40] 2017-11-17 18:31:21,523 - util.py[DEBUG]: Read 14 bytes from /proc/uptime [19:40] 2017-11-17 18:31:21,524 - util.py[DEBUG]: cloud-init mode 'single' took 0.063 seconds (0.06) [19:40] I guess it's fixing up the apt conf ? [19:41] but, that reloads the on-disk object prior to reboot [19:41] debian/cloud-init.postinst [19:41] which might be what breaks apt to security,ubuntu? [19:41] it only changes pipelining [19:43] i'm guesing that code should be version-fixed in some way [19:43] no [19:45] but that is just noise [19:45] ok [19:45] just walking through the log [19:45] I didn't expect that [19:45] yeah, we should probably fix that [19:46] ah [19:46] on ec2, instance is always invalid [19:46] I have a branch, but didn't finish it, to read instance_id from sys/dmi [19:47] so we never read the cache at local time [19:47] ahh o [19:47] ? [19:48] we need to capture the system_uuid to compare [19:48] right. we always re-discover. because there is no check [19:48] yeah [19:48] we don't do that [19:48] so, the local cache check says, it;s invalid [19:48] that's expected at this point (we always do this on ec2) [19:50] hmmm is it possible that get_fallback_nic returns None on some platforms [19:52] but it would have blown up in local mode, no ? [19:52] if I'm reading the bug log right, ti was stage init (versus init-local) [19:55] yeah it should have fallen apart in init-local [19:56] right if we were Ec2 proper, we wouldn't actually get to the DatasourceEc2 [19:56] well, Local exits on non-ec2 [19:56] we would've already detected DatasourceEc2Local and not run init-network [19:56] yeah [19:56] thats why i asked if he was on Amazon [19:56] i dont think it is [19:57] and that we can try on serverstack [19:57] you can force it to rn ec2 even on Openstack ? instead of the OpenstackDS ? [19:57] if we run dpkg-reconfigure cloud-init we can force it right [19:57] just uncheck OpenStack [19:58] I *think* [19:58] * blackboxsw fires up my vpn [20:01] ok creating a xenial instance and will attempt the upgrade [20:22] ok clean reboot on 0.7.9 openstack instance w/ OpenstackDatasource gets me a warning banner [20:30] and upgrading/rebooting doesn't hit that traceback about fallback_nic on the obj.pkl because Ec2 claimed invalid obj.pk and recreated it. [20:31] so Openstack images limited to Ec2Datasource can't reproduce this on upgrade path [20:32] Openstack-ec2datasource: ✔ [20:33] here are the logs as that's a bit complex [20:33] here are the logs http://paste.ubuntu.com/25983308/ [20:35] and for the record dpkg-reconfigure cloud-init did allow me to unset OpenstackDatasource on an openstack instance [20:36] sure. and it should. [20:36] just felt I needed to affirm my "I *think*" comment [20:37] smoser: I'm testing your sandbox dhcpclient branch [20:37] will approve shortly [20:41] blackboxsw: hrm; so we don't yet have a plausible path where we reload an EC2 datasource [20:41] yeah not that I can figure currently [20:42] what about AliYun ? [20:43] it can run at local and net (DEP_FILESYSTEM, DEP_NETWORK) [20:44] and it will get the .fallback and the network config properties, but EC2Local won't run [20:44] which I think get's us the path we're on; that the variable defaults to None, and no path to set it to a fallback value that's not None [20:45] smoser: do you have a aliyun account ? [20:45] rharper: no. idont think so. [21:00] maybe we need to spitball, but smoser your patch seems like it would fix this path, however we got there [21:01] blackboxsw: yeah. i think so too. :) [21:01] and we need the better save too === Hazelesque_ is now known as Hazelesque [21:05] so smoser yeah with public ipv6 configuration, I can't get to security.ubuntu [21:05] as in, if I dhcp6, apt timesout [21:07] blackboxsw: is it possible that our security group is just set up incorrectly ? [21:07] not allowing outbound ipv6 [21:08] ahh very [21:09] blackboxsw: that does somewhat still identify a regression [21:09] blackboxsw: I'm happy with the smoser patch; and I suppose that we can't yet find a path to the failure should mean that the impact is narrow; but it's rather frustrating that it;s not obvious how we hit that path [21:09] but its not really one we could do something about [21:09] we can't easily enable ipv6 when it was enabled in the metadata and then not have the system use it. [21:15] blackboxsw: rharper chat ? [21:20] y [21:20] https://hangouts.google.com/hangouts/_/canonical.com/cloud-init?authuser=0 [21:25] Yeah lost network there in amin [21:34] approved https://code.launchpad.net/~smoser/cloud-init/+git/cloud-init/+merge/333905 [22:07] blackboxsw: old deb [22:07] wget https://launchpad.net/ubuntu/+archive/primary/+files/cloud-init_0.7.9-233-ge586fe35-0ubuntu1~16.04.2_all.deb [22:07] thx smoser [22:09] ddpkg install that deb [22:09] rm -Rf /var/lib/cloud /var/log/cloud-init [22:09] reboot [22:09] apt-get install cloud-init [22:10] cloud-init init [22:10] then i tried to fix with my deb (dpkg -i) [22:10] and run cloud-init init [22:10] again [22:10] 2017-11-17 22:05:41,781 - DataSourceEc2.py[WARNING]: unexpected metadata 'network' key not valid: None [23:27] ok success [23:27] functional branch is at [23:27] https://code.launchpad.net/~chad.smith/cloud-init/+git/cloud-init/+ref/fix-ec2-fallback-nic [23:28] needs tests