[01:34] <smoser> blackboxsw: https://jenkins.ubuntu.com/server/view/cloud-init,%20curtin,%20streams/job/cloud-init-integration-ec2-a/25/console
[01:34] <smoser> thoughts ?
[01:36] <smoser> powersj: https://jenkins.ubuntu.com/server/view/cloud-init,%20curtin,%20streams/job/cloud-init-integration-ec2-x/26/console ?  you have any idea on htat one ?
[01:37] <smoser> Waiter InstanceRunning failed: Waiter encountered a terminal failure state
[01:38] <smoser> 17 seconds after it launched the instance it encoutered a terminal failure state.
[01:39] <powersj> so it was doing install deb encountered error
[01:39] <powersj>     self.instance.wait_until_running()
[01:39] <smoser> well, probably not
[01:40] <smoser> not 17 seconds after launch of the instance
[01:40] <powersj> so the instance was booting and we were waiting
[01:40] <smoser> yeah
[02:36] <smoser> ok. playing a bit more, http://paste.ubuntu.com/p/sFJbTP5Brn/
[02:36] <smoser> that shows the "gain" of easily decorating the class.
[02:37] <smoser> and explains skip_by_date decorator some.
[02:37] <smoser> hope to have a mp for that tmororw.
[02:37] <blackboxsw> nice, didn't realize you were still working.
[02:37] <smoser> that was just still in my head
[02:37] <smoser> integration test failures suck
[02:38] <smoser> with that... i do need to go afk.
[02:38] <smoser> have a nice night all.
[02:39] <blackboxsw> I've started adding descriptions to run/failed jenkins jobs that have hit the keyserver errors.
[02:39] <smoser> hm.. the name wont be right...
[02:39] <smoser> :-(
[02:39] <smoser> ok. later.
[02:39] <blackboxsw> later. I'm kicking another integration run and will see what gives on that ec2 wait traceback
[02:41] <blackboxsw> powersj: could that terminal run state be a result of maybe kicking off two ec2 intergration tests simultaneously? like a cleanup job on one wiped all live instances etc?
[02:41] <blackboxsw> part of the teardown based on keys or something
[02:43] <powersj> thats what i thought
[02:43] <powersj> i didnt see that instance listed though
[02:44] <powersj> is it still occuring?
[04:04] <blackboxsw> all green now powersj
[04:04] <blackboxsw> just finished. some intermittent error.
[04:04] <blackboxsw> smoser: for tomorrow, looks like we hit all green on existing integrationpatchsets.
[04:04] <blackboxsw> I'm outta here
[16:13] <blackboxsw> rharper: on this azure instance with the renamed cirename0, it looks like netplan wasn't waiting on cirename0 per journal
[16:13] <blackboxsw> emd-networkd-wait-online[743]: ignoring: cirename0
[16:20] <blackboxsw> hrm
[16:20] <blackboxsw> emd-networkd[722]: eth0: Interface name change detected, eth0 has been renamed to rename3.
[16:20] <blackboxsw> emd-networkd[722]: rename3: Interface name change detected, rename3 has been renamed to eth0.
[16:20] <blackboxsw> seems like a bit of thrashing in kernel renames of eth9
[16:20] <blackboxsw> seems like a bit of thrashing in kernel renames of eth0
[16:35] <smoser> the rename3 are udev
[16:35] <smoser> persistent_rules i think
[17:02] <cyphermox> blackboxsw: /etc/cloud.cfg.d
[17:03] <cyphermox> or whatever the name of the directory is -- if you deployed the system with maas, cloud-init renames things at boot too.
[17:04] <blackboxsw> cyphermox: right, there was a race w/ cloud-init trying a rename on this one azure instance but failing because a 2nd nic came up as eth0 in the meantime. I was trying to peek at the other players renaming interfaces at the same time. I just turned on systemd-networkd debug to check out what happens on next boot
[17:05] <blackboxsw> end result was that the azure instance was left with a nic named 'cirename0' (which was cloud-init's doing)
[17:06] <blackboxsw> yeah even after reboot, systemd-networkd-wait-online.service is camping out for 2 minutes
[17:06] <blackboxsw> checking the debug journal now
[17:07] <blackboxsw> just looking through this now https://github.com/systemd/systemd/issues/7143
[17:07] <blackboxsw> I mean this http://paste.ubuntu.com/p/W4CT8g4Tyq/
[17:09] <blackboxsw> and looking specifically at wait-online-service https://github.com/systemd/systemd/issues/7143
[17:09] <blackboxsw> and looking specifically at wait-online-service http://paste.ubuntu.com/p/RMcBf6YYjN/
[17:09] <blackboxsw> copy buffer fail
[17:13] <blackboxsw> so it certainly isn't waiting on cirename0, just on eth0, which seems to be out to lunch for some reason. I'll poke to see if I can determine what networkd thinks eth0 actually is (like mac address etc).
[17:13] <blackboxsw> cat /run/systemd/network/10-netplan-eth0.*
[17:13] <blackboxsw> [Match]
[17:13] <blackboxsw> MACAddress=00:0d:3a:91:bc:49
[17:13] <blackboxsw> [Link]
[17:13] <blackboxsw> Name=eth0
[17:13] <blackboxsw> WakeOnLan=off
[17:13] <blackboxsw> [Match]
[17:13] <blackboxsw> MACAddress=00:0d:3a:91:bc:49
[17:13] <blackboxsw> Name=eth0
[17:13] <blackboxsw> [Network]
[17:13] <blackboxsw> DHCP=ipv4
[17:13] <blackboxsw> [DHCP]
[17:13] <blackboxsw> UseMTU=true
[17:16] <cyphermox> blackboxsw: sorry, I can't really keep track here; but two things jumped out in the netplan yaml you pointed out earlier
[17:17] <cyphermox> I guess just one thing
[17:17] <cyphermox> https://launchpad.net/ubuntu/+source/netplan.io/0.38
[17:17] <blackboxsw> no worries, I'm spamming the channel anyway (not really blocked yet, but curious why networkd-online would actually be blocking for so long in this situation as cirename0 matches our optional: True netplan yaml case
[17:17] <cyphermox> ^ this is a definite SRU candidate, but it's currently blocked in cosmic due to haskell.
[17:17] <blackboxsw> checking now
[17:18] <cyphermox> it should fix renames in general
[17:18] <cyphermox> or at least greatly improve the behavior.
[17:19] <blackboxsw> good to know about general netplan ip leases... though it looks like it currently Tracebacks  on all interfaces (even the ones that should be managed)
[17:20] <blackboxsw> cyphermox: but maybe I'm getting that "netplan ip leases" traceback as none of the wait-online-service timed out in general and didn't persist the information for the manage interface eth0
[17:21] <blackboxsw> I'll watch that 0.38 release eagerly thanks
[17:21] <blackboxsw> might try it out on my broken bionic instance now to see what gives
[17:22] <blackboxsw> cyphermox: is  ppa:cyphermox/netplan.io a good ppa to try 0.38?
[17:24] <cyphermox> no, it's not up to date
[17:33]  * blackboxsw can just play w/ cosmic to see behavior differences there
[17:46] <blackboxsw> hah think I got it rharper
[17:46] <blackboxsw> it's only this corner case on azure:
[17:47] <blackboxsw> mac1 of original nic1 is rendered to /etc/netplan/50-cloud-init.yaml by us.
[17:47] <blackboxsw> nic1 detached from instance and nic2 (new-mac) attached to instance
[17:48] <blackboxsw> 90-azure-hotplug.yaml matches with our hotpluggedeth0 rule per https://paste.ubuntu.com/p/gZBWH5GKmg/
[17:49] <blackboxsw> no problem on that boot either, but when you re-attached orig nic1 (it is labeled by the kernel as eth1 now on this system because nic2 is now eth0).
[17:57] <blackboxsw> but that nic1 mac1 matches the original 50-cloud-init.yaml hotplug which also performs a set-name eth0. an I think that collides with the existing nic2(eth0) name on the instance as it is booted
[17:58] <blackboxsw> which results in our  leaving one instance renamed as cirename0
[17:58] <blackboxsw> leaving one *nic* renames ...
[17:58] <blackboxsw> anyway, will try to reproduce the issue on my own instance now
[18:02] <rharper> blackboxsw: hrm
[18:03] <rharper> blackboxsw: that's all fine but it's not clear to me yet why we block the boot;  unless you're suggesting that networkd thinks it has two different nics (eth0 and eth1) both with the same mac value so it matches ?
[18:03] <rharper> that sounds like a networkd bug w.r.t what it can "manage"
[18:37] <blackboxsw> rharper: from debug logs, it looks like it's IPV6 router solicitation I think
[18:37] <blackboxsw> Jun 22 18:01:18 bionic-hotplug-test systemd-networkd[744]: NDISC: Sent Router Solicitation, next solicitation in 1min 12s
[18:38] <blackboxsw> keeps retrying over an over
[18:38] <blackboxsw> checking your bug https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1765173 to see if related
[18:52] <rharper> blackboxsw: whoa; that's not right
[18:52] <blackboxsw> ok so we are on the proper systemd which should proceed without blocking on RA, systemd 237-3ubuntu10
[18:53] <rharper> oh, is the image down level ?
[18:53] <rharper> this was released before 18.04 GAed
[18:53] <rharper> blackboxsw: what level is systemd in your image ?
[18:54]  * rharper tests bionic daily to see if it's regressed
[18:54] <blackboxsw> systemd	237-3ubuntu10 which matches my recent bionic lxc
[18:55] <rharper>  3.036s systemd-networkd-wait-online.service
[18:55] <rharper>  
[18:55] <rharper> that looks like a regression
[18:55] <rharper> mother
[18:55] <rharper> wow, it's the same systemd =(
[18:55]  * rharper adds debugging 
[18:56] <blackboxsw> yeah it's as if the mentioned fix didn't work or get in.
[18:56] <rharper> it went it
[18:56] <rharper> in
[18:56] <rharper> I verified
[18:56] <rharper> so somethings different
[18:56] <rharper> but I'm not sure what at this point
[18:57] <blackboxsw> I'm referring to this comment https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1765173/comments/10
[18:58] <blackboxsw> I'll need another pair of eyes/brain on this to correct some incorrect assumptions I have (and get a better education on this) rharper
[18:58] <rharper> sure
[18:58] <rharper> lemme poke at my lxd container then we can look at your instance
[18:58] <blackboxsw> I'm in hangout/meet for my education by fire
[18:59] <rharper> [2112600.640416] b2 systemd-networkd[149]: eth0: Gained IPv6LL
[18:59] <rharper> [2112603.521339] b2 systemd-networkd[149]: eth0: DHCPv4 address 10.8.107.145/24 via 10.8.107.1
[18:59] <rharper> there's my 3 seconds
[18:59] <rharper> my dnsmasq is just slow
[19:01] <rharper> [9271481.644265] rharper-b2 systemd[1]: Starting Wait for Network to be Configured...
[19:01] <rharper> [9271484.163932] rharper-b2 systemd-networkd[151]: eth0: DHCPv4 address 10.109.225.14/24 via 10.109.225.1
[19:01] <rharper> 2.5 on diglett