/srv/irclogs.ubuntu.com/2018/06/22/#cloud-init.txt

smoserblackboxsw: https://jenkins.ubuntu.com/server/view/cloud-init,%20curtin,%20streams/job/cloud-init-integration-ec2-a/25/console01:34
smoserthoughts ?01:34
smoserpowersj: https://jenkins.ubuntu.com/server/view/cloud-init,%20curtin,%20streams/job/cloud-init-integration-ec2-x/26/console ?  you have any idea on htat one ?01:36
smoserWaiter InstanceRunning failed: Waiter encountered a terminal failure state01:37
smoser17 seconds after it launched the instance it encoutered a terminal failure state.01:38
powersjso it was doing install deb encountered error01:39
powersj    self.instance.wait_until_running()01:39
smoserwell, probably not01:39
smosernot 17 seconds after launch of the instance01:40
powersjso the instance was booting and we were waiting01:40
smoseryeah01:40
smoserok. playing a bit more, http://paste.ubuntu.com/p/sFJbTP5Brn/02:36
smoserthat shows the "gain" of easily decorating the class.02:36
smoserand explains skip_by_date decorator some.02:37
smoserhope to have a mp for that tmororw.02:37
blackboxswnice, didn't realize you were still working.02:37
smoserthat was just still in my head02:37
smoserintegration test failures suck02:37
smoserwith that... i do need to go afk.02:38
smoserhave a nice night all.02:38
blackboxswI've started adding descriptions to run/failed jenkins jobs that have hit the keyserver errors.02:39
smoserhm.. the name wont be right...02:39
smoser:-(02:39
smoserok. later.02:39
blackboxswlater. I'm kicking another integration run and will see what gives on that ec2 wait traceback02:39
blackboxswpowersj: could that terminal run state be a result of maybe kicking off two ec2 intergration tests simultaneously? like a cleanup job on one wiped all live instances etc?02:41
blackboxswpart of the teardown based on keys or something02:41
powersjthats what i thought02:43
powersji didnt see that instance listed though02:43
powersjis it still occuring?02:44
blackboxswall green now powersj04:04
blackboxswjust finished. some intermittent error.04:04
blackboxswsmoser: for tomorrow, looks like we hit all green on existing integrationpatchsets.04:04
blackboxswI'm outta here04:04
=== otubo1 is now known as otubo
blackboxswrharper: on this azure instance with the renamed cirename0, it looks like netplan wasn't waiting on cirename0 per journal16:13
blackboxswemd-networkd-wait-online[743]: ignoring: cirename016:13
blackboxswhrm16:20
blackboxswemd-networkd[722]: eth0: Interface name change detected, eth0 has been renamed to rename3.16:20
blackboxswemd-networkd[722]: rename3: Interface name change detected, rename3 has been renamed to eth0.16:20
blackboxswseems like a bit of thrashing in kernel renames of eth916:20
blackboxswseems like a bit of thrashing in kernel renames of eth016:20
smoserthe rename3 are udev16:35
smoserpersistent_rules i think16:35
cyphermoxblackboxsw: /etc/cloud.cfg.d17:02
cyphermoxor whatever the name of the directory is -- if you deployed the system with maas, cloud-init renames things at boot too.17:03
blackboxswcyphermox: right, there was a race w/ cloud-init trying a rename on this one azure instance but failing because a 2nd nic came up as eth0 in the meantime. I was trying to peek at the other players renaming interfaces at the same time. I just turned on systemd-networkd debug to check out what happens on next boot17:04
blackboxswend result was that the azure instance was left with a nic named 'cirename0' (which was cloud-init's doing)17:05
blackboxswyeah even after reboot, systemd-networkd-wait-online.service is camping out for 2 minutes17:06
blackboxswchecking the debug journal now17:06
blackboxswjust looking through this now https://github.com/systemd/systemd/issues/714317:07
blackboxswI mean this http://paste.ubuntu.com/p/W4CT8g4Tyq/17:07
blackboxswand looking specifically at wait-online-service https://github.com/systemd/systemd/issues/714317:09
blackboxswand looking specifically at wait-online-service http://paste.ubuntu.com/p/RMcBf6YYjN/17:09
blackboxswcopy buffer fail17:09
blackboxswso it certainly isn't waiting on cirename0, just on eth0, which seems to be out to lunch for some reason. I'll poke to see if I can determine what networkd thinks eth0 actually is (like mac address etc).17:13
blackboxswcat /run/systemd/network/10-netplan-eth0.*17:13
blackboxsw[Match]17:13
blackboxswMACAddress=00:0d:3a:91:bc:4917:13
blackboxsw[Link]17:13
blackboxswName=eth017:13
blackboxswWakeOnLan=off17:13
blackboxsw[Match]17:13
blackboxswMACAddress=00:0d:3a:91:bc:4917:13
blackboxswName=eth017:13
blackboxsw[Network]17:13
blackboxswDHCP=ipv417:13
blackboxsw[DHCP]17:13
blackboxswUseMTU=true17:13
cyphermoxblackboxsw: sorry, I can't really keep track here; but two things jumped out in the netplan yaml you pointed out earlier17:16
cyphermoxI guess just one thing17:17
cyphermoxhttps://launchpad.net/ubuntu/+source/netplan.io/0.3817:17
blackboxswno worries, I'm spamming the channel anyway (not really blocked yet, but curious why networkd-online would actually be blocking for so long in this situation as cirename0 matches our optional: True netplan yaml case17:17
cyphermox^ this is a definite SRU candidate, but it's currently blocked in cosmic due to haskell.17:17
blackboxswchecking now17:17
cyphermoxit should fix renames in general17:18
cyphermoxor at least greatly improve the behavior.17:18
blackboxswgood to know about general netplan ip leases... though it looks like it currently Tracebacks  on all interfaces (even the ones that should be managed)17:19
blackboxswcyphermox: but maybe I'm getting that "netplan ip leases" traceback as none of the wait-online-service timed out in general and didn't persist the information for the manage interface eth017:20
blackboxswI'll watch that 0.38 release eagerly thanks17:21
blackboxswmight try it out on my broken bionic instance now to see what gives17:21
blackboxswcyphermox: is  ppa:cyphermox/netplan.io a good ppa to try 0.38?17:22
cyphermoxno, it's not up to date17:24
* blackboxsw can just play w/ cosmic to see behavior differences there17:33
blackboxswhah think I got it rharper17:46
blackboxswit's only this corner case on azure:17:46
blackboxswmac1 of original nic1 is rendered to /etc/netplan/50-cloud-init.yaml by us.17:47
blackboxswnic1 detached from instance and nic2 (new-mac) attached to instance17:47
blackboxsw90-azure-hotplug.yaml matches with our hotpluggedeth0 rule per https://paste.ubuntu.com/p/gZBWH5GKmg/17:48
blackboxswno problem on that boot either, but when you re-attached orig nic1 (it is labeled by the kernel as eth1 now on this system because nic2 is now eth0).17:49
blackboxswbut that nic1 mac1 matches the original 50-cloud-init.yaml hotplug which also performs a set-name eth0. an I think that collides with the existing nic2(eth0) name on the instance as it is booted17:57
blackboxswwhich results in our  leaving one instance renamed as cirename017:58
blackboxswleaving one *nic* renames ...17:58
blackboxswanyway, will try to reproduce the issue on my own instance now17:58
rharperblackboxsw: hrm18:02
rharperblackboxsw: that's all fine but it's not clear to me yet why we block the boot;  unless you're suggesting that networkd thinks it has two different nics (eth0 and eth1) both with the same mac value so it matches ?18:03
rharperthat sounds like a networkd bug w.r.t what it can "manage"18:03
blackboxswrharper: from debug logs, it looks like it's IPV6 router solicitation I think18:37
blackboxswJun 22 18:01:18 bionic-hotplug-test systemd-networkd[744]: NDISC: Sent Router Solicitation, next solicitation in 1min 12s18:37
blackboxswkeeps retrying over an over18:38
blackboxswchecking your bug https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1765173 to see if related18:38
ubot5Ubuntu bug 1765173 in systemd (Ubuntu) "networkd waits 10 seconds for ipv6 network discovery by default" [Undecided,Fix released]18:38
rharperblackboxsw: whoa; that's not right18:52
blackboxswok so we are on the proper systemd which should proceed without blocking on RA, systemd 237-3ubuntu1018:52
rharperoh, is the image down level ?18:53
rharperthis was released before 18.04 GAed18:53
rharperblackboxsw: what level is systemd in your image ?18:53
* rharper tests bionic daily to see if it's regressed18:54
blackboxswsystemd237-3ubuntu10 which matches my recent bionic lxc18:54
rharper 3.036s systemd-networkd-wait-online.service18:55
rharper 18:55
rharperthat looks like a regression18:55
rharpermother18:55
rharperwow, it's the same systemd =(18:55
* rharper adds debugging 18:55
blackboxswyeah it's as if the mentioned fix didn't work or get in.18:56
rharperit went it18:56
rharperin18:56
rharperI verified18:56
rharperso somethings different18:56
rharperbut I'm not sure what at this point18:56
blackboxswI'm referring to this comment https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1765173/comments/1018:57
ubot5Ubuntu bug 1765173 in systemd (Ubuntu) "networkd waits 10 seconds for ipv6 network discovery by default" [Undecided,Fix released]18:57
blackboxswI'll need another pair of eyes/brain on this to correct some incorrect assumptions I have (and get a better education on this) rharper18:58
rharpersure18:58
rharperlemme poke at my lxd container then we can look at your instance18:58
blackboxswI'm in hangout/meet for my education by fire18:58
rharper[2112600.640416] b2 systemd-networkd[149]: eth0: Gained IPv6LL18:59
rharper[2112603.521339] b2 systemd-networkd[149]: eth0: DHCPv4 address 10.8.107.145/24 via 10.8.107.118:59
rharperthere's my 3 seconds18:59
rharpermy dnsmasq is just slow18:59
rharper[9271481.644265] rharper-b2 systemd[1]: Starting Wait for Network to be Configured...19:01
rharper[9271484.163932] rharper-b2 systemd-networkd[151]: eth0: DHCPv4 address 10.109.225.14/24 via 10.109.225.119:01
rharper2.5 on diglett19:01

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!