smoser | blackboxsw: https://jenkins.ubuntu.com/server/view/cloud-init,%20curtin,%20streams/job/cloud-init-integration-ec2-a/25/console | 01:34 |
---|---|---|
smoser | thoughts ? | 01:34 |
smoser | powersj: https://jenkins.ubuntu.com/server/view/cloud-init,%20curtin,%20streams/job/cloud-init-integration-ec2-x/26/console ? you have any idea on htat one ? | 01:36 |
smoser | Waiter InstanceRunning failed: Waiter encountered a terminal failure state | 01:37 |
smoser | 17 seconds after it launched the instance it encoutered a terminal failure state. | 01:38 |
powersj | so it was doing install deb encountered error | 01:39 |
powersj | self.instance.wait_until_running() | 01:39 |
smoser | well, probably not | 01:39 |
smoser | not 17 seconds after launch of the instance | 01:40 |
powersj | so the instance was booting and we were waiting | 01:40 |
smoser | yeah | 01:40 |
smoser | ok. playing a bit more, http://paste.ubuntu.com/p/sFJbTP5Brn/ | 02:36 |
smoser | that shows the "gain" of easily decorating the class. | 02:36 |
smoser | and explains skip_by_date decorator some. | 02:37 |
smoser | hope to have a mp for that tmororw. | 02:37 |
blackboxsw | nice, didn't realize you were still working. | 02:37 |
smoser | that was just still in my head | 02:37 |
smoser | integration test failures suck | 02:37 |
smoser | with that... i do need to go afk. | 02:38 |
smoser | have a nice night all. | 02:38 |
blackboxsw | I've started adding descriptions to run/failed jenkins jobs that have hit the keyserver errors. | 02:39 |
smoser | hm.. the name wont be right... | 02:39 |
smoser | :-( | 02:39 |
smoser | ok. later. | 02:39 |
blackboxsw | later. I'm kicking another integration run and will see what gives on that ec2 wait traceback | 02:39 |
blackboxsw | powersj: could that terminal run state be a result of maybe kicking off two ec2 intergration tests simultaneously? like a cleanup job on one wiped all live instances etc? | 02:41 |
blackboxsw | part of the teardown based on keys or something | 02:41 |
powersj | thats what i thought | 02:43 |
powersj | i didnt see that instance listed though | 02:43 |
powersj | is it still occuring? | 02:44 |
blackboxsw | all green now powersj | 04:04 |
blackboxsw | just finished. some intermittent error. | 04:04 |
blackboxsw | smoser: for tomorrow, looks like we hit all green on existing integrationpatchsets. | 04:04 |
blackboxsw | I'm outta here | 04:04 |
=== otubo1 is now known as otubo | ||
blackboxsw | rharper: on this azure instance with the renamed cirename0, it looks like netplan wasn't waiting on cirename0 per journal | 16:13 |
blackboxsw | emd-networkd-wait-online[743]: ignoring: cirename0 | 16:13 |
blackboxsw | hrm | 16:20 |
blackboxsw | emd-networkd[722]: eth0: Interface name change detected, eth0 has been renamed to rename3. | 16:20 |
blackboxsw | emd-networkd[722]: rename3: Interface name change detected, rename3 has been renamed to eth0. | 16:20 |
blackboxsw | seems like a bit of thrashing in kernel renames of eth9 | 16:20 |
blackboxsw | seems like a bit of thrashing in kernel renames of eth0 | 16:20 |
smoser | the rename3 are udev | 16:35 |
smoser | persistent_rules i think | 16:35 |
cyphermox | blackboxsw: /etc/cloud.cfg.d | 17:02 |
cyphermox | or whatever the name of the directory is -- if you deployed the system with maas, cloud-init renames things at boot too. | 17:03 |
blackboxsw | cyphermox: right, there was a race w/ cloud-init trying a rename on this one azure instance but failing because a 2nd nic came up as eth0 in the meantime. I was trying to peek at the other players renaming interfaces at the same time. I just turned on systemd-networkd debug to check out what happens on next boot | 17:04 |
blackboxsw | end result was that the azure instance was left with a nic named 'cirename0' (which was cloud-init's doing) | 17:05 |
blackboxsw | yeah even after reboot, systemd-networkd-wait-online.service is camping out for 2 minutes | 17:06 |
blackboxsw | checking the debug journal now | 17:06 |
blackboxsw | just looking through this now https://github.com/systemd/systemd/issues/7143 | 17:07 |
blackboxsw | I mean this http://paste.ubuntu.com/p/W4CT8g4Tyq/ | 17:07 |
blackboxsw | and looking specifically at wait-online-service https://github.com/systemd/systemd/issues/7143 | 17:09 |
blackboxsw | and looking specifically at wait-online-service http://paste.ubuntu.com/p/RMcBf6YYjN/ | 17:09 |
blackboxsw | copy buffer fail | 17:09 |
blackboxsw | so it certainly isn't waiting on cirename0, just on eth0, which seems to be out to lunch for some reason. I'll poke to see if I can determine what networkd thinks eth0 actually is (like mac address etc). | 17:13 |
blackboxsw | cat /run/systemd/network/10-netplan-eth0.* | 17:13 |
blackboxsw | [Match] | 17:13 |
blackboxsw | MACAddress=00:0d:3a:91:bc:49 | 17:13 |
blackboxsw | [Link] | 17:13 |
blackboxsw | Name=eth0 | 17:13 |
blackboxsw | WakeOnLan=off | 17:13 |
blackboxsw | [Match] | 17:13 |
blackboxsw | MACAddress=00:0d:3a:91:bc:49 | 17:13 |
blackboxsw | Name=eth0 | 17:13 |
blackboxsw | [Network] | 17:13 |
blackboxsw | DHCP=ipv4 | 17:13 |
blackboxsw | [DHCP] | 17:13 |
blackboxsw | UseMTU=true | 17:13 |
cyphermox | blackboxsw: sorry, I can't really keep track here; but two things jumped out in the netplan yaml you pointed out earlier | 17:16 |
cyphermox | I guess just one thing | 17:17 |
cyphermox | https://launchpad.net/ubuntu/+source/netplan.io/0.38 | 17:17 |
blackboxsw | no worries, I'm spamming the channel anyway (not really blocked yet, but curious why networkd-online would actually be blocking for so long in this situation as cirename0 matches our optional: True netplan yaml case | 17:17 |
cyphermox | ^ this is a definite SRU candidate, but it's currently blocked in cosmic due to haskell. | 17:17 |
blackboxsw | checking now | 17:17 |
cyphermox | it should fix renames in general | 17:18 |
cyphermox | or at least greatly improve the behavior. | 17:18 |
blackboxsw | good to know about general netplan ip leases... though it looks like it currently Tracebacks on all interfaces (even the ones that should be managed) | 17:19 |
blackboxsw | cyphermox: but maybe I'm getting that "netplan ip leases" traceback as none of the wait-online-service timed out in general and didn't persist the information for the manage interface eth0 | 17:20 |
blackboxsw | I'll watch that 0.38 release eagerly thanks | 17:21 |
blackboxsw | might try it out on my broken bionic instance now to see what gives | 17:21 |
blackboxsw | cyphermox: is ppa:cyphermox/netplan.io a good ppa to try 0.38? | 17:22 |
cyphermox | no, it's not up to date | 17:24 |
* blackboxsw can just play w/ cosmic to see behavior differences there | 17:33 | |
blackboxsw | hah think I got it rharper | 17:46 |
blackboxsw | it's only this corner case on azure: | 17:46 |
blackboxsw | mac1 of original nic1 is rendered to /etc/netplan/50-cloud-init.yaml by us. | 17:47 |
blackboxsw | nic1 detached from instance and nic2 (new-mac) attached to instance | 17:47 |
blackboxsw | 90-azure-hotplug.yaml matches with our hotpluggedeth0 rule per https://paste.ubuntu.com/p/gZBWH5GKmg/ | 17:48 |
blackboxsw | no problem on that boot either, but when you re-attached orig nic1 (it is labeled by the kernel as eth1 now on this system because nic2 is now eth0). | 17:49 |
blackboxsw | but that nic1 mac1 matches the original 50-cloud-init.yaml hotplug which also performs a set-name eth0. an I think that collides with the existing nic2(eth0) name on the instance as it is booted | 17:57 |
blackboxsw | which results in our leaving one instance renamed as cirename0 | 17:58 |
blackboxsw | leaving one *nic* renames ... | 17:58 |
blackboxsw | anyway, will try to reproduce the issue on my own instance now | 17:58 |
rharper | blackboxsw: hrm | 18:02 |
rharper | blackboxsw: that's all fine but it's not clear to me yet why we block the boot; unless you're suggesting that networkd thinks it has two different nics (eth0 and eth1) both with the same mac value so it matches ? | 18:03 |
rharper | that sounds like a networkd bug w.r.t what it can "manage" | 18:03 |
blackboxsw | rharper: from debug logs, it looks like it's IPV6 router solicitation I think | 18:37 |
blackboxsw | Jun 22 18:01:18 bionic-hotplug-test systemd-networkd[744]: NDISC: Sent Router Solicitation, next solicitation in 1min 12s | 18:37 |
blackboxsw | keeps retrying over an over | 18:38 |
blackboxsw | checking your bug https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1765173 to see if related | 18:38 |
ubot5 | Ubuntu bug 1765173 in systemd (Ubuntu) "networkd waits 10 seconds for ipv6 network discovery by default" [Undecided,Fix released] | 18:38 |
rharper | blackboxsw: whoa; that's not right | 18:52 |
blackboxsw | ok so we are on the proper systemd which should proceed without blocking on RA, systemd 237-3ubuntu10 | 18:52 |
rharper | oh, is the image down level ? | 18:53 |
rharper | this was released before 18.04 GAed | 18:53 |
rharper | blackboxsw: what level is systemd in your image ? | 18:53 |
* rharper tests bionic daily to see if it's regressed | 18:54 | |
blackboxsw | systemd237-3ubuntu10 which matches my recent bionic lxc | 18:54 |
rharper | 3.036s systemd-networkd-wait-online.service | 18:55 |
rharper | 18:55 | |
rharper | that looks like a regression | 18:55 |
rharper | mother | 18:55 |
rharper | wow, it's the same systemd =( | 18:55 |
* rharper adds debugging | 18:55 | |
blackboxsw | yeah it's as if the mentioned fix didn't work or get in. | 18:56 |
rharper | it went it | 18:56 |
rharper | in | 18:56 |
rharper | I verified | 18:56 |
rharper | so somethings different | 18:56 |
rharper | but I'm not sure what at this point | 18:56 |
blackboxsw | I'm referring to this comment https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1765173/comments/10 | 18:57 |
ubot5 | Ubuntu bug 1765173 in systemd (Ubuntu) "networkd waits 10 seconds for ipv6 network discovery by default" [Undecided,Fix released] | 18:57 |
blackboxsw | I'll need another pair of eyes/brain on this to correct some incorrect assumptions I have (and get a better education on this) rharper | 18:58 |
rharper | sure | 18:58 |
rharper | lemme poke at my lxd container then we can look at your instance | 18:58 |
blackboxsw | I'm in hangout/meet for my education by fire | 18:58 |
rharper | [2112600.640416] b2 systemd-networkd[149]: eth0: Gained IPv6LL | 18:59 |
rharper | [2112603.521339] b2 systemd-networkd[149]: eth0: DHCPv4 address 10.8.107.145/24 via 10.8.107.1 | 18:59 |
rharper | there's my 3 seconds | 18:59 |
rharper | my dnsmasq is just slow | 18:59 |
rharper | [9271481.644265] rharper-b2 systemd[1]: Starting Wait for Network to be Configured... | 19:01 |
rharper | [9271484.163932] rharper-b2 systemd-networkd[151]: eth0: DHCPv4 address 10.109.225.14/24 via 10.109.225.1 | 19:01 |
rharper | 2.5 on diglett | 19:01 |
Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!