blackboxsw | back | 00:35 |
---|---|---|
rharper | blackboxsw: ok; I've got 55 consecutive reboots with no issues | 00:40 |
blackboxsw | ok rharper I misread your comment, I thought you did see the hang in europe | 00:41 |
rharper | oh, I do | 00:41 |
rharper | this is with my fix applied | 00:41 |
rharper | I wanted to make sure it wasn't a fluke; | 00:41 |
blackboxsw | ahh good deal. so we may not need to write network names | 00:41 |
* blackboxsw is trying again is us-central1 | 00:42 | |
rharper | smoser and I were chatting, and that is still proobably the right thing to do anyhow | 00:42 |
rharper | but we can decide when to land that; | 00:42 |
rharper | this also needs maas qa before we push it too | 00:42 |
rharper | I'm worried about all of the other places were we don;'t see this now; and we're not using the systemd-udev-settle.service; it's just not running anywhere except if you've got zfs installed or lvm enabled | 00:43 |
* rharper probes some vmtest runs on bionic with zfs and lvm to see what their journal says | 00:43 | |
blackboxsw | again rharper you saw this just w/ cloud-init clean --logs and reboots right? | 00:45 |
rharper | oh yeah | 00:45 |
blackboxsw | man, us-central1 still not reproducing it for me | 00:45 |
rharper | first boot in europe-west1 with the 420 bionic image works fine | 00:45 |
rharper | then I rebooted | 00:45 |
rharper | saw it | 00:46 |
rharper | rebooted, it came up | 00:46 |
blackboxsw | will try switching to europewest again | 00:46 |
rharper | so not always, since it's racy | 00:46 |
blackboxsw | s/again/for once/ | 00:46 |
blackboxsw | of course this could be as well that my recent attempts had debug message checking and printing name_assign_type | 00:47 |
blackboxsw | heh, didn't realize our account instance view was shared | 00:52 |
blackboxsw | I see rharper-b1 now | 00:52 |
blackboxsw | sure enough first boot in europe-west1-b bricked | 00:53 |
blackboxsw | geez man region-related for sure | 00:53 |
rharper | yeah | 00:54 |
rharper | blackboxsw: that's super interesting | 00:55 |
rharper | w.r.t the region, I suspect it's load | 00:55 |
blackboxsw | could be. trying to see if I can a success boot so I can add my debug deb on followup reboots | 00:56 |
rharper | ah, yeah, you have to get one good boot to set the root password | 00:57 |
blackboxsw | yeah or boot from my previous image snapshot. I'll try that | 00:58 |
smoser | so .. we can reproduce fairly easily ? | 00:59 |
blackboxsw | looks like europe-west1-b or europe-west1-d regions | 01:00 |
rharper | smoser: yeah | 01:00 |
smoser | rharper: if nothing was running trigger.... | 01:02 |
smoser | er.. if nothing was runnign the trigger service , then what was doing the cold plug? | 01:02 |
rharper | not the trigger service | 01:02 |
rharper | that;s always running | 01:02 |
smoser | *something* has to be doing it. or we wouldn't have .link files respected at all | 01:02 |
rharper | it's the settle service | 01:02 |
smoser | so any possible 'udevadm settle' from anywhere would make it work | 01:03 |
rharper | yeah | 01:03 |
rharper | in xenial, the networking.service unit runs a pre command with udevadm settle in it | 01:03 |
rharper | artful could show it, if things were fast enough; and even in bionic, it has to be in this one region where things run slightly odd | 01:04 |
rharper | ok, I need to step a way | 01:05 |
smoser | blackboxsw: logs of launch-softlayer needs combining with launch-ec2 too. | 01:08 |
smoser | blackboxsw: this got missed. | 01:15 |
smoser | https://code.launchpad.net/~smoser/cloud-init/+git/cloud-init/+merge/342334 | 01:15 |
smoser | not terribly important | 01:15 |
smoser | and then this one needs landing too | 01:16 |
smoser | https://code.launchpad.net/~smoser/cloud-init/+git/cloud-init/+merge/344189 | 01:16 |
blackboxsw | smoser: landed https://code.launchpad.net/~smoser/cloud-init/+git/cloud-init/+merge/344189 | 01:32 |
smoser | blackboxsw: thanks | 01:41 |
=== mgerdts_ is now known as mgerdts | ||
rharper | blackboxsw: smoser: if we wanted to be more targetted with the settle, we could for example, trigger it within cloud-init-local if we detech non-renamed interfaces with knames (and ifnames=0 no in cmdline); that would then only impact systems which happen to have that early race between cloud-init-local and udev-trigger | 14:02 |
smoser | right. and that would be in some ways safer | 14:06 |
smoser | from the perspective of not changing boot | 14:06 |
smoser | i'd like to have slangasek or xnox thoughts | 14:06 |
rharper | yes | 14:06 |
smoser | as i am apt to agree with you, that not having the settle service active in boot is ... well just wrong. | 14:06 |
rharper | there is a swarm of "why is my boot slow/ systemd-analyze blame shows udev-settle.service" | 14:07 |
smoser | oh? | 14:07 |
rharper | yes | 14:07 |
smoser | so we did it as an optimization :) | 14:07 |
rharper | but it's because they have things like usb nics or other storage devices that take *time* to come up | 14:07 |
rharper | no | 14:07 |
rharper | I don't think so | 14:07 |
smoser | (it was a joke) | 14:07 |
rharper | it's not clear to me why it's not enabled by default | 14:07 |
rharper | yet | 14:07 |
smoser | i can make a system boot REALLY REALLY FAST | 14:07 |
smoser | and sometimes even do what you want! | 14:07 |
rharper | but, lvm2 has a generator which forces it on, if lvm2 is needed in some sitations | 14:07 |
rharper | and zfs of course, Requires it | 14:07 |
rharper | since they need all of their devices up before they can mount or build a raid, etc | 14:08 |
rharper | so, it *really* seems like it should just always be on | 14:08 |
rharper | one ends up "Waiting" for rootfs anyhow | 14:08 |
smoser | i agree. we should request slangasek and xnox review of your MP ? | 14:08 |
rharper | we've seen those "waiting for device ... foo to appear" | 14:08 |
rharper | smoser: or possible add a systemd task and ask in the GCE bug | 14:08 |
smoser | rharper: well, in my fast boot, sometimes / isnt' there, so but it boots really fast. | 14:08 |
rharper | but I would like foundation review | 14:08 |
rharper | smoser: lol! | 14:09 |
rharper | I get (initramfs) prompt *so* fast those times | 14:09 |
smoser | exactly. | 14:09 |
smoser | and systemd-analyze does not blame udev! | 14:09 |
rharper | I usually take the extra savings and then compile my own kernel, kexec into it to find my root | 14:09 |
rharper | blackboxsw: interesting observation w.r.t zone and image; | 14:11 |
rharper | I wonder if we can further disect what's special about the 420 image in europe-west1 vs. current stuff | 14:11 |
rharper | none-the-less; it does make sense to do something to detect if we've raced and try to fix that in the case we do | 14:12 |
rharper | I'm going to see if I can target the settle within cloud-init-local on the reproducer | 14:12 |
smoser | i compile my kernels with -O4 and funroll-loops . its the best. | 14:12 |
smoser | "it does make sense to do something to detect" | 14:13 |
smoser | maybe | 14:13 |
smoser | it only makes so much sense to determine when a system is broken... why didn't we just fix the system ? | 14:14 |
rharper | that's fair; for now I'm mostly intereted in if we can detect it; | 14:14 |
rharper | whether we target a more narrow fix so as to not "udevadm settle" the world ; aka smoser's favorite alias to 'sleep 1' | 14:15 |
rharper | needs more discussion | 14:15 |
smoser | https://code.launchpad.net/~smoser/cloud-init/+git/cloud-init/+merge/344189 | 14:19 |
smoser | bah. bad link | 14:19 |
smoser | https://code.launchpad.net/~smoser/cloud-init/+git/cloud-init/+merge/344255 | 14:20 |
smoser | that one. | 14:20 |
smoser | that is softlayer doc improvement. | 14:20 |
smoser | rharper, blackboxsw, dpb1 ^ | 14:20 |
rharper | yeah, saw that | 14:20 |
rharper | 6 ways | 14:20 |
smoser | from sunday | 14:21 |
rharper | hehe | 14:23 |
rharper | 2018-04-25 15:44:49,632 - __init__.py[DEBUG]: WARK: found unstable device names: ['eth0']; calling udevadm settle | 15:45 |
rharper | 2018-04-25 15:44:49,968 - util.py[DEBUG]: WARK: Waiting for udev events to settle took 0.336 seconds | 15:45 |
rharper | smoser: we can detect, and "resolve" it more narrowly | 15:46 |
rharper | if we want | 15:46 |
rharper | I'll put up an alternative patch with this change | 15:46 |
smoser | link ? | 15:50 |
smoser | philroche: https://launchpad.net/~smoser/+archive/ubuntu/ibmcloud-test | 15:50 |
smoser | should be populated shortly with a test. | 15:50 |
blackboxsw | smoser: reviewed https://code.launchpad.net/~smoser/cloud-init/+git/cloud-init/+merge/344255 | 16:25 |
=== r-daneel_ is now known as r-daneel | ||
rharper | smoser: blackboxsw: this is an alternative, more targetted settle, https://code.launchpad.net/~raharper/cloud-init/+git/cloud-init/+merge/344339 | 17:16 |
jocha | blackboxsw: I think I pinged you about my updated merge request, but I don't think I received a response, I might've restarted the chat, anyways here it is again : https://code.launchpad.net/~jocha/cloud-init/+git/cloud-init/+merge/344192 :) | 18:19 |
blackboxsw | ahh thanks jocha I'll give it a looksie today | 18:19 |
jocha | awesome thanks! | 18:22 |
smoser | blackboxsw: i responded to your https://code.launchpad.net/~smoser/cloud-init/+git/cloud-init/+merge/344255 . | 19:55 |
smoser | really just wanting to know if you think i cleared things up | 19:56 |
blackboxsw | smoser: yes cleared. land at will, or I can | 19:57 |
smoser | ok. ill land | 19:57 |
blackboxsw | I'm camping in cloud-init hangout trying to get my IBMcloud setup up | 19:57 |
blackboxsw | now that I'm approved | 19:57 |
blackboxsw | but can't seem the find/create my API creds | 19:57 |
blackboxsw | got it. and updating launch-softlayer script | 22:30 |
Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!