/srv/irclogs.ubuntu.com/2017/01/25/#cloud-init.txt

=== shardy is now known as shardy_lunch
=== shardy_lunch is now known as shardy
larskssmoser: if you're around, what is the thought behind having DefaultDependencies=no in cloud-init.service?  It turns out this this is causing cloud-init to start before dbus on fedora, so things like setting the hostname fall over.  The fix may be an explicit dependency on dbus.(service|socket)...16:26
smoserlarsks, 1 minute.16:37
larsks(the root cause is that DefaultDependencies=no means there is no implicit dependency on basic.target)16:39
smosermore than on minute.16:52
smoserok16:52
smoserlarsks, yeah, so we had issues with that too, starting before dbus.16:53
larsksIf we removed DefaultDependencies=no it would in theory Just Work.16:53
larsks...unless there was a specific problem that was meant to solve.16:54
smoserwell, no. then cloud-init is not running early enough in order to block netoworking.16:54
larsksAh, I see.16:54
smosertheres a series of bugs, probably referenced in commits.16:54
smoserlet me look16:54
larsksSo you think an explicit dependency on dbus.service is probably the way to go.16:54
smoserand this is, agreed, really a pita16:54
smoseri dont think that will actually work. let me see.16:54
smoserlarsks, there isa lot of information in git log systemd/16:58
smoserhttps://bugs.launchpad.net/ubuntu/+source/cloud-init/+bug/1629797 is what had us adding the dbus.socket16:58
larsksIndeed, I've had to look at several of those.16:58
smoserit was resolved.16:58
larsksLooking...16:59
smoserwe could ask pitti for advice...16:59
smoserhe's the one who holds most knowledge in his head about systemd and aided in this slew of bugs.17:00
smoserpitti works for rh now17:00
larsksSo https://git.launchpad.net/cloud-init/commit/?id=6e45ffb21e9622780585b4fe15890f009ca8fa71 added Bfore=dbus.socket, but that appears to have been removed by your commit e568aec31051674901047ee577f6e229785cbfc317:01
larsksI will bug pitti for thoughts.17:02
smoserask him to join here, nice to have conversation logged if we do have irc conversation.17:03
smoserlarsks, so you're suggesting17:05
smoser http://paste.ubuntu.com/23864232/17:05
smoserrigh t?17:05
larsksOr After=dbus.socket, maybe, but yeah.  I'm going to test that later today and see if it corrects out issue.17:06
larskss/out/our/17:06
larsksWell, that doesn't work:17:08
larsksBreaking ordering cycle by deleting job cloud-init.service/start17:08
smoseryeah, i saw that quickly in a lxd container (well, for some reason  journalctl doesnt show the error... but cloud-init.service did not run)17:15
rharperlarsks: we switched dbus for sysinit.target17:15
larsksrharper: yes, I spotted that.17:15
rharpertypically, you need to run after dbus.socket (which is required for networkd/timesyncd) but before sysinit.target17:17
rharperlarsks: which fedora release?17:18
smoserlarsks, what is the ordering cycle that you see ?17:18
smosercan you paste journalctl ?17:18
larsksThe problem we have is that we have Before=sysinit.target, which means we run before dbus.service.  But adding After=dbus.service results in an ordering loop: http://chunk.io/f/b9c11198c8c14c6b84dccec25cd1f3cc17:19
larsksThis is F25 right now, using cloud-init 0.7.9.17:20
larsksNote that those log entries are what we get after adding After=dbus.socket to cloud-init.service.17:21
larsksI can produce logs for other scenarios if you would like.17:21
rharperlarsks: yes, that sounds familiar, it should be ok to run before dbus.service; just not before dbus.socket;   we had to split the  gap so-to-speak;  we also have a resolv service that does libnss before dbus.service is up (which then resolv takes over)17:26
larsksrharper: the point is that with After=dbus.socket results in the ordering loop.17:26
smoserright, and larsks needs to run after dbus.socket so that he can update the hostname on the system17:27
rharperyeah, we'll need to detangle that17:27
smoserbecause obviously calling a kernel api should go through a user space daemon witih a dbus socket17:27
rharperand something else may need a 'DefaultDependencies=no'17:27
smoserwell i think its just that you cant be before sysinit.target and after dbus.socket17:27
=== rangerpbzzzz is now known as rangerpb
larsksRight.17:27
larsksBut I'm still not clear on why we need defaultdependencies=no.  We have explicit ordering on most of the network stuff now, I think.17:28
smoserif you drop defaultdependencies=no, then you get defaultdependencies17:28
smoserwhich include sysinit.target17:28
smoseri think17:28
larsksYeeeeeees.17:28
rharperit's because default adds more deps that do not allow the correct ordering for placing befor networking17:28
larsksrharper: do you know exactly what the problem is?  Because there are now explicit dependencies on serveral network units.17:29
larsksAre those insufficient?17:29
rharperit's the additional deps that push things with default deps further up in the cycle17:29
larsksWhat do we hit if we come after sysinit.target?  Is there a test case that demonstrates an actual problem?17:29
rharperwe cannot block networking17:29
smoserwe so need better integration test. :-(17:30
smoserits coming.17:30
larsksI understand the problem is "we cannot block networking", but why, and why are the explicit Before= deps not sufficient to permit us to block networking?17:30
rharpercloud-init local needs to be able to write out a network config17:30
rharperthey are17:30
larsksrharper: sure, but this isn't about cloud-init-local17:30
rharperbut defaultdeps bring in *MORE* deps17:30
larsksThis is cloud-init.service.17:30
larsksThere is no problem with cloud-init-local having defaultpdeps=no17:30
smoseri thinkt hat the issue is probably thatAfter sysinit.target (which is added unless you are Defaultdependencies=no) will mean that we run After dbus.socket17:30
rharperwhich runs right before networking is considered online17:30
smoserwhich, on ubuntu, with resolved , means dns queries block until timeout17:31
smoserbecause resolved wasnt up but the socket was17:31
larsksSince it doesn't sound like there is any evidence that defaultdeps=no is necessary, I'd like to produce some so that we can actually test things. smoser, what is a scenario that requires that we "block networking"...something passing in a static network config?17:31
larsksI would like to find something that will fail somewhere (fedora/ubuntu/whatever) if I remove the defaultdependencies line.17:32
smoserit is necessary17:32
smoserother wise you run After=sysinit.target17:32
smoserwhich is After=dbus.socket17:32
larsksThis discussion has developed an ordering problem :).  I am just asking for some sort of scenario that would demostrate the problem you and rharper have described.17:33
rharperopenstack boot17:33
larsksBut a normal openstack boot doesn't require cloud-init to do *anything* w/r/t networking.17:34
rharperwhen cloud-init.service runs it will attempt to poke at network based metadata services17:34
smoserright17:34
rharperit can and usually does17:34
larsksSo it needs to run *after* networking in that case.17:34
smoserand it will attempt dns lookup on the gce .internal  name17:34
rharperbut before it's online17:34
smoserand that blocks17:34
rharpera cloud may provide network configuration via metadata services17:34
larsksrharper: I think I just missed a distinction there.17:34
rharperso all other units that need network, run after 'network-online.target' is reached17:35
larsksAh, I see.  So in that case, you expect cloud-init to...bring up interfaces manually first, in order to contact the metadata service?17:35
larsksI mean, how do the interfaces come up in that situation to permit access to the network metadata service, if we're running before the system brings up networking?17:35
smoserhey, i'm really sorry, but ih ave got to work on some other things .17:35
rharpersmoser: np17:35
larsksYeah sure.  I just want to understand the problem we're trying to solve with these dependency settings.17:36
larsksI also have other things I need to work on :)17:36
rharperlarsks: we use the hosts network service (so ifupdown or netword) a fallback network config (typically dhcp on first nic) is done17:36
larsksBut aren't those going to depend on, e.g., networkmanager already running?17:36
rharperright17:36
rharperbut not reaching the network-online.target17:36
rharperit's really threading a needle17:36
rharpercloud-init is expected to do things iwth the network which could affect network-based services (like sshd host key gen)17:37
larsksHmmmm.  But we already have Before=network-online.target, right?17:37
rharperwe don't want sshd to be running (it runs after network-online.target is reached)17:37
larsksSo even if we exclude default dependencies, we're still okay.17:37
larsksWe also have Before=sshd.service17:37
rharperwhat we really need is a list of default deps that get added unless you add DD=no17:38
larsksI am pretty sure that means sysinit.target and basic.target.  But I suppose you mean you'd like the transitive deps in that case?17:38
rharperthen we can walk each of those to see if they order themselves after network-online.target or something else that forces cloud-init.service to run later than we need17:38
rharperright, DefaultDeps is larger than just those two right ?17:39
larsksNo.  From systemd.service: Unless DefaultDependencies= in the "[Unit]" is set to false, service units will implicitly have dependencies of type Requires= and After= on sysinit.target, a dependency of type After= on basic.target as well as dependencies of type Conflicts= and Before= on shutdown.target.17:39
larsksWhat exactly is implied by those depends on a lot on how other services are ordered w/r/t to basic.target and sysinit.target17:39
rharperright, so the default deps of sysinit.target and basic.target make cloud-init.service run too late17:40
larsksMaaaybe.  I've noticed that between ubuntu and rhel/fedora there are substantial differences in service ordering.  And even between fedora and rhel, I think.17:40
rharpervery likely17:40
rharperthe list of units and ordering is massively fragile17:40
larsksSo we still don't have a clear test case that demonstrates an actual problem.  It sounds like you are suggesting that an openstack config that passes in an explicit network configuration should help demonstrate one?17:41
larsksI can try putting that together later this week.17:41
rharperyou don't even need a network config17:41
rharperjust use the default metadata service (ie, not a configdrive)17:42
rharperin our case, if you run systemctl list-dependencies sysinit.target17:42
rharperhonestly; we can try moving it back in as well and see what breaks17:43
rharperin general, it's such a mess that it gets paged out of my memory once things work as expected17:43
larsksYeah, the thing is, running cloud-init *after* network-online.target will work just fine in that case (since it doesn't need to touch the network config).17:44
larsksSince networking is up, it will have no problem contacting the metadata service.17:44
rharperright17:44
larsksI am trying to produce a failure :)17:44
larsksSpeaking of paging things out, I should get lunch before my next meeting and the meeting after that.  I've also pointed pitti at the problem, although he's got devconf going on right now and may not be able to look at things until next week or so.17:49
rharpercool17:49
=== shardy is now known as shardy_afk
=== shardy_afk is now known as shardy
rharpersmoser:  https://code.launchpad.net/~raharper/cloud-init/+git/cloud-init/+merge/31563321:49
=== rangerpb is now known as rangerpbzzzz

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!