[02:11] <aloini> I am seeing some problems with a ubuntu server 18.04.3 instance, where on bootup, the cannot start the network interface causing the entire machine to not start. If I check /etc/network/interfaces, I just see a blank file and am not sure where it is failing to start the problem.
[02:34] <aloini> Actually, I take that back, its ubuntu 18.04.4 (I didn't know there was a more recent upgrade that occurred)
[02:49] <aloini> This what I eventually see if I wait long enough: https://imgur.com/HW4ozvM
[03:23] <tomreyn> aloini: ubuntu server 18.04 uses netplan with the systemd-networkd renderer for network configuration by default, /etc/network/interfaces would be legacy.
[03:23] <tomreyn> do you read release notes?
[03:23] <tomreyn> !releasenotes
[03:24] <aloini> I do yes, my issue is that this occurred after a reboot of an already functioning network configuration and the server was working. The server is hosted in esxi, and the other servers I have have no problems on the host.
[03:25] <tomreyn> the screenshot you posted does not explain what failed about brining up systemdd-networkd, you'll need to refer to the log files as indicated.
[03:25] <aloini> Which log files? I attempted to look at /var/log/syslog, lastlog, kernel, and others but couldn't find any relevant info in any log file.
[03:26] <tomreyn> quoting your screen shot: "See systemctl status systemd-networkd.service for details."
[03:26] <aloini> I can't do that if the system does not boot or drop to a shell though.
[03:26] <aloini> If I reboot into recovery mode, there is no relevant information there.
[03:27] <tomreyn> so it does not contniue to boot after 1min30s are reached?
[03:27] <aloini> No, it just continually cycles through this process of trying to start the network interface for an unlimited amount of time.
[03:27] <tomreyn> i see. in this case you may want to boot to recovery
[03:28] <aloini> I rebooted the server this morning around noon, and then came back to it at 5 and it was still cycling through.
[03:28] <tomreyn> !recovery
[03:28] <tomreyn> other than syslog there's also journalctl for accessing log files.
[03:29] <tomreyn> well, not log files, but logs
[03:30] <tomreyn> to me, cloud-init is the culprit there
[03:31] <aloini> Where would you start from here then? If I boot into recovery and tell it to drop to root shell it does that successfully. But, I am honestly, not familiar with cloud-init so I am not sure what I need to do here to resolve cloud-init or another service to have it start working again.
[03:32] <aloini> But one second, let me see if there is a way to get something from journalctl.
[03:33] <tomreyn> journalctl -b -1 -e     would let you inspect the end (-e) of what was logged during the previous (-b -1 ) boot
[03:38] <aloini> I see a call trace in the log right after starting, but, nothing more then that. I am unable to copy and paste things, so photos will be the only way to achieve this... one second.
[03:38] <aloini> https://imgur.com/lzIrhHo
[03:39] <tomreyn> a (virtual) serial console would enable to copy and paste
[03:40] <aloini> For what it is also worth, in recovery, I can ping out to IP addresses, but am unable to use DNS (IE: can't ping google.com but can ping google's dns servers, 8.8.8.8)
[03:41] <tomreyn> since i know practically nothing about this system, guessing on the lower end of a kernel call trace is not going to get us very far. this trace refers to "fuse", which may suggest your system makes use of a fuse file system, where the driver fails somehow.,
[03:43] <tomreyn> since you have networking, you could post the full log to termbin, if that's acceptable in terms of company policies / regulations, to share those
[03:43] <aloini> Yeah it would be, this is a personal system, not a company system.
[03:43] <tomreyn> journalctl -b -1 | nc 5.39.93.71 9999
[03:44] <tomreyn> post the url it returns
[03:45] <aloini> https://termbin.com/f6ad
[03:47] <tomreyn> "pci 0000:00:15.3: BAR 13: no space for [io  size 0x1000]" and "pci 0000:00:15.3: BAR 13: failed to assign [io  size 0x1000]" is the first problem, try a web search for this
[03:51] <tomreyn> so this does not seem to really hint on why the ens160 network interface fails to get configured. maybe you can share the network configuration?
[03:53] <tomreyn> maybe using:  cat /etc/netplan/* | nc 5.39.93.71 9999
[03:54] <tomreyn> the systemd-timesyncd task gets hung somehow. this could be due to problems with the hwclock provided by the (vmware) virtualization
[03:54] <aloini> https://termbin.com/fk0l
[03:54] <tomreyn> you were not running the latest kernel package at the time, though
[03:54] <tomreyn> does the same still happen on the latest kernel?
[03:55] <aloini> It's mainly a DHCP configuration, and the DHCP server is up and running as far as I can see (other clients are receiving addresses without issue)
[03:56] <tomreyn> dhcp would happen after the network interface is brought up, so its indeed not a dhcp issue
[03:56] <aloini> Not sure tomreyn, I can't run apt update due to the lack of dns, it seems that the file that symlinks to /etc/resolv.conf (../run/systemd/resolve/stub-resolve.conf) is missing in recovery
[03:59] <tomreyn> you can either mount a tmpfs at /run and create the expected directories and the file there, with some public resolvers or your preferred ones, or you can delete the symlink at /etc/resolv.conf and place the file there,p then delete it later on.
[04:00] <tomreyn> (or just move it aside)
[04:04] <aloini> So, I don't see any potential upgrades for the kernel, https://termbin.com/b3g7
[04:08] <tomreyn> good. all i know is that when it was creating the logs you posted at https://termbin.com/f6ad it was running 5.3.0-26-generic #28
[04:08] <tomreyn> but 5.3.0-28-generic #30 is available now
[04:10] <tomreyn> i had you posted the log from last but one boot there, though
[04:10] <tomreyn> i suggest you start by looking for a vmware upgrade first of all, since this can be a virtualization issue
[04:10] <aloini> Ah, yeah, I booted into .28 to verify.
[04:10] <aloini> Ah, yeah, I booted into .26 to verify if a previous kernel would fix it. *
[04:11] <tomreyn> the log we were looking at was produced between Wed 2019-11-06 03:27:04 UTC (when it booted) and Mon 2020-02-17 03:34:53 UTC (when the log ends, due to reboot or shutdown), though.
[04:12] <tomreyn> the log is probably also not posted completely, but cut off towards the end (or the system froze / power cycled there)
[04:13] <tomreyn> it may be useful to review a log of a current kernel boot after you've worked out the vmware side of things
[04:13] <aloini> So if I boot to recovery, remove the resolv.conf file, run init 5, I can then boot the system perfectly fine.
[04:14] <aloini> I am sure there are things that are not necessarily working correctly however.
[04:14] <tomreyn> so no more pci errors?
[04:14] <aloini> Does seem like fuse might be causing it.
[04:14] <tomreyn> and does systemd-timesyncd work then?
[04:14] <aloini> What is the latest linux 4 kernel?
[04:15] <tomreyn> upstream? kernel.org would tell.
[04:16] <aloini> user@plex:~$ which systemd-timesyncd
[04:16] <aloini> user@plex:~$ command -v systemd-timesyncd
[04:16] <aloini> There is no output of that command
[04:16] <tomreyn> it's a systemd service
[04:17] <tomreyn> timedatectl can query it
[04:18] <aloini> https://termbin.com/3gto
[04:18] <aloini> If I run timedatectl nothing happens however
[04:19] <tomreyn> i'll be happy to continue looking into this once you have convincingly stated that you've reviewed available vmware updates
[04:21] <tomreyn> also discuss how you use fuse file systems
[04:22] <tomreyn> and show a     journalctl -b     for a current kernel boot
[04:22] <tomreyn> in this order
[04:24] <aloini> Updating to https://docs.vmware.com/en/VMware-vSphere/6.7/rn/esxi670-201912001.html right now, but, using fuse to mount a Google Drive File System mount via rclone and cache. Once the esxi upgrade is complete, I will get back to you on the other stuff.
[04:40] <aloini> So this is the current boot log if I do the following: recovery, init 5: https://termbin.com/486f
[04:40] <aloini> If I just have the system boot up, it still goes through the continuous loop of starting networking services
[04:51] <aloini> I also do have the latest version of vmware tools installd into the guest OS as well
[04:51] <aloini> ii  open-vm-tools                        2:11.0.1-2ubuntu0.18.04.2                       amd64        Open VMware Tools for virtual machines hosted on VMware (CLI)
[04:53] <tomreyn> unfortunately the previously problematic PCI 15ad:07a0 vmware device triggering the "no space for [io  size 0x1000]" messages is still problematic. maybe a newer version of vmwares' guest additions (provided by them/the virtualization host) may help.
[04:54] <tomreyn> how do you mount the fuse file system in fstab?
[04:59] <aloini> Ah, thanks, you made me remember a change I made several weeks ago to a systemd file.
[04:59] <aloini> Fixing that actually caused the system to boot again properly.
[05:01] <aloini> I am mounting fuse with a systemd script that waits on the network to mount due to Google Drive requiring a valid network connection.
[05:02] <tomreyn> aloini: don't keep me dumb - which change did you make and revert now?
[05:03] <aloini> I basically left off a \ for the script.
[05:03] <aloini> One second.
[05:04] <aloini> https://paste.ubuntu.com/p/CFnzvfGQry/
[05:04] <aloini> Line 19 was missing the \
[05:04] <aloini> Adding that resolved the problem
[05:06] <tomreyn> i see, so just a syntax error in a systemd service file, i'd hoped this to be reported by systemd when you enabled the service.
[05:07] <tomreyn> you can and should use the _netdev mount option in /etc/fstab for network devices
[07:05] <lordievader> Good morning
[16:19] <charolastra> in the process of an LTS -> LTS upgrade it stoped at the question of a modified file and the options to keep it, view difference, etc. but then didn't take any input anymore. dpkg process is still running and i see a process called 'xenial'. how to best debug the current situation? just kill dpkg?
[16:19] <blenderartist18> I'm trying to do a headless install of Ubuntu 19.10 through serial console using these instructions: https://askubuntu.com/questions/250869/how-can-i-install-ubuntu-on-a-device-without-a-screen-nor-a-keyboard/260469#260469
[16:20] <blenderartist18> But these files don't exist: syslinx.cfg or text.cfg
[16:20] <blenderartist18> Any ideas how to get this to work for Ubuntu 19.10?