[02:11] I am seeing some problems with a ubuntu server 18.04.3 instance, where on bootup, the cannot start the network interface causing the entire machine to not start. If I check /etc/network/interfaces, I just see a blank file and am not sure where it is failing to start the problem. [02:34] Actually, I take that back, its ubuntu 18.04.4 (I didn't know there was a more recent upgrade that occurred) [02:49] This what I eventually see if I wait long enough: https://imgur.com/HW4ozvM [03:23] aloini: ubuntu server 18.04 uses netplan with the systemd-networkd renderer for network configuration by default, /etc/network/interfaces would be legacy. [03:23] do you read release notes? [03:23] !releasenotes [03:23] For release notes of a given Ubuntu release, please refer to the 'Docs' column on the 'List of releases' table at https://wiki.ubuntu.com/Releases [03:24] I do yes, my issue is that this occurred after a reboot of an already functioning network configuration and the server was working. The server is hosted in esxi, and the other servers I have have no problems on the host. [03:25] the screenshot you posted does not explain what failed about brining up systemdd-networkd, you'll need to refer to the log files as indicated. [03:25] Which log files? I attempted to look at /var/log/syslog, lastlog, kernel, and others but couldn't find any relevant info in any log file. [03:26] quoting your screen shot: "See systemctl status systemd-networkd.service for details." [03:26] I can't do that if the system does not boot or drop to a shell though. [03:26] If I reboot into recovery mode, there is no relevant information there. [03:27] so it does not contniue to boot after 1min30s are reached? [03:27] No, it just continually cycles through this process of trying to start the network interface for an unlimited amount of time. [03:27] i see. in this case you may want to boot to recovery [03:28] I rebooted the server this morning around noon, and then came back to it at 5 and it was still cycling through. [03:28] !recovery [03:28] If your system fails to boot normally, it may be useful to boot it into recovery mode. For instructions, see https://wiki.ubuntu.com/RecoveryMode [03:28] other than syslog there's also journalctl for accessing log files. [03:29] well, not log files, but logs [03:30] to me, cloud-init is the culprit there [03:31] Where would you start from here then? If I boot into recovery and tell it to drop to root shell it does that successfully. But, I am honestly, not familiar with cloud-init so I am not sure what I need to do here to resolve cloud-init or another service to have it start working again. [03:32] But one second, let me see if there is a way to get something from journalctl. [03:33] journalctl -b -1 -e would let you inspect the end (-e) of what was logged during the previous (-b -1 ) boot [03:38] I see a call trace in the log right after starting, but, nothing more then that. I am unable to copy and paste things, so photos will be the only way to achieve this... one second. [03:38] https://imgur.com/lzIrhHo [03:39] a (virtual) serial console would enable to copy and paste [03:40] For what it is also worth, in recovery, I can ping out to IP addresses, but am unable to use DNS (IE: can't ping google.com but can ping google's dns servers, 8.8.8.8) [03:41] since i know practically nothing about this system, guessing on the lower end of a kernel call trace is not going to get us very far. this trace refers to "fuse", which may suggest your system makes use of a fuse file system, where the driver fails somehow., [03:43] since you have networking, you could post the full log to termbin, if that's acceptable in terms of company policies / regulations, to share those [03:43] Yeah it would be, this is a personal system, not a company system. [03:43] journalctl -b -1 | nc 5.39.93.71 9999 [03:44] post the url it returns [03:45] https://termbin.com/f6ad [03:47] "pci 0000:00:15.3: BAR 13: no space for [io size 0x1000]" and "pci 0000:00:15.3: BAR 13: failed to assign [io size 0x1000]" is the first problem, try a web search for this [03:51] so this does not seem to really hint on why the ens160 network interface fails to get configured. maybe you can share the network configuration? [03:53] maybe using: cat /etc/netplan/* | nc 5.39.93.71 9999 [03:54] the systemd-timesyncd task gets hung somehow. this could be due to problems with the hwclock provided by the (vmware) virtualization [03:54] https://termbin.com/fk0l [03:54] you were not running the latest kernel package at the time, though [03:54] does the same still happen on the latest kernel? [03:55] It's mainly a DHCP configuration, and the DHCP server is up and running as far as I can see (other clients are receiving addresses without issue) [03:56] dhcp would happen after the network interface is brought up, so its indeed not a dhcp issue [03:56] Not sure tomreyn, I can't run apt update due to the lack of dns, it seems that the file that symlinks to /etc/resolv.conf (../run/systemd/resolve/stub-resolve.conf) is missing in recovery [03:59] you can either mount a tmpfs at /run and create the expected directories and the file there, with some public resolvers or your preferred ones, or you can delete the symlink at /etc/resolv.conf and place the file there,p then delete it later on. [04:00] (or just move it aside) [04:04] So, I don't see any potential upgrades for the kernel, https://termbin.com/b3g7 [04:08] good. all i know is that when it was creating the logs you posted at https://termbin.com/f6ad it was running 5.3.0-26-generic #28 [04:08] but 5.3.0-28-generic #30 is available now [04:10] i had you posted the log from last but one boot there, though [04:10] i suggest you start by looking for a vmware upgrade first of all, since this can be a virtualization issue [04:10] Ah, yeah, I booted into .28 to verify. [04:10] Ah, yeah, I booted into .26 to verify if a previous kernel would fix it. * [04:11] the log we were looking at was produced between Wed 2019-11-06 03:27:04 UTC (when it booted) and Mon 2020-02-17 03:34:53 UTC (when the log ends, due to reboot or shutdown), though. [04:12] the log is probably also not posted completely, but cut off towards the end (or the system froze / power cycled there) [04:13] it may be useful to review a log of a current kernel boot after you've worked out the vmware side of things [04:13] So if I boot to recovery, remove the resolv.conf file, run init 5, I can then boot the system perfectly fine. [04:14] I am sure there are things that are not necessarily working correctly however. [04:14] so no more pci errors? [04:14] Does seem like fuse might be causing it. [04:14] and does systemd-timesyncd work then? [04:14] What is the latest linux 4 kernel? [04:15] upstream? kernel.org would tell. [04:16] user@plex:~$ which systemd-timesyncd [04:16] user@plex:~$ command -v systemd-timesyncd [04:16] There is no output of that command [04:16] it's a systemd service [04:17] timedatectl can query it [04:18] https://termbin.com/3gto [04:18] If I run timedatectl nothing happens however [04:19] i'll be happy to continue looking into this once you have convincingly stated that you've reviewed available vmware updates [04:21] also discuss how you use fuse file systems [04:22] and show a journalctl -b for a current kernel boot [04:22] in this order [04:24] Updating to https://docs.vmware.com/en/VMware-vSphere/6.7/rn/esxi670-201912001.html right now, but, using fuse to mount a Google Drive File System mount via rclone and cache. Once the esxi upgrade is complete, I will get back to you on the other stuff. [04:40] So this is the current boot log if I do the following: recovery, init 5: https://termbin.com/486f [04:40] If I just have the system boot up, it still goes through the continuous loop of starting networking services [04:51] I also do have the latest version of vmware tools installd into the guest OS as well [04:51] ii open-vm-tools 2:11.0.1-2ubuntu0.18.04.2 amd64 Open VMware Tools for virtual machines hosted on VMware (CLI) [04:53] unfortunately the previously problematic PCI 15ad:07a0 vmware device triggering the "no space for [io size 0x1000]" messages is still problematic. maybe a newer version of vmwares' guest additions (provided by them/the virtualization host) may help. [04:54] how do you mount the fuse file system in fstab? [04:59] Ah, thanks, you made me remember a change I made several weeks ago to a systemd file. [04:59] Fixing that actually caused the system to boot again properly. [05:01] I am mounting fuse with a systemd script that waits on the network to mount due to Google Drive requiring a valid network connection. [05:02] aloini: don't keep me dumb - which change did you make and revert now? [05:03] I basically left off a \ for the script. [05:03] One second. [05:04] https://paste.ubuntu.com/p/CFnzvfGQry/ [05:04] Line 19 was missing the \ [05:04] Adding that resolved the problem [05:06] i see, so just a syntax error in a systemd service file, i'd hoped this to be reported by systemd when you enabled the service. [05:07] you can and should use the _netdev mount option in /etc/fstab for network devices [07:05] Good morning === Wryhder is now known as Lucas_Gray === Wryhder is now known as Lucas_Gray [16:19] in the process of an LTS -> LTS upgrade it stoped at the question of a modified file and the options to keep it, view difference, etc. but then didn't take any input anymore. dpkg process is still running and i see a process called 'xenial'. how to best debug the current situation? just kill dpkg? [16:19] I'm trying to do a headless install of Ubuntu 19.10 through serial console using these instructions: https://askubuntu.com/questions/250869/how-can-i-install-ubuntu-on-a-device-without-a-screen-nor-a-keyboard/260469#260469 [16:20] But these files don't exist: syslinx.cfg or text.cfg [16:20] Any ideas how to get this to work for Ubuntu 19.10? === teward_ is now known as teward