aloini | I am seeing some problems with a ubuntu server 18.04.3 instance, where on bootup, the cannot start the network interface causing the entire machine to not start. If I check /etc/network/interfaces, I just see a blank file and am not sure where it is failing to start the problem. | 02:11 |
---|---|---|
aloini | Actually, I take that back, its ubuntu 18.04.4 (I didn't know there was a more recent upgrade that occurred) | 02:34 |
aloini | This what I eventually see if I wait long enough: https://imgur.com/HW4ozvM | 02:49 |
tomreyn | aloini: ubuntu server 18.04 uses netplan with the systemd-networkd renderer for network configuration by default, /etc/network/interfaces would be legacy. | 03:23 |
tomreyn | do you read release notes? | 03:23 |
tomreyn | !releasenotes | 03:23 |
ubottu | For release notes of a given Ubuntu release, please refer to the 'Docs' column on the 'List of releases' table at https://wiki.ubuntu.com/Releases | 03:23 |
aloini | I do yes, my issue is that this occurred after a reboot of an already functioning network configuration and the server was working. The server is hosted in esxi, and the other servers I have have no problems on the host. | 03:24 |
tomreyn | the screenshot you posted does not explain what failed about brining up systemdd-networkd, you'll need to refer to the log files as indicated. | 03:25 |
aloini | Which log files? I attempted to look at /var/log/syslog, lastlog, kernel, and others but couldn't find any relevant info in any log file. | 03:25 |
tomreyn | quoting your screen shot: "See systemctl status systemd-networkd.service for details." | 03:26 |
aloini | I can't do that if the system does not boot or drop to a shell though. | 03:26 |
aloini | If I reboot into recovery mode, there is no relevant information there. | 03:26 |
tomreyn | so it does not contniue to boot after 1min30s are reached? | 03:27 |
aloini | No, it just continually cycles through this process of trying to start the network interface for an unlimited amount of time. | 03:27 |
tomreyn | i see. in this case you may want to boot to recovery | 03:27 |
aloini | I rebooted the server this morning around noon, and then came back to it at 5 and it was still cycling through. | 03:28 |
tomreyn | !recovery | 03:28 |
ubottu | If your system fails to boot normally, it may be useful to boot it into recovery mode. For instructions, see https://wiki.ubuntu.com/RecoveryMode | 03:28 |
tomreyn | other than syslog there's also journalctl for accessing log files. | 03:28 |
tomreyn | well, not log files, but logs | 03:29 |
tomreyn | to me, cloud-init is the culprit there | 03:30 |
aloini | Where would you start from here then? If I boot into recovery and tell it to drop to root shell it does that successfully. But, I am honestly, not familiar with cloud-init so I am not sure what I need to do here to resolve cloud-init or another service to have it start working again. | 03:31 |
aloini | But one second, let me see if there is a way to get something from journalctl. | 03:32 |
tomreyn | journalctl -b -1 -e would let you inspect the end (-e) of what was logged during the previous (-b -1 ) boot | 03:33 |
aloini | I see a call trace in the log right after starting, but, nothing more then that. I am unable to copy and paste things, so photos will be the only way to achieve this... one second. | 03:38 |
aloini | https://imgur.com/lzIrhHo | 03:38 |
tomreyn | a (virtual) serial console would enable to copy and paste | 03:39 |
aloini | For what it is also worth, in recovery, I can ping out to IP addresses, but am unable to use DNS (IE: can't ping google.com but can ping google's dns servers, 8.8.8.8) | 03:40 |
tomreyn | since i know practically nothing about this system, guessing on the lower end of a kernel call trace is not going to get us very far. this trace refers to "fuse", which may suggest your system makes use of a fuse file system, where the driver fails somehow., | 03:41 |
tomreyn | since you have networking, you could post the full log to termbin, if that's acceptable in terms of company policies / regulations, to share those | 03:43 |
aloini | Yeah it would be, this is a personal system, not a company system. | 03:43 |
tomreyn | journalctl -b -1 | nc 5.39.93.71 9999 | 03:43 |
tomreyn | post the url it returns | 03:44 |
aloini | https://termbin.com/f6ad | 03:45 |
tomreyn | "pci 0000:00:15.3: BAR 13: no space for [io size 0x1000]" and "pci 0000:00:15.3: BAR 13: failed to assign [io size 0x1000]" is the first problem, try a web search for this | 03:47 |
tomreyn | so this does not seem to really hint on why the ens160 network interface fails to get configured. maybe you can share the network configuration? | 03:51 |
tomreyn | maybe using: cat /etc/netplan/* | nc 5.39.93.71 9999 | 03:53 |
tomreyn | the systemd-timesyncd task gets hung somehow. this could be due to problems with the hwclock provided by the (vmware) virtualization | 03:54 |
aloini | https://termbin.com/fk0l | 03:54 |
tomreyn | you were not running the latest kernel package at the time, though | 03:54 |
tomreyn | does the same still happen on the latest kernel? | 03:54 |
aloini | It's mainly a DHCP configuration, and the DHCP server is up and running as far as I can see (other clients are receiving addresses without issue) | 03:55 |
tomreyn | dhcp would happen after the network interface is brought up, so its indeed not a dhcp issue | 03:56 |
aloini | Not sure tomreyn, I can't run apt update due to the lack of dns, it seems that the file that symlinks to /etc/resolv.conf (../run/systemd/resolve/stub-resolve.conf) is missing in recovery | 03:56 |
tomreyn | you can either mount a tmpfs at /run and create the expected directories and the file there, with some public resolvers or your preferred ones, or you can delete the symlink at /etc/resolv.conf and place the file there,p then delete it later on. | 03:59 |
tomreyn | (or just move it aside) | 04:00 |
aloini | So, I don't see any potential upgrades for the kernel, https://termbin.com/b3g7 | 04:04 |
tomreyn | good. all i know is that when it was creating the logs you posted at https://termbin.com/f6ad it was running 5.3.0-26-generic #28 | 04:08 |
tomreyn | but 5.3.0-28-generic #30 is available now | 04:08 |
tomreyn | i had you posted the log from last but one boot there, though | 04:10 |
tomreyn | i suggest you start by looking for a vmware upgrade first of all, since this can be a virtualization issue | 04:10 |
aloini | Ah, yeah, I booted into .28 to verify. | 04:10 |
aloini | Ah, yeah, I booted into .26 to verify if a previous kernel would fix it. * | 04:10 |
tomreyn | the log we were looking at was produced between Wed 2019-11-06 03:27:04 UTC (when it booted) and Mon 2020-02-17 03:34:53 UTC (when the log ends, due to reboot or shutdown), though. | 04:11 |
tomreyn | the log is probably also not posted completely, but cut off towards the end (or the system froze / power cycled there) | 04:12 |
tomreyn | it may be useful to review a log of a current kernel boot after you've worked out the vmware side of things | 04:13 |
aloini | So if I boot to recovery, remove the resolv.conf file, run init 5, I can then boot the system perfectly fine. | 04:13 |
aloini | I am sure there are things that are not necessarily working correctly however. | 04:14 |
tomreyn | so no more pci errors? | 04:14 |
aloini | Does seem like fuse might be causing it. | 04:14 |
tomreyn | and does systemd-timesyncd work then? | 04:14 |
aloini | What is the latest linux 4 kernel? | 04:14 |
tomreyn | upstream? kernel.org would tell. | 04:15 |
aloini | user@plex:~$ which systemd-timesyncd | 04:16 |
aloini | user@plex:~$ command -v systemd-timesyncd | 04:16 |
aloini | There is no output of that command | 04:16 |
tomreyn | it's a systemd service | 04:16 |
tomreyn | timedatectl can query it | 04:17 |
aloini | https://termbin.com/3gto | 04:18 |
aloini | If I run timedatectl nothing happens however | 04:18 |
tomreyn | i'll be happy to continue looking into this once you have convincingly stated that you've reviewed available vmware updates | 04:19 |
tomreyn | also discuss how you use fuse file systems | 04:21 |
tomreyn | and show a journalctl -b for a current kernel boot | 04:22 |
tomreyn | in this order | 04:22 |
aloini | Updating to https://docs.vmware.com/en/VMware-vSphere/6.7/rn/esxi670-201912001.html right now, but, using fuse to mount a Google Drive File System mount via rclone and cache. Once the esxi upgrade is complete, I will get back to you on the other stuff. | 04:24 |
aloini | So this is the current boot log if I do the following: recovery, init 5: https://termbin.com/486f | 04:40 |
aloini | If I just have the system boot up, it still goes through the continuous loop of starting networking services | 04:40 |
aloini | I also do have the latest version of vmware tools installd into the guest OS as well | 04:51 |
aloini | ii open-vm-tools 2:11.0.1-2ubuntu0.18.04.2 amd64 Open VMware Tools for virtual machines hosted on VMware (CLI) | 04:51 |
tomreyn | unfortunately the previously problematic PCI 15ad:07a0 vmware device triggering the "no space for [io size 0x1000]" messages is still problematic. maybe a newer version of vmwares' guest additions (provided by them/the virtualization host) may help. | 04:53 |
tomreyn | how do you mount the fuse file system in fstab? | 04:54 |
aloini | Ah, thanks, you made me remember a change I made several weeks ago to a systemd file. | 04:59 |
aloini | Fixing that actually caused the system to boot again properly. | 04:59 |
aloini | I am mounting fuse with a systemd script that waits on the network to mount due to Google Drive requiring a valid network connection. | 05:01 |
tomreyn | aloini: don't keep me dumb - which change did you make and revert now? | 05:02 |
aloini | I basically left off a \ for the script. | 05:03 |
aloini | One second. | 05:03 |
aloini | https://paste.ubuntu.com/p/CFnzvfGQry/ | 05:04 |
aloini | Line 19 was missing the \ | 05:04 |
aloini | Adding that resolved the problem | 05:04 |
tomreyn | i see, so just a syntax error in a systemd service file, i'd hoped this to be reported by systemd when you enabled the service. | 05:06 |
tomreyn | you can and should use the _netdev mount option in /etc/fstab for network devices | 05:07 |
lordievader | Good morning | 07:05 |
=== Wryhder is now known as Lucas_Gray | ||
=== Wryhder is now known as Lucas_Gray | ||
charolastra | in the process of an LTS -> LTS upgrade it stoped at the question of a modified file and the options to keep it, view difference, etc. but then didn't take any input anymore. dpkg process is still running and i see a process called 'xenial'. how to best debug the current situation? just kill dpkg? | 16:19 |
blenderartist18 | I'm trying to do a headless install of Ubuntu 19.10 through serial console using these instructions: https://askubuntu.com/questions/250869/how-can-i-install-ubuntu-on-a-device-without-a-screen-nor-a-keyboard/260469#260469 | 16:19 |
blenderartist18 | But these files don't exist: syslinx.cfg or text.cfg | 16:20 |
blenderartist18 | Any ideas how to get this to work for Ubuntu 19.10? | 16:20 |
=== teward_ is now known as teward |
Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!