[11:41] morning guys, my NIC's stopped working. over the weekend and I managed to verify that its not an HW issue. now since the box was up and running just fine for the last couple years (ubuntu 16.04) and I didnt update anything manually the only idea I have is that a kernel patch broke it [11:42] I am using canonicals live patch, so was there a new patch over the weekend or is there any issue that could explain the issue with a recent patch? === himcesjf_ is now known as him-cesjf [13:04] ben_r, ^ [13:04] steven, there should be logging somewhere to say when if any patches were applied [13:05] apw, I didn't see the original message? [13:05] "somewhere"? [13:05] ben_r, steven has a machine which was up for a very long time, he believes his nic stopped working and that this is not h/w [13:05] somewhere in a web UI or on the server? [13:06] ben_r, and was asking if a livepatch was released over the weekend [13:07] i assume we log application locally somewhere ... [13:07] as i am running cosmic i am not using it [13:07] steven: if you're looking for if a livepatch is applied, there would be logs in /var/log/syslog, or dmesg if this was recent enough, and if the system is accessible, you can look in /sys/kernel/livepatch for entries there [13:08] ahhhhh, ok [13:08] I booted into live systems and those saw the NIC's so its definitely not a HW failure [13:08] I have a (very shitty) serial console [13:08] over the weekend no, there was one on monday, the bionic patch did include a wifi fix [13:09] sounds about rightd [13:09] well depending on the TZ that is, but I was using the server on sunday and monday it stopped working [13:09] so somewhere between sun/mon somehing must have happened [13:10] ben_r: I dont ahve /var/kernel/syslog [13:10] /var/log/syslog [13:10] it's /var/log/syslog [13:11] ah ok [13:11] so lemme check [13:13] anything I can grep for? [13:13] steven: grep -i livepatch will turn up what time a patch was loaded [13:16] steven: for the record the wifi fix was in mac80211_hwsim, to plug a memory leak... [13:18] ben_r, that doesn't sound like something one would be using on a server nic [13:18] ok so livepatch did something on may 21st [13:18] steven, does the nic claim to be up in system config, if you ask it to cycle for instance [13:20] ok so if I lspci it seems to be listed, broadcom corporation NetXtreme BCM 5720 ghigabit ethernet PCIe [13:20] apw: not quite sure what yuou mean [13:21] steven, was asking if the nic still claimed to have an address etc [13:21] claimed it where? [13:22] the system doesnt see the NIC :D [13:23] from ubuntu/network configuration perspective it doesn't exist apw ben_r, I can't configure it, ip link show/ifconfig dont list it. its gone. [13:23] lspci seems to list the hardware at least [13:23] steven, has this system been rebooted since or did it just dissappear with the system up [13:24] i guess we ought to see it going away in syslog if it has not rebooted [13:24] yes ofc, had to verify the HW was still available. by booting a live system [13:30] so can you see anything in syslog regarding the livepatch being applied and between that and the reboot regarding the nic ? [13:31] when you rebooted, did you get the kernl base or are you upgraded to the main kernel with the fixes applied [13:32] a sec, going thru the log right now looking for the moment it threw errors [13:38] apw: how can I check that? [13:38] I only have one entry [13:38] in grub that is [13:40] so there is something I guess could be of interest, on may 21st, some modules load log "inserted module iscsi_tcp" and ib_iser [13:40] and then I get hundred of lines of log for that very same second and the next thing u know it cannot bring up the eno1 [13:40] apw: ben_r ^ [13:41] steven: can you pastebin the messages? [13:42] I actually can't, will a sceenshot do? (its a real shit serial console [13:42] steven: sure, just so we can read it, preferably the beginning where the errors start [13:43] gimme a couple minutes, I host images on the server. need to upload stuff on dropbox instead :D [13:48] https://www.dropbox.com/s/zlaqmcnffj5gg30/Screenshot%20at%2015-44-27.png?dl=0 [13:48] ttps://www.dropbox.com/s/6e3z2m2wt6a85ej/Screenshot%20at%2015-44-43.png?dl=0 [13:49] second 47, now hundred of systemd and kernel lines [13:49] and eventually [13:49] ttps://www.dropbox.com/s/ion6uoxgvqbma9e/Screenshot%20at%2015-47-05.png?dl=0 [13:49] and thats it ben_r [13:49] now its just a bunch of network error logs cos it cannot find the eno1 [14:00] steven: I don't see anything odd here kernel-wise. You said it could see the device in lspci right? any chance this server has a BMC in it with a shared network port? eno1 would be the builtin port, so the BMC might have stolen it [14:01] I don't know what a BMC is [14:02] and lspci lists it, yes. [14:02] steven: it's the controller for the server, lets you remote power on/off, probably manages that serial console too [14:03] oh its an HP server, it has this iLo which is a thing on its own [14:03] unrelated to the actual nic's [14:03] hmm [14:04] its really dumb, I simply dont know where to look or what to present to figure out whether this indeed is a kernel issue or not. [14:04] it would make sense tho since it stopped working right around the time the live patch was released [14:05] and every other os has no issue with it [14:08] steven: the next thing I'd do is grep on eno1, make sure that the driver you're using grabbed the device and nothing else renamed it [14:09] if you can reboot it again, right after you boot a 'dmesg | grep eno1' might turn something else up too [14:09] ah right, bout that [14:09] the serial console for some reason won;t accept | [14:09] so I can't even pipe. [14:10] oh wow [14:10] I know, ok so Imma have to leave the house right now [14:10] I appreciate your help, do you mind if I ping u later again? [14:10] sure, feel free [14:11] thank you! ttyl. [16:37] ben_r: I am back for a bit, you happen to be around? [16:41] steven: yep [16:42] nice! so like. I am not a kernel guy and don't really now much about driver stuff in general, but if I know lspci lists the HW is there a way to see what happens in the kernel [16:42] like whether it loads a driver or errors out? [16:42] and that without using a pipe :D [16:44] steven: are you using the driver provided by ubuntu or broadcom's driver? [16:44] all stock ubunti [16:44] I *think* that card uses tg3 [16:44] I didn't install/compile anything nor used a custom ppa [16:45] so lspci -v should show a "kernel driver in use" line for each device [16:46] give that a try first... [16:47] https://paste.ubuntu.com/p/gZT4s23Kmy/ [16:48] it even says NIC 2 [16:48] try it again with sudo, but I'm pretty sure you can see drive in use without that [16:48] *driver [16:52] dammit [16:52] console is not big enough [16:53] try 'sudo lspci -v -s 03:00.1', that should do just your card [16:54] https://paste.ubuntu.com/p/4MtzgYTfqY/ ben_r [16:55] steven: yeah, I don't see a driver... [16:55] hmm [16:55] https://paste.ubuntu.com/p/YCkBgbFdRP/ ip link show, just to show that it really doesnt show up [16:57] yeah, no surprise there, no driver == no device :) you could try to load the driver with 'sudo modprobe -v tg3' [16:58] ah look at this [16:58] � modprobe -v tg3 [16:58] modprobe: FATAL: Module tg3 not found in directory /lib/modules/4.4.0-81-generic [16:58] steven, do you have -extra installed ? [17:00] whats the pkg name? [17:00] linux-image-extra-4.4.0-81-generic i assume [17:01] not installed [17:02] but I didnt install nor remove anything [17:04] ok so I do have the tg3.ko file in a folder for 4.4.0-66 [17:04] try the modprobe again [17:04] I mean I could just cp it into the currents kernel folder and hope for the best? [17:05] still, fatal. module not found] [17:05] after you install extras it should appear in the right place [17:05] I can;t install anything ;D [17:05] well, the 4.4.0-66 one won't work [17:06] presumably you have 4.4.0-66 kernel installed? [17:07] if you can reboot into that kernel, the driver should load and you'll get the network back. [17:07] ok so.. can I instruct grub to load a specific kernel? [17:08] I never actually told a system what kernel to boot] [17:08] but I do have the 66 installed [17:08] still [17:09] I think the iLO gives you a boot console right? if you hold shift during boot GRUB should pop a menu up and let you pick the older kernel [17:11] sure. lemme reboot it [17:12] you want the second entry, "Advanced options for Ubuntu" then pick the 66 kernel from there [17:12] press or hold? [17:12] hold it while it boots [17:12] it takes a couple minutes cos it does some hp init stuff before it actually loads grub [17:12] * ben_r nods [17:16] so what happended? the driver got removed from the kernel and was moved into extra and the live patch upated the kernel? [17:18] steven, livepatch doesn't change the installed kernel on disk, if the kernl didn't have extras there it was not installed [17:19] booted to the previous kernel, now it works! [17:19] gosh thank you ben_r and apw, probably not the right channel for this but I highly appreciate it! [17:19] steven: one more thing [17:19] sure? [17:20] please make sure to update your kernel to something recent, those kernels are old and there have been manditory reboots, so they are not receiving livepatches [17:20] make sure you get -extras along with it [17:20] :) [17:20] I wanted to update to 1804 but iirc its not safe to update yet [17:21] that's an even bigger jump, yes. the safest thing to do is move up to the current 16.04 kernel [17:22] at what point is the dist upgrade considred safe? wasnt it the first .1 release? [17:27] steven: as a dev I'm not really the right person to ask, all of my systems are 18.04 with latest of everything :) you'd have to consider your own needs there. [17:28] ok, did you just run dist upgrade? [17:28] I am not a total linux noob, I know my way around it (except interals/kernel stuff) [17:29] but last time I upgraded from 14.04 to 16.04 something went real bad and some person told me I did it too early (after the release ofc) [17:31] steven: well, again I am not telling you what you should do, this is my experience with it - I went from 16.04 to 18.04 beta using do-release-upgrade, and had no issues on 3 systems [17:31] it's not supposed to break ;) [17:31] but like always, make sure you have a good backup before whatever you choose to do [17:32] sure, data is stored some place else anyway [17:32] :) [17:32] ok did an update, upgrade, isntalled extra, now rebooting. [17:35] awesome, up and running! Again, thank you. appreciate the help [17:35] :) [17:36] my pleasure :) glad to help! === steven is now known as sstutz === maxb is now known as Guest52644 === rcj is now known as Guest73979 === samkottler is now known as Guest69295 === yofel is now known as Guest51009 === teward is now known as Guest49319 === lag is now known as Guest99431 === Tahvok is now known as Guest19737 === Guest49319 is now known as 17WABB7M7 === cyberzeus is now known as Guest65044 === icey is now known as Guest76648 === 17WABB7M7 is now known as teward === Guest73979 is now known as rcj [21:25] jsalisbury, can you please tell how to install -proposed manually ? i am not able to get the update through update manager [21:25] s10gopal, The steps are all on this wiki: [21:25] https://wiki.ubuntu.com/Testing/EnableProposed [21:26] jsalisbury, i have enabled them , and tried some command told on #ubuntu , but not able to get it [21:27] told by hggdh, [21:28] s10gopal, did you look at that wiki? I would just copy and paste what on there to here. Are you seeing an error when following the wiki? [21:29] jsalisbury, in software & update -> developer option -> i have tick pre realease update and running software update says system is upto date [21:30] https://paste.ubuntu.com/p/M5c6z2XNpc/ [21:30] and the log by manually running command [21:30] s10gopal, did you reboot after that? If so, what does uname -a return? [21:30] yes [21:31] 4.15.0-22-generic [21:31] I can see, in the proposed pocket, a linux-generic version 4.15.0.23.24 [21:32] s10gopal, What does your /etc/apt/sources.list look like? Does it have '-proposed' instead of '-updates'? [21:32] s10gopal, so bionic-proposed instead of bionic-updates? [21:33] s10gopal: what command, exactly, did you run to update the kernel? [21:33] http://paste.ubuntu.com/p/Jh9pk8srZF/ [21:33] hggdh, sudo apt update && sudo apt full-upgrade [21:34] also created the file /etc/apt/preferences.d/proposed-updates with this content: [21:34] and changed xenital to bionic too [21:35] s10gopal: this is quite strange. I just enabled proposed, and ran update && full-upgrade, and I DO SEE kernel 4.15.-=23 being noted as going to be installed [21:36] hggdh, how can i install it manually ? [21:36] s10gopal, you could download the kernel as a .deb directly from: https://launchpad.net/~canonical-kernel-team/+archive/ubuntu/ppa/+build/14927311 [21:36] even this is not working sudo aptitude -t bionic-proposed sudo: aptitude: command not found [21:37] ... [21:37] why are you trying to use aptitude? [21:37] it is on the link [21:38] sudo apt-get upgrade -s , says everything is upto date [21:39] s10gopal: if you are using apt-get you NEED apt-get dist-upgrade, *not* just a apt-get upgrade [21:39] if using apt (NOT apt-get) you need apt full-upgrade [21:40] I have no idea on aptitude [21:41] sudo apt update && sudo apt full-upgrade says everything is upto date [21:41] s10gopal: dpkg -l linux\* | pastebinit, and give us the link [21:42] http://paste.ubuntu.com/p/dG2bJ3X7WH/ [21:43] OK, so you do not have the proposed kernel installed. Now, please run again sudo apt update && sudo apt full-upgrade, and give us ALL the output on a pastebin [21:44] hggdh, http://paste.ubuntu.com/p/cqPN53S2Hg/ [21:44] s10gopal: this is not what I asked for [21:45] hggdh, https://paste.ubuntu.com/p/kjVmVbCjDh/ [21:47] s10gopal: OK, so what is going on is your repository mirror (in.archive.ubuntu.com) is not yet up-to-date. Nothing we can do. [21:47] s10gopal: you may try another mirror, or wait. [21:47] hggdh, how ? [21:47] how what? [21:48] hggdh, change download from india to main server ? [21:48] run software update, and select a different mirror [21:49] then you MUST apt update && apt full-upgrade again [21:50] hggdh, even http://archive.ubuntu.com/ubuntu says it is uptodate [21:51] hggdh, which server you are using ? [21:51] s10gopal: see https://paste.ubuntu.com/p/j7ZRcpKBxS/ [21:52] s10gopal: works for me on the main repository. You are certainly doing something wrong [21:52] hggdh, i am typing the command you gave me [21:52] s10gopal: then please type it as you issued it here [21:56] hggdh, https://paste.ubuntu.com/p/qnndDcSjT5/ [21:57] jsalisbury, linux-image-unsigned-4.15.0-23-generic_4.15.0-23.25_amd64.deb (7.5 MiB) , only need to install it ? [21:58] s10gopal, If you want to try that route, you should install the linux-modules, linux-modules-extra and linux-image-unsigned .deb packages. [21:59] s10gopal, you should try to get your system back to a point where you can use the GUIs at some point.