[11:41] <steven> morning guys, my NIC's stopped working. over the weekend and I managed to verify that its not an HW issue. now since the box was up and running just fine for the last couple years (ubuntu 16.04) and I didnt update anything manually the only idea I have is that a kernel patch broke it
[11:42] <steven> I am using canonicals live patch, so was there a new patch over the weekend or is there any issue that could explain the issue with a recent patch?
[13:04] <apw> ben_r, ^
[13:04] <apw> steven, there should be logging somewhere to say when if any patches were applied
[13:05] <ben_r> apw, I didn't see the original message?
[13:05] <steven> "somewhere"?
[13:05] <apw> ben_r, steven has a machine which was up for a very long time, he believes his nic stopped working and that this is not h/w
[13:05] <steven> somewhere in a web UI or on the server?
[13:06] <apw> ben_r, and was asking if a livepatch was released over the weekend
[13:07] <apw> i assume we log application locally somewhere ...
[13:07] <apw> as i am running cosmic i am not using it
[13:07] <ben_r> steven: if you're looking for if a livepatch is applied, there would be logs in /var/log/syslog, or dmesg if this was recent enough, and if the system is accessible, you can look in /sys/kernel/livepatch for entries there
[13:08] <ben_r> ahhhhh, ok
[13:08] <steven> I booted into live systems and those saw the NIC's so its definitely not a HW failure
[13:08] <steven> I have a (very shitty) serial console 
[13:08] <ben_r> over the weekend no, there was one on monday, the bionic patch did include a wifi fix
[13:09] <steven> sounds about rightd
[13:09] <steven> well depending on the TZ that is, but I was using the server on sunday and monday it stopped working
[13:09] <steven> so somewhere between sun/mon somehing must have happened
[13:10] <steven> ben_r: I dont ahve /var/kernel/syslog
[13:10] <apw> /var/log/syslog
[13:10] <ben_r> it's /var/log/syslog
[13:11] <steven> ah ok
[13:11] <steven> so lemme check
[13:13] <steven> anything I can grep for?
[13:13] <ben_r> steven: grep -i livepatch will turn up what time a patch was loaded
[13:16] <ben_r> steven: for the record the wifi fix was in mac80211_hwsim, to plug a memory leak...
[13:18] <apw> ben_r, that doesn't sound like something one would be using on a server nic
[13:18] <steven> ok so livepatch did something on may 21st
[13:18] <apw> steven, does the nic claim to be up in system config, if you ask it to cycle for instance
[13:20] <steven> ok so if I lspci it seems to be listed, broadcom corporation NetXtreme BCM 5720 ghigabit ethernet PCIe
[13:20] <steven> apw: not quite sure what yuou mean 
[13:21] <apw> steven, was asking if the nic still claimed to have an address etc
[13:21] <steven> claimed it where?
[13:22] <steven> the system doesnt see the NIC :D
[13:23] <steven> from ubuntu/network configuration perspective it doesn't exist apw ben_r, I can't configure it, ip link show/ifconfig dont list it. its gone.
[13:23] <steven> lspci seems to list the hardware at least
[13:23] <apw> steven, has this system been rebooted since or did it just dissappear with the system up
[13:24] <apw> i guess we ought to see it going away in syslog if it has not rebooted
[13:24] <steven> yes ofc, had to verify the HW was still available. by booting a live system
[13:30] <apw> so can you see anything in syslog regarding the livepatch being applied and between that and the reboot regarding the nic ?
[13:31] <apw> when you rebooted, did you get the kernl base or are you upgraded to the main kernel with the fixes applied
[13:32] <steven> a sec, going thru the log right now looking for the  moment it threw errors
[13:38] <steven> apw: how can I check that?
[13:38] <steven> I only have one entry
[13:38] <steven> in grub that is
[13:40] <steven> so there is something I guess could be of interest, on may 21st, some modules load log "inserted module iscsi_tcp" and ib_iser
[13:40] <steven> and then I get hundred of lines of log for that very same second and the next thing u know it cannot bring up the eno1 
[13:40] <steven> apw: ben_r ^
[13:41] <ben_r> steven: can you pastebin the messages?
[13:42] <steven> I actually can't, will a sceenshot do? (its a real shit serial console
[13:42] <ben_r> steven: sure, just so we can read it, preferably the beginning where the errors start
[13:43] <steven> gimme a couple minutes, I host images on the server. need to upload stuff on dropbox instead :D
[13:48] <steven> https://www.dropbox.com/s/zlaqmcnffj5gg30/Screenshot%20at%2015-44-27.png?dl=0
[13:48] <steven> ttps://www.dropbox.com/s/6e3z2m2wt6a85ej/Screenshot%20at%2015-44-43.png?dl=0
[13:49] <steven> second 47, now hundred of systemd and kernel lines 
[13:49] <steven> and eventually
[13:49] <steven> ttps://www.dropbox.com/s/ion6uoxgvqbma9e/Screenshot%20at%2015-47-05.png?dl=0
[13:49] <steven> and thats it ben_r 
[13:49] <steven> now its just a bunch of network error logs cos it cannot find the eno1 
[14:00] <ben_r> steven: I don't see anything odd here kernel-wise. You said it could see the device in lspci right? any chance this server has a BMC in it with a shared network port? eno1 would be the builtin port, so the BMC might have stolen it
[14:01] <steven> I don't know what a BMC is
[14:02] <steven> and lspci lists it, yes. 
[14:02] <ben_r> steven: it's the controller for the server, lets you remote power on/off, probably manages that serial console too
[14:03] <steven> oh its an HP server, it has this iLo which is a thing on its own
[14:03] <steven> unrelated to the actual nic's
[14:03] <ben_r> hmm
[14:04] <steven> its really dumb, I simply dont know where to look or what to present to figure out whether this indeed is a kernel issue or not.
[14:04] <steven> it would make sense tho since it stopped working right around the time the live patch was released
[14:05] <steven> and every other os has no issue with it
[14:08] <ben_r> steven: the next thing I'd do is grep on eno1, make sure that the driver you're using grabbed the device and nothing else renamed it
[14:09] <ben_r> if you can reboot it again, right after you boot a 'dmesg | grep eno1' might turn something else up too
[14:09] <steven> ah right, bout that
[14:09] <steven> the serial console for some reason won;t accept |
[14:09] <steven> so I can't even pipe.
[14:10] <ben_r> oh wow
[14:10] <steven> I know, ok so Imma have to leave the house right now
[14:10] <steven> I appreciate your help, do you mind if I ping u later again?
[14:10] <ben_r> sure, feel free
[14:11] <steven> thank you! ttyl.
[16:37] <steven> ben_r: I am back for a bit, you happen to be around?
[16:41] <ben_r> steven: yep
[16:42] <steven> nice! so like. I am not a kernel guy and don't really now much about driver stuff in general, but if I know lspci lists the HW is there a way to see what happens in the kernel
[16:42] <steven> like whether it loads a driver or errors out?
[16:42] <steven> and that without using a pipe :D
[16:44] <ben_r> steven: are you using the driver provided by ubuntu or broadcom's driver?
[16:44] <steven> all stock ubunti
[16:44] <ben_r> I *think* that card uses tg3
[16:44] <steven> I didn't install/compile anything nor used a custom ppa
[16:45] <ben_r> so lspci -v should show a "kernel driver in use" line for each device
[16:46] <ben_r> give that a try first...
[16:47] <steven> https://paste.ubuntu.com/p/gZT4s23Kmy/
[16:48] <steven> it even says NIC 2
[16:48] <ben_r> try it again with sudo, but I'm pretty sure you can see drive in use without that
[16:48] <ben_r> *driver
[16:52] <steven> dammit
[16:52] <steven> console is not big enough
[16:53] <ben_r> try 'sudo lspci -v -s 03:00.1', that should do just your card
[16:54] <steven> https://paste.ubuntu.com/p/4MtzgYTfqY/ ben_r 
[16:55] <ben_r> steven: yeah, I don't see a driver...
[16:55] <ben_r> hmm
[16:55] <steven> https://paste.ubuntu.com/p/YCkBgbFdRP/ ip link show, just to show that it really doesnt show up
[16:57] <ben_r> yeah, no surprise there, no driver == no device :) you could try to load the driver with 'sudo modprobe -v tg3'
[16:58] <steven> ah look at this
[16:58] <steven> � modprobe -v tg3                                                               
[16:58] <steven> modprobe: FATAL: Module tg3 not found in directory /lib/modules/4.4.0-81-generic
[16:58] <apw> steven, do you have -extra installed ?
[17:00] <steven> whats the pkg name?
[17:00] <apw> linux-image-extra-4.4.0-81-generic i assume
[17:01] <steven> not installed
[17:02] <steven> but I didnt install nor remove anything 
[17:04] <steven> ok so I do have the tg3.ko file in a folder for 4.4.0-66
[17:04] <ben_r> try the modprobe again
[17:04] <steven> I mean I could just cp it into the currents kernel folder and hope for the best?
[17:05] <steven> still, fatal. module not found]
[17:05] <ben_r> after you install extras it should appear in the right place
[17:05] <steven> I can;t install anything ;D
[17:05] <ben_r> well, the 4.4.0-66 one won't work
[17:06] <ben_r> presumably you have 4.4.0-66 kernel installed?
[17:07] <ben_r> if you can reboot into that kernel, the driver should load and you'll get the network back. 
[17:07] <steven> ok so.. can I instruct grub to load a specific kernel?
[17:08] <steven> I never actually told a system what kernel to boot]
[17:08] <steven> but I do have the 66 installed
[17:08] <steven> still
[17:09] <ben_r> I think the iLO gives you a boot console right? if you hold shift during boot GRUB should pop a menu up and let you pick the older kernel
[17:11] <steven> sure. lemme reboot it
[17:12] <ben_r> you want the second entry, "Advanced options for Ubuntu" then pick the 66 kernel from there
[17:12] <steven> press or hold?
[17:12] <ben_r> hold it while it boots
[17:12] <steven> it takes a couple minutes cos it does some hp init stuff before it actually loads grub
[17:12]  * ben_r nods
[17:16] <steven> so what happended? the driver got removed from the kernel and was moved into extra and the live patch upated the kernel?
[17:18] <apw> steven, livepatch doesn't change the installed kernel on disk, if the kernl didn't have extras there it was not installed
[17:19] <steven> booted to the previous kernel, now it works!
[17:19] <steven> gosh thank you ben_r and apw, probably not the right channel for this but I highly appreciate it!
[17:19] <ben_r> steven: one more thing
[17:19] <steven> sure?
[17:20] <ben_r> please make sure to update your kernel to something recent, those kernels are old and there have been manditory reboots, so they are not receiving livepatches
[17:20] <ben_r> make sure you get -extras along with it
[17:20] <ben_r> :)
[17:20] <steven> I wanted to update to 1804 but iirc its not safe to update yet
[17:21] <ben_r> that's an even bigger jump, yes. the safest thing to do is move up to the current 16.04 kernel
[17:22] <steven> at what point is the dist upgrade considred safe? wasnt it the first .1 release?
[17:27] <ben_r> steven: as a dev I'm not really the right person to ask, all of my systems are 18.04 with latest of everything :) you'd have to consider your own needs there. 
[17:28] <steven> ok, did you just run dist upgrade?
[17:28] <steven> I am not a total linux noob, I know my way around it (except interals/kernel stuff)
[17:29] <steven> but last time I upgraded from 14.04 to 16.04 something went real bad and some person told me I did it too early (after the release ofc)
[17:31] <ben_r> steven: well, again I am not telling you what you should do, this is my experience with it - I went from 16.04 to 18.04 beta using do-release-upgrade, and had no issues on 3 systems
[17:31] <ben_r> it's not supposed to break ;)
[17:31] <ben_r> but like always, make sure you have a good backup before whatever you choose to do
[17:32] <steven> sure, data is stored some place else anyway
[17:32] <ben_r> :)
[17:32] <steven> ok did an update, upgrade, isntalled extra, now rebooting.
[17:35] <steven> awesome, up and running! Again, thank you. appreciate the help
[17:35] <ben_r> :)
[17:36] <ben_r> my pleasure :) glad to help!
[21:25] <s10gopal> jsalisbury, can you please tell how to install -proposed manually ? i am not able to get the update through update manager
[21:25] <jsalisbury> s10gopal, The steps are all on this wiki:
[21:25] <jsalisbury>  https://wiki.ubuntu.com/Testing/EnableProposed
[21:26] <s10gopal> jsalisbury, i have enabled them , and tried some command told on #ubuntu , but not able to get it
[21:27] <s10gopal> told by hggdh, 
[21:28] <jsalisbury> s10gopal, did you look at that wiki?  I would just copy and paste what on there to here.  Are you seeing an error when following the wiki?
[21:29] <s10gopal> jsalisbury, in software & update -> developer option -> i have tick pre realease update and running software update says system is upto date
[21:30] <s10gopal>  https://paste.ubuntu.com/p/M5c6z2XNpc/
[21:30] <s10gopal> and the log by manually running command 
[21:30] <jsalisbury> s10gopal, did you reboot after that?  If so, what does uname -a return?
[21:30] <s10gopal> yes
[21:31] <s10gopal> 4.15.0-22-generic
[21:31] <hggdh> I can see, in the proposed pocket, a linux-generic version 4.15.0.23.24
[21:32] <jsalisbury> s10gopal, What does your /etc/apt/sources.list look like?  Does it have '-proposed' instead of '-updates'?
[21:32] <jsalisbury> s10gopal, so bionic-proposed instead of bionic-updates?
[21:33] <hggdh> s10gopal: what command, exactly, did you run to update the kernel?
[21:33] <s10gopal> http://paste.ubuntu.com/p/Jh9pk8srZF/
[21:33] <s10gopal> hggdh, sudo apt update && sudo apt full-upgrade
[21:34] <s10gopal> also created the file /etc/apt/preferences.d/proposed-updates with this content:
[21:34] <s10gopal> and changed xenital to bionic too
[21:35] <hggdh> s10gopal: this is quite strange. I just enabled proposed, and ran update && full-upgrade, and I DO SEE kernel 4.15.-=23 being noted as going to be installed
[21:36] <s10gopal> hggdh, how can i install it manually ?
[21:36] <jsalisbury> s10gopal, you could download the kernel as a .deb directly from: https://launchpad.net/~canonical-kernel-team/+archive/ubuntu/ppa/+build/14927311
[21:36] <s10gopal> even this is not working sudo aptitude -t bionic-proposed   sudo: aptitude: command not found
[21:37] <hggdh> ...
[21:37] <hggdh> why are you trying to use aptitude?
[21:37] <s10gopal> it is on the link
[21:38] <s10gopal> sudo apt-get upgrade -s , says everything is upto date
[21:39] <hggdh> s10gopal: if you are using apt-get you NEED apt-get dist-upgrade, *not* just a apt-get upgrade
[21:39] <hggdh> if using apt (NOT apt-get) you need apt full-upgrade
[21:40] <hggdh> I have no idea on aptitude
[21:41] <s10gopal> sudo apt update && sudo apt full-upgrade says everything is upto date
[21:41] <hggdh> s10gopal: dpkg -l linux\* | pastebinit, and give us the link
[21:42] <s10gopal> http://paste.ubuntu.com/p/dG2bJ3X7WH/
[21:43] <hggdh> OK, so you do not have the proposed kernel installed. Now, please run again sudo apt update && sudo apt full-upgrade, and give us ALL the output on a pastebin
[21:44] <s10gopal> hggdh, http://paste.ubuntu.com/p/cqPN53S2Hg/
[21:44] <hggdh> s10gopal: this is not what I asked for
[21:45] <s10gopal> hggdh, https://paste.ubuntu.com/p/kjVmVbCjDh/
[21:47] <hggdh> s10gopal: OK, so what is going on is your repository mirror (in.archive.ubuntu.com) is not yet up-to-date. Nothing we can do.
[21:47] <hggdh> s10gopal: you may try another mirror, or wait.
[21:47] <s10gopal> hggdh, how ?
[21:47] <hggdh> how what?
[21:48] <s10gopal> hggdh, change download from india to main server ?
[21:48] <hggdh> run software update, and select a different mirror
[21:49] <hggdh> then you MUST apt update && apt full-upgrade again
[21:50] <s10gopal> hggdh, even http://archive.ubuntu.com/ubuntu says it is uptodate 
[21:51] <s10gopal> hggdh, which server you are using ?
[21:51] <hggdh> s10gopal: see https://paste.ubuntu.com/p/j7ZRcpKBxS/
[21:52] <hggdh> s10gopal: works for me on the main repository. You are certainly doing something wrong
[21:52] <s10gopal> hggdh, i am typing the command you gave me
[21:52] <hggdh> s10gopal: then please type it as you issued it here
[21:56] <s10gopal> hggdh, https://paste.ubuntu.com/p/qnndDcSjT5/
[21:57] <s10gopal> jsalisbury, linux-image-unsigned-4.15.0-23-generic_4.15.0-23.25_amd64.deb (7.5 MiB) , only need to install it ?
[21:58] <jsalisbury> s10gopal, If you want to try that route, you should install the linux-modules, linux-modules-extra and linux-image-unsigned .deb packages.
[21:59] <jsalisbury> s10gopal, you should try to get your system back to a point where you can use the GUIs at some point.