[15:40] <dijuremo> Tried to use 4.4.0-116 kernel on the DELL T3610 with A16 BIOS and latest microcode installed and it also hangs on first attempt to log in to GUI
[15:40] <dijuremo> uname -r 
[15:40] <dijuremo> 4.4.0-116-generic
[15:40] <dijuremo> dpkg -l | grep microcode 
[15:40] <dijuremo> ii  intel-microcode                             3.20180312.0~ubuntu16.04.1 
[15:58] <TJ-> dijuremo: do these various PCs share a common GPU maker?
[15:59] <TJ-> dijuremo: or, put another way, is this freeze GUI specific or is there a way to reproduce the freeze without the GUI starting, via console or SSH for example?
[16:07] <dijuremo> TJ: With the laptops, the GPU was the Intel one part of the CPU. The freeze is usually triggered in some occassions after X starts. In some other after login in (during login in itself) or after log out.
[16:07] <dijuremo> TJ: Most of the workstation machines have either a Geforce or Quadro card.
[16:08] <dijuremo> TJ: Most of our Geforce series are eVGA branded, no idea who makes the Quadro cards for DELL.
[16:13] <TJ-> dijuremo: if we could reduce the proof to something simple without GUI it'd be much easier to track down the cause, over SSH would even discount it being console/video related
[16:14] <dijuremo> TJ: So I do not know how to trigger the freeze from a console. Any suggestions on what to run? We started noticing when running X related things, ie lightdm or sometimes even we can survive a log in, but the machine freezes after loging out when lightdm restarts.
[16:15] <dijuremo> TJ: We first notice the issues when we applied updates, the machines would boot up and freeze at the lightdm login screen
[16:16] <dijuremo> # uptime 
[16:16] <dijuremo>  12:15:41 up 54 min,  1 user,  load average: 0.00, 0.00, 0.00
[16:16] <dijuremo> So the machine has been up for that long at the login screen without freezing.
[16:16] <dijuremo> I am logged in to it via SSH
[16:17] <TJ-> dijuremo: maybe get it do a lot of I/O? "sudo find / >/dev/null" 
[16:20] <dijuremo> It completed without freezing
[16:22] <TJ-> dijuremo: trouble is we don't know if that is a reasonable way to provoke the issue or not :s
[16:22] <dijuremo> Si I am running prime95, started 12:22
[16:23] <dijuremo> top - 12:23:11 up  1:02,  2 users,  load average: 9.58, 3.42, 1.25
[16:23] <dijuremo> That should use lots of CPU and memory
[16:24] <TJ-> of those are OK, it's worth trying the same commands from a direct console. If there's an issue with the GPU side  that might trigger it
[16:24] <dijuremo> I gotta go to a meeting so will let prime95 torture the hardware for 1.5 hours... if it has not crashed, then we can confirm it is related to GPU/GUI
[16:25] <TJ-> dijuremo: good plan
[16:27] <dijuremo> For the record, we are currently on 4.4.0-116 kernel
[17:57] <dijuremo> Still running...
[17:57] <dijuremo> top - 12:23:11 up  1:02,  2 users,  load average: 9.58, 3.42, 1.25
[17:59] <dijuremo> I guess I should try and get lightdm out of the equation, perhaps log in one of the virtual terminals and run startx and see if that freezes the machine...
[18:03] <dijuremo> Oh well, machine froze. I had two ssh sessions going into it, one running prime95, the other top. I quit top, exited the ssh session and then the machine froze
[18:08] <TJ-> dijuremo: with no console activity?
[18:13] <dijuremo> prime95 was running as I disconnected my ssh session and then it froze
[18:14] <dijuremo> I had two separate ssh sessions
[18:14] <dijuremo> I did not get to try login in the console and running startx yet.
[18:15] <dijuremo> This was as I was going to do that, I disconnected one ssh session, was going to stop prime95 and then go directly to the machine to try login in and running startx.
[18:17] <dijuremo> Now I rebooted into 4.4.13, sshd in on shell 1, started prime95, then on shell2 I tried to ssh, saw the motd and now frozen.
[18:17] <dijuremo> So it is more prone to freezing with 4.4.13 than it is with 4.4.0
[18:18] <TJ-> 4.4.13 or 4.13 ?
[18:38] <dijuremo> Sorry, typo more prone to crash with 4.13.0- than 4.4.0-x
[18:40] <TJ-> dijuremo: can you monitor temperatures? I'm wondering if this freeze is actually an overheat
[18:41] <dijuremo> I usually do that with gkrellm... not so sure how to do it form the command line... will have to look
[18:44] <dijuremo> Package id 0:  +69.0°C  (high = +85.0°C, crit = +95.0°C)
[18:45] <dijuremo> And it just hung again....
[18:45] <dijuremo> Package id 0:  +69.0°C  (high = +85.0°C, crit = +95.0°C) 
[18:45] <dijuremo> Core 0:        +64.0°C  (high = +85.0°C, crit = +95.0°C) 
[18:45] <dijuremo> Core 1:        +67.0°C  (high = +85.0°C, crit = +95.0°C) 
[18:45] <dijuremo> Core 2:        +65.0°C  (high = +85.0°C, crit = +95.0°C) 
[18:45] <dijuremo> Core 3:        +67.0°C  (high = +85.0°C, crit = +95.0°C) 
[18:45] <dijuremo> Core 4:        +69.0°C  (high = +85.0°C, crit = +95.0°C) 
[18:45] <dijuremo> Core 5:        +67.0°C  (high = +85.0°C, crit = +95.0°C)
[18:45] <dijuremo> I do not think it is overheating... well. Thiis is it for today, gotta run to a meeting... :)