/srv/irclogs.ubuntu.com/2018/03/29/#ubuntu-kernel.txt

=== himcesjf_ is now known as him-cesjf
dijuremoTried to use 4.4.0-116 kernel on the DELL T3610 with A16 BIOS and latest microcode installed and it also hangs on first attempt to log in to GUI15:40
dijuremouname -r 15:40
dijuremo4.4.0-116-generic15:40
dijuremodpkg -l | grep microcode 15:40
dijuremoii  intel-microcode                             3.20180312.0~ubuntu16.04.1 15:40
TJ-dijuremo: do these various PCs share a common GPU maker?15:58
TJ-dijuremo: or, put another way, is this freeze GUI specific or is there a way to reproduce the freeze without the GUI starting, via console or SSH for example?15:59
dijuremoTJ: With the laptops, the GPU was the Intel one part of the CPU. The freeze is usually triggered in some occassions after X starts. In some other after login in (during login in itself) or after log out.16:07
dijuremoTJ: Most of the workstation machines have either a Geforce or Quadro card.16:07
dijuremoTJ: Most of our Geforce series are eVGA branded, no idea who makes the Quadro cards for DELL.16:08
TJ-dijuremo: if we could reduce the proof to something simple without GUI it'd be much easier to track down the cause, over SSH would even discount it being console/video related16:13
dijuremoTJ: So I do not know how to trigger the freeze from a console. Any suggestions on what to run? We started noticing when running X related things, ie lightdm or sometimes even we can survive a log in, but the machine freezes after loging out when lightdm restarts.16:14
dijuremoTJ: We first notice the issues when we applied updates, the machines would boot up and freeze at the lightdm login screen16:15
dijuremo# uptime 16:16
dijuremo 12:15:41 up 54 min,  1 user,  load average: 0.00, 0.00, 0.0016:16
dijuremoSo the machine has been up for that long at the login screen without freezing.16:16
dijuremoI am logged in to it via SSH16:16
TJ-dijuremo: maybe get it do a lot of I/O? "sudo find / >/dev/null" 16:17
dijuremoIt completed without freezing16:20
TJ-dijuremo: trouble is we don't know if that is a reasonable way to provoke the issue or not :s16:22
dijuremoSi I am running prime95, started 12:2216:22
dijuremotop - 12:23:11 up  1:02,  2 users,  load average: 9.58, 3.42, 1.2516:23
dijuremoThat should use lots of CPU and memory16:23
TJ-of those are OK, it's worth trying the same commands from a direct console. If there's an issue with the GPU side  that might trigger it16:24
dijuremoI gotta go to a meeting so will let prime95 torture the hardware for 1.5 hours... if it has not crashed, then we can confirm it is related to GPU/GUI16:24
TJ-dijuremo: good plan16:25
dijuremoFor the record, we are currently on 4.4.0-116 kernel16:27
dijuremoStill running...17:57
dijuremotop - 12:23:11 up  1:02,  2 users,  load average: 9.58, 3.42, 1.2517:57
dijuremoI guess I should try and get lightdm out of the equation, perhaps log in one of the virtual terminals and run startx and see if that freezes the machine...17:59
dijuremoOh well, machine froze. I had two ssh sessions going into it, one running prime95, the other top. I quit top, exited the ssh session and then the machine froze18:03
TJ-dijuremo: with no console activity?18:08
dijuremoprime95 was running as I disconnected my ssh session and then it froze18:13
dijuremoI had two separate ssh sessions18:14
dijuremoI did not get to try login in the console and running startx yet.18:14
dijuremoThis was as I was going to do that, I disconnected one ssh session, was going to stop prime95 and then go directly to the machine to try login in and running startx.18:15
dijuremoNow I rebooted into 4.4.13, sshd in on shell 1, started prime95, then on shell2 I tried to ssh, saw the motd and now frozen.18:17
dijuremoSo it is more prone to freezing with 4.4.13 than it is with 4.4.018:17
TJ-4.4.13 or 4.13 ?18:18
dijuremoSorry, typo more prone to crash with 4.13.0- than 4.4.0-x18:38
TJ-dijuremo: can you monitor temperatures? I'm wondering if this freeze is actually an overheat18:40
dijuremoI usually do that with gkrellm... not so sure how to do it form the command line... will have to look18:41
dijuremoPackage id 0:  +69.0°C  (high = +85.0°C, crit = +95.0°C)18:44
dijuremoAnd it just hung again....18:45
dijuremoPackage id 0:  +69.0°C  (high = +85.0°C, crit = +95.0°C) 18:45
dijuremoCore 0:        +64.0°C  (high = +85.0°C, crit = +95.0°C) 18:45
dijuremoCore 1:        +67.0°C  (high = +85.0°C, crit = +95.0°C) 18:45
dijuremoCore 2:        +65.0°C  (high = +85.0°C, crit = +95.0°C) 18:45
dijuremoCore 3:        +67.0°C  (high = +85.0°C, crit = +95.0°C) 18:45
dijuremoCore 4:        +69.0°C  (high = +85.0°C, crit = +95.0°C) 18:45
dijuremoCore 5:        +67.0°C  (high = +85.0°C, crit = +95.0°C)18:45
dijuremoI do not think it is overheating... well. Thiis is it for today, gotta run to a meeting... :)18:45

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!