[00:42] TimStarling, I added comment to the bug as well. If you can confirm 4.13 final is bad, I'll start a bisect between 4.13 and 4.14-rc1. [00:52] jsalisbury: I was worried you'd be off for the night, I see you are US east [00:53] I'm UTC+11, 11:53 am, so I can keep doing this for a while yet [00:53] TimStarling, I check in here an there after the usual hours :-) [00:54] I replied on the bug about 4.13 final [00:54] TimStarling, cool, I see it's broken, so I'll start the bisect and post the first kernel when it's built. [01:03] would you be interested in having ssh root access to it overnight my time? I'm pondering whether I can set something up for that [01:05] maybe a proper (debootstrap) install to a USB stick, then it could boot up without intervention [01:13] TimStarling, not sure I like the liability associated with having remote root access. But at any rate, I built the first test kernel and posted it. [01:18] it's slow to download, ~40KB/s, seems slower than the mainline ones, will take at least 10 minutes [01:19] supposedly I have a 28 Mbps wired connection [01:29] stupid internet [01:31] it was saying it would take another 40 minutes, so I downloaded it to my linode instance and then from there to home [01:31] which was 100x faster [01:38] tested it, reported "fixed" [02:01] next time I'm downloading it via ssh -D [10:42] * thresh dances [10:43] looks like launchpad #1742721 is fixed for me \o/ [10:43] Launchpad bug 1742721 in linux (Ubuntu Artful) "linux-image-4.13.0-26-generic / linux-image-extra-4.13.0-26-generic fail to boot" [Critical,In progress] https://launchpad.net/bugs/1742721 [11:24] thresh: I think my schools have the same issue, did you use jsalisbury's kernel from comment #18 there? [11:24] yep, as I've said in #19 :) [11:25] the issue was the framebuffer driver being broken [11:26] thresh: which graphics card is that? I see this in i5's with internal graphics... [11:26] 26:00.0 VGA compatible controller: Silicon Motion, Inc. SM750 (rev a1) [11:26] and the kernel module is sm750fb [11:27] Thank you, so it must be unrelated to my case :) [11:27] too bad! [11:27] Yeah, I'll have to do a kernel bisect too when I get again to one of those schools [11:27] Probably tomorrow [11:27] that's never fun [11:27] but at least you've got a physical access :-) [11:27] * alkisg wonders if we can program grub to boot to a specific test kernel only once... [11:28] we can, it's called grub-reboot [11:28] https://wiki.debian.org/GrubReboot [11:28] Ah cool then, I'll try that [11:28] but you have to be careful with all the submenu subtleties [11:29] OK, I think that will be the same as GRUB_DEFAULT there, so I know the syntax [11:29] yep [11:29] 0>2 for submenus etc [13:11] Hi [13:11] I have a problem with a USB printer. When I unplug it, it does not correctly unregistered from systemd. [13:12] "systemctl list-units" still shows [13:12] sys-devices-pci0000:00-0000:00:14.0-usb2-2\x2d1.device loaded active plugged Deskjet_2540_series [13:12] after unplugging. [13:12] xnox told me that it is a kernel problem. [13:13] Do we have any tools for intelligently comparing dmesg logs ignoring timestamps and slight numeric-value differences? I'm /trying/ to debug a loss of all USB devices when kexec-ing (4.13.0-25-lowlatency - but affects all versions tested so far) [13:14] tkamppeter: would/should that be handled by a udev 'remove' rule? [13:15] xnox, around? [15:12] Hi there, guys. I'm looking for some advice on a bug I've found (and reported) in the linux-firmware package, regarding a Qualcomm wi-fi controller firmware. I feel it's a rather important bug and it has a rather trivial fix (which has been in upstream for about 3 months). It's been introduced in Xenial when the 4.13 HWE kernel landed (it's also pr [15:12] esent in Artful as of this moment). I've been to #ubuntu-bugs and they recommended I should come here to ask for advice on this. The bug in question is reported in LP1743279. [15:14] bug #1743279 [15:14] bug 1743279 in linux-firmware (Ubuntu) "QCA6174 stops working on newer kernels after second group rekeying" [Undecided,Confirmed] https://launchpad.net/bugs/1743279 [15:14] sforshee, ^ [15:15] andrebrait: thanks, I saw that bug and it's on my todo list [15:17] sforshee Oh, ok. I got worried because it's kind of a big deal and I didn't know if someone had seen it or not. Let me know if I can be of any further assistance. [15:20] andrebrait: I'll try to get a test package done "soon" (hopefully today) and post it to the bug so you can confirm that it works [15:21] sforshee If that helps, can I create the package or follow some instructions as per http://packaging.ubuntu.com/html/fixing-a-bug.html [15:22] andrebrait: thanks, but that's not necessary [15:24] sforshee Ok then. Thanks a lot. :) [16:24] TJ-, xnox, USB unplug problem solved, was not kernel, as I upgraded to Bionic, not even rebooted into the new kernel, and problem went away, so was probably UDEV or systemd. [16:25] tkamppeter: ahhh, corner-case then :) I wish my issue was so easily solved :D [16:34] Yesterday I encountered an odd issue when I tried to plug my Pixel XL into a USB3 ExpressCard in a laptop. It initally seemed to work but as soon as I tried to put it in file transfer mode, file transfer didn't work and I got "usb 9-1: can't set config #1, error -28" and a bunch of "usb 9-1: Not enough bandwidth for new device state." in dmesg. [16:34] I'm not sure if this is a bug or just something with my setup, which is why I decided to ask here first. [16:38] mamarley: which controller took the port? xhci or ehci ? [17:06] TJ-: It is an add-in USB3 card that only has an XHCI controller. [17:09] A "Renesas Technology Corp. uPD720202 USB 3.0 Host Controller" to be specific. [17:17] mamarley: but the controller supports high/full/low-speed devices too, and that should mean the kernel's xhci module *may* hand over to another module if the port doesn't show as SS - does dmesg indicate if the newly discovered device us superspeed or high speed? [17:52] tkamppeter, or probably dbus [17:52] as one cannot restart that one on upgrade [17:53] tkamppeter, so.... all of your udev rules and jobs work fine now, and you have no more bugs open against systemd? [17:56] xnox, my current udev rules and jobs work, as s-c-p upstream has worked around the quote/unquote problem, the quote/unquote problem, bug 1721839, still persists, only it is not release-critical any more. [17:56] bug 1721839 in systemd (Ubuntu) "[REGRESSION] Services asked for by UDEV do not get triggered" [Critical,Confirmed] https://launchpad.net/bugs/1721839 [17:57] xnox, unplugging problem, which affected both ippusbxd and system-config-printer services is fixed, ippusbxd was never affected by bug 1721839, only system-config-printer, but newest s-c-p upstream works around the problem. [17:57] bug 1721839 in systemd (Ubuntu) "[REGRESSION] Services asked for by UDEV do not get triggered" [Critical,Confirmed] https://launchpad.net/bugs/1721839 [17:58] xnox, why could the unplugging problem be caused by dbus? [18:10] TJ-: It shows up as a SuperSpeed device. [18:24] mamarley: does "lsusb -v -d VVVV:PPPP" show anything strange in the EndPoint Descriptor's wMaxPacketSize ? [18:25] TJ-: That command outputs nothing. [18:26] mamarley: VVVV:PPPP is the vendor:product IDs :D [18:26] d'oh [18:27] mamarley: the xhci.c source indicates the 2 bandwidth errors generated when trying to configure endpoints, so there may be some clue in that info, although I'm not sure what would cout as 'bad' :) [18:30] TJ-: It says "wMaxPacketSize 0x0400 1x 1024 bytes" 5 times for different interface descriptors. [18:31] mamarley: I don't have a similar device here to compare against [18:32] I'm pretty sure it has something to do with that one controller because if I plug the same phone into my desktop or my other laptop (both of which have Intel XHCI controllers) it works properly. [18:37] mamarley: it would make sense. I've just been comparing outputs here. Using "sudo lsusb -v -d VVVV:PPPP" with the 2 root hubs exposed by the controller ("Linux Foundation X.0 root hub") - maybe that will provide some clue [18:41] TJ-: https://paste.ubuntu.com/26405967/ [18:46] mamarley: so "0000:0d:00.0" is the PCI device ? [18:47] TJ-: Wait, I somehow screwed that up and posted the output of one of the EHCI USB2 root hubs. The USB2 root hub from the XHCI controller is actually https://paste.ubuntu.com/26405994/. [18:48] Not that it matters since the device is on the USB3 root hub anyway. [18:48] And yes, that is the PCI device. [18:49] mamarley: when the Pixel XL is connected is it listed by lsusb ? if so can you pastebin "sudo lsusb -v -d ..." for it? [18:51] mamarley: also, does this happen after a suspend/resume or (also) on a cold/fresh boot ? [18:52] TJ-: It is indeed: https://paste.ubuntu.com/26406010/ (I don't know why it says it is a Nexus 4; it is most definitely a Pixel XL.) [18:52] It happens starting the first time I plug in the phone after booting the PC. [18:53] mamarley: is that "lsusb" when the Pixel is in file-transfer mode? if not, can you put it in that mode and redo the lsusb ? [18:53] mamarley: OK, so not an ACPI/device-reset issue [18:53] TJ-: Yes, I put it in file transfer mode before capturing the data. [18:54] I also tried turning off ADB to see if that had any effect, but it did not. [18:55] mamarley: you notice the device shows 3 endpoints for the MTP interface? I wonder if there should be an "EP 2 OUT" to match the "EP 2 IN" - any chance you can check that on another PC (and capture the lsusb output for that too so we can compare) ? [18:57] mamarley: I also notice EP 2 IN has bMaxBurst == 0 which looks strange [18:57] TJ-: OK, here it is plugged directly into the USB-C port on my desktop: https://paste.ubuntu.com/26406045/ [18:58] mamarley: OK, so the value is the same but ... [19:03] ... the laptop is getting the bandwidth error, so something different is causing the driver to miscalculate the total [19:05] I have found a few similar reports on various bug trackers, but those all seem to happen when multiple devices are plugged in and can be worked around by unplugging other devices. In my case, the phone is the only device plugged in to that controller. [19:06] mamarley: yes, I saw the RH report with no conclusion has been ongoing for a while [19:13] mamarley: which kernel version is it? [19:14] TJ-: 4.15-rc8 from the mainline builds. [19:14] mamarley: I notice your error -28 and in drivers/usb/host/xhci.h I see "#define COMP_STOPPED_SHORT_PACKET 28" [19:15] mamarley: does the same error occur for all (Ubuntu) kernel versions ? [19:18] I haven't tried it on any other kernels yet. It looks like that define wasn't introduced until 4.11, so I will try 4.10.x and see what it does. [19:23] mamarley: No, it just changed from COMP_SHORT_TX like many others [19:23] Ah, OK. [19:29] It looks like it would be printing a different warning if it hit that condition anyway. [20:11] TJ-: After painstaking tracing through lots of kernel code, it looks like my error is probably coming from drivers/usb/host/xhci.c:1787 (that is the only place in the XHCI driver that returns -ENOSPC, which is -28). [20:17] But, besides the definition and a user-friendly string converter, that is literally the only usage of COMP_BANDWIDTH_ERROR and COMP_SECONDARY_BANDWIDTH_ERROR in the entire kernel. [20:21] It looks like that value is coming back from the controller itself, so the controller must be bugged. [20:47] mamarley: I wonder if commit da997066894817 is responsible? Did you test on the early kernels? [20:51] TJ-: That's not even my controller (mine is 0x0015). [20:58] mamarley: your device /is/ already there using the same quirk. Can you pastebin the dmesg? Looks like it should also contain "QUIRK: Resetting on resume" [20:59] TJ-: https://paste.ubuntu.com/26406653/ [21:03] And I just tried kernel 4.4.0, it suffers from the same problem. [21:09] mamarley: do you have another ExpressCard PC to test the controller in? (wondering if the PCI/e config is responsible [21:11] TJ-: I was wondering that myself, but sadly this is an ExpressCard/54 card and my other laptop only has an ExpressCard/34 slot. [21:12] mamarley: looking at dmesg indicates xhci is applying XHCI_SPURIOUS_SUCCESS and XHCI_RESET_ON_RESUME quirks: "hcc params 0x014051cf hci version 0x100 quirks 0x00000090" [21:12] mamarley: shame. I've got 6 here! [21:13] mamarley: any other USB 3.x devices you can test besides the Pixel ? [21:14] No, sadly. Only a bunch of USB2 devices. [21:14] mamarley: any USB3 hub you could interpose between Expresscard and Pixel? [21:14] Nope, I don't even have any USB2 hubs. [21:15] mamarley: ooo, that might be a useful test, a USB2 hub to force the port to use USB2 [21:15] I could replicate that by only partially plugging in the USB cord, not allowing the extra USB3 contacts to touch. [21:16] mamarley: :D now there's the sign of proper hacker :p [21:16] mamarley: not the chipset, but what is the make/model of the ExpressCard - I might be able to find one [21:18] TJ-: In USB2 mode, it works fine. The card is a StarTech ECUSB3S254F. [21:19] * mamarley wonders if this might be because this old laptop only has 2.5gT/s PCIE and not 5.0gT/s. [21:20] mamarley: that's a very good point [21:23] mamarley: does "sudo lspci -vvvnnk -d xxxx:yyyy" confirm that via LinkSta:~Speed and LinkCtl2:~Target Link Speed? [21:28] TJ-: "LnkSta: Speed 2.5GT/s", "LnkCtl2: Target Link Speed: 5GT/s" [21:33] mamarley: Yes, that's from the COMP_BANDWIDTH_ERROR I was looking at originally [21:33] mamarley: Ignore ^^^^ I was scrolled back up in the history :D [21:34] mamarley: right, so it could be the xhci_reserve_bandwidth() ought to be considering the upstream PCI port for determining the maximum, not just what the USB3 controller reports [21:35] It looks to me like the error is coming straight from the controller itself, not from the driver. [21:37] The only way it can ever return -28 is because of the COMP_BANDWIDTH_ERROR or COMP_SECONDARY_BANDWIDTH error directly from the controller. [21:41] mamarley: I wonder if the XHCI_SW_BW_CHECKING quirk might help [21:41] * mamarley busts out the compiler. [21:52] Oh wait, apparently the module has a "quirks" command line option. [21:59] mamarley: mamarley yes... I think it needs quirks=0x20 ? [21:59] Yeah, I was trying to find documentation on how to use that argument. [21:59] presumably command-line since module is built-in, xhci_pci.quirks-0x20 ?? [22:00] mamarley: presumably command-line since module is built-in, xhci_pci.quirks=0x20 (since the flags are only additive - it won't remove deefault quirks) [22:05] TJ-: How are you getting 0x20? [22:07] jsalisbury: I can test another bisection kernel in about an hour from now if you compile one before then [22:08] mamarley: by reading the wrong option! I was remembering the << 4 option but it's "#define XHCI_SW_BW_CHECKING (1 << 8)" here so 0x0100 :) [22:08] Yeah, I was going to say my C is a bit rusty, but it seems like 1<<8 ought to be 0b100000000 or 0x100. [22:08] TimStarling, ack. There is one posted in comment #17 now. [22:09] mamarley: hehehe testing you :p [22:09] thanks [22:11] mamarley: it'd be good to open a bug for this bandwidth issue if only to track all the diagnosis and testing we're doing [22:12] Hmm, that didn't take, the quirks are still 0x00000090. [22:12] mamarley: boot-time cmdline entry ? [22:12] It is. [22:13] Ah, the quirks need to be applied to xhci_hcd, not xhci_pci. [22:14] But, I still get exactly the same error. :( [22:15] mamarley: ahhh, of course [22:16] mamarley: right, because - if our hypothesis is correct - the driver currently doesn't consider bandwidth of upstream devices. The thing is, if that is the cause, and the current error is from the controller, does it 'know' it's on a PCIe x1 2.5TG/s link and that's why [22:19] Yes, looking at the code it seems it would have bombed with -ENOMEM if the driver didn't think there was enough bandwidth before it even got to the part where it tries to send the command to the controller. [22:30] mamarley: presumably then the Vostro only has a PCIe v1.1 hub [22:30] I already knew that. [22:32] mamarley: I've not seen the lspci report for it [22:36] Apparently these chips have updatable firmware, but (of course) one needs Microsoft® Windows® in order to update the firmware or even check the current version… [22:45] mamarley: the Renesas or the Intel PCI bridge? :) [22:47] The USB controller. [22:51] mamarley: Renesas aren't very helpful in documenting errata, and looks like all their docs are behind a registration wall [22:58] mamarley: looks like the Intel site has the most useful info and updater https://downloadcenter.intel.com/download/22775/Renesas-Electronics-USB-3-0-Firmware-Updates [23:01] TJ-: I figured out how to execute the FW updater, using a sketchy WinPE disk I downloaded from somewhere, a sketchy DLL file I downloaded, and typing "ls" instead of "dir" entirely too many times. Now I just need to find which sketchy website has the latest FW available. [23:02] mamarley: the link I gave you has 2, one for 4 rear ports and 1 for front 2 ports - I'd suspect the 2-port version might be what you need if the fwupdater is generic enough (from Renesas rather than specific to Intel mobos) [23:02] Yeah, but that is an old FW version. I can see at least 2.0.2.6 elsewhere. [23:02] TJ-: Regardless of how this turns out, I definitely owe you a drink of your choosing, if I ever meet you in real life. Thanks! [23:03] mamarley: can you trust the version number? I'd trust Intel over some no-name at least [23:06] jjohansen: any update on the artful/bionic stacking apparmor bug? [23:13] hallyn: yeah, I have a patch for it, but I haven't tested it yet [23:14] mamarley: you might find this very useful if you haven't already seen it. http://pete.akeo.ie/2011/10/flashing-necrenesas-usb-30.html [23:14] I got it flashed, but it didn't do any good. :( [23:14] hallyn: are you specifically looking for a xenial kernel? [23:15] mamarley: the FW fixes I've seen seem to be solving a suspend/resume lock-up/fall-off-the-bus issue [23:15] That's what I saw too, but it was worth a shot. [23:16] err no not xenial, bionic [23:16] jjohansen: no, actually an artful kernel :) [23:16] sorry xenial was where it was working [23:16] * hallyn tries to remember which file he had to edit as a workaround [23:17] hallyn: ack, I'll kick a build off, [23:17] /lib/apparmor/profile-load [23:23] jjohansen: yeah, found it in my logs too :) thanks - will kernels actually build right now? i thought intelpocolypse prevented builds? [23:23] hallyn: it has [23:23] I am giving it a go [23:24] I think most of that has been cleaned up now [23:27] hallyn: I'll ping you in 30 min or so when I know if the build succeeded or failed [23:35] thanks :) *hopefull* i'll be out for dinner :) [23:38] mamarley: not sure if this will help but it sure is interesting! https://lkml.org/lkml/2016/6/23/632 [23:40] mamarley: and the author's related firmware repo: https://github.com/chunkeey/renesas-fw [23:46] mamarley: fw-uploader is on patchwork: https://patchwork.kernel.org/patch/9195761/ [23:50] hallyn: I haven't tested it yet, but you can grab it from http://people.canonical.com/~jj/nsname_fix/