[00:58] Question. The nouveau driver in Ubuntu 14.04 kernel (3.11.0-20) doesn't work on my system but works fine in the upstream kernel from a later version (3.14.1-031401). Should I report that as a bug for Ubuntu 14.04? [00:59] oops, wrong info. It's broken in 3.13.0-24 (which is Ubuntu 14.04). Works in 3.11.0-20 (from Ubuntu 13.10) and in upstream Ubuntu build 3.14.1-031401 [01:03] Having those upstream packages are really useful for debugging === sz0 is now known as sz0` === sz0` is now known as sz0 === sz0 is now known as sz0` === sz0` is now known as sz0 [07:15] I run nbd-client from initramfs (ubuntu 14.04) to create the device /dev/nbd0, on which the rootfs is mounted. When I shutdown or reboot, I get a kernel panic because some I/O are done and the device is not accessible (http://pastebin.com/P4qWXUFC). I guess nbd-client is killed too early by init. Using systemd, the solution is described here [07:15] http://www.freedesktop.org/wiki/Software/systemd/RootStorageDaemons/, and renaming nbd-client to @nbd-client (or specifing the option -m in nbd-client 3.8, (https://github.com/yoe/nbd/blob/master/nbd-client.c#L532) would probably solve the problem. Any idea how I should solve the issue? [07:18] niluje: See /run/sendsigs.omit.d [07:18] niluje: Any PIDs in there will be bypassed by the "kill the world" bit on shutdown. [07:19] infinity: I'm giving a look [07:26] infinity: thanks a *LOT* [07:26] it works fine, no more kernel panic [07:26] however, when the process is killed? [07:27] the connection doesn't seem to be cleanly closed, I guess the nbd-client didn't send a shutdown(2) to the nbd-server [07:35] niluje: So, I'm not sure how best you'd mangle that. S90{halt,reboot} in rc[06].d will fail if the root disappears. [07:36] niluje: S60umountroot should at least be making sure the disappearing client doesn't cause data loss. Assuming nbd has a concept of read-only mounts. [07:42] infinity: I'm looking into /etc/rc6.d/S60umountroot and I don't really get what I could do here. Could you tell me a bit more? [07:43] niluje: Well, nothing there, really. Just pointint out that that should be remounting your root readonly (if nbd supports such a thing). [07:43] niluje: There's a fundamental chicken/egg issue where we can *never* actually umount root completely, or the reboot/halt would fail, cause you'd no longer have binaries to run. [07:44] hm right [07:45] infinity: I tried a mount -t ext4 -o remount,ro /dev/nbd0 / yesterday and it works well, the file system is readonly, can't write anything on it [07:46] niluje: Right, and shutdown should be doing that for you, it's just never going to exit the nbd client. [07:48] niluje: There are certainly ways we could do this more cleverly, given a bunch of re-engineering (like copying everything we need to run shutdown(8) into a tmpfs, umounting /, and then rebooting), but that's a fair bit of fiddling to get right and has never really been an issue worth caring about, since settling a read-only FS should guarantee that an unclean shudown doesn't really matter. [07:50] infinity: there's something I don't understand [07:50] You made me understand why the rootfs can't be unmounted [07:51] When I put the nbd-client PID in /run/sendsigs.omit.d, the nbd-client process will not be killed upon reboot [07:51] in /etc/rc6.d/S60umountroot, I see a [...] mount $MOUNT_FORCE_OPT -n -o remount,ro -t dummytype dummydev [...] [07:52] which, if I understand well, will flush what needs to be flushed and assure me the filesystem will be "clean" once the reboot will occure (and the fs will be remounted rw) [07:52] am I wrong somewhere? [07:54] Remounted ro, you mean. [07:55] infinity: remounted rw [07:55] niluje: The whole point of that line is to remount ro. [07:55] I mean, the fs is first remounted ro, then when the system restarts, it is remounted rw [07:55] niluje: Oh, sure. Yes. [07:55] niluje: rw when it comes back, obviously. [07:55] ok [07:56] and the "ro" part is to flush pending writes, right? [08:01] infinity: what I don't understand is, if the process isn't killed, why can't I reconnect on nbd-server upon reboot? [08:01] That, I don't know. I've never used nbd. [08:02] thinking about it [08:02] it seems pretty normal, I guess the network it shutdown too [08:02] is* [08:07] I just added some logging in the initramfs to run ps just before launching nbd-client. What I do is: start the machine, put nbd-client PID in /run/sendsigs.omit.d/, reboot. Then, if the process hasn't been killed, shouldn't I see it listed? (it is not) [08:09] niluje: How would the process be listed after a reboot? [08:15] infinity: I'm lost :-( [08:16] I don't understand when the process is killed, when listed in /run/sendsigs.omit.d [08:25] niluje: It's not. The machine reboots. [08:25] niluje: Which, of course, kills everything. But not with a signal. It just dies. Cause the machine is dead. :P [08:28] infinity: ok [08:28] :p [08:28] so I don't understand why you can't unmount the rootfs [08:30] heya ikepanhc, do you know if Alex is around? [08:30] jk-: hi Jeremy :D [08:30] jk-: give me a sec [08:30] ikepanhc: thanks! :) [08:31] jk-, ping [08:32] hi alexhung, just had a few fwts questions; is there a better channel to use? [08:32] niluje: Because if you unmount it, how can you call "halt" or "reboot"? [08:32] niluje: Or the shell script that is S90reboot... [08:33] hm right :p [08:35] infinity: is there anywhere a ressource explaining step by step what happens during a reboot? [08:35] niluje: I doubt it. It's mostly documented in code. [08:35] ok [08:36] wouldn't it be possible to mount a tmpfs in which I'd get something executed that'd kill the process at the very last moment? [08:37] niluje: I hinted above at that being the very complicated solution to trying to not need root to reboot. [08:37] niluje: But it's overkill for literally every use case I've ever run into. [08:37] niluje: If nbd-server is exploding when a client disappears, that's a bug, IMO, and should be fixed, not worked around. [08:40] niluje: The maintainer of nbd (wouter) happens to be a very nice fellow. You might ask him about your issues. [08:40] niluje: I will continue to maintain the opinion that if the server can't handle a client disappearing on reboot and trying to reconnect later with a new instance, it's fundamentally broken. :P [08:41] yeah what happens when you lose power or the network for reasons outside your control, you end up in the same mess [08:41] yup [08:42] infinity: I'm going to dig a bit more before contacting the guys behind nbd. [08:42] thanks a lot for your time [08:42] niluje: Sure. He happens to be a decent dude with access to lovely chocolate. [08:42] He tried to kill me with said chocolate last time I saw him. [08:42] Evil Belgians. [08:43] :D [08:43] Thats killing someone softly... :) [08:43] just wondering, are you a ubuntu committer, a kernel committer, or something? [08:43] smb: That's wouter's style. [08:44] niluje: Ubuntu and Debian developer, and random hacker on several upstream projects. Also, just a guy who can't sleep (but I'm going to go try). [08:44] aren't you in london? at 9am?! [08:44] niluje: I'm in Calgary. It's 2:44am. [08:44] heh ... very few people are in london [08:45] apw, Beside all the inhabitants [08:45] apw: Hey, to be fair, I'm in London more than some people who claim to live there. [08:45] true, but as you arn't, the irc contingent is severely reduced [08:45] ok :-) thanks a lot again [08:47] fallacy (not), yes file a bug against it indeed === apw changed the topic of #ubuntu-kernel to: Home: https://wiki.ubuntu.com/Kernel/ || Ubuntu Kernel Team Meeting - Tues April 29th, 2014 - 17:00 UTC || If you have a question just ask, and do wait around for an answer! If the question is should I file a bug for something, likely you can assume yes. [08:50] apw, Bored of repeating that answer you are? [08:50] :) [08:51] heh [08:52] ;D [08:57] mlankhorst, So about my whining part #1. bug 1298517 has screenshots about it. ;) [08:57] Launchpad bug 1298517 in mesa (Ubuntu) "Rendering issues in unity with xserver-modeset" [High,Triaged] https://launchpad.net/bugs/1298517 [08:58] modesetting hates you :P [09:00] Well, its mostly nearly working. It's just that unity developers obviously don't test in VMs and if they do, they don't use terminals. ;-P [09:01] llvmpipe bugs have a low chance of getting fixed, though.. [09:02] Unless "he" stumbles over them trying to present a Ubuntu desktop running in THE CLOUD™ [09:03] :) [09:04] amitk, Oh look who is there. Hello stranger. ;) [09:05] smb: that should go on the quotes page, too bad I don't have access anymore ;) [09:05] amitk, Not again. :-P [09:08] smb: for the record, I'm always lurking around this channel.. [09:09] still use ubuntu core, none of the desktop stuff though [09:09] amitk, Yeah, and I could tell if I cared to look at the list of people. But I consider you under the towel unless you say something. :) === sz0 is now known as sz0` === sz0` is now known as sz0 === sz0 is now known as sz0` === sz0` is now known as sz0 [10:43] smb, i osmetimes wonder why i even have that thing enabled (the list) [10:45] apw, You got a point there [10:46] * apw idly wonders if it can be turned off in his interface .... hmm === deffrag__ is now known as deffrag_ [14:09] oh i hate that, spend 3 hours debugging a test suite failure "you kernel is broken" to ... of course ... find the test is broken [14:16] solution: spend less time on testing. ;-) [14:21] arges, fwiw that network namespace cleanup bug has been repro'd by someone else on 3.13.0-24 (upstream bug). I think we have an easy way to reproduce it, working on setting that up now. [14:21] cool [14:26] also interesting is that it's also been seen on bare metal [14:32] apw, that's what we call 'quality time' [14:35] mlankhorst, you are right .... rm -rf tests ... better [14:36] sconklin: is that the bug where the refcount appears to fail and the kernel thinks there's still something holding eth0 in an otherwise unused ns? [14:36] stgraber: https://bugzilla.kernel.org/show_bug.cgi?id=65191 [14:36] bugzilla.kernel.org bug 65191 in Netfilter/Iptables "BUG in nf_nat_cleanup_conntrack" [Normal,New] [14:37] ah, I haven't seen that one in a while :) [14:38] we see it when we scale up and create a bunch of containers, then scale down and remove all of them [14:50] ** [14:50] ** Ubuntu Kernel Team Meeting - Today @ 17:00 UTC - #ubuntu-meeting [14:50] ** [14:55] stgraber: are there any multiple-container scaling tests in the LXC tests? [15:01] sconklin: we have lxc-test-concurrent that tests multi-threaded LXC but the LXC tests are typically simple busybox so they don't do much [15:03] ok. Thanks. I'll see whether we have anything that can run outside of our environment to stress this [15:07] sconklin: in your testcase you'll also need to do something that is generating conntrack entries (and allowing them to timeout or get cleaned up) [15:08] yeah, it's not trivial, and I'm not terribly optimistic [15:22] apw: I’m working on fixing powerpc64-emb right now. What’s my timeframe on getting it into the next upload? [15:23] BenC, you now have 3 weeks :-) [15:24] bjf: Not sure if that’s good or bad :) [15:25] BenC, you missed the friday cutoff. i'll be uploading new trusty bits today ... if you are talking utopic, that's a different story [15:25] bjf: utopic [15:25] BenC, ok, then apw and/or rtg can tell you that [15:25] I still keep reading that as micro-topic [15:39] BenC, well we want to get upload soon, but if you have bits to fix power so we don't have to rip rip, that would be welcome [15:39] BenC, how long do you think you'll be [15:39] apw: End of today [15:42] as tim is away we should be ok :) [15:59] sconklin: hey, so I just reproduced the conntrack panic here on metal with current trusty [15:59] (accidently, I was doing stress testing for another unrelated bug ;)) [15:59] stgraber: whoa. what test case did you use? [16:00] arges: running a -j9 kernel build + one unprivileged container running chrome with a youtube video + another container with 500 sub-containers and around 30 with sub-sub-containers [16:00] stgraber, arges - I'm looking at commit 0eba801b64cc8284d9024c7ece30415a2b981a72 which is in upstream but not stable, could be related - I have not tested upstream tip yet [16:01] so I must have been having around 600 netns on that box, each with around 150 ipv6 routing table entries and 20-30 of those with iptables and ip6tables nat entries [16:03] I could build a test kernel for stgraber if he can easily reproduce. sconklin is there a launchpad bug for this? [16:03] arges: no [16:04] arges: I'm not sure about "easily" yet but I can try again once that kernel is actually done building and I don't mind my work machine panicing on me :) [16:07] sconklin: ok file a bug please : ). stgraber when you're ready probably be good to first try http://kernel.ubuntu.com/~kernel-ppa/mainline/daily/current/ and then we can try patch guessing [16:13] arges: fwiw the mainline build v3.15-rc2-trusty should have this patch in it [16:13] sconklin: yup. [16:14] apw: Successfully linked. Still need to boot test it [16:17] BenC, sounds hopeful [16:36] arges: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1314274 [16:36] Launchpad bug 1314274 in linux (Ubuntu) "BUG in nf_nat_cleanup_conntrack" [Undecided,New] [16:48] exellent usage of buildds =) i like it [16:54] ## [16:54] ## Kernel team meeting in 5 minutes [16:54] ## [17:00] ## [17:00] ## Meeting starting now - #ubuntu-meeting [17:00] ## agenda: https://wiki.ubuntu.com/KernelTeam/Meeting [17:00] ## === jsalisbury changed the topic of #ubuntu-kernel to: Home: https://wiki.ubuntu.com/Kernel/ || Ubuntu Kernel Team Meeting - Tues May 6th, 2014 - 17:00 UTC || If you have a question just ask, and do wait around for an answer! If the question is should I file a bug for something, likely you can assume yes. [20:48] arges: that bug also happens with the mainline v3.15-rc2-trusty build, I put that in the bug. [20:48] that means that the patch we talked about doesn't fix it, although that one may need to be in -stable also [20:48] kamal: ^^