hallyn | apw: thanks - that worked! | 00:04 |
---|---|---|
manjo | anyone know if ports.ubuntu is having issues | 00:08 |
manjo | looks like some of the ports mirrors might be down... oh well | 00:11 |
hallyn | apw: the patch is not quite right as it should grab another reference to the namespace, but it doesn't seem to blow up and does fix the bug, so i'll send a proper patch tongiht - thanks again | 00:17 |
hallyn | or does it not matter? | 00:24 |
hallyn | are we guaranteed that the cgns will stick aroudn for the duration of the mount due to task->mntns ? | 00:25 |
hallyn | that actually may be the case bu ti need to think it through | 00:25 |
manjo | apw, shit! | 00:44 |
manjo | (initramfs) echo "81780AD9-068C-4A80-A795-56856973B8F9" | tr '[:upper:]' '[:lowe | 00:44 |
manjo | r:]' | 00:44 |
manjo | 81780AD9-068C-4A80-A795-56856973B8F9 | 00:44 |
manjo | apw, so tr trick won't work in initramfs environemt | 00:45 |
manjo | apw, (initramfs) echo "81780AD9-068C-4A80-A795-56856973B8F9" | awk '{print tolower($0 | 00:52 |
manjo | )}' | 00:52 |
manjo | 81780ad9-068c-4a80-a795-56856973b8f9 | 00:52 |
manjo | that works | 00:52 |
manjo | apw, revised patch attached to the bug with real system boot test results | 01:17 |
jsalisbury | lamont, I posted the last bunch of test kernels in a directory tree. Location and how they are organized posted to the bug. | 02:41 |
lamont | jsalisbury: awesome | 02:45 |
lamont | I'll smash through the bisect tomorrow | 02:45 |
tjaalton | do you prefer an oldskool pull request or a merge request via lp? (for xenial) | 07:27 |
=== shuduo is now known as shuduo-afk | ||
apw | tjaalton, old skool pull request is prolly easier for people as that is what they are used to | 09:12 |
tjaalton | ok, I'll do that then | 09:14 |
tjaalton | apw: enjoy! :) | 11:19 |
apw | manjo, i've uploaded an alternative fix (to avoid adding the first awk dependency) to ppa:apw/ubuntu/initramfs-tools-test, if you could test that for me then i can get it uploaded for real later today | 11:22 |
apw | tjaalton, that is highly doubtful :/ | 11:22 |
tjaalton | hehe | 11:23 |
apw | tjaalton, pththththth | 11:28 |
apw | hallyn, yo ... did you bottom out that cgroups fix? i don't see an email | 12:18 |
HerrAmeise | anyone install dist-upgrade 4.2.0-30 this morning and have trouble booting? | 14:27 |
HerrAmeise | i had to revert back to 4.2.0-27 in order to get it working | 14:27 |
apw | bjf, kamal ^ | 14:28 |
apw | HerrAmeise, what was your symtoms | 14:28 |
apw | you are cirtinaly the first report I have seen of issues with that update | 14:28 |
HerrAmeise | Unity would not start up | 14:29 |
HerrAmeise | so it went through the normal boot sequence | 14:29 |
HerrAmeise | and then just hung forever | 14:29 |
HerrAmeise | I couldn't even open a terminal | 14:30 |
apw | HerrAmeise, ugg, sounds bad. could you file a bug against the kernle "ubuntu-bug linux" from your working kernel | 14:30 |
HerrAmeise | so I had to go to grub and boot with a different kernel version | 14:30 |
apw | and put whatever details you have in it, and tell us the bug# here | 14:30 |
HerrAmeise | yup no problem | 14:30 |
apw | it is also worth looking in syslog to see if there was anything reported when it broke | 14:30 |
HerrAmeise | yep i can do that one sec | 14:30 |
apw | i am assuming you receieved the kernel via a simple update (update manager or apt-get dist-upgrade sort of thing) | 14:31 |
HerrAmeise | yea i did apt-get dist-upgrade | 14:32 |
HerrAmeise | didn't build it myself or anything crazy | 14:32 |
apw | and its hard to not get all the pieces you need as a complete set when its done that way | 14:32 |
apw | (so that eliminates one possibility) | 14:33 |
apw | once you have a bug, i'll paste some initial kernels to test to try and figure out which of the three updates is at fault | 14:33 |
bjf | apw, news to me. but -30 only has been in -updates for less than 24 hrs. | 14:33 |
apw | yep, that indeed | 14:34 |
HerrAmeise | apw: ok, I am also running this on a VM at work if that is relevant | 14:34 |
HerrAmeise | VMware Workstation 10 | 14:34 |
apw | HerrAmeise, VMs would be a common test subject but mostly in KVM | 14:35 |
apw | HerrAmeise, got a bug # yet ? | 14:42 |
stgraber | apw: did our uploads fir your kernel adt problem? | 15:06 |
apw | stgraber, yes ... final one just went green in the last 15m | 15:07 |
apw | (for xenial) | 15:07 |
stgraber | good! | 15:07 |
apw | stgraber, and we're seeing ppc64el lxc ADT failures on trusty regardless of kernel by the looks of it, is this expected ? | 15:19 |
apw | stgraber, and if not i'll file you anohter bug :) | 15:19 |
HerrAmeise | apw: is there any other way to report bugs other than through Apport? | 15:20 |
HerrAmeise | it's really a PITA | 15:21 |
apw | in theory you can ask apport to file the info to a blob you can move to anohter machine and submit form there | 15:21 |
HerrAmeise | and its not just an application crash | 15:22 |
HerrAmeise | btw i upgraded the kernel to 4.2.0-30 on my 32-bit Ubuntu VM and the same thing happened | 15:25 |
HerrAmeise | so definitely able to replicate the error | 15:25 |
HerrAmeise | first one was 64-bit | 15:25 |
apw | it is as likely a vmware realted issue as anything | 15:25 |
HerrAmeise | true | 15:27 |
HerrAmeise | i'll try natively when i get home tonight | 15:28 |
apw | HerrAmeise, the first debugging steps are to try -28 and -29 to see which of 28,29,30 are the first broken one | 15:29 |
apw | https://launchpad.net/ubuntu/+source/linux/4.2.0-29.34 | 15:29 |
apw | https://launchpad.net/ubuntu/+source/linux/4.2.0-28.33 | 15:30 |
apw | binary packages for those are in the librarian ^ | 15:30 |
HerrAmeise | 33674 | 15:37 |
HerrAmeise | oops sorry wrong window | 15:38 |
stgraber | apw: the latest ppc64el failure on trusty indicates a DC network failure | 15:41 |
stgraber | unable to reach the gpg network and cloud-images.ubuntu.com | 15:41 |
apw | stgraber, i'll ask for them again and see if it goes away then | 15:42 |
stgraber | the rest of your results look good so I'd say your kernel is fine, it's just the test runner having some network difficulties | 15:42 |
stgraber | I know that IS changed the squid proxy IP recently, could be that the ppc64el VMs don't have the right firewall rule or something | 15:43 |
stgraber | if it fails again, we'll involve pitti | 15:43 |
apw | stgraber, ack, will let you know | 15:43 |
hallyn | apw: sorry, no, had some technical difficulties. will try to send it out this morning | 15:55 |
apw | hallyn, the pending fixes which break you are time sensitive, so i would like to get to a place where i have a plan to add something or rip something and upload at the latest tommorrow | 15:56 |
hallyn | apw: should i send a patch first upstream or first to ubuntu-kernel@? | 15:57 |
apw | hallyn, either is fine, if you are confident in the fix i cna apply it while upstream grinds on it | 15:58 |
apw | were still in a reaosnably felxible period so we can rip it and replace it if upstream has a better idea | 15:58 |
apw | as i assume my other option is to rip the other thing you applied which causes the issue | 15:59 |
apw | the cgn i think it was | 15:59 |
hallyn | apw: http://paste.ubuntu.com/15181046/ <- does that mean anything to you? | 15:59 |
hallyn | (it kinda blocks me when the server hosting all my work keeps hanging with crap like that - from 3.13 to 3.16 and now on 4.2) | 16:00 |
hallyn | if it migth be hw then i'll just $%)(*$%)($ switch | 16:00 |
apw | i wonder if that could be the fuse cve i was just reviewing | 16:01 |
hallyn | oh | 16:01 |
sforshee | hallyn: do you hit the WARN_ON at the end of the cgroup_mount after your change? | 16:01 |
hallyn | which warn_on? | 16:02 |
sforshee | hallyn: was also thinking it might make sense to make kernfs to use sget_userns with init_user_ns always, haven't had time to really think it through yet though | 16:02 |
sforshee | the one in the if (pinned_sb) block | 16:02 |
hallyn | sforshee: i don't think so | 16:03 |
apw | sforshee, you know fuse well, is fuse_fill_write_pages() use on the write or read path ? | 16:03 |
apw | it ought to be write, but hey, nothing is clear in this world | 16:04 |
hallyn | sforshee: we cannot have cgroup mount use init_user_ns always, if you end up doing the comparison for the sb. | 16:04 |
sforshee | apw: write it would seem, invoked by fuse's write_iter callback | 16:04 |
apw | sforshee, thanks, not that cve then hallyn | 16:05 |
apw | sforshee, that stack trace might be something you grok better than i: http://paste.ubuntu.com/15181046/ | 16:05 |
hallyn | apw: trying to decie whether to halt my world to have the people hosting the server check the hw for 10 hours | 16:05 |
sforshee | apw: looks to me like the kernel is hung waiting for userspace to respond | 16:06 |
apw | hallyn, if it is always that, it look to my shallow knowledge of fuse that it is waiting on a userspace provider, and isn't interruptible | 16:06 |
apw | sforshee, ok you see the same as me | 16:06 |
hallyn | ok, thx :) | 16:06 |
apw | hallyn, so i'd not be keen to blame h/w but whatever crap is mounted on fuse | 16:06 |
apw | is that lxcfs :) | 16:06 |
hallyn | hm, could be. | 16:07 |
apw | sforshee, do we really hand things to uspace and wait in an uninterruptible way for it to respond, that sounds mad to me | 16:07 |
hallyn | in that case the only thing i can think is that the kernel builds make oom happen killing it (bc nothing else was execising fuse), | 16:07 |
hallyn | except i've got 42g ram | 16:07 |
apw | hallyn, then you'd have an oom in there | 16:08 |
apw | hallyn, and it looks to start right there, with something not it before | 16:09 |
sforshee | hallyn: going back to cgfs ... prior to my changes it was going to reuse an existing superblock. Now it still wants to but sget refuses because it's a different userns. Is that right? | 16:10 |
sforshee | apw: I think fuse does wait in an uninterruptible way to respond. But there is some kind of abort connection sysfs node to break those waits. | 16:15 |
sforshee | hallyn: it does seem to me that it will possible to hit that WARN_ON(new_sb) if using kernfs_mount_ns. Does that represent some real problem? | 16:32 |
sforshee | hallyn: also I'm not sure what you were getting at wrt using init_user_ns always | 16:33 |
sforshee | in effect that's what's happening before we have s_user_ns. But like I said I need to think it through some more. | 16:33 |
hallyn | sforshee: just to make sure, you see why i need something like it right? | 16:38 |
hallyn | looking at fs/sysfs/mount.c, i think i just need to grab/release the ns ((i suppose sb release could be done lazily) | 16:39 |
sforshee | hallyn: I think so. Previously you ended up reusing a superblock, but now sget returns EBUSY because you're in a different userns. | 16:39 |
hallyn | right | 16:39 |
sforshee | and by passing a namespace you force the kernfs test function to not match the old superblock | 16:40 |
sforshee | but there seems to be something inherent to this code that expects to reuse the superblock in some cases | 16:40 |
hallyn | is there? i was wondering that but didn't see it, | 16:40 |
sforshee | just look at pinned_sb | 16:40 |
hallyn | is there a simple way we could re-use it? | 16:41 |
hallyn | without hardcoding cgroupfs in sget_userns | 16:41 |
sforshee | well, that's where the though of doing sget_userns(..., &init_user_ns) in kernfs_mount_ns came from | 16:42 |
sforshee | *thought | 16:42 |
sforshee | or I was also considering whether maybe that check only makes sense for device-backed mounts | 16:42 |
hallyn | yeah it doesnt make sense for i.e. sysfs or proc, right? you can't get another userns's mount there | 16:44 |
hallyn | devpts? | 16:44 |
hallyn | i wonder whether this will quietly break containers doing 'mount -t devpts' without -o newinstance | 16:45 |
hallyn | well we pass file_system_type in, so i guess we *could* filter on cgroupfs; how would we tell whether it's blockdev-backed in sget? | 16:46 |
sforshee | you can't mount devpts from !init_user_ns without newinstance it appears | 16:47 |
hallyn | oh, good | 16:47 |
sforshee | there's a flag in the fs type that says whether or not it's device backed | 16:48 |
sforshee | FS_REQUIRES_DEV | 16:48 |
cristian_c | jsalisbury: hi | 16:49 |
hallyn | so if (type->flags & FS_REQUIRES_DEV && user_ns != old->s_user_ns) ? | 16:49 |
lamont | jsalisbury: finally read what you set up for me. you have outdone expectations, thanks. | 16:49 |
sforshee | hallyn: yeah, something like that. Still thinking though. | 16:50 |
lamont | I'll smash through those sometime before the end of lunch today, expect an update in something like 2-4 hours. I'm assuming that our final kernel is top-of-branch, with the identified commit reverted? | 16:50 |
sforshee | hallyn: so essentially s_user_ns is used for 2 things. First is translating ids for the backing store, which doesn't apply to psuedo filesystems. | 16:53 |
sforshee | the second is for privileges towards the superblock. For cgroups/sysfs do we really want root in the userns to have privileges towards the superblock? | 16:54 |
sforshee | though if they aren't they can't remount | 16:55 |
hallyn | sforshee, well in the case of cgfs cgroup_mount() guards remount, but i'm not sure about proc and sysfs | 16:58 |
hallyn | i would assume so | 16:58 |
hallyn | else that would have been an issue already since we allow mount on those | 17:00 |
hallyn | sforshee: it seems we canassume it's safe if it's not dev-backed and it already has FS_USERNS_MOUNT | 17:01 |
sforshee | hallyn: assume what's safe? Skipping the checkin in sget_userns? | 17:02 |
hallyn | sforshee: yes | 17:03 |
sforshee | hallyn: I think at minimum it's probalby okay as a interim fix while we decide what the best fix is | 17:05 |
xnox | bjf, what's the minimal amount of maas functionality is required to run kernel-maas testing for s390x? | 17:10 |
xnox | as far as can tell that enablement is currently staggnating, and i'm wonder if that can be expedited somehow. | 17:11 |
bjf | xnox, right now i only deal with bare-metal via maas | 17:13 |
xnox | bjf, and it needs to boot to ssh right? | 17:14 |
bjf | xnox, yes | 17:14 |
xnox | bjf, and you do use the powercycle functions in maas right? e.g. poweroff/on/reboot? | 17:15 |
bjf | xnox, yup | 17:15 |
xnox | bjf, ok, cool. | 17:15 |
manjo | apw, I had trouble installing from the PPA due to dependcy issues .. posted it to the bug | 17:17 |
manjo | apw, https://bugs.launchpad.net/ubuntu/+source/initramfs-tools/+bug/1548120 | 17:17 |
ubot5 | Launchpad bug 1548120 in initramfs-tools (Ubuntu) "[xenial][initramfs-tools] support uppercase and lowercase uuids" [High,In progress] | 17:17 |
apw | manjo, those deps are right | 17:19 |
apw | manjo, they are internal deps making sure the bits are at the same version from the same source package | 17:19 |
manjo | apw, hmm I have xenial-propose enabled .. | 17:21 |
manjo | apw, should I disable that 1st ? | 17:22 |
apw | manjo, nope shouldn't matter | 17:22 |
apw | manjo, did you add the ppa or download the .debs ? | 17:22 |
manjo | added your repo | 17:22 |
manjo | etc/apt/sources.list.d/apw-ubuntu-initramfs-tools-test-xenial.list | 17:23 |
manjo | using apt-add-repo | 17:23 |
apw | initramfs-tools-bin_0.122ubuntu5~rc1_amd64.deb (81.7 KiB) | 17:23 |
apw | initramfs-tools-core_0.122ubuntu5~rc1_all.deb (116.8 KiB) | 17:23 |
apw | initramfs-tools_0.122ubuntu5~rc1_all.deb (84.0 KiB) | 17:24 |
apw | well that PPA has all three of those in there | 17:24 |
apw | so your deps should be found from the PPA | 17:24 |
manjo | initramfs-tools: | 17:24 |
manjo | Installed: (none) | 17:24 |
manjo | Candidate: 0.122ubuntu5~rc1 | 17:24 |
manjo | Version table: | 17:24 |
manjo | 0.122ubuntu5~rc1 500 | 17:24 |
manjo | 500 http://ppa.launchpad.net/apw/initramfs-tools-test/ubuntu xenial/main arm64 Packages | 17:24 |
manjo | Candidate: 0.122ubuntu5~rc1 | 17:24 |
manjo | Version table: | 17:24 |
manjo | 0.122ubuntu5~rc1 500 | 17:24 |
manjo | 500 http://ppa.launchpad.net/apw/initramfs-tools-test/ubuntu xenial/main arm64 Packages | 17:24 |
manjo | apw, ah -bin comes from ports | 17:25 |
apw | OH its bloody arm64 | 17:25 |
manjo | Candidate: 0.122ubuntu4 | 17:25 |
manjo | Version table: | 17:25 |
manjo | *** 0.122ubuntu4 500 | 17:25 |
manjo | 500 http://ports.ubuntu.com/ubuntu-ports xenial-proposed/main arm64 Packages | 17:25 |
apw | hang on | 17:25 |
manjo | ok ☺ | 17:25 |
manjo | apw, welcome to my world | 17:25 |
apw | manjo, ok in about 10m there will a ~rc2 in there built for arm64 as well | 17:29 |
manjo | cool | 17:30 |
manjo | will post results to the bug | 17:30 |
=== zequence_ is now known as zequence | ||
sforshee | hallyn: are you already preparing a patch for the cgfs issue or should I go ahead and make one? | 17:47 |
hallyn | sforshee: I'm looking at the code, but i'm not sure i'm doing it right | 17:48 |
hallyn | (adding an extra helper fn which tries to get the logic right of whether we need to check the user_ns) | 17:48 |
sforshee | sounds more complicated than what I was thinking ... | 17:49 |
hallyn | lemme finish tihs up and pastebin it and you can tell me i'm wrong :) | 17:49 |
sforshee | sounds good | 17:49 |
hallyn | sforshee: http://paste.ubuntu.com/15181923/ ? | 17:51 |
hallyn | seems to be buliding anyway | 17:54 |
sforshee | hallyn: I think that should work, but the FS_USERNS_MOUNT seems unnecessary | 17:55 |
hallyn | sforshee: so the idea of that is that any virtual fs which we cannot currently mount in a userns will be restricted to your own userns... | 17:56 |
hallyn | agreed not sure if it's needed, and it may be reckless... might break things... | 17:56 |
hallyn | but i wasn't certain that it woudl be safe otherwise | 17:56 |
hallyn | no shouldn't break things - it just allows what you had designed to work the way you menat it to for those filesystems :) | 17:56 |
sforshee | but if you can't mount it in a user_ns then both namespaces are always init_user_ns anyway | 17:56 |
hallyn | sforshee: oh, youdon't relax the need for FS_USERNS_MOUNT, right | 17:57 |
hallyn | so you're right, shouldn't be necessary so we don't need the helper really | 17:57 |
hallyn | sforshee: so i'll just build and test http://paste.ubuntu.com/15181976/ | 17:58 |
hallyn | if you can do a clenaer version of that then even better | 17:58 |
sforshee | hallyn: or even http://paste.ubuntu.com/15181975/ | 17:58 |
hallyn | right | 18:00 |
hallyn | do you mind sending that in then? :) | 18:00 |
sforshee | you want to test it first? | 18:00 |
hallyn | is building my version enough or do you want me to do your verbatim version? | 18:01 |
sforshee | hallyn: I'm probably going to build my version either way before I send it, so I can just give you that version to test | 18:03 |
sforshee | I hate to send patches without build testing, typos can be easy to overlook | 18:03 |
hallyn | builds taking forever | 18:10 |
kamal | HerrAmeise, apw, bjf: So far, I'm unable reproduce Herr's issue. In a 15.10 amd64 qemu vm, I can successfully boot linux-image-4.2.0-{27,28,29,30}-generic with no apparent trouble. | 18:22 |
bjf | kamal, thanks for looking at that | 18:24 |
hallyn | sforshee... and still building. if you get a build, so long as you can 'lxc launch cxenial x1' and x1 boots (gets >3 processes and gets an ip address) then it's good | 18:35 |
sforshee | hallyn: mine's still building too | 18:37 |
lamont | jsalisbury: back to you. | 18:45 |
lamont | and thanks muchly for making it less painful | 18:45 |
jsalisbury | lamont, thanks, I'll take a look | 18:45 |
lamont | it's also good to have my gut confirmed. :D | 18:46 |
jsalisbury | lamont, I'll build a test kernel with that commit reverted, if it fixes the bug, I'll get in touch with the patch author | 18:47 |
lamont | ack. scream when | 18:49 |
sforshee | hallyn: http://people.canonical.com/~sforshee/cgns/, updating my vm to test it now | 18:52 |
lamont | jsalisbury: fwiw, http://global.shuttle.com/products/productsDetail?productId=1480 | 18:53 |
hallyn | sforshee: success! | 18:56 |
hallyn | i think | 18:57 |
hallyn | yeah. dunno why there is a 'lxc' cgroup there, but ... | 18:57 |
sforshee | hallyn: seems to work for me too | 19:00 |
sforshee | I'll send it | 19:00 |
sforshee | apw, hallyn: patch sent | 19:06 |
tjaalton | umm yeah disregard the pull request, I'll send a new one | 19:08 |
hallyn | sforshee: awesome, thx | 19:11 |
cristian_c | jsalisbury: hi, again | 19:30 |
=== philipballew is now known as Guest44851 | ||
=== RAOF_ is now known as RAOF |
Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!