=== DarkPlayer_ is now known as DarkPlayer [05:54] moin === smb` is now known as smb [07:37] morning === fmasi_afk is now known as fmasi [08:18] * apw yawns [08:43] * ppisati wonders about all those abandoned pastebin entries... [09:11] ppisati, i think they just go poof after a while [09:12] apw: i was wondering if one could retrieve all the pastebins that he created, i just had to recreate one because i forgot the #id [09:12] ppisati, heh interesting question === ghostcube_ is now known as ghostcube [11:45] apw: my question is about NUMA memory policies [11:45] I'm trying to understand how the task.mempolicy structure get defined & used. [11:46] I'm on a kernel panic where task.mempolicy.mode = 0x0, which is the default value but which should never happen [11:46] since the case statement for MPOL_DEFAULT is BUG() [11:47] apw: fyi it's in mm/mempolicy.c into slab_node() [11:53] caribou, so yes i read it the same, that if policy pointer is set then policy mode would not be 0, just checking now if they are cheating anywhere [11:54] if (!mpol || mpol->mode == MPOL_DEFAULT) [11:54] return; /* show nothing */ [11:54] apw: in the crash I'm looking at, I got > 6000 task struct with mempolicy.mode = 0x3 & only one mongodb task that has it at 0x0 [11:54] i note that some of the shmem code implies that it might be possible regardless [11:54] and where is it crashing, it is in that bug in slab_node() ? [11:55] apw: yep. I found one other report in some ML about a similar panic in 2.6.3x kernels [11:55] apw: I'm trying to figure out if this mempolicy.mode can be set by some userland syscall === fmasi is now known as fmasi_lunch [11:57] it is not clear that just returning numa_node_id() would be a heck of a lot safer than BUG()'ing === fmasi_lunch is now known as fmasi_afk [11:57] :q [11:58] apw: ok, so I'm not looking in the wrong direction; that's what I wanted to confirm [11:58] apw: thanks for looking it up [12:00] apw: hmm, this could be some kind of use-after-free as the refcnt is 0x0 [12:03] caribou, that is believeable indeed, this is changed from userspace, mbind() seems to make it happen [12:03] caribou, but that calls mpol_new to make the policy, and there the policy == NULL if you say MPOL_DEFAULT [12:04] caribou, so i would concur you are most likely seeing a race on free or similar [12:04] apw, hi, is there a config-3.11.*-generic somewhere on gomeisa? [12:05] apw, I've just realized how stupid it is to build a kernel on your laptop with less than 10 GB free. [12:05] diwic, hmmm dunno, you could just make one with fdr genconfigs in the tree [12:05] apw: ok, I'll investigate this [12:06] diwic, are you building somethign which isn't our tree and already has the config in it ? [12:06] apw, I was thinking of bisecting v3.10 .. v3.11-rc1 [12:07] apw, so then I would start with an upstream tree which might not have any debian/rules ? [12:07] apw: and if MPOL_DEFAULT make policy == NULL, then slab_node will return numa_node_id() [12:07] diwic, then just fdr genconfigs (locally in your tree) and look in CONFIGS/* for the one you need to copy over [12:07] caribou, yeah if you have a reference count of 0 it is all bad [12:08] caribou, you should check the policy pointer is in the policy slab while you are at it [12:08] though i suspect it will be [12:09] apw: ok, will do. Thanks [12:21] apw, hmm, or I can just scp one over from my saucy laptop [12:22] diwic, yep they are in /boot obviously [12:22] yeah === fmasi_afk is now known as fmasi === ghostcube_ is now known as ghostcube [14:53] apw: do you have a couple of minutes to talk a bit more about my mempolicy issue ? [14:54] or anyone else who have followed the previous discussion [14:54] caribou, sure [14:55] apw: here is the full backtrace : http://paste.ubuntu.com/6154796/ [14:55] apw: looking at the bottom of the backtrace shows that the process is exiting & has already started to destroy the mempolicy slab (in __mpol_put)= [14:56] apw: then it is interrupted by an IRQ coming from the network (net_rx_action) [14:56] apw: I just want to make sure I get the context correctly : [14:58] caribou, yes it seems that is correct, though somehow that intterupt context [14:58] apw: it is while handling the IRQ that it does the __slab_alloc which uses a kmem from the numa_policy slab, the same one that he had started to hand over [14:58] has a reference to the mpolicy, but the reference count should have been higher if it does [14:59] apw: __mpol_put had already decreased the refcnt a few cycles prior to getting the IRQ apparently [15:00] __mpol_put does it just before calling keme_cache_free [15:01] apw: I'm trying to get more recent kernels tested (3.8ish) this is on 3.2.0-38 [15:02] that implies the reference count is lower than it should be, that someone freed it and didn't clear a pointer or similar though [15:02] apw: I'm just surprized that the slab allocation is handing over the same numa_policy structure so fast [15:02] it will hand out the last one deallocated as it is cache hot [15:03] apw: well, it's still being freed at the bottom of the backtrace, prior to the IRQ [15:03] apw: thought so [15:04] caribou, not necessarily we are still in the free routine, but if we are giving up the CPU we are most likely past freeing it [15:04] at the point where we would return or similar [15:05] apw: so if I understand it correctly, the task is ramping down, releasing its numa_policy slab, then gets hit by the IRQ that goes on allocating a slab that turns out to be the same one [15:05] ?!? [15:05] so this is allocating out of a different slab as well [15:05] so does this not mean that this process is clearing its numa policy [15:05] we take an interrupt, a [15:05] and allocate something from the slab, but using the 'current' processes numa policy [15:06] so that might mean we have not done the clean up in a good order [15:06] as if we are freeing it we should be no longer using it [15:06] apw: from what I see, the cleanup had not been completed when the IRQ got it [15:06] mpol_put(tsk->mempolicy); [15:06] s/it/in [15:06] tsk->mempolicy = NULL; [15:07] brb [15:07] right but .. we are exiting, so we throw away our policy to the allocator, _then_ remove it [15:07] that seems backwards [15:07] i would expect to see thing [15:07] tmp = tsk->mempolicy [15:08] tsk->mempolicy = NULL [15:08] mpol_put(tmp) [15:08] now ... i think that the writer of the code believes we can not do that becasue we have 'task_lock(tsk)' but this [15:08] code path seems to imply not [15:08] if this is something you could reproduce then i would recommend trying that tmp thing [15:09] apw: ok. I'm trying to see if we can find a way to reproduce [15:11] caribou, shall i spin you a patch to try ? or you got it [15:12] apw: let me try to reproduce it first [15:12] caribou, i'll put a quick patch together, its pretty simple [15:20] apw, at least the 3.12 keyboard problem isn't systemic. it appears to work on an AMD gizmo [15:22] rtg, good to kno [15:22] i'll try it out here in a bit [15:22] apw, gonna fire it up on a gigabyte MB soon [15:23] as soon as my USB stick flashes.... [15:24] rtg heh === psivaa is now known as psivaa-afk-bbl === psivaa-afk-bbl is now known as psivaa [17:57] apw, well, so far everything seems to work on 3.12-rc2. at least all of the mainline bits. overlay and aufs are still disabled. [17:58] hmm, should check audio [18:01] rtg, ok will have a poke at the overlay etc tommorrow [18:01] when i am awake [18:02] apw, oh yeah, what are you doing around ? get lost. go have a beer. [19:31] * rtg -> lunch === fmasi is now known as fmasi_afk [20:55] * rtg -> EOD