=== DarkPlayer_ is now known as DarkPlayer | ||
ppisati | moin | 05:54 |
---|---|---|
=== smb` is now known as smb | ||
smb | morning | 07:37 |
=== fmasi_afk is now known as fmasi | ||
* apw yawns | 08:18 | |
* ppisati wonders about all those abandoned pastebin entries... | 08:43 | |
apw | ppisati, i think they just go poof after a while | 09:11 |
ppisati | apw: i was wondering if one could retrieve all the pastebins that he created, i just had to recreate one because i forgot the #id | 09:12 |
apw | ppisati, heh interesting question | 09:12 |
=== ghostcube_ is now known as ghostcube | ||
caribou | apw: my question is about NUMA memory policies | 11:45 |
caribou | I'm trying to understand how the task.mempolicy structure get defined & used. | 11:45 |
caribou | I'm on a kernel panic where task.mempolicy.mode = 0x0, which is the default value but which should never happen | 11:46 |
caribou | since the case statement for MPOL_DEFAULT is BUG() | 11:46 |
caribou | apw: fyi it's in mm/mempolicy.c into slab_node() | 11:47 |
apw | caribou, so yes i read it the same, that if policy pointer is set then policy mode would not be 0, just checking now if they are cheating anywhere | 11:53 |
apw | if (!mpol || mpol->mode == MPOL_DEFAULT) | 11:54 |
apw | return; /* show nothing */ | 11:54 |
caribou | apw: in the crash I'm looking at, I got > 6000 task struct with mempolicy.mode = 0x3 & only one mongodb task that has it at 0x0 | 11:54 |
apw | i note that some of the shmem code implies that it might be possible regardless | 11:54 |
apw | and where is it crashing, it is in that bug in slab_node() ? | 11:54 |
caribou | apw: yep. I found one other report in some ML about a similar panic in 2.6.3x kernels | 11:55 |
caribou | apw: I'm trying to figure out if this mempolicy.mode can be set by some userland syscall | 11:55 |
=== fmasi is now known as fmasi_lunch | ||
apw | it is not clear that just returning numa_node_id() would be a heck of a lot safer than BUG()'ing | 11:57 |
=== fmasi_lunch is now known as fmasi_afk | ||
apw | :q | 11:57 |
caribou | apw: ok, so I'm not looking in the wrong direction; that's what I wanted to confirm | 11:58 |
caribou | apw: thanks for looking it up | 11:58 |
caribou | apw: hmm, this could be some kind of use-after-free as the refcnt is 0x0 | 12:00 |
apw | caribou, that is believeable indeed, this is changed from userspace, mbind() seems to make it happen | 12:03 |
apw | caribou, but that calls mpol_new to make the policy, and there the policy == NULL if you say MPOL_DEFAULT | 12:03 |
apw | caribou, so i would concur you are most likely seeing a race on free or similar | 12:04 |
diwic | apw, hi, is there a config-3.11.*-generic somewhere on gomeisa? | 12:04 |
diwic | apw, I've just realized how stupid it is to build a kernel on your laptop with less than 10 GB free. | 12:05 |
apw | diwic, hmmm dunno, you could just make one with fdr genconfigs in the tree | 12:05 |
caribou | apw: ok, I'll investigate this | 12:05 |
apw | diwic, are you building somethign which isn't our tree and already has the config in it ? | 12:06 |
diwic | apw, I was thinking of bisecting v3.10 .. v3.11-rc1 | 12:06 |
diwic | apw, so then I would start with an upstream tree which might not have any debian/rules ? | 12:07 |
caribou | apw: and if MPOL_DEFAULT make policy == NULL, then slab_node will return numa_node_id() | 12:07 |
apw | diwic, then just fdr genconfigs (locally in your tree) and look in CONFIGS/* for the one you need to copy over | 12:07 |
apw | caribou, yeah if you have a reference count of 0 it is all bad | 12:07 |
apw | caribou, you should check the policy pointer is in the policy slab while you are at it | 12:08 |
apw | though i suspect it will be | 12:08 |
caribou | apw: ok, will do. Thanks | 12:09 |
diwic | apw, hmm, or I can just scp one over from my saucy laptop | 12:21 |
apw | diwic, yep they are in /boot obviously | 12:22 |
diwic | yeah | 12:22 |
=== fmasi_afk is now known as fmasi | ||
=== ghostcube_ is now known as ghostcube | ||
caribou | apw: do you have a couple of minutes to talk a bit more about my mempolicy issue ? | 14:53 |
caribou | or anyone else who have followed the previous discussion | 14:54 |
apw | caribou, sure | 14:54 |
caribou | apw: here is the full backtrace : http://paste.ubuntu.com/6154796/ | 14:55 |
caribou | apw: looking at the bottom of the backtrace shows that the process is exiting & has already started to destroy the mempolicy slab (in __mpol_put)= | 14:55 |
caribou | apw: then it is interrupted by an IRQ coming from the network (net_rx_action) | 14:56 |
caribou | apw: I just want to make sure I get the context correctly : | 14:56 |
apw | caribou, yes it seems that is correct, though somehow that intterupt context | 14:58 |
caribou | apw: it is while handling the IRQ that it does the __slab_alloc which uses a kmem from the numa_policy slab, the same one that he had started to hand over | 14:58 |
apw | has a reference to the mpolicy, but the reference count should have been higher if it does | 14:58 |
caribou | apw: __mpol_put had already decreased the refcnt a few cycles prior to getting the IRQ apparently | 14:59 |
caribou | __mpol_put does it just before calling keme_cache_free | 15:00 |
caribou | apw: I'm trying to get more recent kernels tested (3.8ish) this is on 3.2.0-38 | 15:01 |
apw | that implies the reference count is lower than it should be, that someone freed it and didn't clear a pointer or similar though | 15:02 |
caribou | apw: I'm just surprized that the slab allocation is handing over the same numa_policy structure so fast | 15:02 |
apw | it will hand out the last one deallocated as it is cache hot | 15:02 |
caribou | apw: well, it's still being freed at the bottom of the backtrace, prior to the IRQ | 15:03 |
caribou | apw: thought so | 15:03 |
apw | caribou, not necessarily we are still in the free routine, but if we are giving up the CPU we are most likely past freeing it | 15:04 |
apw | at the point where we would return or similar | 15:04 |
caribou | apw: so if I understand it correctly, the task is ramping down, releasing its numa_policy slab, then gets hit by the IRQ that goes on allocating a slab that turns out to be the same one | 15:05 |
caribou | ?!? | 15:05 |
apw | so this is allocating out of a different slab as well | 15:05 |
apw | so does this not mean that this process is clearing its numa policy | 15:05 |
apw | we take an interrupt, a | 15:05 |
apw | and allocate something from the slab, but using the 'current' processes numa policy | 15:05 |
apw | so that might mean we have not done the clean up in a good order | 15:06 |
apw | as if we are freeing it we should be no longer using it | 15:06 |
caribou | apw: from what I see, the cleanup had not been completed when the IRQ got it | 15:06 |
apw | mpol_put(tsk->mempolicy); | 15:06 |
caribou | s/it/in | 15:06 |
apw | tsk->mempolicy = NULL; | 15:06 |
ppisati | brb | 15:07 |
apw | right but .. we are exiting, so we throw away our policy to the allocator, _then_ remove it | 15:07 |
apw | that seems backwards | 15:07 |
apw | i would expect to see thing | 15:07 |
apw | tmp = tsk->mempolicy | 15:07 |
apw | tsk->mempolicy = NULL | 15:08 |
apw | mpol_put(tmp) | 15:08 |
apw | now ... i think that the writer of the code believes we can not do that becasue we have 'task_lock(tsk)' but this | 15:08 |
apw | code path seems to imply not | 15:08 |
apw | if this is something you could reproduce then i would recommend trying that tmp thing | 15:08 |
caribou | apw: ok. I'm trying to see if we can find a way to reproduce | 15:09 |
apw | caribou, shall i spin you a patch to try ? or you got it | 15:11 |
caribou | apw: let me try to reproduce it first | 15:12 |
apw | caribou, i'll put a quick patch together, its pretty simple | 15:12 |
rtg | apw, at least the 3.12 keyboard problem isn't systemic. it appears to work on an AMD gizmo | 15:20 |
apw | rtg, good to kno | 15:22 |
apw | i'll try it out here in a bit | 15:22 |
rtg | apw, gonna fire it up on a gigabyte MB soon | 15:22 |
rtg | as soon as my USB stick flashes.... | 15:23 |
apw | rtg heh | 15:24 |
=== psivaa is now known as psivaa-afk-bbl | ||
=== psivaa-afk-bbl is now known as psivaa | ||
rtg | apw, well, so far everything seems to work on 3.12-rc2. at least all of the mainline bits. overlay and aufs are still disabled. | 17:57 |
rtg | hmm, should check audio | 17:58 |
apw | rtg, ok will have a poke at the overlay etc tommorrow | 18:01 |
apw | when i am awake | 18:01 |
rtg | apw, oh yeah, what are you doing around ? get lost. go have a beer. | 18:02 |
* rtg -> lunch | 19:31 | |
=== fmasi is now known as fmasi_afk | ||
* rtg -> EOD | 20:55 |
Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!