[05:54] <ppisati> moin
[07:37] <smb> morning
[08:18]  * apw yawns
[08:43]  * ppisati wonders about all those abandoned pastebin entries...
[09:11] <apw> ppisati, i think they just go poof after a while
[09:12] <ppisati> apw: i was wondering if one could retrieve all the pastebins that he created, i just had to recreate one because i forgot the #id
[09:12] <apw> ppisati, heh interesting question
[11:45] <caribou> apw: my question is about NUMA memory policies
[11:45] <caribou> I'm trying to understand how the task.mempolicy structure get defined & used.
[11:46] <caribou> I'm on a kernel panic where task.mempolicy.mode = 0x0, which is the default value but which should never happen
[11:46] <caribou> since the case statement for MPOL_DEFAULT is BUG()
[11:47] <caribou> apw: fyi it's in mm/mempolicy.c into slab_node()
[11:53] <apw> caribou, so yes i read it the same, that if policy pointer is set then policy mode would not be 0, just checking now if they are cheating anywhere
[11:54] <apw>         if (!mpol || mpol->mode == MPOL_DEFAULT)
[11:54] <apw>                 return;         /* show nothing */
[11:54] <caribou> apw: in the crash I'm looking at, I got > 6000 task struct with mempolicy.mode = 0x3 & only one mongodb task that has it at 0x0
[11:54] <apw> i note that some of the shmem code implies that it might be possible regardless
[11:54] <apw> and where is it crashing, it is in that bug in slab_node() ?
[11:55] <caribou> apw: yep. I found one other report in some ML about a similar panic in 2.6.3x kernels
[11:55] <caribou> apw: I'm trying to figure out if this mempolicy.mode can be set by some userland syscall
[11:57] <apw> it is not clear that just returning numa_node_id() would be a heck of a lot safer than BUG()'ing
[11:57] <apw> :q
[11:58] <caribou> apw: ok, so I'm not looking in the wrong direction; that's what I wanted to confirm
[11:58] <caribou> apw: thanks for looking it up
[12:00] <caribou> apw: hmm, this could be some kind of use-after-free as the refcnt is 0x0
[12:03] <apw> caribou, that is believeable indeed, this is changed from userspace, mbind() seems to make it happen
[12:03] <apw> caribou, but that calls mpol_new to make the policy, and there the policy == NULL if you say MPOL_DEFAULT
[12:04] <apw> caribou, so i would concur you are most likely seeing a race on free or similar
[12:04] <diwic> apw, hi, is there a config-3.11.*-generic somewhere on gomeisa?
[12:05] <diwic> apw, I've just realized how stupid it is to build a kernel on your laptop with less than 10 GB free.
[12:05] <apw> diwic, hmmm dunno, you could just make one with fdr genconfigs in the tree
[12:05] <caribou> apw: ok, I'll investigate this
[12:06] <apw> diwic, are you building somethign which isn't our tree and already has the config in it ?
[12:06] <diwic> apw, I was thinking of bisecting v3.10 .. v3.11-rc1
[12:07] <diwic> apw, so then I would start with an upstream tree which might not have any debian/rules ?
[12:07] <caribou> apw: and if MPOL_DEFAULT make policy == NULL, then slab_node will return numa_node_id()
[12:07] <apw> diwic, then just fdr genconfigs (locally in your tree) and look in CONFIGS/* for the one you need to copy over
[12:07] <apw> caribou, yeah if you have a reference count of 0 it is all bad
[12:08] <apw> caribou, you should check the policy pointer is in the policy slab while you are at it
[12:08] <apw> though i suspect it will be
[12:09] <caribou> apw: ok, will do. Thanks
[12:21] <diwic> apw, hmm, or I can just scp one over from my saucy laptop
[12:22] <apw> diwic, yep they are in /boot obviously
[12:22] <diwic> yeah
[14:53] <caribou> apw: do you have a couple of minutes to talk a bit more about my mempolicy issue ?
[14:54] <caribou> or anyone else who have followed the previous discussion
[14:54] <apw> caribou, sure
[14:55] <caribou> apw: here is the full backtrace : http://paste.ubuntu.com/6154796/
[14:55] <caribou> apw: looking at the bottom of the backtrace shows that the process is exiting & has already started to destroy the mempolicy slab (in __mpol_put)=
[14:56] <caribou> apw: then it is interrupted by an IRQ coming from the network (net_rx_action)
[14:56] <caribou> apw: I just want to make sure I get the context correctly :
[14:58] <apw> caribou, yes it seems that is correct, though somehow that intterupt context 
[14:58] <caribou> apw: it is while handling the IRQ that it does the __slab_alloc which uses a kmem from the numa_policy slab, the same one that he had started to hand over
[14:58] <apw> has a reference to the mpolicy, but the reference count should have been higher if it does
[14:59] <caribou> apw: __mpol_put had already decreased the refcnt a few cycles prior to getting the IRQ apparently
[15:00] <caribou> __mpol_put does it just before calling keme_cache_free
[15:01] <caribou> apw: I'm trying to get more recent kernels tested (3.8ish) this is on 3.2.0-38
[15:02] <apw> that implies the reference count is lower than it should be, that someone freed it and didn't clear a pointer or similar though
[15:02] <caribou> apw: I'm just surprized that the slab allocation is handing over the same numa_policy structure so fast
[15:02] <apw> it will hand out the last one deallocated as it is cache hot
[15:03] <caribou> apw: well, it's still being freed at the bottom of the backtrace, prior to the IRQ 
[15:03] <caribou> apw: thought so
[15:04] <apw> caribou, not necessarily we are still in the free routine, but if we are giving up the CPU we are most likely past freeing it
[15:04] <apw> at the point where we would return or similar
[15:05] <caribou> apw: so if I understand it correctly, the task is ramping down, releasing its numa_policy slab, then gets hit by the IRQ that goes on allocating a slab that turns out to be the same one
[15:05] <caribou> ?!?
[15:05] <apw> so this is allocating out of a different slab as well
[15:05] <apw> so does this not mean that this process is clearing its numa policy
[15:05] <apw> we take an interrupt, a
[15:05] <apw> and allocate something from the slab, but using the 'current' processes numa policy
[15:06] <apw> so that might mean we have not done the clean up in a good order
[15:06] <apw> as if we are freeing it we should be no longer using it
[15:06] <caribou> apw: from what I see, the cleanup had not been completed when the IRQ got it
[15:06] <apw>         mpol_put(tsk->mempolicy);
[15:06] <caribou> s/it/in
[15:06] <apw>         tsk->mempolicy = NULL;
[15:07] <ppisati> brb
[15:07] <apw> right but .. we are exiting, so we throw away our policy to the allocator, _then_ remove it
[15:07] <apw> that seems backwards
[15:07] <apw> i would expect to see thing
[15:07] <apw> tmp = tsk->mempolicy
[15:08] <apw> tsk->mempolicy = NULL
[15:08] <apw> mpol_put(tmp)
[15:08] <apw> now ... i think that the writer of the code believes we can not do that becasue we have 'task_lock(tsk)' but this
[15:08] <apw> code path seems to imply not
[15:08] <apw> if this is something you could reproduce then i would recommend trying that tmp thing
[15:09] <caribou> apw: ok. I'm trying to see if we can find a way to reproduce
[15:11] <apw> caribou, shall i spin you a patch to try ?  or you got it
[15:12] <caribou> apw: let me try to reproduce it first
[15:12] <apw> caribou, i'll put a quick patch together, its pretty simple
[15:20] <rtg> apw, at least the 3.12 keyboard problem isn't systemic. it appears to work on an AMD gizmo
[15:22] <apw> rtg, good to kno
[15:22] <apw> i'll try it out here in a bit
[15:22] <rtg> apw, gonna fire it up on a gigabyte MB soon
[15:23] <rtg> as soon as my USB stick flashes....
[15:24] <apw> rtg heh
[17:57] <rtg> apw, well, so far everything seems to work on 3.12-rc2. at least all of the mainline bits. overlay and aufs are still disabled.
[17:58] <rtg> hmm, should check audio
[18:01] <apw> rtg, ok will have a poke at the overlay etc tommorrow
[18:01] <apw> when i am awake
[18:02] <rtg> apw, oh yeah, what are you doing around ? get lost. go have a beer.
[19:31]  * rtg -> lunch
[20:55]  * rtg -> EOD