/srv/irclogs.ubuntu.com/2013/09/25/#ubuntu-kernel.txt

=== DarkPlayer_ is now known as DarkPlayer
ppisatimoin05:54
=== smb` is now known as smb
smbmorning07:37
=== fmasi_afk is now known as fmasi
* apw yawns08:18
* ppisati wonders about all those abandoned pastebin entries...08:43
apwppisati, i think they just go poof after a while09:11
ppisatiapw: i was wondering if one could retrieve all the pastebins that he created, i just had to recreate one because i forgot the #id09:12
apwppisati, heh interesting question09:12
=== ghostcube_ is now known as ghostcube
caribouapw: my question is about NUMA memory policies11:45
caribouI'm trying to understand how the task.mempolicy structure get defined & used.11:45
caribouI'm on a kernel panic where task.mempolicy.mode = 0x0, which is the default value but which should never happen11:46
caribousince the case statement for MPOL_DEFAULT is BUG()11:46
caribouapw: fyi it's in mm/mempolicy.c into slab_node()11:47
apwcaribou, so yes i read it the same, that if policy pointer is set then policy mode would not be 0, just checking now if they are cheating anywhere11:53
apw        if (!mpol || mpol->mode == MPOL_DEFAULT)11:54
apw                return;         /* show nothing */11:54
caribouapw: in the crash I'm looking at, I got > 6000 task struct with mempolicy.mode = 0x3 & only one mongodb task that has it at 0x011:54
apwi note that some of the shmem code implies that it might be possible regardless11:54
apwand where is it crashing, it is in that bug in slab_node() ?11:54
caribouapw: yep. I found one other report in some ML about a similar panic in 2.6.3x kernels11:55
caribouapw: I'm trying to figure out if this mempolicy.mode can be set by some userland syscall11:55
=== fmasi is now known as fmasi_lunch
apwit is not clear that just returning numa_node_id() would be a heck of a lot safer than BUG()'ing11:57
=== fmasi_lunch is now known as fmasi_afk
apw:q11:57
caribouapw: ok, so I'm not looking in the wrong direction; that's what I wanted to confirm11:58
caribouapw: thanks for looking it up11:58
caribouapw: hmm, this could be some kind of use-after-free as the refcnt is 0x012:00
apwcaribou, that is believeable indeed, this is changed from userspace, mbind() seems to make it happen12:03
apwcaribou, but that calls mpol_new to make the policy, and there the policy == NULL if you say MPOL_DEFAULT12:03
apwcaribou, so i would concur you are most likely seeing a race on free or similar12:04
diwicapw, hi, is there a config-3.11.*-generic somewhere on gomeisa?12:04
diwicapw, I've just realized how stupid it is to build a kernel on your laptop with less than 10 GB free.12:05
apwdiwic, hmmm dunno, you could just make one with fdr genconfigs in the tree12:05
caribouapw: ok, I'll investigate this12:05
apwdiwic, are you building somethign which isn't our tree and already has the config in it ?12:06
diwicapw, I was thinking of bisecting v3.10 .. v3.11-rc112:06
diwicapw, so then I would start with an upstream tree which might not have any debian/rules ?12:07
caribouapw: and if MPOL_DEFAULT make policy == NULL, then slab_node will return numa_node_id()12:07
apwdiwic, then just fdr genconfigs (locally in your tree) and look in CONFIGS/* for the one you need to copy over12:07
apwcaribou, yeah if you have a reference count of 0 it is all bad12:07
apwcaribou, you should check the policy pointer is in the policy slab while you are at it12:08
apwthough i suspect it will be12:08
caribouapw: ok, will do. Thanks12:09
diwicapw, hmm, or I can just scp one over from my saucy laptop12:21
apwdiwic, yep they are in /boot obviously12:22
diwicyeah12:22
=== fmasi_afk is now known as fmasi
=== ghostcube_ is now known as ghostcube
caribouapw: do you have a couple of minutes to talk a bit more about my mempolicy issue ?14:53
caribouor anyone else who have followed the previous discussion14:54
apwcaribou, sure14:54
caribouapw: here is the full backtrace : http://paste.ubuntu.com/6154796/14:55
caribouapw: looking at the bottom of the backtrace shows that the process is exiting & has already started to destroy the mempolicy slab (in __mpol_put)=14:55
caribouapw: then it is interrupted by an IRQ coming from the network (net_rx_action)14:56
caribouapw: I just want to make sure I get the context correctly :14:56
apwcaribou, yes it seems that is correct, though somehow that intterupt context 14:58
caribouapw: it is while handling the IRQ that it does the __slab_alloc which uses a kmem from the numa_policy slab, the same one that he had started to hand over14:58
apwhas a reference to the mpolicy, but the reference count should have been higher if it does14:58
caribouapw: __mpol_put had already decreased the refcnt a few cycles prior to getting the IRQ apparently14:59
caribou__mpol_put does it just before calling keme_cache_free15:00
caribouapw: I'm trying to get more recent kernels tested (3.8ish) this is on 3.2.0-3815:01
apwthat implies the reference count is lower than it should be, that someone freed it and didn't clear a pointer or similar though15:02
caribouapw: I'm just surprized that the slab allocation is handing over the same numa_policy structure so fast15:02
apwit will hand out the last one deallocated as it is cache hot15:02
caribouapw: well, it's still being freed at the bottom of the backtrace, prior to the IRQ 15:03
caribouapw: thought so15:03
apwcaribou, not necessarily we are still in the free routine, but if we are giving up the CPU we are most likely past freeing it15:04
apwat the point where we would return or similar15:04
caribouapw: so if I understand it correctly, the task is ramping down, releasing its numa_policy slab, then gets hit by the IRQ that goes on allocating a slab that turns out to be the same one15:05
caribou?!?15:05
apwso this is allocating out of a different slab as well15:05
apwso does this not mean that this process is clearing its numa policy15:05
apwwe take an interrupt, a15:05
apwand allocate something from the slab, but using the 'current' processes numa policy15:05
apwso that might mean we have not done the clean up in a good order15:06
apwas if we are freeing it we should be no longer using it15:06
caribouapw: from what I see, the cleanup had not been completed when the IRQ got it15:06
apw        mpol_put(tsk->mempolicy);15:06
caribous/it/in15:06
apw        tsk->mempolicy = NULL;15:06
ppisatibrb15:07
apwright but .. we are exiting, so we throw away our policy to the allocator, _then_ remove it15:07
apwthat seems backwards15:07
apwi would expect to see thing15:07
apwtmp = tsk->mempolicy15:07
apwtsk->mempolicy = NULL15:08
apwmpol_put(tmp)15:08
apwnow ... i think that the writer of the code believes we can not do that becasue we have 'task_lock(tsk)' but this15:08
apwcode path seems to imply not15:08
apwif this is something you could reproduce then i would recommend trying that tmp thing15:08
caribouapw: ok. I'm trying to see if we can find a way to reproduce15:09
apwcaribou, shall i spin you a patch to try ?  or you got it15:11
caribouapw: let me try to reproduce it first15:12
apwcaribou, i'll put a quick patch together, its pretty simple15:12
rtgapw, at least the 3.12 keyboard problem isn't systemic. it appears to work on an AMD gizmo15:20
apwrtg, good to kno15:22
apwi'll try it out here in a bit15:22
rtgapw, gonna fire it up on a gigabyte MB soon15:22
rtgas soon as my USB stick flashes....15:23
apwrtg heh15:24
=== psivaa is now known as psivaa-afk-bbl
=== psivaa-afk-bbl is now known as psivaa
rtgapw, well, so far everything seems to work on 3.12-rc2. at least all of the mainline bits. overlay and aufs are still disabled.17:57
rtghmm, should check audio17:58
apwrtg, ok will have a poke at the overlay etc tommorrow18:01
apwwhen i am awake18:01
rtgapw, oh yeah, what are you doing around ? get lost. go have a beer.18:02
* rtg -> lunch19:31
=== fmasi is now known as fmasi_afk
* rtg -> EOD20:55

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!