[05:27] <achiang> hughhalf: hi, i heard you had an arduino serial device?
[06:11] <hughhalf> achiang, umm, not sure what you mean sorry, have done some arduino stuff, but arduino serial ?
[06:12] <achiang> hughhalf: hm, JFo or apw claimed that you had arduino hardware that connected via USB serial to a pc?
[06:13] <hughhalf> this sort of thing ? http://www.freetronics.com/products/twentyten
[06:18] <achiang> hughhalf: maybe i should ask my real question. :)
[06:19] <achiang> hughhalf: found a lucid regression, i filed a bug a few days ago, but it doesn't seem to be getting any attention -- https://bugs.launchpad.net/ubuntu/+source/linux/+bug/690798
[06:19] <ubot2> Launchpad bug 690798 in linux (Ubuntu) "arduino USB serial device breaks on lucid kernel upgrade (affects: 2) (heat: 12)" [Undecided,New]
[06:19] <achiang> hughhalf: the reason i poked you is because apw hinted you might have the hardware
[06:20] <hughhalf> oh, ok, I've a few arduino's floating around here, yes
[06:21] <achiang> hughhalf: can you find someone to take a look at it? it seems to be an easily reproducible kernel regression in lucid, which obviously isn't good
[06:24] <hughhalf> achiang, will see what we can work out, sure. 
[06:25] <achiang> hughhalf: great, thanks!
[06:27] <hughhalf> quic read of that bug suggests that it may actually be that the kernel is now doing The Right Thing (TM) but that the application is now not working as it's relying on (arguably) "broken" behaviour
[06:31] <achiang> hughhalf: really?
[06:32] <hughhalf> well, like I said, an initial take on that patch is that it's correcting incorrect behaviour
[06:34] <achiang> i see, interesting
[06:35] <hughhalf> achiang, I may mis-read it though, but given Greg has signed off on it it suggests to me that he'd regard the behaviour as now being correct.
[06:35] <hughhalf> Lemme look in the LKML archive and see if there is any discussion on it
[06:35] <achiang> ok, thanks hughhalf 
[06:39] <hughhalf> achiang, I take it that you know the person who wrote the code ?
[06:39] <achiang> hughhalf: yes, i do know him
[06:39] <achiang> hughhalf: i'd be happy to explain to him that his code is broken, if you can tell me why. :)
[06:41] <hughhalf> well, broken may be too strong, but lemme dig a bit more
[06:45] <achiang> hughhalf: stepping out to buy an emergency USB key, back in about 20
[06:45] <hughhalf> no worries
[07:02] <achiang> back
[07:02] <hughhalf> :)
[07:04] <hughhalf> achiang, so my read at this stage is that the kernel patch is in all likelyhood correct and that the app may need to be tweaked to implicitly set, or clear the DTR/RTS lines the way the arduino requires them.  This based on a couple of things \
[07:05] <kees> apw: oh! were you able to reproduce 686705 ?
[07:05] <hughhalf> The patch has been in for a couple of weeks now and was itself a revert back to the way the FTDI driver had been for some time
[07:05] <hughhalf> and it's a patch that's been scrutinised by gkh and others who have deep familiarity with serial code
[07:05] <hughhalf> secondly, the FTDI devices are used in a huge range of hardware, not just arduino, so I'd venture we'd be seeing more problems if it were in fact broken behaviour
[07:06] <hughhalf> but, achiang, this all based on what is it, about 40 minutes of reading on my part
[07:06] <hughhalf> achiang, I'm happy to update the bug to that effect though if you like, call it an educated guess :)
[07:06] <hughhalf> well, somewhat educated :)
[07:07] <achiang> hughhalf: what you outlined sounds reasonable to me. my concern was, "did ubuntu take a patch from upstream that was later reverted (in upstream)" 
[07:07] <achiang> hughhalf: i heard a rumor about that, but admit, i did not do due diligence in chasing it down
[07:07] <hughhalf> achiang, heh, no probs
[07:08] <hughhalf> I mucked about a fair amount with USB serial code back in 2.4 days and the code, as reverted, rings true to me
[07:08] <hughhalf> but the real clincher is the pervasiveness of FTDI devices
[07:08] <hughhalf> achiang, happy to summarise in the bug if you like
[07:09] <achiang> hughhalf: i'd say go ahead and update the bug and push back. sounds reasonable, and i know the person well enough to know that his feelings won't get hurt
[07:09] <achiang> hughhalf: much appreciate the research, ta!
[07:12] <hughhalf> achiang, my pleasure, a pleasant trip down memory lane :)
[07:12] <hughhalf> achiang, I'll leave it to you to change the actual disposition of the bug once you chat with your friend
[07:12] <achiang> hughhalf: wow, i've never heard anyone talk about usb, serial, and "pleasant" in the same sentence
[07:13] <achiang> hughhalf: ok, i'll make sure to update the bug status as appropriate. thanks again. :)
[07:13] <hughhalf> np
[07:26]  * hughhalf steps away for a moment
[08:17] <apw> kees, yep, seems one of my machines is showing the same symptoms ... going to rip out NX to confirm its the cause this am
[09:54] <apw> cjwatson, i have a machine which seems to reproduce the warm boot hang ... i note it hangs on the warm boot before the grub menu, before purple is asserted; any way i can ask grub whats up?
[09:56] <apw> cjwatson, just confirmed that it is NX emulation on i386 which is triggering the failure mode; backing that out clears things up
[10:15] <cjwatson> apw: does 'grub-install --debug-image=all <device>' show any output?
[10:15] <cjwatson> apw: wait, a *kernel* change fixes something that happens *before the grub menu*?
[10:15] <cjwatson> apw: that's impossible
[10:15] <cjwatson> oh, warm boot
[10:15] <apw> cjwatson, heh nothing which occurs is impossible.  yeah warm boot
[10:15] <cjwatson> good grief
[10:16] <apw> i know ... mad isn't it
[10:16] <apw> it has to be an assumption from grub about the initial environment
[10:16] <apw> i suspect that we'd like to fix grub as well as undoing whatever we are not undoing before reboot in the kernel
[10:20] <cjwatson> like I say, 'grub-install --debug-image=all <device>' should produce boot-time output that at least narrows down where it falls over
[10:21] <apw> cjwatson, will get the kernel downgraded and see what i can find
[10:28] <apw> cjwatson, woh that boots _slow_
[10:29] <apw> yay purple ... this could take some time
[10:31] <apw> cjwatson, how many lines of output would i expect from a sucessful boot
[10:32] <apw> given they are coming out about 3-4 per second
[10:34] <cjwatson> loads
[10:34] <cjwatson> zillions
[10:34] <apw> damn, i guess this test will take longer than i had hoped
[10:34] <cjwatson> hopefully the last screenful will be useful since there's no shift-pgup
[10:34] <cjwatson> what's the exact diff you backed out?
[10:35] <apw> cjwatson, sadly this is the good boot, before the bad boot
[10:35] <cjwatson> oh.  DDTT :-)
[10:35] <apw> cjwatson, http://pastebin.com/MWcTAaZg
[10:36] <apw> cjwatson, not sure if its quicker to let it finish, or boot a USB image and fix it
[10:38] <apw> cjwatson, that diff is all about using user code segments to protect userspace from executing data
[10:38] <cjwatson> I'd probably just let it run at this point
[10:38] <apw> it is not at all clear how that could affect grub
[10:38] <cjwatson> different CS on entry?
[10:39] <apw> cjwatson, that diff btw is stupidly backwards, its the diff of the revert
[10:39] <cjwatson> yeah
[10:39] <cjwatson> mind you if CS were wrong *nothing* would work
[10:40] <cjwatson> don't suppose it's reproducible in kvm?
[10:40] <apw> cjwatson, if i hold shift i get GRUB loading, but no colour change and no menu
[10:40] <apw> cjwatson, can't say i've tried it no, i was supprised when it appeared on a previously working machine
[10:40] <cjwatson> right, that's where I need the debug-image stuff, I need to know how far it gets
[10:40] <apw> though that occured cause i moved it to 32 bit for performane
[10:41] <cjwatson> have roughly no hope of narrowing it down otherwise
[10:41] <apw> yeah ... well i have the debug on :/  and once i get booted i will reboot and let you know
[10:41] <apw> still reading the menu on this boot sadly
[10:41] <apw> near the bottom at least
[10:41] <cjwatson> so should this happen with the generic kernel on any i386 machine?
[10:42] <apw> cjwatson, i cannot say i am 100% sure if that, i suspect it cannot be all i386s else we'd be inundated with whining and we are not
[10:43] <apw> cirtainly it happens on a couple of atom systems, Sarvatt_ has an N270 and I have an N455 (64 bit capable) showing it
[10:43] <cjwatson> do we know if the version of grub matters or if it's just the version of the kernel?
[10:43] <apw> i suspect the bios could easily fix things
[10:43] <apw> cjwatson, no i do not know if grub version helps, i hear but have yet to confirm that =text does not make a difference
[10:44] <cjwatson> it would make even less sense for gfxpayload=text to be relevant
[10:44] <cjwatson> that only does anything at all after the menu is displayed
[10:44] <apw> yep indeed
[10:44] <apw> though i was more thinking of the graphics=auto bit, might matter, the fact we used the bios to go graphical
[10:44] <cjwatson> of course grub maverick<->natty is over 4000 lines of upstream changelog so exactly how much that would help for bisecting is unclear
[10:45] <cjwatson> sure, but gfxpayload=text doesn't influence that
[10:45] <apw> ahh yes so we might need a different test there to confirm if its that ... but as i have to wait on debug :/ i'll not be able to do that for a bit
[10:47] <apw> cjwatson, man this thing does a lot of small mallocs
[10:47] <cjwatson> yep
[10:47] <apw> that cannot be cheap :)
[10:47] <cjwatson> *shrug*
[10:53] <matti> :>
[10:54] <apw> cjwatson, ok rebooting with debug
[10:55] <apw> cjwatson, no output what so ever
[10:55] <cjwatson> blink
[10:55] <cjwatson> joy, so it's really early
[10:55] <apw> and a hard-reset gives me a debug boot
[10:56] <apw> yeah really really early it seems
[10:56] <cjwatson> so you get the full string "GRUB loading"?
[10:56] <cjwatson> actually, what *exact* text
[10:56] <apw> cjwatson, i did not hold shift, let me re-do the test
[10:57]  * apw hates shift
[10:57] <cjwatson> it might have punctuation after
[10:57] <apw> will confirm
[10:57] <cjwatson> grub-install *without* debug-image before you reboot!
[10:57] <apw> doing that now
[10:57] <cjwatson> then --debug-image=all before the warm boot
[10:57] <cjwatson> I'd like to know where the cursor is too
[10:58] <cjwatson> is it on the same line as "GRUB loading" (with possible punctuation), or on the next line?
[10:59] <apw> i have to boot via a usb stick to clean up for the reboots, so it takes a bit
[10:59] <cjwatson> diskboot.S prints a dot on each read from disk, and a newline when it's finished reading
[10:59] <cjwatson> after the newline, it jumps to the GRUB kernel
[11:00] <cjwatson> so we should be able to tell from the *exact* message there how far it got through diskboot.S
[11:00] <apw> cjwatson, got u
[11:02] <cjwatson> then there's a small pile of bootstrap assembly (grub-core/kern/i386/pc/startup.S) and then it jumps to grub_main (grub-core/kern/main.c)
[11:02] <apw> GRUB loading.<newline>
[11:02] <apw> cursor is on second line left edge
[11:03] <apw> cjwatson, when it works it prints quite some debugging next before switching to grpahics mode and going much slower
[11:03] <cjwatson> OK, so it got into the kernel
[11:03] <cjwatson> (ours, not yours)
[11:03] <apw> and none of that pre-graphics mode debug is here either
[11:04] <apw> cjwatson, cursor is flashing, no idea if that h/w or s/w driven
[11:04] <cjwatson> I doubt graphics is involved
[11:04] <cjwatson> hardware
[11:04] <cjwatson> if graphics init were relevant, there'd still be some debug output before that
[11:04] <apw> anything else to get from here, or shall i start recovering
[11:05] <cjwatson> it would be useful to try a grub2 package with debian/patches/ubuntu_really_quiet.patch reverted
[11:05] <apw> ok
[11:06] <cjwatson> can you assemble that or do you need me to?
[11:06] <cjwatson> would be lovely to have a diff of register states
[11:07] <cjwatson> I wonder if it's one of the GDTs
[11:07] <apw> cjwatson, cirtainly we are changing something descriptor table like, though i thought it was the LDT from the descriptions in the patch
[11:08] <apw> cjwatson, i think i can do the grubby thing
[11:09] <cjwatson> seems to be GDT entries from the code, although I admit to not being very familiar with this stuff
[11:09] <apw> cjwatson, then ... i suspect we need to put them back before reboot
[11:10] <apw> though why this matters all of a sudden, it never seemed to on older releases ...
[11:20] <cjwatson> there've been a few major changes to GRUB's startup code since maverick, so I suppose that could be related
[11:20] <cjwatson> mainly the introduction of Reed-Solomon redundancy
[11:20] <cjwatson> hesitant to finger that for sure though
[11:34] <apw> cjwatson, well at least  it is different so it is possible this is new
[11:34] <cjwatson> GRUB loads its own GDT on entry to protected mode
[11:35] <cjwatson> I don't suppose no-exec ranges might be preserved across warm boot?
[11:35] <cjwatson> so some bit of memory that GRUB's code is loaded into might be still marked no-exec?
[11:35] <cjwatson> I have no idea how warm booting works really
[11:35] <apw> no exec ranges in this context are simply segment size offsets
[11:36] <apw> we are likely rebooting with the segment sizes limited
[11:37] <apw> but those are in the GDT as far as i know, so it seems it would have to be bust before protected mode somehow
[11:43] <cjwatson> but isn't the GDT only used in protected mode?
[11:45] <apw> cjwatson, yeah indeed, it makes no sense what to ever
[11:48] <apw> cjwatson, these extra prints, i wonder if they could be turned on by shift as well
[11:49] <cjwatson> yeah, I was just thinking that earlier
[11:50] <apw> ok i see the inverted hello message on a normal boot and _not_ on the failed warm boot
[11:50] <apw> can't quite read it to tell you what it says, but i deffo get a new message on the normal boot now
[11:51] <apw> cjwatson, ^^
[11:51] <cjwatson> ok, so it's between the end of diskboot.S and the end of grub_machine_init
[11:51] <cjwatson> still a hell of a lot of hairy code :(
[11:52] <apw> cjwatson, can we print in that region ?
[11:53] <cjwatson> it's tricky, a lot of that is in protected mode
[11:53] <cjwatson> and printing is int 10h
[11:53] <cjwatson> you would have to very very very very very carefully jump in and out of real mode
[11:53]  * apw whimpers
[11:54]  * diwic can't do that, his carefullness is limited to three very's maximum.
[11:55] <apw> diwic, :)
[11:56] <diwic> apw, nice to see someone working
[11:56] <apw> cjwatson, i note that we move to real mode then setup the segment registers, is that the right way round?
[11:57] <apw> cjwatson, ahh ignore that, it has to load CS to jump into real mde
[11:59] <cjwatson> something like http://paste.ubuntu.com/545921/ might be worth tryinig
[11:59] <cjwatson> *trying
[11:59] <cjwatson> see if it's a bug in the RS code
[11:59] <cjwatson> (untested!)
[12:09] <apw> cjwatson, yep, doesn't even apply to the version in the archive :)
[12:10] <apw> cmpiling nw
[12:10]  * apw suspects this keyboard may have had it day
[12:12] <apw> BAH avahi doesn't work on natty ... does anything work?
[12:15] <cjwatson> no.  HTH
[12:16] <apw> happy or hope
[12:19] <apw> cjwatson, it seems to only see itself
[12:20] <cjwatson> I meant hope
[12:26] <apw> cjwatson, heh though you probabally did
[12:26] <apw> i would have said "good luck with that" myself
[12:52] <apw> cjwatson, ok i think turnng off the RS code there has also sorted it
[12:59] <apw> Sarvatt_, about ?
[13:12] <apw> cjwatson, yeah as far as i can tell turning off just that one line of code there is enough to sort it out
[13:20] <cjwatson> ok, I'll have a poke at the RS implementation and try to find likely causes
[13:21] <apw> cjwatson, no idea what it could be doing which triggers issues
[13:28] <diwic> apw, any prognosis on when 2.6.38 merge window will open/close?
[13:29] <apw> depends if he releases before xmas, which he has tended to do in the past
[13:29] <apw> if so, i'd expect the window to open in the beginning of january
[13:31] <diwic> apw, and it's open for a week or so?
[13:32] <apw> diwic, normally a week, though if it opens when he releasaes i expect it to be a little longer, so probabally a full week into the new year
[13:33] <diwic> apw, and since we're likely going with 2.6.38, getting patches in there is quite soon
[13:34] <apw> diwic, yes, now is a good time to be getting stuff ready and in maintainer trees
[13:34] <diwic> apw, the alternative is merging into Ubuntu, but the administration exercise is heavier :-/ 
[13:35] <apw> diwic, indeed, we carry a lot of patches before tehy get to mainline if they are a justified
[13:50] <apw> cjwatson, is this using RS to encode the grub payload?
[13:58] <cjwatson> apw: yeah, it's because some things widdle over the boot track
[14:00] <apw> cjwatson, what does STANDALONE mean in grub context ?
[14:03] <cjwatson> apw: it's specific to grub-core/lib/reed_solomon.c
[14:03] <cjwatson> it means it's being built for use at boot time rather than for use in the utility code (grub-setup)
[14:04] <cjwatson> I wonder if it's something to do with trying to use memory at 0x100000 / 0x100100
[14:05] <cjwatson> maybe that memory isn't available?
[14:06] <apw> cjwatson, i am struggling to know what might be in there the second time that is not the first
[14:07] <cjwatson> sort of sounds like we need a diff of e820 maps
[14:07] <apw> cjwatson, i would be supprised if they differ
[14:10] <cjwatson> you could try http://paste.ubuntu.com/545963/ or something just to see if it makes a difference
[14:10] <cjwatson> picking a low memory region at random
[14:11] <cjwatson> that doesn't really make sense though - grub decompresses to 0x100000 later anyway
[14:12] <apw> cjwatson, i'll give it a try
[14:19] <apw> cjwatson, this init function for the inverts does not seem to set the first element ... which i suspect means it would default 0 the first time
[14:20] <apw> cjwatson, of course i cannot tell if it ever uses the [0] in the rest of the algorithm
[15:35] <apw> cjwatson, i moved that bufer from 0x10... to 0x09... and it seems to work
[16:04] <apw> cjwatson, some of this code handlng the scratch buffer is a little suspect
[16:04] <apw> #ifndef STANDALONE
[16:04] <apw>   chosen = xmalloc (n * sizeof (int));
[16:04] <apw>   grub_memset (chosen, -1, n * sizeof (int));
[16:04] <apw> #else
[16:04] <apw>   chosen = (void *) scratch;
[16:04] <apw>   scratch += n;
[16:04] <apw> #endif
[16:04] <apw> cjwatson, that bit for instance neither allowed enough space (scratch is a char * not an int *) and does not init it to -1
[16:15] <apw> not that actually the init looks right either
[16:55] <smagoun> Hi, the lenovo-sl-laptop driver was included in l-b-m for Karmic (bug 351586). I can't find this driver in 10.10 though. Anyone know what happened to it?
[16:55] <ubot2> Launchpad bug 351586 in linux-backports-modules-2.6.28 (Ubuntu) (and 2 other projects) "please add lenovo-sl-laptop to ubuntu sauce (affects: 11) (dups: 4) (heat: 8)" [Medium,Fix released] https://launchpad.net/bugs/351586
[16:56] <czr_> achiang, maybe the arduino problem is related to this: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/655868
[16:56] <ubot2> Launchpad bug 655868 in linux (Ubuntu) "[lucid regression] FTDI based USB to serial adapter no longer works (affects: 5) (heat: 38)" [Undecided,New]
[16:57] <achiang> czr_: good catch, thanks!
[16:59] <czr_> I'm being hit with the issue with lucid. using the newest backport kernel fixes it (without software modifications), but I lose bcm, and it's not a proper solution anyway
[17:06] <xclaesse> That bug is still reproducable on latest natty: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/662946
[17:06] <ubot2> Launchpad bug 662946 in linux (Ubuntu Maverick) (and 2 other projects) "linux kernel 2.6.35 slows down the whole system because of kslowdxxx processes (affects: 38) (dups: 2) (heat: 208)" [Medium,Incomplete]
[17:06] <xclaesse> it is making ubuntu unusable since maverick here
[17:06] <smagoun> To answer my own question, it looks like functionality in lenovo-sl-laptop was at least partially folded into the asus-laptop driver
[17:30] <kees> apw: but... the nx code is unchanged from maverick :(
[17:30] <apw> kees, indeed so, there is an interaction between that and some new code in grub2
[17:33] <kees> apw: yeah, very strange. is there a common "kernel is shutting down now" routine in the kernel? maybe it could reset the CS limit? that's the only thing I can think of that might survive a warmboot.
[17:34] <apw> kees, i have actually tried just commenting out the CS limit checks and not had any success, i may have done it wrong but i don't think so
[17:35] <kees> well, the checks aren't running at grub time, it would just be the CPU state left after boot, right?
[17:46] <apw> kees, yep indeed, though grub loads the registers in use for this feature in theory
[17:47] <apw> kees, simply put having poked it for nearly a whole day I am none the wiser as to why it occurs
[17:47] <apw> i know of two ways to mitigate the issue, but no idea waht the issue really is
[17:49] <kees> apw: what are the mitigations?
[17:49] <apw> revert the nx patch (diabling it does not seem to work)
[17:50] <apw> or turn of the error correction core in grub
[17:50] <kees> error correction core?
[17:50] <apw> or actually, 3) move the EC core scratch buffer
[17:50] <kees> _move_ it?!
[17:51] <apw> it does reed-solomon encoding on the the stuff in track 0 to cope with mangling of its stage2 or something
[17:51] <kees> that makes even less sense. is a memory map surviving boot or something?
[17:52] <apw> kees, i know ... it makes no sense on the face of it
[17:53] <apw> i suspect it is a bug in the reed solomon decoder, but why the presence of the NX code triggers it i have no idea
[17:53] <apw> the presense of it before the last full processor reset of course
[18:03] <kees> apw: maybe the grub code isn't clearing some area of memory that just happens to have the NX code in it or something, and it's comparing the wrong areas? ram contents would survive the warm boot.
[18:03] <apw> kees, yeah i am working on the assumption its a memory layout issue, that the NX stuff is moving things
[18:04] <apw> but this is mad code written by propeller headed maths people
[18:04] <apw> "this is simpler thought of in the frequency domain" .... EEEK
[18:06] <kees> haha
[18:07] <apw> i know ... and its all magic maths, special code doing powers and vile ick all with 1 letter viariable
[18:10] <cjwatson> an uninitialised memory bug seems likely
[18:11] <apw> yeah, and a pig to find that is going to be
[18:11] <cjwatson> I've mentioned it to upstream
[18:12] <apw> cjwatson, thanks
[18:12] <apw> i have found one apparent bug, but i cannot really see how it would trigger this behaviour
[18:13] <apw> as in it looks like it would be always broken or we are always  lucky
[18:15] <cjwatson> the missing memset looks like a good candidate
[18:16] <apw> cjwatson, doesn't seem to be a memset in STANDALONE either
[18:16] <apw> though the allocation for that one is 1/2 it should be
[18:16] <cjwatson> that's what I meant
[18:19] <xclaesse> apw, about 662946 you asked me to test natty kernel... I'm running up to date natty here, and I can reproduce the bug
[18:19] <apw> bug #662946
[18:19] <ubot2> Launchpad bug 662946 in linux (Ubuntu Maverick) (and 2 other projects) "linux kernel 2.6.35 slows down the whole system because of kslowdxxx processes (affects: 39) (dups: 2) (heat: 212)" [Medium,Incomplete] https://launchpad.net/bugs/662946
[18:25] <apw> xclaesse, odd, noone else who had that original issue is still experiencing it (/me for instance) so i guess we have some other bug/trigger for kslowd usage ... hmmm
[18:28] <xclaesse> apw, it is not kslowd anymore
[18:28] <xclaesse> it is kworker
[18:28] <xclaesse> but result is the same
[18:30] <apw> indeed
[18:30]  * apw wonders if kworker has any debug support
[18:32] <apw> cjwatson, ok changing that size alone does not work
[18:35] <cjwatson> apw: I'm attempting to valgrind it
[18:36] <apw> cjwatson, woh ... now that is brave :) ...
[18:36] <apw> cjwatson, am now building with a 'memset' over the array on that routne
[18:36]  * apw thinks this one is going to be a tiny little error, and this is going to take some time to find
[18:37] <apw> kees, do you have a machine which reproduces this issue ?  it seems any atom running i386 should be suspeceptible
[18:39] <cjwatson> apw: http://paste.ubuntu.com/546052/ seems to be enough to make valgrind happy
[18:39] <cjwatson> can you try that?
[18:39] <apw> cjwatson, there is no memset in STANDALONE
[18:39] <apw> i am trying this which is equivalent
[18:40] <cjwatson> grub_memset
[18:40] <cjwatson> and yes there is
[18:40] <cjwatson> just have to spell it right :)
[18:40] <apw> cjwatson, really there doesn't seem to be, i got a compile failure from moving that line down
[18:40] <kees> apw: I don't, no. other atom systems I've tried don't show it.
[18:41] <cjwatson> apw: hang on a moment then
[18:41] <apw>   chosen = (void *) scratch;
[18:41] <apw>   scratch += n * sizeof (int);
[18:41] <apw>   for (i = 0; i < n; i++)
[18:41] <apw>     chosen[i] = -1;
[18:41] <apw> i just am using that
[18:41] <apw> as i _think_ that is what they really meant, i don't think they want the bytes to be -1, but each choice
[18:43] <cjwatson> apw: try http://paste.ubuntu.com/546054/ then
[18:43] <cjwatson> apw: it's equivalent surely
[18:43] <cjwatson> -1 is all-bits-set
[18:43] <apw> i guess it is, yeah, but ... naughty
[18:44] <cjwatson> still, would prefer you to test minimal-change from upstream if you could
[18:44] <apw> cjwatson, yep, am testing with your patch now ...
[18:45] <ohsix> is the rtc hack all you have when you don't have a serial port when you're debugging suspend/resume problems?
[18:45] <apw> ohsix, pretty much yes
[18:45] <apw> they should never have allowed them to take the serial ports off these machines
[18:46] <mjg59> apw: USB debugging's not that hard to support
[18:46] <apw> ohsix, some peoplpe have a pcix card with lights on which they use, but that involves taking your machines to bit
[18:46] <apw> mjg59, yeah it is if you want to test suspend/resume though
[18:46] <apw> as either its suspended and you can't use it, or its not and the behaviour of half your devices change (in my experience)
[18:47] <apw> someone had a memory buffer for debug stuff somewhere, but i forget who
[18:47] <ohsix> mine stopped waking up a while ago and too many things updated in the window for me to know which bit it is (i had been using the xorg-edgers ppa & kernel)
[18:47] <apw> mjg59, we do build in some usb stuff to make it easier to debug, but not to much gain 
[18:47] <mjg59> apw: Sorry, may not have been clear. The USB debug port spec.
[18:47] <cjwatson> apw: thanks
[18:48] <apw> mjg59, ahh yes, not that i've managed to find a device implementing the other half
[18:48] <mjg59> It basically gives you a bit-banging interface that can function as a console even if you don't have the full USB stack up
[18:49] <apw> cjwatson, i wish this compile was faster ... its a slow iteration what with the two reboots too
[18:49] <ohsix> it should dtmf the pc speaker so you can record and decode it :D
[18:49] <apw> cjwatson, hrm adding that produced _these_
[18:49] <apw> reed_solomon.c: Assembler messages:
[18:49] <apw> reed_solomon.c:699: Warning: ignoring changed section attributes for .text
[18:49] <apw> ../../../grub-core/kern/i386/pc/startup.S:163: Error: attempt to move .org backwards
[18:49] <cjwatson> oh 'eck
[18:49] <apw> am i going MAD ?
[18:50] <cjwatson> no, there'll be a constant size somewhere to adjust
[18:50] <apw> oh one of those
[18:50] <cjwatson> include/grub/offsets.h, crank GRUB_KERNEL_I386_PC_NO_REED_SOLOMON_PART up until it works
[18:51] <cjwatson> should only be a tiny bit to account for the code size increase
[18:52] <apw> cjwatson, yeah slammed in 24 whole bytes of space
[18:53] <ohsix> wasn't there something you can do to keep the display/console alive to spam you during wakeup? does that not work with kms/the non-vesa drivers?
[18:53] <cjwatson> oddly, it compiled here without that change
[18:53] <apw> cjwatson, wibble
[18:54] <apw> +  //gf_invert[0] = 0;
[18:54] <cjwatson> I'll clean my build tree and try again
[18:54] <apw> thats the only other change i think i am carrying
[18:54] <cjwatson> is that needed?  valgrind didn't pick that up
[18:55] <apw> cjwatson, its commented out
[18:55] <cjwatson> ok
[18:55] <apw> i noticed it wasn't initialised in my testing, but i suspect its never used
[18:55] <cjwatson> probably not.  I agree this is mathmo code
[18:56] <apw> you can just tell its an implementation of some equasion ... it feels like RSA key generator code
[19:05] <matahari> hi all
[19:08] <matahari> After an apt-get upgrade, update-initramfs -k all -u -v is hanging. Last line of output is: Adding module /lib/modules/2.6.35-24-generic/kernel/fs/udf/udf.ko ;apt-get hang up on upgrade, and i had to run dpkg --configure -a , but it hook up at generating the initramfs... Do you have any hints what i could try or even a fix? Thanks!
[19:10] <apw> matahari, not heard of that before no
[19:10] <apw> you could try stracing it to see what it is doing
[19:13] <matahari> how can i do that?
[19:17] <ohsix> apw: stock ubuntu kernels can do pm_trace right? i just tried it and the rtc was still set to wall time
[19:23] <kees> \o/ resize2fs corrupts filesystems in natty!
[19:24] <matahari> apw: stacktrace is hanging as well; this is the output: http://pastebin.com/qJpwaPWc
[19:24] <apw> matahari, you need -f on strace i suspect
[19:27] <apw> cjwatson, that turned my grub into an instant reboot
[19:29] <cjwatson> hmph.  well, EOD here ...
[19:32] <matahari> apw: okay, now i see much more :-) actually it is hanging with repeating the following output all the time: http://pastebin.com/yzcAdBW4
[19:32] <apw> cjwatson, yeah same here, same place same channel tommorrow for the next thrilling installment
[19:32] <apw> cjwatson, of course it just exploding may be indicative that the memory scratch is pointing to is not a good place
[19:34] <ohsix> yea nothing from the rtc thing,c an i assume waking up didn't fail? something after waking up is locking it up?
[19:37] <apw> ohsix, very very hard to tell in all honesty
[19:37] <ohsix> hrmph well i need it fixed, haven't got any work done most of the month :|
[19:38] <apw> have you tried logining into the machine remotly to see if it is alive?
[19:39] <ohsix> if the keyboard lights and stuff don't work when i hit caps lock and the drive doesn't make any noise, it's already dead, no? i can try that though
[19:56] <ohsix> no on the network after unsuspend
[19:56] <matahari> apw: now - after 15minutes strace is still hanging at that line.... :-( Do you have any ideas what i can do further?
[20:10] <ohsix> [    0.612174] acpi device:02: hash matches
[20:22] <ohsix> i'll have to dig some more later; i dunno why that was in dmesg actually, bbl
[20:36] <matahari> apw: well, i'll try to reboot now - let's see what happens...
[20:45] <kees> is upstream bugzilla down?
[20:45] <kees> ooh, back now
[21:28] <lamont> oh mighty kernel diziens: someone got a minute to 'splain md*/stripe_cache_size to me and save me digging in the source later?  (If you're gonna dig, don't bother - I'm just trying to save myself the reading effort)