#ubuntu-ports 2006-04-15
<braddr_> g'evening.
<fabbione> hey brad
<fabbione> that error message is really weird
* braddr_ nods.
<fabbione> do you have a stock T2000 or did you add pci cards?
<braddr_> stock
<fabbione> same here
<braddr_> and solaris boots fine
<fabbione> ok.. just one sec..
<braddr_> it's been over 10 years since I played with solaris though.. so I'm so lost in it. :)
<fabbione> don't worry :)
<fabbione> can you boot in solaris in the meanwhile?
<braddr_> if you have something I should look at, sure.
<fabbione> we need to figure out what is different on your box from the one david and I have
<braddr_> righto.. booting it up
<braddr_> well, once the extremely long POST finishes.
<fabbione> yeah eheh
<fabbione> at least i hope you are not sitting 3 feet from it
<fabbione> because it's so damn noisy it's killing me
<braddr_> oh, I am.
<fabbione> you have ALL my understanding
<braddr_> I have the rails in the rack, but I gotta do move another box before I can actually rack this beast.
<fabbione> once you are in solaris, can you please slam somewhre the output of prtconf -v -p
<braddr_> roger
<fabbione> i have it rack mounted..
<fabbione> but there aren't that many people with a rack at home
<fabbione> really..
<braddr_> I'm rather tempted to yank 2 of the 3 front fans.
<braddr_> count me among the unusal.
<fabbione> lol
<braddr_> but this box is easily the loudest of the machines I have.
<fabbione> even when i boot up the entire SAN is less noisy!
<braddr_> ok.. booted.  lemee grab that info
<braddr_> I can give you an account if you'd like.. it's on the public net.
<fabbione> let see if we really need it first
<braddr_> the sc-net part isn't though
* braddr_ twiddles his thumbs.. shouldn't have done this over the 9600 serial connection
<fabbione> meh
<fabbione> it's easier to pipe it to a file
<fabbione> :)
<braddr_> oh sure, point out me being stupid.. duh.
<braddr_> ok.. it's at the same url, prtconf.txt
<fabbione> thanks
<braddr_> hrm.. only 16 cpu nodes?  This was supposed to be an 8 core model.
<fabbione> mine was supposed to be 8 cores too and i got 6
<fabbione> 16 cpus you have a  4 cores
* braddr_ grumbles
<braddr_> I'll send a nastygram to my sun contact tomorrow.
<braddr_> another one.  They also didn't send the serial management cable.. had to... improvise.
<fabbione> ah
<fabbione> i had 2
<braddr_> last sun box I had didn't _have_ the nifty management stuff.
<fabbione> eheh
<fabbione> i am still checking the output
<fabbione> i am not as fast as david here :)
<braddr_> s'ok.. I'm not in any hurry.
<braddr_> I wish I was able to figure this stuff out on my own.. not used to being this handicapped.
<fabbione> don't worry
<fabbione> the only real difference i can see is that on your machine there is nvramrc in use
<fabbione> +        use-nvramrc?:  'true'
<fabbione> could you got back to the OBP and do:
<fabbione> setenv use-nvramrc? false
<fabbione> or
<fabbione> setenv use-nvramrc?=false
<fabbione> i can never remember
<braddr_> not sure why that'd be. I didn't do any openboot config changes.  Sure.
<braddr_> the former
<fabbione> it was turned off on mine
<fabbione> but it's the only immediate diff i can see
<braddr_> hrm.. resume isn't getting me back into the os.
<fabbione> go
<braddr_> oh, right.
<fabbione> ok> go
<braddr_> ok.. it shows as false now.
<braddr_> ... in prtconf
<fabbione> perfect
<fabbione> now you could try to netboot?
<braddr_> reboot and.. roger.
<fabbione> thanks
<fabbione> i am trying to reproduce it in the otherway around here in the meantime
* braddr_ nods.
<braddr_> btw, if you have a set, my range earmuffs do a _great_ job of cancelling the noise of those fans.
<fabbione> eheh
<fabbione> i use headphone set + metallica at 200% of the volume
<braddr_> damaging your hearing further.
<fabbione> with this treatment i can barely hear my own brain
<fabbione> ;)
<braddr_> ok.. it's into the boot sequence now.. at 9600
<fabbione> i feel your pain
<braddr_> no difference.
<fabbione> ok we will need to wait for david
<fabbione> he doesn't irc much..
<fabbione> but i already send him the info
<braddr_> yeah.. kinda a popular guy
<braddr_> he'd get swarmed.
<fabbione> yeah i know
<braddr_> based on that stack trace, that's probably the first disk access of some sort, no?
<fabbione> there is no disk access at that time
<fabbione> it's loading from the initramfs
<fabbione> or ramdisk
<fabbione> but it loads fine here
<braddr_> [   17.001666]  checking if image is initramfs... it is                               
<braddr_> [    3.293044]  Freeing initrd memory: 3832k freed                                    
<fabbione> yeps
<fabbione> at that point where it fails it is starting up the installer
<fabbione> it seems we found one relevant difference
<fabbione> you have 8 cpus
<fabbione> sorry 16
<fabbione> i have 24
<fabbione> david 32
<braddr_> though this seems to be a non-smp kernel, since it only inits one
<fabbione> the sparc kernel is still able to probe the amount of CPU's installed on the system
<fabbione> the diff is that UP init only 1
* braddr_ nods.
<fabbione> SMP enables the ones you ask for :)
<braddr_> right.. just pointing out another potential difference.. david's not likely booting a non-smp kernel.
<fabbione> he did :)
<braddr_> I'm sure he _has_, but not nearly as much as smp.
<braddr_> is there an older image that'd be worth trying?
<braddr_> david's email hints that there was/is
<fabbione> yes just one second that i am collecting info for david
<braddr_> okey
<fabbione> http://ports.ubuntu.com/ubuntu-ports/dists/dapper/main/installer-sparc/20051026ubuntu26/images/sparc64/netboot/2.6/
<fabbione> you can try this one
<fabbione> but i don't ensure it's "old" enough or that will work
<braddr_> oh, no worries, just looking to broaden the facts on hand
<braddr_> no change
<braddr_> ... other than the version number of the build
<fabbione> ok
<fabbione> we are looking at it
<braddr_> can I bring you a coke.. maybe a pizza? :)
<fabbione> i jsut woke up.. coffee would do :P
<braddr_> hrm.. gonna guess the round trip might be kinda longish.
<fabbione> ehhe
<braddr_> for the sake of completeness, the hex number at the end is slightly different: Error at TPC[4ca2bc]  with -20, and 4c9f9c with -19
<braddr_> probably meaningless
<fabbione> ok thanks
<fabbione> i need a few minutes to upgrade my box and do some debugging...
* braddr_ nods.
<fabbione> are you using tftp booting i assume?
<braddr_> you and dave talking via some im network?  I'd love to observe just to soak up a bit of background.
<fabbione> braddr_: IM
<fabbione> braddr_: can you boot with:
<fabbione> boot net max_cpus=1
<fabbione> ?
<fabbione> if it errors the same way we have some clues :)
<braddr_> on it.
<fabbione> thanks
<braddr_> I can easily move the sc-net port to a public ip address, too, if that'd help.
<fabbione> nah
<braddr_> this boot is with -19 still, whoops.
<fabbione> it's ok
<fabbione> eheh
<fabbione> -20- is better
<braddr_> close enough?
<braddr_> ok.. I'll redo
<braddr_> much different this time
<fabbione> is it?
<braddr_> I'm at the choose a language part of the installer
<fabbione> AH
<braddr_> with -19
<fabbione> can you try with -20- please?
<braddr_> it finished while I was moving the images around
<fabbione> ok
<braddr_> hrm.. having trouble getting back to openboot
<fabbione> telnet to the alom
<fabbione> and do a break
<fabbione> you can login multiple times to alom
<braddr_> thanks.
<braddr_> does the linux kernel disable sending break to get to openboot?
<fabbione> no afaik
<fabbione> at sc> reset 
<braddr_> odd.. with -20 it looks like the same failure.. lemee scroll up to make sure I booted it correctly
<braddr_> Kernel command line: max_cpus=1
<braddr_> so.. -19 works, but -20 doesn't.
<braddr_> ... with 1 cpu.  neither does with all of 'em.
<fabbione> ok we are checking some stuff..
* braddr_ looks for his jeopardy theme music
<fabbione> i suggest to try to reproduce that it works on -19- and fails on -20-.. but can you please poweroff in between?
<fabbione> with max_cpus=1
<braddr_> let me go back to -19 w/o a power off.  Doing that introduced a several minute wait cycle.
<fabbione> yes waiting is not an issue
<fabbione> i want to know if the boot with -19- is reproducible
<braddr_> ok, then a powercycle then boot net w/ -19
<fabbione> exactly
* braddr_ starts a timer.. I'm curious just how long it really takes.
<braddr_> 12:49:10am start
<fabbione> ehhe
<braddr_> ok.. I just checked, I've got an 11am meeting at work tomorrow, so I'm gonna put a cap of 2 hours on this debugging before I'll have to bug out and get some sleep.
<fabbione> ok 
<braddr_> on the other hand, that could change depending on how close we are. :)
<fabbione> even if we get to narrow down the problem. it will take at least a few hours to get the fix in the kenrel (assuming it is a kenrel issue) and propagate it to an installer
<fabbione> tho i could build a custom one for you
* braddr_ nods.
<braddr_> assuming -19's successful boot is reproducable, that ought to give a fairly large hint I assume.  and I also assume we won't know for sure that the problem is fixed w/o a test kernel/image.
<fabbione> yes to both
<fabbione> we know what changed between 19 and 20
<fabbione> but there is still the rootcause of the max_cpus=1
<fabbione> specially because what you are booting is a UP kernel
<fabbione> and theoretically max_cpus=1 has no meaning
<braddr_> ok.. booting now
<braddr_> Requesting Internet Address for 0:14:4f:f:10:60
<braddr_> ERROR: /pci@7c0/pci@0/pci@1/pci@0,2/LSILogic,sas@2: Last Trap: Fast Instruction Access MMU Miss
<braddr_> [Exception handlers interrupted, please file a bug] 
<braddr_> [type 'resume' to attempt a normal recovery] 
<braddr_> {0} ok resume
<braddr_> Error -256 
<braddr_> odd.
* braddr_ tries the net boot again.
<fabbione> did you break while it was already booting?
<braddr_> sorta.. I hit break after the no keyboard part of the early boot
<fabbione> ok
<fabbione> for your own sanity set:
<braddr_> so that's sorta 'normal'?
<fabbione> setenv auto-boot? false :)
<fabbione> yes
<braddr_> it's into the boot sequence now.. just a second more
<fabbione> sure
<braddr_> damn.. same __lookup_hash stacktrace
<fabbione> ok thanks
<braddr_> I might be tired, but I know I saw it boot once. :)
<fabbione> want to recheck the boot parameters?
<fabbione> yes i know.. it might be "lucky"
<braddr_> already did.. lemee paste
<braddr_> Kernel command line: max_cpus=1
<fabbione> ok
<fabbione> stay tuned.. :)
<braddr_> and just checked, it's definitly the -19 image
<braddr_> gonna set the auto-boot part for next time.
<braddr_> ok.. and it's a 10min boot cycle. :)
<fabbione> braddr_: i have some good news and some bad news...
<fabbione> which one do you want first?
<braddr_> how about in that order
<fabbione> the good news is that David has a pretty good idea of what is going on.
<braddr_> and the bad news is that he wants sleep
<fabbione> the bad news is that we will have to wait mid-week because he is busy to push some .17 stuff to Linus
<fabbione> he doesn't sleep.. we don't usually allow him...
<braddr_> likly story. :P
<braddr_> ok.. well, I'll tinker with solaris for a while then.
<fabbione> anyway it looks like that there is some heavy memory corruption problem
<braddr_> and see if I can convince the sun weenies to ship me the box I asked for.
<fabbione> that happens early in the boot
<fabbione> way before where we see the OOPS
* braddr_ nods.
<fabbione> the strange things are:
<fabbione> - we don't see it with 24/32 cpu's
<fabbione>   so it might be specific to the 16 cpu's one
<fabbione> perhaps a bit that overflows or something like that... go figure
<fabbione> so that machine is gold now
<braddr_> well, it's not going anywhere.
<fabbione> perfect
<braddr_> I'll see if I can find a reliable pattern to get it past the bootup problem, since it did work once.
<fabbione> i am afraid that it was more a lucky boot than you can acutally reproduce
<braddr_> just got 'lucky' again.
<fabbione> with -20- ?
<braddr_> no, -19
<braddr_> and with max_cpus=1
<fabbione> and the weird thing is that max_cpus is useless.... go figure...
* braddr_ tries it again.. I wanna see if that first boot is bad but the rest aren't
<fabbione> can you try -19- without max_cpus?
<braddr_> after this round, sure.
<fabbione> ok thanks
<braddr_> failed this cycle (19 w/ max_cpus=1)
<braddr_> trying w/o max_cpus
<fabbione> ok
<braddr_> failed
<fabbione> i think there is no real pattern..
* braddr_ nods.
<fabbione> it's just memory corruption happening for some odd reasons
<fabbione> there might be 2 reasons:
<fabbione> - hw is buggy
<braddr_> is there any reason to believe that on the boots that it gets into user space that it'll be stable?
<fabbione> - the specs we got are not complete and don't cover the 16 CPU's case properly
<braddr_> assuming non-buggy hw
<fabbione> it fails opening a file in /proc
<fabbione> so it eithers contain garbage
<fabbione> or /proc is busted
<fabbione> perhaps it works once it boots, but well that won't help you much
<fabbione> because you might install and not being able to boot after
* braddr_ nods.
<braddr_> but it'd progress.
<fabbione> not really..
<braddr_> I've only got 60 days, if the hype is to be believed.
<fabbione> yes we will fix it WAY before that
<braddr_> unless it's hardware. :P
<fabbione> because by that time Dapper must be stable and released :)
<fabbione> well clearly..
<braddr_> is niagara hardware a showstopper for dapper?
<fabbione> nobody at SUN has been reporting such case... that's why HW failure is somehow floating in my mind
<fabbione> no
<braddr_> didn't think so.
<fabbione> sparc is not supported by Ubuntu
<fabbione> but if we can get to release at the same time, the better :)
<braddr_> right.. so why the confidence that it'll be fixed before then?
<fabbione> because i am a nasty bitch
<braddr_> well, right.
<fabbione> i want it fixed before that
<braddr_> determination and pigheadedness can go a long way.
<fabbione> trust me.. i can bitch and nag people enough to make them cry like little babies :P
<braddr_> I know the type.. I have to do a lot of that sorta project management at work
<fabbione> eheh
<braddr_> though rarely does anyone have to actually ccry
<fabbione> :)
<braddr_> anything else I or we can do until later in the week when david frees up?
<fabbione> once he frees up, we will just have to be ready to test what he asks
<fabbione> see another major issue is that we have "only" 3 machines in 3 different setups
<fabbione> that doesn't make it easier to isolate
<braddr_> agreed
<braddr_> and unlike x86 hardware, can't just yank a cpu or 4 to normalize the configs. :)
<fabbione> eheh
<braddr_> I'm really looking forward to putting this hardware through it's paces.  I'll be using it to do high parallel compiles of gcc (with the gdc, the d language frontend) and it's test suite of 10's of thousands of nice independant tests.
<fabbione> eheheh
<braddr_> pretty much the perfect hardware for this sort of thing.
<fabbione> if it doesn't use fpu yes
<braddr_> well, some tests do, but most don't and the compiler certainly doesn't need it
<fabbione> yeps...
<fabbione> we did push all optimizations in for gcc already
<fabbione> and glibc
<fabbione> so it should be quite fast
<fabbione> assuming we can get it to boot :P
<braddr_> I assume only in gcc trunk though, ri ght?
<braddr_> I haven't tried getting the d frontend moved anywhere past 4.0.x
<fabbione> we did backport all the optimization into ubuntu gcc as well
<fabbione> so if you use the gcc4 shipped by default it will work
* braddr_ nods.
<braddr_> I'm less concerned about the actual generated code speed, though faster is better.
<braddr_> anything will be faster than the old athlon box I've been using.
<fabbione> ehhe
<fabbione> i can cut the kernel in 1 minute and 20 secs here
<fabbione> with 24 CPU's
<fabbione> and minimal config of course (but still bootable)
<braddr_> the test suite for D takes about 8-10 hours on the athlon
<fabbione> not too bad
<braddr_> kinda high for iterative development, but a decent overnight check it tomorrow run.
<braddr_> 15-20 minutes would be better.
<fabbione> well..
<fabbione> like we say in Italy: "You can't have a drunk wife and the jar full of wine"
<braddr_> just need 2 jars.
<fabbione> ahah
<braddr_> a little creativity will solve most problems.
<fabbione> sometimes it does
<braddr_> hey, is there a way to disable cpu's in openboot or sc?  Maybe dropping it down to 8 or 4 would prove interesting?
<fabbione> nope
<fabbione> it's hardcoded in the CPU
<braddr_> oh well, worth a shot.
<fabbione> yeah
<fabbione> time for breakfast and more coffee :)
<fabbione> brb
<braddr_> I've yet to get -20 to boot ever, but -19 boots periodically.
<braddr_> no pattern, but I tried a lot with -20 and never.
<fabbione> probably the fixes in -20- can trigger the error constantly
<braddr_> hey.. I caught an oops on this 'successful' boot
<braddr_> http://www.puremagic.com/~braddr/t2000/oops-1.txt
<fabbione> checking..
<fabbione> was that with -19- ?
<braddr_> yes.. but double checking to be sure.
<fabbione> ok
<braddr_> yup
<braddr_> and looks like the install has stalled after the language screen
<braddr_> not super suprising.
<fabbione> yes mostlikely
<fabbione> udev is generating a crash
<fabbione> probably parsing /sysfs
<braddr_> right, well, with the installer stalling, I'm gonna give up on the box for tonight
<fabbione> yes.. get some good rest and get ready for some heavy debugging soon :)
<braddr_> it seems silly to have a 2 middleman process between david and the hardware
<fabbione> nah it's ok.. because i will need to push the changes in the Ubuntu kernel as well
<fabbione> and trigger all the other bits to have the installer etc. etc.
<braddr_> sure, but not until after it's fixed.
<fabbione> it might be an Ubuntu specific bug..
<braddr_> could be
<braddr_> well, you've got my email address, and I'll hang out here during the usa/pacific evening/night hours.
<braddr_> during work hours my availability is less predictable, but usually I can make myself available
<fabbione> i will try to ping you with enough notice
#ubuntu-ports 2006-04-16
* braddr_ drops a pin.  It's way too quiet here.
* fabbione yawns
* braddr_ waves.
<fabbione> hey braddr_ 
#ubuntu-ports 2007-04-14
<joejaxx> :)
<joejaxx> hello daq4th 
#ubuntu-ports 2008-04-12
<jbailey> lamont: Around?
<lamont> am now
<lamont> wow.  and only 3 minutes of separation
<jbailey> Well, I try. =)
<jbailey> The ia64 box is finally back up and running - Is there any use in having Ubuntu on it?
<jbailey> It's got Debian sid from about a year ago on it now.
<lamont> well, if you wanted to burn an alternate dvd and test the install, that'd be _wonderful_...
 * lamont has been busy
<jbailey> It's a remote box, so I can't do install testing.
<lamont> ah, well then.
 * lamont hasn't tested hardy at all, other than far  enough to know that do-release-upgrade -d makes seemingly sane decisions about what to do going gusty->hardy
<jbailey> do-release-upgrade?
<jbailey> Wassat?
<lamont> iz ubuntu-speak for "do what it takes" 
<lamont> it does the magic of dapper->hardy, or gutsy->hardy, or $N->$N+1
<jbailey> Ah interesting.
<jbailey> So apt-get dist-upgrade is officially the Wrong Thing now?
<lamont> jbailey: since dapper->edgy, iirc
<lamont> or was it breezy->dapper... 
<lamont> can't remember
<lamont> update-manager is the gui versoin
<lamont> do-release-upgrade is server-ish
<jbailey> Weird.
<jbailey> I also just use apt-get for everything.
<jbailey> And I tend to assume that as long as I have the main meta packages installed that everything will be alright.
<lamont> mostly
<lamont> edgy or feisty is when evms went away, and if you didn't know to nuke it, (and had raid???) then there was no love.;
<jbailey> Right.
<jbailey> Why is volume management on Linux so doomed?
<lamont> and then you booted from a livecd and fix0red it and rebuilt the initramfs and then there was love again
<jbailey> I'm still pining after NetWare.
<lamont> for more fun, gutsy or maybe feisty dropped raid support from the livecd -> more pain if y ou did taht
 * lamont did that doo much
<jbailey> So far, that's the only OS that has gotten it right.
<andrewks> alternate dvd for Hardy?
 * andrewks is the hoster of said remote ia64 box
#ubuntu-ports 2008-04-13
<hklv> I'd like to create a port of Ubuntu for the XO (of the OLPC project), with specific geode-lx optimizations. Where should I start?
#ubuntu-ports 2009-04-06
<NCommander> well
<NCommander> That's new
<NCommander> I got a +5 Informative mod on Slashdot by saying Ubuntu has an HPPA port
#ubuntu-ports 2009-04-08
<pschulz01> Hello all.. I am having an interesting with my powerpc (MacMini) and jaunty.
<pschulz01> I'm trying to upgrade everything.. I'm using it as a workstation.. 
<pschulz01> Whan I remove gdm, the font on the virtual terminals goes screwy.. looks like an endian issue as the characters are almost readable.
<pschulz01> I put gdm back in.. and it comes good again.
<NCommander> pschulz01, what are you trying to do specifically?
<pschulz01> NCommander: Looks like it might have 'fixed' itself.
<pschulz01> NC
<pschulz01> NCommander: I'm upgrading to jaunty.. but the box has seen a lot of customisations..
<pschulz01> NCommander: Really just wan tto get the desktop up so I can use it as a terminal.
<pschulz01> NCommander: I just uninstalled (purge) x11-common.. and a whole lot of packages suddenly wanted to be installed.
<pschulz01> NCommander: It's getting better :-)
<pschulz02> ,gk
