/srv/irclogs.ubuntu.com/2006/04/15/#ubuntu-ports.txt

=== fabbione [i=fabbione@gordian.fabbione.net] has joined #ubuntu-ports
=== fabbione [i=fabbione@gordian.fabbione.net] has joined #ubuntu-ports
=== braddr_ [n=braddr@bellevue.puremagic.com] has joined #ubuntu-ports
braddr_g'evening.08:46
fabbionehey brad08:46
fabbionethat error message is really weird08:46
=== braddr_ nods.
fabbionedo you have a stock T2000 or did you add pci cards?08:47
braddr_stock08:47
fabbionesame here08:47
braddr_and solaris boots fine08:47
fabbioneok.. just one sec..08:48
braddr_it's been over 10 years since I played with solaris though.. so I'm so lost in it. :)08:48
fabbionedon't worry :)08:48
fabbionecan you boot in solaris in the meanwhile?08:49
braddr_if you have something I should look at, sure.08:49
fabbionewe need to figure out what is different on your box from the one david and I have08:49
braddr_righto.. booting it up08:49
braddr_well, once the extremely long POST finishes.08:49
fabbioneyeah eheh08:50
fabbioneat least i hope you are not sitting 3 feet from it08:50
fabbionebecause it's so damn noisy it's killing me08:50
braddr_oh, I am.08:50
fabbioneyou have ALL my understanding08:50
braddr_I have the rails in the rack, but I gotta do move another box before I can actually rack this beast.08:51
fabbioneonce you are in solaris, can you please slam somewhre the output of prtconf -v -p08:51
braddr_roger08:51
fabbionei have it rack mounted..08:51
fabbionebut there aren't that many people with a rack at home08:51
fabbionereally..08:51
braddr_I'm rather tempted to yank 2 of the 3 front fans.08:51
braddr_count me among the unusal.08:52
fabbionelol08:52
braddr_but this box is easily the loudest of the machines I have.08:52
fabbioneeven when i boot up the entire SAN is less noisy!08:52
braddr_ok.. booted.  lemee grab that info08:52
braddr_I can give you an account if you'd like.. it's on the public net.08:53
fabbionelet see if we really need it first08:53
braddr_the sc-net part isn't though08:53
=== braddr_ twiddles his thumbs.. shouldn't have done this over the 9600 serial connection
fabbionemeh08:55
fabbioneit's easier to pipe it to a file08:55
fabbione:)08:55
braddr_oh sure, point out me being stupid.. duh.08:56
braddr_ok.. it's at the same url, prtconf.txt08:56
fabbionethanks08:57
braddr_hrm.. only 16 cpu nodes?  This was supposed to be an 8 core model.08:57
fabbionemine was supposed to be 8 cores too and i got 608:58
fabbione16 cpus you have a  4 cores08:58
=== braddr_ grumbles
braddr_I'll send a nastygram to my sun contact tomorrow.08:59
braddr_another one.  They also didn't send the serial management cable.. had to... improvise.09:00
fabbioneah09:01
fabbionei had 209:01
braddr_last sun box I had didn't _have_ the nifty management stuff.09:01
fabbioneeheh09:01
fabbionei am still checking the output09:02
fabbionei am not as fast as david here :)09:02
braddr_s'ok.. I'm not in any hurry.09:02
braddr_I wish I was able to figure this stuff out on my own.. not used to being this handicapped.09:02
fabbionedon't worry09:03
fabbionethe only real difference i can see is that on your machine there is nvramrc in use09:03
fabbione+        use-nvramrc?:  'true'09:03
fabbionecould you got back to the OBP and do:09:03
fabbionesetenv use-nvramrc? false09:04
fabbioneor09:04
fabbionesetenv use-nvramrc?=false09:04
fabbionei can never remember09:04
braddr_not sure why that'd be. I didn't do any openboot config changes.  Sure.09:04
braddr_the former09:04
fabbioneit was turned off on mine09:04
fabbionebut it's the only immediate diff i can see09:05
braddr_hrm.. resume isn't getting me back into the os.09:05
fabbionego09:05
braddr_oh, right.09:05
fabbioneok> go09:06
braddr_ok.. it shows as false now.09:06
braddr_... in prtconf09:06
fabbioneperfect09:06
fabbionenow you could try to netboot?09:06
braddr_reboot and.. roger.09:06
fabbionethanks09:06
fabbionei am trying to reproduce it in the otherway around here in the meantime09:07
=== braddr_ nods.
braddr_btw, if you have a set, my range earmuffs do a _great_ job of cancelling the noise of those fans.09:07
fabbioneeheh09:07
fabbionei use headphone set + metallica at 200% of the volume09:08
braddr_damaging your hearing further.09:08
fabbionewith this treatment i can barely hear my own brain09:08
fabbione;)09:08
braddr_ok.. it's into the boot sequence now.. at 960009:08
fabbionei feel your pain09:09
braddr_no difference.09:09
fabbioneok we will need to wait for david09:09
fabbionehe doesn't irc much..09:09
fabbionebut i already send him the info09:09
braddr_yeah.. kinda a popular guy09:09
braddr_he'd get swarmed.09:09
fabbioneyeah i know09:10
braddr_based on that stack trace, that's probably the first disk access of some sort, no?09:12
fabbionethere is no disk access at that time09:14
fabbioneit's loading from the initramfs09:14
fabbioneor ramdisk09:14
fabbionebut it loads fine here09:14
braddr_[   17.001666]  checking if image is initramfs... it is                               09:14
braddr_[    3.293044]  Freeing initrd memory: 3832k freed                                    09:14
fabbioneyeps09:15
fabbioneat that point where it fails it is starting up the installer09:15
fabbioneit seems we found one relevant difference09:16
fabbioneyou have 8 cpus09:16
fabbionesorry 1609:16
fabbionei have 2409:16
fabbionedavid 3209:16
braddr_though this seems to be a non-smp kernel, since it only inits one09:16
fabbionethe sparc kernel is still able to probe the amount of CPU's installed on the system09:16
fabbionethe diff is that UP init only 109:17
=== braddr_ nods.
fabbioneSMP enables the ones you ask for :)09:17
braddr_right.. just pointing out another potential difference.. david's not likely booting a non-smp kernel.09:17
fabbionehe did :)09:18
braddr_I'm sure he _has_, but not nearly as much as smp.09:18
braddr_is there an older image that'd be worth trying?09:20
braddr_david's email hints that there was/is09:20
fabbioneyes just one second that i am collecting info for david09:20
braddr_okey09:21
fabbionehttp://ports.ubuntu.com/ubuntu-ports/dists/dapper/main/installer-sparc/20051026ubuntu26/images/sparc64/netboot/2.6/09:22
fabbioneyou can try this one09:22
fabbionebut i don't ensure it's "old" enough or that will work09:22
braddr_oh, no worries, just looking to broaden the facts on hand09:23
braddr_no change09:28
braddr_... other than the version number of the build09:28
fabbioneok09:28
fabbionewe are looking at it09:28
braddr_can I bring you a coke.. maybe a pizza? :)09:29
fabbionei jsut woke up.. coffee would do :P09:29
braddr_hrm.. gonna guess the round trip might be kinda longish.09:29
fabbioneehhe09:29
braddr_for the sake of completeness, the hex number at the end is slightly different: Error at TPC[4ca2bc]  with -20, and 4c9f9c with -1909:31
braddr_probably meaningless09:31
fabbioneok thanks09:32
fabbionei need a few minutes to upgrade my box and do some debugging...09:33
=== braddr_ nods.
fabbioneare you using tftp booting i assume?09:34
braddr_you and dave talking via some im network?  I'd love to observe just to soak up a bit of background.09:35
fabbionebraddr_: IM09:35
fabbionebraddr_: can you boot with:09:35
fabbioneboot net max_cpus=109:36
fabbione?09:36
fabbioneif it errors the same way we have some clues :)09:36
braddr_on it.09:36
fabbionethanks09:36
braddr_I can easily move the sc-net port to a public ip address, too, if that'd help.09:37
fabbionenah09:37
braddr_this boot is with -19 still, whoops.09:37
fabbioneit's ok09:37
fabbioneeheh09:37
fabbione-20- is better09:37
braddr_close enough?09:37
braddr_ok.. I'll redo09:37
braddr_much different this time09:38
fabbioneis it?09:38
braddr_I'm at the choose a language part of the installer09:38
fabbioneAH09:39
braddr_with -1909:39
fabbionecan you try with -20- please?09:39
braddr_it finished while I was moving the images around09:39
fabbioneok09:39
braddr_hrm.. having trouble getting back to openboot09:40
fabbionetelnet to the alom09:40
fabbioneand do a break09:40
fabbioneyou can login multiple times to alom09:41
braddr_thanks.09:41
braddr_does the linux kernel disable sending break to get to openboot?09:42
fabbioneno afaik09:42
fabbioneat sc> reset 09:43
braddr_odd.. with -20 it looks like the same failure.. lemee scroll up to make sure I booted it correctly09:43
braddr_Kernel command line: max_cpus=109:44
braddr_so.. -19 works, but -20 doesn't.09:44
braddr_... with 1 cpu.  neither does with all of 'em.09:45
fabbioneok we are checking some stuff..09:45
=== braddr_ looks for his jeopardy theme music
fabbionei suggest to try to reproduce that it works on -19- and fails on -20-.. but can you please poweroff in between?09:47
fabbionewith max_cpus=109:47
braddr_let me go back to -19 w/o a power off.  Doing that introduced a several minute wait cycle.09:47
fabbioneyes waiting is not an issue09:48
fabbionei want to know if the boot with -19- is reproducible09:48
braddr_ok, then a powercycle then boot net w/ -1909:48
fabbioneexactly09:49
=== braddr_ starts a timer.. I'm curious just how long it really takes.
braddr_12:49:10am start09:49
fabbioneehhe09:49
braddr_ok.. I just checked, I've got an 11am meeting at work tomorrow, so I'm gonna put a cap of 2 hours on this debugging before I'll have to bug out and get some sleep.09:51
fabbioneok 09:51
braddr_on the other hand, that could change depending on how close we are. :)09:52
fabbioneeven if we get to narrow down the problem. it will take at least a few hours to get the fix in the kenrel (assuming it is a kenrel issue) and propagate it to an installer09:53
fabbionetho i could build a custom one for you09:53
=== braddr_ nods.
braddr_assuming -19's successful boot is reproducable, that ought to give a fairly large hint I assume.  and I also assume we won't know for sure that the problem is fixed w/o a test kernel/image.09:54
fabbioneyes to both09:55
fabbionewe know what changed between 19 and 2009:55
fabbionebut there is still the rootcause of the max_cpus=109:56
fabbionespecially because what you are booting is a UP kernel09:56
fabbioneand theoretically max_cpus=1 has no meaning09:56
braddr_ok.. booting now09:56
braddr_Requesting Internet Address for 0:14:4f:f:10:6009:57
braddr_ERROR: /pci@7c0/pci@0/pci@1/pci@0,2/LSILogic,sas@2: Last Trap: Fast Instruction Access MMU Miss09:57
braddr_[Exception handlers interrupted, please file a bug] 09:57
braddr_[type 'resume' to attempt a normal recovery] 09:57
braddr_{0} ok resume09:57
braddr_Error -256 09:57
braddr_odd.09:57
=== braddr_ tries the net boot again.
fabbionedid you break while it was already booting?09:57
braddr_sorta.. I hit break after the no keyboard part of the early boot09:57
fabbioneok09:57
fabbionefor your own sanity set:09:58
braddr_so that's sorta 'normal'?09:58
fabbionesetenv auto-boot? false :)09:58
fabbioneyes09:58
braddr_it's into the boot sequence now.. just a second more09:58
fabbionesure09:58
braddr_damn.. same __lookup_hash stacktrace09:59
fabbioneok thanks09:59
braddr_I might be tired, but I know I saw it boot once. :)09:59
fabbionewant to recheck the boot parameters?09:59
fabbioneyes i know.. it might be "lucky"09:59
braddr_already did.. lemee paste09:59
braddr_Kernel command line: max_cpus=109:59
fabbioneok09:59
fabbionestay tuned.. :)10:00
braddr_and just checked, it's definitly the -19 image10:00
braddr_gonna set the auto-boot part for next time.10:00
braddr_ok.. and it's a 10min boot cycle. :)10:00
fabbionebraddr_: i have some good news and some bad news...10:14
fabbionewhich one do you want first?10:14
braddr_how about in that order10:14
fabbionethe good news is that David has a pretty good idea of what is going on.10:14
braddr_and the bad news is that he wants sleep10:15
fabbionethe bad news is that we will have to wait mid-week because he is busy to push some .17 stuff to Linus10:15
fabbionehe doesn't sleep.. we don't usually allow him...10:15
braddr_likly story. :P10:15
braddr_ok.. well, I'll tinker with solaris for a while then.10:16
fabbioneanyway it looks like that there is some heavy memory corruption problem10:16
braddr_and see if I can convince the sun weenies to ship me the box I asked for.10:16
fabbionethat happens early in the boot10:16
fabbioneway before where we see the OOPS10:16
=== braddr_ nods.
fabbionethe strange things are:10:16
fabbione- we don't see it with 24/32 cpu's10:16
fabbione  so it might be specific to the 16 cpu's one10:17
fabbioneperhaps a bit that overflows or something like that... go figure10:17
fabbioneso that machine is gold now10:17
braddr_well, it's not going anywhere.10:17
fabbioneperfect10:17
braddr_I'll see if I can find a reliable pattern to get it past the bootup problem, since it did work once.10:18
fabbionei am afraid that it was more a lucky boot than you can acutally reproduce10:19
braddr_just got 'lucky' again.10:19
fabbionewith -20- ?10:19
braddr_no, -1910:19
braddr_and with max_cpus=110:20
fabbioneand the weird thing is that max_cpus is useless.... go figure...10:21
=== braddr_ tries it again.. I wanna see if that first boot is bad but the rest aren't
fabbionecan you try -19- without max_cpus?10:21
braddr_after this round, sure.10:21
fabbioneok thanks10:21
braddr_failed this cycle (19 w/ max_cpus=1)10:23
braddr_trying w/o max_cpus10:24
fabbioneok10:25
braddr_failed10:26
fabbionei think there is no real pattern..10:26
=== braddr_ nods.
fabbioneit's just memory corruption happening for some odd reasons10:26
fabbionethere might be 2 reasons:10:26
fabbione- hw is buggy10:26
braddr_is there any reason to believe that on the boots that it gets into user space that it'll be stable?10:27
fabbione- the specs we got are not complete and don't cover the 16 CPU's case properly10:27
braddr_assuming non-buggy hw10:27
fabbioneit fails opening a file in /proc10:27
fabbioneso it eithers contain garbage10:28
fabbioneor /proc is busted10:28
fabbioneperhaps it works once it boots, but well that won't help you much10:28
fabbionebecause you might install and not being able to boot after10:29
=== braddr_ nods.
braddr_but it'd progress.10:29
fabbionenot really..10:29
braddr_I've only got 60 days, if the hype is to be believed.10:29
fabbioneyes we will fix it WAY before that10:29
braddr_unless it's hardware. :P10:30
fabbionebecause by that time Dapper must be stable and released :)10:30
fabbionewell clearly..10:30
braddr_is niagara hardware a showstopper for dapper?10:30
fabbionenobody at SUN has been reporting such case... that's why HW failure is somehow floating in my mind10:30
fabbioneno10:30
braddr_didn't think so.10:30
fabbionesparc is not supported by Ubuntu10:30
fabbionebut if we can get to release at the same time, the better :)10:31
braddr_right.. so why the confidence that it'll be fixed before then?10:31
fabbionebecause i am a nasty bitch10:31
braddr_well, right.10:31
fabbionei want it fixed before that10:31
braddr_determination and pigheadedness can go a long way.10:31
fabbionetrust me.. i can bitch and nag people enough to make them cry like little babies :P10:32
braddr_I know the type.. I have to do a lot of that sorta project management at work10:32
fabbioneeheh10:32
braddr_though rarely does anyone have to actually ccry10:32
fabbione:)10:32
braddr_anything else I or we can do until later in the week when david frees up?10:34
fabbioneonce he frees up, we will just have to be ready to test what he asks10:34
fabbionesee another major issue is that we have "only" 3 machines in 3 different setups10:35
fabbionethat doesn't make it easier to isolate10:35
braddr_agreed10:36
braddr_and unlike x86 hardware, can't just yank a cpu or 4 to normalize the configs. :)10:36
fabbioneeheh10:36
braddr_I'm really looking forward to putting this hardware through it's paces.  I'll be using it to do high parallel compiles of gcc (with the gdc, the d language frontend) and it's test suite of 10's of thousands of nice independant tests.10:38
fabbioneeheheh10:38
braddr_pretty much the perfect hardware for this sort of thing.10:38
fabbioneif it doesn't use fpu yes10:39
braddr_well, some tests do, but most don't and the compiler certainly doesn't need it10:39
fabbioneyeps...10:39
fabbionewe did push all optimizations in for gcc already10:39
fabbioneand glibc10:39
fabbioneso it should be quite fast10:39
fabbioneassuming we can get it to boot :P10:40
braddr_I assume only in gcc trunk though, ri ght?10:40
braddr_I haven't tried getting the d frontend moved anywhere past 4.0.x10:40
fabbionewe did backport all the optimization into ubuntu gcc as well10:40
fabbioneso if you use the gcc4 shipped by default it will work10:40
=== braddr_ nods.
braddr_I'm less concerned about the actual generated code speed, though faster is better.10:41
braddr_anything will be faster than the old athlon box I've been using.10:41
fabbioneehhe10:41
fabbionei can cut the kernel in 1 minute and 20 secs here10:41
fabbionewith 24 CPU's10:42
fabbioneand minimal config of course (but still bootable)10:42
braddr_the test suite for D takes about 8-10 hours on the athlon10:42
fabbionenot too bad10:43
braddr_kinda high for iterative development, but a decent overnight check it tomorrow run.10:43
braddr_15-20 minutes would be better.10:43
fabbionewell..10:44
fabbionelike we say in Italy: "You can't have a drunk wife and the jar full of wine"10:44
braddr_just need 2 jars.10:44
fabbioneahah10:44
braddr_a little creativity will solve most problems.10:45
fabbionesometimes it does10:45
braddr_hey, is there a way to disable cpu's in openboot or sc?  Maybe dropping it down to 8 or 4 would prove interesting?10:53
fabbionenope10:55
fabbioneit's hardcoded in the CPU10:55
braddr_oh well, worth a shot.10:55
fabbioneyeah10:55
fabbionetime for breakfast and more coffee :)10:56
fabbionebrb10:56
braddr_I've yet to get -20 to boot ever, but -19 boots periodically.11:09
braddr_no pattern, but I tried a lot with -20 and never.11:09
fabbioneprobably the fixes in -20- can trigger the error constantly11:13
braddr_hey.. I caught an oops on this 'successful' boot11:14
braddr_http://www.puremagic.com/~braddr/t2000/oops-1.txt11:15
fabbionechecking..11:18
fabbionewas that with -19- ?11:18
braddr_yes.. but double checking to be sure.11:19
fabbioneok11:19
braddr_yup11:19
braddr_and looks like the install has stalled after the language screen11:20
braddr_not super suprising.11:20
fabbioneyes mostlikely11:20
fabbioneudev is generating a crash11:20
fabbioneprobably parsing /sysfs11:20
braddr_right, well, with the installer stalling, I'm gonna give up on the box for tonight11:23
fabbioneyes.. get some good rest and get ready for some heavy debugging soon :)11:24
braddr_it seems silly to have a 2 middleman process between david and the hardware11:24
fabbionenah it's ok.. because i will need to push the changes in the Ubuntu kernel as well11:24
fabbioneand trigger all the other bits to have the installer etc. etc.11:25
braddr_sure, but not until after it's fixed.11:25
fabbioneit might be an Ubuntu specific bug..11:25
braddr_could be11:25
braddr_well, you've got my email address, and I'll hang out here during the usa/pacific evening/night hours.11:26
braddr_during work hours my availability is less predictable, but usually I can make myself available11:27
fabbionei will try to ping you with enough notice11:29
=== shinmen [n=shinmen@nat1.inalambrica.net] has joined #ubuntu-ports
=== jb-home [n=jbailey@modemcable139.249-203-24.mc.videotron.ca] has joined #ubuntu-ports
=== shinmen [n=shinmen@nat1.inalambrica.net] has joined #ubuntu-ports

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!