/srv/irclogs.ubuntu.com/2006/06/11/#ubuntu-ports.txt

braddrsounds like a potential pain, testing wise.12:03
fabbioneyes12:04
fabbioneindeed12:04
fabbionebut the first thing i will do is to test your combination12:04
fabbioneif i can reproduce it, i know what i need to tell people to do :)12:04
fabbionebecause the point is that the UP kernel is well.. UP12:05
fabbioneit doesn't use anything of the SMP code if not to detect how many CPU's are available on the system12:05
fabbioneand that's done parsing the OBP devicetree12:05
fabbionethe test i did on your machine was to disable thread 0 of core 112:06
fabbione(your booting CPU basically)12:06
fabbionebecause your CPU has no core+12:07
fabbionecore0 even12:07
fabbionebut the ALOM and OBP do some magic cpu virtual remapping12:07
fabbioneso even if in reality the code is running on core1 thread0, the kernel sees it as cpu012:07
fabbionethat might be buggy12:07
=== braddr wonders if he has a 6 core model where cpu 0 was flawed somehow so it was disabled and shipped as a 4 core.
fabbionebraddr: all the cpu's are born as 8 cores12:09
braddrah12:09
fabbionethe 6/4 core models are 8 core where 8 - $x core did fail12:09
fabbioneso they get disabled in hw12:09
fabbioneif you do a showcomponent in the ALOM you can see it12:10
fabbionethe first block of MP/CPU0/.. are the core/thread set12:10
fabbionethe second block (8 entries) are the 4 dual bus for memory access12:10
fabbioneMB/CPU0/R0/D1 iirc12:10
=== braddr nods.
fabbione    MB/CMP0/P0 12:11
fabbione <- core0 thread012:11
braddrso, barring remapping, cpu p0-3 and 20-31 were disabled12:11
fabbioneyou don't have that12:11
fabbione    MB/CMP0/CH0/R0/D0 <- memory 12:12
fabbionebasically yes. the 4 cores that did not pass the test have been disabled12:12
fabbionethat's also why (as you can see) there are "holes"12:12
fabbioneone thing that did puzzle me also was the ALOM POST process12:13
braddrso, on your and david's sytems, at least one of them has 0-3 disabled?12:14
fabbionein both your machine and my machine i did disable the booting CPU12:14
fabbioneand the POST was still running on it..12:14
fabbioneso that's what triggered a bell12:14
fabbionenope.. on all our machines there is a core0/thread0 working12:14
fabbionedavid has 1x8 and 1x612:15
fabbionei have 1x6 and 2x812:15
fabbionebut all with core012:15
fabbioneyour is "special"12:15
=== braddr waits for the short bus to stop by
fabbioneeheh12:15
fabbionebut we will figure it12:15
fabbioneit might as well be a case we don't handle properly OR a combination of bugs12:16
fabbionewe will see soon hopefully12:16
braddrit'll get sorted out, I have no doubts.12:16
fabbioneyepos12:16
=== braddr [n=braddr@209.189.198.126] has joined #ubuntu-ports
=== braddr [n=braddr@209.189.198.126] has joined #ubuntu-ports
braddrhark.. is that the sound of a jet passing overhead or did someone just turn on my t2000? :P07:59
fabbionemorning braddr :)07:59
braddrg'morning08:01
braddrbrb.. my windows desktop's networking is acting all screwy.. gonna reboot.08:02
=== braddr [n=braddr@209.189.198.126] has left #ubuntu-ports []
=== braddr [n=braddr@209.189.198.126] has joined #ubuntu-ports
braddrwhat's on the testing block for today/tonight?08:08
fabbionedavid has a possible workaround for your problem, but i think he is actually eating to power up brain and brainstorm on a proper solution08:09
fabbionei just woke up and i am going to test another fix on your box08:10
fabbione"another possible fix"08:10
fabbioneboot.img-debian-sparc <- tsk ;)08:16
fabbionedebian doesn't support Niagara :D08:16
braddrhey, fire me. :)08:17
braddrnotice the date on that file?08:18
fabbionenah.. i was just teasing :P08:19
braddrI know08:19
fabbionethese are the moments in which i wish gzip could fork on N cpu's08:30
braddrheh08:30
braddryup.. lots of cpu's just don't help single threaded apps.08:30
fabbioneabout to boot the new image.. 08:39
fabbioneyou can go in console read only mode if you like08:40
fabbioneboot.img-fabbione-9 <-08:40
=== braddr crosses his fingers.
fabbioneoh well08:42
braddrkablooie08:42
fabbionecrap08:43
braddrshouldn't need to be that verbose.. just boot net mem=1024k should be enough08:45
fabbioneDOUBLE TYPO08:45
fabbionego fabion!08:45
braddrer.. 1024m08:45
fabbioneit depends on the tftpd you are using08:45
fabbionetftp-hpa doesn't support broadcast08:46
fabbioneso you need to specify the tftp server address08:46
braddrIt's been working fine for me forever. :)08:46
fabbioneit doesn't here and never did08:46
fabbionemight be my anal firewall08:46
fabbioneTHERE WE GO08:48
braddrI wasn't watching, but just saw the screen reset.08:48
fabbionei am at the language selector08:48
braddryup.. I see. :)08:49
fabbioneis there an extra disk i can use to test an install?08:49
fabbionei would like to make sure that it doesn't crash later08:49
=== braddr tries to remember if he did anything interesting with disk1 beyond just do the install back when we had it working.
braddrdrop to a shell and mount the large partition?08:50
braddrI definitly don't have a spare disk, but I'm thinking that it'd be fine to reinstall on disk108:50
fabbioneok.. give me a few minutes to get to the partitioner08:50
fabbioneotherwise we can't mount anything08:50
braddrmight be quicker to just boot that disk. :)08:51
braddron the other hand.. gotta get that far to do an install for real anyway08:51
fabbioneyeah i think i am going to check a couple of more things before we do the install08:52
fabbioneyeps.. ok it's reproducible08:55
fabbionelet see what's the watermark for the issue to appear08:55
braddrwant me to take control for a sec and check that disk for wipe-ability?08:56
fabbionein a few minutes08:56
fabbioneif that's ok with you08:56
braddrtake your time08:56
fabbionethanks08:56
fabbionethis mem thing opens a completely new frontier :)08:57
braddrthe amount of memory seems to be a factor?  Bizarre.08:57
fabbionei have 8GB too08:57
fabbioneand it works08:57
fabbioneit might be a very complex combination at this point08:58
fabbione2g boots08:59
=== fabbione attempts 4
=== fabbione attempts 8
braddrkablooie09:06
fabbioneyeps09:07
fabbionethat's good..09:07
fabbioneit was expected09:07
braddrif this goes well.. try 8G-1M or -1k even09:10
fabbioneyeah09:10
braddrinteresting09:13
fabbioneyes09:13
braddroh, sure, return to bisecting.09:15
fabbioneyeps09:15
fabbioneit might even be 7G+1m09:16
fabbionehem09:16
=== fabbione hides in a corner
braddrstick with bisecting.. it won't take that many steps to figure out.09:18
fabbioneyeah i know09:18
fabbionei forgot the m at the end09:18
braddrand thus the upper bound has been changed.09:19
braddrer, lower bound09:20
fabbioneyeps09:20
fabbioneexcept i keep forgetting the m :)09:20
braddrwhat's it default to, bytes?09:20
fabbionekb iirc09:21
braddr7680-8191..09:21
fabbioneno worries.. that one is normal because i did interrupt the boot09:22
braddrpick a number, any number.. where will the tip over point land.09:22
fabbione7936m <- 7.75G09:22
braddr7936-8192.. I'll put my money on 7999/8000 for no good reason. :)09:25
fabbione8064 now :)09:26
fabbioneit might even be bad ram09:26
fabbionein UP used for something and it crashes09:26
fabbionein SMP used by some init tables that are not used once init is done09:27
fabbionethat would be ironic09:27
braddrthat would be hilarious and certainly ironic.09:28
braddrI certainly haven't done much that stresses memory to hell and back.09:28
fabbione8064 is good09:29
braddr8064-8191.. so much for my first pointless guess09:29
braddrtime for easier numbers.. 810009:29
fabbionethat would be.... 8064+6409:30
fabbione8128 ?09:30
fabbioneyeah09:30
fabbionei am pretty sure this will work too09:30
fabbionelast 32M :)09:34
braddr8128-8191, running out of midpoints.09:34
fabbionefor your blog -9 is just the dapper kernel from git with security fixes and stuff09:35
fabbione-10 that i am about to boot after the bisect had DEBUG_BOOTMEM on09:35
braddron bellevue: ~braddr/.www/t2000/boots.txt -- I just made it world writable, so you can edit09:36
fabbionenah i am not going to edit www please :)09:38
fabbioneso this one boots too09:38
braddrwell, you've been testing stuff when I'm not around, so it might be worth making notes.  I just made that one file writable09:38
braddrI'll add this block though09:39
fabbioneoh i didn't do much before..09:39
fabbionethis is realtime debugging ;)09:39
fabbionebut i need a short break09:39
braddr-9 git as of what date or id?09:40
braddr-10 is -9 + debug_bootmem?09:41
braddranything to note about -8 and boot.img-dapper-orig?09:41
braddrI think -8 was just current as of that day, but I'm not sure about that09:42
fabbione-8 is old09:43
fabbione-dapper-orig is just the acutal dapper kernel.09:43
fabbione-9 is just git as of yesterday09:43
fabbioneno special id09:43
fabbione-10 is -9+DEBUG_BOOTMEM09:43
fabbionei didn't boot -10 yet09:44
=== braddr nods.
fabbionewe are still booting -909:44
braddrI saw.  Where are we with the memory boot ranges?09:44
fabbione817609:45
fabbionebooting now09:45
braddrhighest good ?09:45
fabbionedavem suspects that we are sharing a page with some firmware stuff and everything goes bad09:45
fabbionethat's the bisect09:45
fabbioneand it works09:46
fabbioneso it's lower bound now09:46
braddrokey09:46
fabbioneso next is....09:46
fabbione818209:46
fabbioneno09:46
fabbione818409:46
fabbionethere09:49
braddrsorry.. not paying attention to the boot attempts, updating the blog of what I've been doing with the box and the testing you've been doing.09:50
fabbionebah09:50
fabbione8184 does error09:51
fabbioneand i forgot m again09:51
fabbioneso lower mark 8176 and higher mark is 818409:51
braddra hah.. new top end point!09:51
fabbionenew top 818009:55
fabbione8176 - 818009:55
braddrdown to a 4m range09:55
=== fabbione tests 8178
fabbioneCheck cable and try again09:57
fabbioneLink Down09:57
fabbioneTimed out waiting for Autonegotation to complete09:57
fabbionewhat have you done?09:57
fabbioneit went up again.. brrr09:57
fabbionescary09:57
braddrodd.. nothing, but it retried and made it09:57
=== braddr checks bellevue's dmesg log
braddrnothing interesting on that side09:58
fabbioneok 8178 is good09:59
fabbione8178 - 818009:59
braddrlast boot!09:59
fabbionenope...09:59
fabbionewe need to get down to kb09:59
braddrwell, unless you wanna dip into the k's.09:59
braddrheh09:59
fabbione817909:59
fabbionewell ideally we need to find the page that's making this issue10:00
=== braddr nods.
fabbionea boundary of 4k would do10:00
braddrget the page and have the kernel dump the entire thing before the crash and if possible as part of the oops or whatever that die point is10:01
fabbionethe problem is that we can't dump the page10:01
fabbioneas soon as we access that page the hypervisor blocks the system10:02
fabbionethat's the error we are getting10:02
fabbionewe need to understand what is there and why10:02
braddroh10:02
braddrslip the guy a bit of valium, he's obviously a little too high strung.10:02
fabbione8179 is good10:02
fabbioneso now.. some maths :)10:03
fabbione8375296 - 837632010:04
fabbionerange in kb10:04
braddr2 boots to establish an acceptable good and bad value?10:04
fabbionewe will bisec that now :)10:04
fabbioneit's just one mega10:04
fabbioneso now it is 8179+512k10:05
fabbionei think we found it :)10:07
fabbionewow.. it doesn't even break!10:08
braddrfor a definition of we that has you doing all the work.10:08
=== fabbione powersoff
fabbioneso low mark is still 837529610:09
fabbionehighmark is 837580810:09
fabbionelet's try in the middle10:09
braddrworth repeating the last try?10:09
fabbioneyes, but not now10:09
fabbionei want to test lowmark+256k10:09
fabbioneif it still hangs, i want to test passing a known to work mem in kb10:10
=== braddr nods
fabbionewe need to make 100% sure we are not hitting other bugs, like MB = 1000kb instead of 102410:10
=== braddr recalls suggesting that. :)
fabbionei am pretty sure it's done in 2^ values10:11
fabbionebut you may never know ;)10:11
braddrall we need is a good boot to see it in the boot output10:11
=== braddr eyes it warily.
fabbioneit's hanging again10:15
fabbionewaits a few secs..10:15
braddrwait.. is this the -10 kernel?10:15
fabbione-910:15
fabbionewe didn't boot 10 at all10:15
braddrok.. just checking10:15
braddrthis is the symptom we used to see with bootmem I tink.10:15
fabbioneyeps it hangs10:15
=== braddr checks his notes
fabbioneprobably because we are scraping the same page10:16
fabbionebootmem does basically clean all mem allocation10:16
fabbioneoh neat10:16
fabbionesc> reset doesn't the trick without poweroff/poweron10:17
fabbionenow i am booting with 8179 in kbytes10:18
fabbionejust to make sure we are converting it properly10:18
braddrhrm.. my notes are insufficient10:18
fabbioneand guess what.. we are not10:18
=== braddr kicks himself.
braddrtry it in m again10:18
fabbioneyeps10:18
fabbioneactually10:19
braddrI was seeing this sort of boot hang with my hand built kernels, and I don't have enough notes on the differences.  I did get one of them too boot and I want to say that bootmem was one of the differences, but I'm not confident enough in that memory.10:19
fabbionei think mem is in bytes..10:19
fabbionelet me check10:19
braddrit was definitly hanging to the point of needing to reset it10:20
braddrjust append a k10:20
braddrremove the doubt10:20
fabbionemem=nn[KMG] 10:21
fabbioneit doesn't say the default10:21
fabbioneso i guess it's bytes10:21
fabbionethis is 8179m in k10:21
fabbionescore10:21
fabbioneso the last boots were just wrong10:23
=== fabbione resets the counters
braddrok.. so there's insufficient checking for minimum memory somewhere in early bootup. :)10:23
fabbione8375296 - 837632010:23
fabbione8375296 (8179) = good10:23
fabbione8376320 (8180) = booting now....10:24
=== braddr hopes, for our sanity's sake, it fails
fabbionemath is not an opinion in these cases ;)10:26
braddrsure sure.. but as I get older, I'm starting to believe more and more that hardware can be spiteful.10:26
fabbionehehe10:26
fabbionesee.. boom10:26
braddryup10:27
fabbione8375808k (8179+512k) = booting now..10:27
=== braddr wanders to the kitchen to see what he can find to eat and drink. I kinda forgot to do something about dinner.
fabbioneehhe10:28
fabbioneyeah i will spend MAX another hour debugging10:28
braddrI was gonna grab a subway sandwich or something before the closed, but that was 90 minutes ago.. oops.10:28
fabbioneit's sunday, i am tired and i want to get ready for the F1 race :P10:28
fabbionesubway rocks!10:28
braddrthey're not bad.. good for the sandwich category.10:28
fabbioneyeps10:29
braddrback shortly10:29
fabbionesure10:29
fabbionebang10:30
fabbionenew highmark :)10:30
fabbionelowmark= 8375296 (8179)10:30
fabbionehighmark= 8375808k (8179+512k)10:30
=== fabbione bisects
fabbione8375552 (8179+256k) = good10:34
fabbione8375680 (8179+386k) = ahhhh new abort10:38
fabbionei think we got it10:38
fabbionethat one just stops booting10:39
fabbioneBooting Linux...10:39
fabbioneProgram terminated10:39
fabbioneso it's hitting something10:39
fabbionethe limit seems to be 8375552 (8179+256k) = good10:42
fabbionethat + 128 = stops immediatly10:43
fabbionethat + 64 = init the console and stops10:43
fabbioneclearly we don't catch the hypervisor error that early in the boot10:43
fabbionebang bang10:44
fabbionewe are there10:44
fabbionesummary:10:50
fabbionefrom mem=1024m to mem=8375552k (8179m+256k) is all good.10:50
fabbione8375552k + 32k  = hangs hard at Booting Linux...10:50
fabbione8375552k + 64k  = init the console and abort "Program terminated"               8375552k + 128k = "Booting Linux...\nProgram terminated"10:50
fabbione8375552k + 256k = as above10:50
fabbione8375552k + 512k = boots but stops at the hypervisor error we knew about.10:50
fabbionei am switching to -1010:50
fabbionehttp://people.ubuntu.com/~fabbione/t2k-memboot-results.txt11:00
=== braddr captures that in his log as well.
fabbioneok i am done for today :)11:11
braddrgood progress11:12
fabbionei guess we are pretty close now 11:12
braddrI look forward to hearing with davidm has to say about it11:12
fabbioneyeah he went to sleep not too long ago11:13
fabbioneso it will be not before tomorrow11:13
=== braddr nods, "I should have, but luckily it's the weekend.
fabbionei think i will try booting the bootmem kernel on my machine to see the diff11:13
fabbionebut later :)11:13
braddrhave a good rest of the weekend11:14
fabbionethanks11:14
fabbionegood night11:14
=== ChanServ [ChanServ@services.] has joined #ubuntu-ports
=== ajmitch [n=ajmitch@203.89.166.123] has joined #ubuntu-ports
=== jb-home [n=jbailey@modemcable139.249-203-24.mc.videotron.ca] has joined #ubuntu-ports
=== ajmitch__ [n=ajmitch@203.89.166.123] has joined #ubuntu-ports
=== ajmitch [n=ajmitch@203.89.166.123] has joined #ubuntu-ports

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!