/srv/irclogs.ubuntu.com/2015/02/19/#maas.txt

catbus1travnewmatic: so it didn't get an IP address from the MAAS?00:33
travnewmaticcatbus1, it did get an ip address from maas02:55
catbus1but it didn't get pxe files from MAAS?02:56
travnewmaticits like the maas server gave up on it after a certain period of time02:57
travnewmatictimed out02:57
travnewmaticERRORWed, 18 Feb. 2015 16:55:46Failed to power on node — Timeout after 7 tries02:58
catbus1it failed to power on the node02:58
catbus1back to power control IPMI again02:59
catbus1does the node respond to the manual ipmipower command?02:59
travnewmaticumm that i have not tried03:02
travnewmaticthe node powers back on03:02
travnewmaticbut like i said it does this bootloop thing03:02
travnewmaticit never makes it out of the bios03:02
travnewmaticand then after a while it does03:03
catbus1that sounds like a firmware issue03:04
travnewmaticyeah03:04
travnewmaticwe dont make much of use of the dracs on those models03:04
travnewmaticso keeping them up to date isnt a priority03:05
travnewmaticis it an outdated thing you suspect or a misconfiguration, or both03:05
catbus1outdated I think, I would update the firmware. Most of the time it fixes all the odd issues. Do you have any other server than 1950?03:06
travnewmaticmany many other types03:08
travnewmaticbut it'd require some rearranging03:08
travnewmaticall dell03:08
travnewmatici was thinking to try another model03:08
travnewmaticbut we have a ton of these older 1950's just sitting around03:09
travnewmaticso it'd be pretty cool if we could get it to work on those03:09
travnewmaticnot to mention we have ram out the ass for those03:09
catbus1In the first phase, the discovery/enlistment phase, MAAS would set a new IPMI username/password so it can power control the node from that point on. All you need to do is commission it and it will be ready to be deployed. You don't have to manually power on/off or change any BIOS settings.03:09
travnewmaticright03:09
travnewmaticthen that will be my task for tomorrow03:10
travnewmaticfiguring out how to update the firmware in the 1950's03:10
catbus1sounds fun03:10
catbus1don't brick any03:10
travnewmatic:D03:11
travnewmaticis there a way to check the version?03:11
travnewmatici think i found it03:12
travnewmaticactually nope i didnt03:14
travnewmaticso i walked out there and checked03:21
travnewmatic"Remote Access Configuration Utility Copyright 2006 Dell Inc. All Rights Reserved 1.21"03:21
travnewmaticBaseboard management controller revision 1.7703:22
travnewmaticPrimary backplane firmware revision 1.0503:22
catbus1I am not familiar with Dell servers.03:27
travnewmaticpoop03:29
travnewmatictheres also this http://www.dell.com/support/home/us/en/19/Drivers/DriversDetails?driverId=CH9TW03:30
travnewmaticactually this seems to be the most relevant http://www.dell.com/support/home/us/en/19/Drivers/DriversDetails?driverId=D8GP903:43
travnewmaticso i'm either going to need to install windows on it03:43
travnewmaticor hope that red-hat version also works with centos and install that03:44
travnewmaticagain thanks for the help, i'll see what i can figure out tomorrow03:50
travnewmaticyall have a good rest of the day03:50
=== frankban__ is now known as frankban
=== racedo_ is now known as racedo
nashvillecan anyone give me a hand i am implementing a maas server and it all works it is just SUPER slow on loading boot-kernel any ideas anyone?15:09
kikonashville, yes, sure15:09
nashvillekiko hello old friend15:10
kikonashville, very slow tftp suggests a networking problem15:10
nashvillehmmm15:10
kikothings to look for15:10
kikoifconfig drops errors15:10
kikoswitch configuration throttling udp15:10
kikothe latter is VERY common if you are using a "good switch"15:11
kikoeverything that could mess up IP traffic could apply here15:11
kikoMTU issues15:11
kikoetc15:11
nashvillei have actually gotten it to register a node but cant get it to commission and it took forever to register15:11
kikotftp is particularly sensitive15:11
kikobecause it's a udp-only protocol15:12
kikotcpdump on the server side can give you some hints15:12
nashvilleok thats what i have been thinking it is too15:12
nashvilletime to look at the switches15:12
kikoif you can replace the switches and cables one by one15:12
nashvillemtu should be like 1500 im guessing15:12
kikoand you'll find it15:12
kikonormally on a lan it's 150015:12
kikobut some people have weirder setups15:12
nashvilleits running on a hyper-v virtual machine15:12
kikooh!15:13
nashvillethats where my server is15:13
nashvillei was kinda wondering if i have something about the vm wrong15:13
kikohttp://www.altaro.com/hyper-v/troubleshooting-the-hyper-v-virtual-switch-part-1/15:13
kikohttps://mssecbyben.wordpress.com/2014/11/02/losing-udp-traffic-in-a-hyper-v-environment-with-nvgre-and-cant-explain-why/15:13
kikonashville, j^2 and I spent a day looking at related problems and ended up replacing the NIC on the maas host which was dropping packets like crazy15:14
nashvillelike inside the server you replaced the actual hardware?15:15
kikoyes15:15
kikoifconfig showed the way15:15
j^2nashville: yep, we went through the whole 7 layers and it turns out it was a layer 1 problem15:15
nashvilleugh thats nasty i really hope thats not it15:16
nashvilleok for some reason when i do ifconfig as the maas user it gives me problems15:17
nashvillethat doesnt matter it happens on my working server15:18
kikonashville, sudo ifconfig -a15:22
kikonashville, as your user may lack the right caps15:22
nashvilleafter some reading thanks kiko i think my problem is vmq gonna disable that and retest and ill let you guys know thanks kiko15:44
=== roadmr is now known as roadmr_afk
kikonashville, let me know if it works and I'll release note16:10
=== med_` is now known as medberry
=== medberry is now known as medberry2
nashville@kiko well that was it buddy disabled vmq in hyper-v (2012r2) and its BLAZING now16:25
nashvilleyou are the man/woman kiko thanks a ton16:25
kikonashville, it's those pesky windows people16:25
kikoheh16:25
kikoenjoy16:25
kikonashville, how many nodes?16:25
kikoand why did you chose to run maas inside hyper-v?16:25
kikoare you a windows shop?16:25
nashvilleright now just ten but it is easily going to climb to the thousands16:25
nashvilleno actually i dont know why they did it that way16:26
kikohmm16:26
kikoit does make testing easier16:26
nashvillebasically i think they bought the server and didnt know what to do with it so they put 2012 on there and hyper-v16:26
kikobut not running on the bare metal can make the networking a bit more complex16:26
kikoI see16:26
kikowho is they, procurement?16:26
nashvilleim just a contractor here so i dont know the whole story16:26
nashvillemore like "engineering"16:26
nashvilleheh16:27
kikoheh16:27
nashvillethis makes twice you have saved me during this project kiko let me know where you would like your beer sent16:28
kikonashville, I want a blog post actually :)16:29
nashvillei could do that... but kiko, honestly i basically followed exactly what you sent me16:29
nashvillespecifically:16:29
kikoI know! but a blog post that says "maas works great" and "here were the gotchas" is great publicity16:30
nashvillehttp://www.altaro.com/hyper-v/troubleshooting-the-hyper-v-virtual-switch-part-1/16:30
nashvillepart two16:30
nashvilleoh16:30
nashvilleyeah i can def do that!!!16:30
kikoi.e. something like I did for dhcp/dns here: http://kiko.ghost.io/things-i-wish-id-known-about-nsupdate-and-dynamic-dns-updates/16:30
nashvillewhen i get off tonight ill write one up and put it in the chanel so you can see it16:30
kikocool thanks!16:32
jamespageroaksoax, https://bugs.launchpad.net/juju-core/+bug/1423626 fyi17:08
ubot5Launchpad bug 1423626 in maas (Ubuntu) "Inconsistent device naming depending on install method" [Undecided,New]17:08
jamespageroaksoax, I spent most of today helping a guy in #juju who could not get his LXC containers to get network addresses :(17:08
kikojamespage, hmm!17:10
jamespagekiko, my exact comment was "!!!eeek"17:10
kikojamespage, I bet roaksoax would say "d-i is going away"17:11
kikojamespage, but the actual bug if I understand your comment is with d-i, which by using biosdevname to install causes the interface to be named "incorrectly" based on the commissioning data17:12
kikoI guess the easiest solution would for d-i not to install biosdevname17:12
kikowell, but is that easy?17:12
jamespageincorrect/inconsistent17:12
jamespagekiko, I'm not convinced that MAAS is doing anything wrong - maybe Juju should be using MAC address and not name to identity the primary network interface ?17:13
kikojamespage, perhaps, though I'm not sure maas encodes the interface name in its own logic in places17:14
kikoor not sure maas doesn't encode17:14
kikoat any rate, perhaps asking if juju /could/ do that to begin with might be a good starting point17:14
kikoas it's the place where it could be made more robust17:15
kikoi.e. the theme of accepting inputs liberally17:15
dimiternthere's no way to get the NIC name by MAC address at the time juju is generating cloud-init userdata for the node AFAIK17:17
jamespagedimitern, kiko: in the new world:17:18
jamespagehttp://fedoraproject.org/wiki/Features/SystemdPredictableNetworkInterfaceNames17:18
kikodimitern, isn't the agent running and able to issue an ifconfig -a?17:18
jamespagenot sure whether we'll have that on by default for vivid - I'll check17:19
dimiternkiko, the agent is not yet running as is the machine - this happens at allocate/deploy time17:20
dimiternkiko, the agent however might do that once the machine boots17:20
dimiternkiko, but that's a bit nasty - there should be a maas api for it :)17:20
kikodimitern, the problem is that the interface name might have changed17:23
kikodimitern, so while we could return to you what maas thinks the interface name on the node is17:23
kikothat might have changed from commissioning to install17:23
kikoas in jamespage's case17:23
dimiternkiko, that's a fair point17:23
jamespagekiko, dimitern: no plans to enable that specific feature for vivid/systemd17:24
kikodimitern, well, whatever code /is/ running on the machine could figure out what the interface name is17:26
kikodimitern, what is running on the machine? an ssh session?17:26
dimiternkiko, juju polls the machine addresses as they become known via the cloud api and then tries to connect via ssh to all of them17:29
dimiternkiko, in order to do the initial bootstrap (e.g.install mongod etc.) and then starts jujud17:30
jamespagekiko, dimitern: ouch - I think I just re-opened old wounds in #ubuntu-devel17:31
jamespageapparently this was enabled by default for server d-i installs under some protest from the foundations team17:31
jamespagehttps://bugs.launchpad.net/ubuntu/+source/ubuntu-meta/+bug/134785917:32
ubot5Launchpad bug 1347859 in ubuntu-meta (Ubuntu) "Introduction of Predictable Network Interface Names (aka biosdevname) breaks working systems" [Undecided,New]17:32
dimitern"predictable" seems like the wrong term here :)17:34
roaksoaxjamespage: yes, it might indeed be a juju issue17:36
jamespagedimitern, 'accurate'17:36
jamespagethe card in slot 1 port 1 will always be addressed the same17:36
jamespageirrespective of what actually plugged into it17:36
kikojamespage, except most motherboards have on-board nics that are routed who knows how ;)17:38
dimiternjamespage, btw - I've just deployed a amd64 trusty node using d-i in my maas, and unlike before I can see "apt-cache policy biosdevname" is installed, but the NICs are still called ethX17:38
kikodimitern, right, so in that ssh session could we not inter the interface name based on the MAC?17:38
dimiternkiko, it will be rather late then - we could do it when bootcmds are run to dump some data somewhere (maybe even the correct /etc/network/interfaces using the correct names)17:40
dimiternkiko, I've commented on about this suggestion on the bug, be we need to investigate17:40
kikodimitern, uhh.. right. I am now beyond my understanding of the process :)17:40
kikothanks :)17:40
dimiternkiko, np :) thanks for the discussion - it was very useful17:42
kikothanks to you17:42
travnewmaticmorning maas crew17:55
kikomorning travnewmatic17:58
travnewmaticgot a bit of actual work today18:00
travnewmaticthen will attempt to update the ipmi firmware stuff18:00
travnewmatici'll have some help, my coworker is onboard the maas bandwagon, so i'm optimistic18:01
kikotravnewmatic, at least somebody's trying to help :) where did you leave off?18:08
kikoto isolate the issue, are you already able to control a machine via IPMI manually?18:09
travnewmaticwell maas does have some control over the hardware through ipmi18:09
travnewmaticit can turn the machine on18:09
travnewmaticand can turn the machine off18:09
travnewmaticbut theres some miscommunication going on18:09
travnewmatici mentioned the bootloops18:09
kikowhat is the latest symptom?18:09
travnewmaticthe machine will be off after having checked in with maas18:10
kikotravnewmatic, you're on 1.7.1, right?18:10
travnewmaticuuuuuh18:10
travnewmatichow to check?18:10
travnewmatici know i'm at least 1.718:10
kikodpkg -l | grep maas18:10
kikobut you should be if you used the PPA18:10
travnewmatic1.7.118:11
kikogreat!18:11
travnewmaticwoohoo! :D18:11
kikotell me about these fruit loops18:11
travnewmaticlol18:11
travnewmaticso the machine will be off after having initially checked in with the maas server18:11
travnewmaticin the maas interface i'll hit commission18:11
kikoby checked in do you mean what we call enlist?18:11
kikoand you've accepted it into your pool?18:11
travnewmaticyes, the stage before commission18:11
travnewmaticso i'll click commission18:12
travnewmaticthe machine will turn back on18:12
travnewmaticgo halfway through the bios, then turn off, and then turn back on again18:12
kikoso far so good18:12
travnewmaticand it'll do that a few times18:12
kikothrough the bios? i.e. not even attempt to pxe boot?18:12
travnewmaticdoesnt make it that far18:12
kikookay18:12
kikothis could be an issue with the fact we're telling it to network boot18:13
kikodo the NICs have netboot functionality available/enabled18:13
travnewmaticmmmm i would assume so, as it has been able to pxe boot before?18:13
travnewmaticare those different things?18:13
kikoit has?18:14
kikoI see, enlist was able to pxe boot of course18:15
kikookay18:15
kikoso if you power the machine on manually it does pxe boot18:15
kikobut if maas tells it to power on it gets stuck mid-boot18:15
kikotravnewmatic, would it be hard to video and dropbox the symptom?18:15
kikoor illegal18:15
travnewmaticlol i dont think it'd be illegal18:16
travnewmatici'd be happy to do that18:16
kikothat makes it less fun but more effective I guess18:16
travnewmaticthough i wonder if it would be better for us to invest more time into updating the firmware to be current18:16
kikoright, let's do that first18:17
travnewmaticmhm18:17
kikoand then if you are still stuck we should get a video and a model number so I can check with the certification guys18:17
travnewmaticfor sure18:28
travnewmaticthis is actually kind of cool18:30
travnewmaticto think that i could contribute to the development of a peice of software used by people all over the planet18:30
kikothat's rather the essence of free software18:37
travnewmaticyeah!18:37
kikoit's why practically all of us got into it, really18:37
travnewmaticmhm18:37
travnewmaticmy buddy here at work contributes as well18:37
travnewmaticas we work at a data center18:37
travnewmaticwe've got hardware and bandwidth out the wazoo18:38
travnewmaticso he set up a mirror18:38
travnewmaticcentos in this case18:38
kikodid you know maas can now provision centos?18:39
kikoj^2 is working on a knife cookbook that gets it all up and running18:39
kikoit's slightly unfortunate that maas is the gutter where all your hardware and networking problems run to18:39
kikoso while maas itself is probably working, your hardware seems to have a mind of its own :)18:39
travnewmaticyeah, i'm not holding maas accountable for our old janky out of date servers18:40
kikoYET18:40
travnewmatichahah18:40
kikobut soon :)18:40
travnewmaticright :D18:40
travnewmaticbut the centos thing is kind of a requirement for us18:40
travnewmaticmost of the servers we provision are either centos or windows18:40
travnewmaticwell i guess we're lucky in that ubuntu seems to have a good partnership with dell18:42
kikowith all the major vendors in reality18:45
kikobut indeed, we should be able to get any issues resolved if the hardware is relatively well supported18:45
travnewmaticso the adventure begins20:19
travnewmatickiko, do you have any experience with what we're about to attempt :D20:19
travnewmatickiko, http://www.dell.com/support/home/us/en/19/Drivers/DriversDetails?driverId=XCVN0&fileId=3078071536&osCode=LNUX&productCode=poweredge-1950&languageCode=EN&categoryId=ES20:28
travnewmaticits doing its thing!20:28
travnewmatickeep your fingers crossed20:29
travnewmatichttp://imgur.com/5wgYMVK20:34
travnewmaticits doing the boot looping before it gets out of the bios20:36
travnewmaticbut the firmware is upgraded20:37
travnewmaticstill loopin20:39
kikohmm20:39
travnewmaticbut i'm watching the log and no maas error20:39
travnewmatictheres the error20:40
travnewmaticFeb 19 14:35:31 tnewman3 maas.power: [INFO] plain-desire.local: Power state20:40
travnewmatichas changed from unknown to on.20:40
travnewmaticFeb 19 14:39:55 tnewman3 maas.power: [ERROR] Error changing power state (on)20:40
travnewmatic of node: plain-desire.local (node-a20f1e7a-b876-11e4-8a8d-0015c5ef85ed)20:40
travnewmaticFeb 19 14:39:55 tnewman3 maas.node: [INFO] plain-desire.local: Stopping moni20:40
travnewmatictor: node-a20f1e7a-b876-11e4-8a8d-0015c5ef85ed20:40
travnewmaticFeb 19 14:39:55 tnewman3 maas.node: [ERROR] plain-desire.local: Marking node20:40
travnewmatic failed: Timeout after 7 tries20:40
travnewmaticgonna try an R21020:40
kikotravnewmatic, could you get an actual video of a boot-up (i.e. from the time maas triggers the commission power-on) and /msg that to me?20:41
kikoI need to split tonight but I'll look later20:41
kikothanks20:41
travnewmaticshore20:42
travnewmaticyou have a good night!20:42
travnewmaticwill post results!20:42
travnewmatici think my troubles may have been related to spanning tree portfast not being enabled20:55
travnewmaticeverything worked good on the R21021:10
travnewmaticnow trying it again on the 1950 with the updated firmware and spanning tree portfast enabled on the port21:10
=== roadmr_afk is now known as roadmr
travnewmatickiko, we're golden21:28
travnewmaticit appears spanning-tree portfast was the culprit21:28
travnewmaticand it also works on servers that do not have upgraded firmware21:50
catbus1good news21:51
catbus1travnewmatic: so you have 1950 commissioned successfully now?21:52
travnewmaticindeed!21:52
travnewmaticso, to do, centos, static ip's, subnets21:53
=== medberry2 is now known as med_

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!