/srv/irclogs.ubuntu.com/2013/02/21/#maas.txt

=== matsubara-afk is now known as matsubara
=== matsubara is now known as matsubara-lunch
roaksoax	rvba: howdy!! thanks for work on the genericipaddress filed	15:06
roaksoax	rvba: One thing though, I don't oppose merging this upstream, however, shipping it as a patch we would have ensured that this would have only affercted precise. Now that it is being merged, it will also affect quantal	15:09
roaksoax	and in quantal we will not use GenericIpAddressFiel from django	15:09
roaksoax	rvba: so just wanted to make sure you guys are aware of that	15:09
rvba	roaksoax: Hi. In my code, I detect which version of Django we are using and only do the monkey patch if django.VERSION = 1.3	15:11
roaksoax	rvba: ok cool then :)	15:11
rvba	So the quantal version will use the field from Django itself.	15:11
rvba	I think it's better to put this upstream because it mean all that code gets tested everytime we make a change to the code (testing by the unit tests suite)	15:12
roaksoax	rvba: awesome then!!	15:12
rvba	means*	15:12
roaksoax	rvba: for sure	15:12
roaksoax	rvba: alright, so once it lands in maas/1.2 i'll prepare a new package, upload it to stable	15:30
roaksoax	rvba: upload a new django to PPA as well	15:30
roaksoax	and cleanup to start testing	15:31
rvba	roaksoax: ok, I'll land it now and then we can start testing it.	15:32
=== matsubara-lunch is now known as matsubara
roaksoax	rvba: ok it is all in ppa:maas-maintainers/experimental for testing. It is building now though	16:06
rvba	roaksoax: it's building in the daily ppa too :)	16:06
roaksoax	ok cool :)	16:06
roaksoax	rvba: once tested I'll upload to raring	16:07
roaksoax	and that should be almost all we need to SRU	16:07
rvba	roaksoax: the recent change only impacted the 1.2 branch. The raring package uses trunk.	16:08
roaksoax	rvba: right, but not in Ubuntu archives	16:08
roaksoax	rvba: ubuntu archives will have 1.2 until the SRU is done	16:08
rvba	Ah ok	16:08
roaksoax	once that happens, we can upload trunk to ubuntu archives	16:08
rvba	roaksoax: btw, we need to fix bug 1123986 before we SRU MAAS. we're in the process of fixing it.	16:09
ubot5	bug 1123986 in MAAS 1.2 "multiple juju environments in maas" [High,Triaged] https://launchpad.net/bugs/1123986	16:09
roaksoax	rvba: ok! Note that having different juju environments won't really help much though :)	16:10
roaksoax	rvba: cause the network is the same so things can still collide	16:10
roaksoax	rvba: won't help much in certain escenarios	16:10
rvba	roaksoax: yeah, but we definitely need to fix the file storage stuff in MAAS.	16:10
roaksoax	rvba: but this is definitelly cool! we were just talking about it yesterday :)	16:10
roaksoax	rvba: alright. Is there an ETA on when we can start seeing these changes landing?	16:11
rvba	roaksoax: I'd say sometime next week (most of the code is already done). Because we need to do some serious testing.	16:12
roaksoax	rvba: ok cool	16:13
roaksoax	rvba: btw.. urgency=high is a debian thing, not ubuntu :) (lp:~rvb/maas/packaging.precise.sru-high)	16:18
rvba	roaksoax: haha :)	16:18
Jon___	Howdy, anyone home?	16:37
rvba	roaksoax: btw, please have a look at bug 1131296	17:05
ubot5	bug 1131296 in maas-enlist (Ubuntu) "maas-enlist uses a wrong url when enlisting nodes (/MAAS/api/1.0/nodes//MAAS/api/1.0/nodes/)" [Undecided,New] https://launchpad.net/bugs/1131296	17:05
roaksoax	rvba: will do	17:08
roaksoax	rvba: could you provide a bit more backgroun though?	17:12
rvba	roaksoax: I think the way maas-enlist builds the url it uses to register nodes has a bug.	17:14
rvba	roaksoax: but because of a bug in MAAS itself, it works ok :).	17:14
rvba	The thing is: the bug in MAAS must be resolved.	17:14
roaksoax	rvba: for sure... I wonder what is that is being given to maas-nlist	17:53
roaksoax	rvba: as in maas-enlist -s X.y.z.a/	17:53
roaksoax	or what	17:53
=== kentb is now known as kentb-afk
racedo	hey roaksoax we have seen a pattern that when we "release" an allocated node in maas to be ready and then redeploy it after rebooting it goes to grup rescue	18:42
racedo	if we instead delete it, reenlist it and deploy something it works	18:42
racedo	s/grup/grub/	18:42
roaksoax	racedo: how do you release it and how do you reboot it?	19:05
roaksoax	racedo: it seems that on the reboot it is not being told to PXE boot! which is what maas tells the node to do when it tells it to start	19:07
roaksoax	racedo: so if you are manually rebooting the node, it won't pxe boot unless you tell it so via IPMI	19:07
racedo	roaksoax: we reboot it via ipmi	19:07
roaksoax	racedo: manually?	19:08
racedo	yes	19:08
racedo	we dont have ilo or access now	19:08
roaksoax	racedo: that's the problem then	19:08
roaksoax	racedo: if you reboot manually you need to tell ipmi to PXE boot	19:08
racedo	so the sequence is: enlist->commission->deploy then juju-destroy then deploy with constraints maas-name to a node then after reboot it goes to grub rescue	19:10
racedo	roaksoax: ack	19:10
roaksoax	racedo: so	19:10
roaksoax	ipmi-chassis-config ${driver_option} -h ${power_address} -u ${power_user} -p ${power_pass} --commit --filename ${config}	19:10
roaksoax	ipmipower ${driver_option} -h ${power_address} -u ${power_user} -p ${power_pass} --cycle --on-if-off	19:10
roaksoax	where config is: http://paste.ubuntu.com/1700983/	19:10
racedo	yep Boot_Device PXE	19:11
racedo	got that	19:11
racedo	thanks roaksoax that could be it	19:11
racedo	we are confirming it right now	19:11
=== kentb-afk is now known as kentb
negronjl	roaksoax, I have a question re: preseed when you get a chance	19:58
roaksoax	negronjl: shoot :)	19:58
negronjl	roaksoax, I see this line in /usr/share/maas/preseeds/preseed_master: partman/early_command string debconf-set partman-auto/disk `list-devices disk \| head -n1`	19:58
negronjl	roaksoax, however, when I have seen that line in the past, I have seen it as: partman/early_command string debconf-set partman-auto/disk "$(list-devices disk \| head -n1)"	19:59
negronjl	roaksoax, will the above affect anything ?	19:59
roaksoax	negronjl: uhmmm I wouldn't know really	20:00
roaksoax	i don't think it should	20:01
roaksoax	negronjl: depends on the busybox shell I guess	20:01
roaksoax	with i believe is postfix	20:01
negronjl	roaksoax, If I change the preseed file, do I need to restart any particular service ?	20:02
negronjl	roaksoax, I mean if i change /usr/share/maas/preseeds/preseed_master BTW	20:03
roaksoax	negronjl: no, the preseeds anre rendedred at exec time	20:03
negronjl	roaksoax, ack ... thanks	20:03
=== matsubara is now known as matsubara-afk
racedo	we are getting a "No authorization header received." at the last stage of cloud init during commissioning when the nodes are accessing the maas server	20:11
racedo	that prevents them from reporting back to the maas server and go to ready state and they are stuck at commissioning	20:11
racedo	any clue?	20:11
racedo	the address the nodes are trying to contact is http://maas/MAAS/metadata/2012-03-01/	20:12
roaksoax	racedo: the DEFAULT_MAAS_URL is incorrect	20:13
racedo	roaksoax: ok, where is that?	20:13
* racedo checking		20:13
roaksoax	racedo: sudo dpkg-reconfigure maas-region-controller and enter either a hostname or ip address that is addresseable from the node's that are commissioning	20:14
roaksoax	racedo: etc/maas/maas_local_settings.py	20:14
racedo	it's the right one	20:14
roaksoax	racedo: so 'maas' is not resolvable	20:14
roaksoax	http://maas/MAAS/metadata/2012-03-01/ --> 'maas' resolves?	20:14
racedo	ok	20:14
racedo	no	20:14
racedo	it's the ip	20:14
racedo	sorry	20:15
racedo	i pasted it for privacy reasons :)	20:15
roaksoax	racedo: ah lol :), so is the address reacheable from the commissioning server?	20:15
roaksoax	racedo: as in the same network?	20:15
roaksoax	of the nodes being deployed?	20:15
racedo	if I access from my browser it says "No authorization header received."	20:15
racedo	yes, they enlist, then reboot then we accept and commission	20:15
racedo	then after cloud init we see that they want to report back to maas using that URL	20:16
racedo	and then they get the auth error 401	20:16
racedo	and get stuck in commissioning	20:16
racedo	instead of going to ready and shut down	20:16
racedo	they just shut down	20:17
roaksoax	racedo: if you access throught the browser you wont see anything because the commissioning steps does authentication	20:23
racedo	ok	20:24
roaksoax	racedo: maybe ntp issue?	20:25
roaksoax	the clocks in the maas server and the nodes are not the same?	20:25
racedo	i ssh to it during comissiong and check the date and it was fine, we went to the bios to set the right time and date too	20:25
racedo	we just rebooted maas and are trying again	20:26
roaksoax	ack	20:33
racedo	roaksoax: i'm going to share a screenshot in a sec if that's ok	20:33
roaksoax	racedo: sure	20:34
racedo	roaksoax: https://docs.google.com/a/canonical.com/file/d/0BzitEgbYskgzN0Y3X21td0RKMU0/edit?usp=sharing	20:35
racedo	you sho	20:35
racedo	should have access :)	20:35
roaksoax	racedo: yeah that's an issue with oath clocks not being synced	20:35
roaksoax	smoser: ^^	20:35
racedo	LP 978127 ?	20:36
ubot5	Launchpad bug 978127 in MAAS "incorrect time on node causes failed oauth" [Critical,Fix released] https://launchpad.net/bugs/978127	20:36
roaksoax	racedo: that seems to be the one	20:36
roaksoax	racedo: you guys are using stable ppa right?	20:36
roaksoax	smoser: was the fix for this backported to maas/1.2?	20:36
smoser	roaksoax, that bug (and its fix) are displayed there correctly.	20:37
racedo	during commissioning i'm sshing the node and the time it's right, it was 5 hours ahead this morning but not now	20:37
racedo	roaksoax: yeah we use /stable	20:37
smoser	racedo, and that system is 5 hours off the clock on the maas server	20:37
racedo	smoser: it was	20:37
racedo	not any more	20:37
smoser	it was in that screenshot	20:38
smoser	thats what its telling you.	20:38
smoser	unless you're telling me you fixed it since that screen shot.	20:38
smoser	but the INTERNAL SERVER ERROR is different.	20:38
racedo	smoser: no, i ssh during comissioning and the time is right, i did right during the time we took the screenshot	20:38
roaksoax	racedo: also please pastebin apache2's error log	20:38
smoser	i suspect you have something in your maas logs	20:38
racedo	ok	20:39
roaksoax	racedo: is the MAAS server with the same time too?	20:39
racedo	yeah	20:39
smoser	racedo, that system and the maas server disagree on the time. by 18000 seconds.	20:39
smoser	theres little doubt in my mind on that.	20:39
smoser	'date --utc'	20:40
smoser	on both	20:40
racedo	ok	20:40
smoser	i think you have differing clocks, but i dont think thats the whole issue. the fact that the client is re-setting its clock indicates that its working around the issue.	20:43
negronjl	roaksoax, smoser: apache log with errors here: https://pastebin.canonical.com/85324/	20:43
smoser	negronjl, /var/log/apache2/errors.log	20:44
smoser	or something to that effect.	20:44
smoser	you're shoing me access log (i htink)	20:44
racedo	smoser: roaksoax https://pastebin.canonical.com/85326/	20:45
racedo	check lines 4 and 50	20:45
smoser	right. its 5 hours off.	20:46
roaksoax	racedo: Thu Feb 21 20:43:44 UTC 2013 commissioning node: Thu Feb 21 15:43:42 UTC 2013	20:46
negronjl	smoser: https://pastebin.canonical.com/85327/	20:46
roaksoax	times are different	20:46
smoser	(isnt that what i said?)	20:46
racedo	oh!	20:47
roaksoax	racedo: that's your issue	20:47
smoser	its not the issue.	20:47
racedo	i was being too slow with date then :)	20:47
smoser	unless its causing fallout from maas/longpoll	20:47
roaksoax	smoser: i think txlongpoll is just for UI related stuff isn't it?	20:48
smoser	i dont know. but the error in the screenshot says INTERNAL ERROR	20:49
smoser	and the log says INTERNAL ERROR	20:49
roaksoax	negronjl: restart maas-txlongpoll	20:50
negronjl	roaksoax, done	20:51
roaksoax	smoser: maybe it is related... though I think we saw that too lkast time on the drill	20:51
roaksoax	negronjl: so there's 2 things it seems. 1. the clock skew, 2. txlongpoll	20:51
negronjl	roaksoax, checking the txlongpoll on logs to see if it is still there	20:51
racedo	roaksoax: but the time in the maas server is set to EST even if the BIOS has UTC, is that an issue?	20:52
racedo	roaksoax: https://pastebin.canonical.com/85328/	20:53
smoser	racedo, in ubuntu bios clock always has utc.	20:53
smoser	(i thikn there are some cases where if you're dual booting it will try to not use utc, but you want utc)	20:53
racedo	smoser: should i go to the BIOS and change it to match EST	20:54
racedo	?	20:54
smoser	that is fine. all checking is done on actual time.	20:54
racedo	ok	20:54
smoser	you want bios set to utc. on both boxes.	20:54
racedo	smoser: we changed it in the BIOS of the commissioning node just in case, then we are changing it back to UTC	20:55
* roaksoax brb		20:56
smoser	racedo, fwiw, i'm pretty sure you could just run 'sudo ntpdate pool.ntp.org' and reboot. and i think that will get it fixed.	20:56
smoser	(because on system shutdown, the current clock is copied to bios clock)	20:56
racedo	smoser: thats right	20:57
racedo	smoser: ok done	20:58
smoser	that will likely fix the oauth complaints, but i think you'll still see the internal server error.	21:00
racedo	smoser: you are right, now it only says internal server error	21:00
racedo	this is what i see in the maas apache error from that client when commissioning: https://pastebin.canonical.com/85331/	21:06
smoser	roaksoax, how do we get more info on that ?	21:06
negronjl	smoser, roaksoax: increasing the debug level in apache ...	21:08
smoser	not apache.	21:08
smoser	maybe in maas.	21:09
smoser	i'm pretty sure its comoing from maas.	21:09
negronjl	smoser: ok	21:09
smoser	we should be able to get a maas stack trace some where.	21:09
roaksoax	maas.log	21:09
roaksoax	celery logas	21:10
roaksoax	and txlongpoll logs	21:10
roaksoax	racedo pastebin those please	21:10
racedo	roaksoax: https://pastebin.canonical.com/85334/ is maas.log	21:12
racedo	after increasing the apache log to debug nothing changed from above apache2 access log	21:13
racedo	the 500 errors are logged in the access log rather than the error log	21:14
roaksoax	racedo did you guys crrate any tags? that error is weird in maas.log	21:15
racedo	roaksoax: no	21:15
racedo	roaksoax: we created constraints	21:15
racedo	and actually we are not getting the maas-name constraint to match the nodes names	21:15
roaksoax	maybe thats related	21:17
roaksoax	i have little to no knowledge in the constraints system	21:18
racedo	now there's no zookeeper	21:18
racedo	no constraints, just maas	21:18
racedo	we can reinstall maas :)	21:18
roaksoax	will need to check logs to see ig any upstream commit might havre regressed something	21:18
roaksoax	could you file a bug with that error log?	21:19
racedo	yes	21:19
racedo	roaksoax: https://bugs.launchpad.net/maas/+bug/1131418	21:29
ubot5	Launchpad bug 1131418 in MAAS "Nodes don't go to ready, after commissioning they get a 500 error when reporting back to maas" [Undecided,New]	21:29
racedo	roaksoax: as we need to finish this, we may reinstall maas now and do exactly the same steps we were following	21:29
roaksoax	racedo: ok i' testing this in my local virtual environment	21:29
racedo	roaksoax: I'm sharing a doc with you of the exact steps we took from the very beginning	21:30
racedo	roaksoax: i shared with you a dump of all the http traffic between the client and the maas server to debug with wireshark if that helps	21:48
roaksoax	racedo: ack thanks	22:11
roaksoax	racedo: still around?	23:22
racedo	roaksoax: yes	23:22
roaksoax	racedo: i don't think this is needed: juju set-constraints maas-name=	23:22
roaksoax	not anymore with newer maas	23:22
racedo	oh ok	23:23
racedo	but doesn't the constraint stay until wiped?	23:23
roaksoax	racedo: nah, the reason why that was done was because there was a bug , but IIRC that was fixed	23:23
roaksoax	racedo: juju add-unit --constraints doens'yt work	23:24
roaksoax	?	23:24
racedo	roaksoax: what we were doing is specify every time what node we are deploying the server to	23:24
racedo	oh	23:24
racedo	i see what you mean	23:24
racedo	ok, thanks roaksoax	23:25
negronjl	roaksoax, didn't know that add-unit took constraints ... that saves us time	23:25
roaksoax	negronjl: I think that works	23:26
negronjl	roaksoax, not according to the juju help but, I'll give it a try anyway	23:26
roaksoax	let me check	23:26
negronjl	roaksoax, thx	23:27
roaksoax	negronjl: yeah it doesn't :(	23:28
roaksoax	i thought it did	23:28
negronjl	roaksoax, thx	23:28
roaksoax	racedo: so yeah after deploying swift you have to clean the constraints	23:29
roaksoax	becuase you are setting them globally	23:29
roaksoax	but in cases like the bootstrap I think you have tro	23:29
roaksoax	because you only set them for that deployment in particular	23:29
roaksoax	racedo: ok just tested commissioning with latest from maas-maintainers/stable and it commissioned just fine	23:30
racedo	ok, with constraints and stuff?	23:30
=== kentb is now known as kentb-out
roaksoax	racedo: i'm testing that now	23:32
racedo	cool thx	23:32
roaksoax	racedo: ok bootstrap with constraint went fine. I commissioned nodes after bootstrap, also went fine	23:43
racedo	ok, juan is doing it as well here in parallel	23:48
* roaksoax waitingf for the bootstrap to finish		23:52

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!