=== matsubara-afk is now known as matsubara | ||
=== matsubara is now known as matsubara-lunch | ||
roaksoax | rvba: howdy!! thanks for work on the genericipaddress filed | 15:06 |
---|---|---|
roaksoax | rvba: One thing though, I don't oppose merging this upstream, however, shipping it as a patch we would have ensured that this would have only affercted precise. Now that it is being merged, it will also affect quantal | 15:09 |
roaksoax | and in quantal we will not use GenericIpAddressFiel from django | 15:09 |
roaksoax | rvba: so just wanted to make sure you guys are aware of that | 15:09 |
rvba | roaksoax: Hi. In my code, I detect which version of Django we are using and only do the monkey patch if django.VERSION = 1.3 | 15:11 |
roaksoax | rvba: ok cool then :) | 15:11 |
rvba | So the quantal version will use the field from Django itself. | 15:11 |
rvba | I think it's better to put this upstream because it mean all that code gets tested everytime we make a change to the code (testing by the unit tests suite) | 15:12 |
roaksoax | rvba: awesome then!! | 15:12 |
rvba | means* | 15:12 |
roaksoax | rvba: for sure | 15:12 |
roaksoax | rvba: alright, so once it lands in maas/1.2 i'll prepare a new package, upload it to stable | 15:30 |
roaksoax | rvba: upload a new django to PPA as well | 15:30 |
roaksoax | and cleanup to start testing | 15:31 |
rvba | roaksoax: ok, I'll land it now and then we can start testing it. | 15:32 |
=== matsubara-lunch is now known as matsubara | ||
roaksoax | rvba: ok it is all in ppa:maas-maintainers/experimental for testing. It is building now though | 16:06 |
rvba | roaksoax: it's building in the daily ppa too :) | 16:06 |
roaksoax | ok cool :) | 16:06 |
roaksoax | rvba: once tested I'll upload to raring | 16:07 |
roaksoax | and that should be almost all we need to SRU | 16:07 |
rvba | roaksoax: the recent change only impacted the 1.2 branch. The raring package uses trunk. | 16:08 |
roaksoax | rvba: right, but not in Ubuntu archives | 16:08 |
roaksoax | rvba: ubuntu archives will have 1.2 until the SRU is done | 16:08 |
rvba | Ah ok | 16:08 |
roaksoax | once that happens, we can upload trunk to ubuntu archives | 16:08 |
rvba | roaksoax: btw, we need to fix bug 1123986 before we SRU MAAS. we're in the process of fixing it. | 16:09 |
ubot5 | bug 1123986 in MAAS 1.2 "multiple juju environments in maas" [High,Triaged] https://launchpad.net/bugs/1123986 | 16:09 |
roaksoax | rvba: ok! Note that having different juju environments won't really help much though :) | 16:10 |
roaksoax | rvba: cause the network is the same so things can still collide | 16:10 |
roaksoax | rvba: won't help much in certain escenarios | 16:10 |
rvba | roaksoax: yeah, but we definitely need to fix the file storage stuff in MAAS. | 16:10 |
roaksoax | rvba: but this is definitelly cool! we were just talking about it yesterday :) | 16:10 |
roaksoax | rvba: alright. Is there an ETA on when we can start seeing these changes landing? | 16:11 |
rvba | roaksoax: I'd say sometime next week (most of the code is already done). Because we need to do some serious testing. | 16:12 |
roaksoax | rvba: ok cool | 16:13 |
roaksoax | rvba: btw.. urgency=high is a debian thing, not ubuntu :) (lp:~rvb/maas/packaging.precise.sru-high) | 16:18 |
rvba | roaksoax: haha :) | 16:18 |
Jon___ | Howdy, anyone home? | 16:37 |
rvba | roaksoax: btw, please have a look at bug 1131296 | 17:05 |
ubot5 | bug 1131296 in maas-enlist (Ubuntu) "maas-enlist uses a wrong url when enlisting nodes (/MAAS/api/1.0/nodes//MAAS/api/1.0/nodes/)" [Undecided,New] https://launchpad.net/bugs/1131296 | 17:05 |
roaksoax | rvba: will do | 17:08 |
roaksoax | rvba: could you provide a bit more backgroun though? | 17:12 |
rvba | roaksoax: I think the way maas-enlist builds the url it uses to register nodes has a bug. | 17:14 |
rvba | roaksoax: but because of a bug in MAAS itself, it works ok :). | 17:14 |
rvba | The thing is: the bug in MAAS must be resolved. | 17:14 |
roaksoax | rvba: for sure... I wonder what is that is being given to maas-nlist | 17:53 |
roaksoax | rvba: as in maas-enlist -s X.y.z.a/ | 17:53 |
roaksoax | or what | 17:53 |
=== kentb is now known as kentb-afk | ||
racedo | hey roaksoax we have seen a pattern that when we "release" an allocated node in maas to be ready and then redeploy it after rebooting it goes to grup rescue | 18:42 |
racedo | if we instead delete it, reenlist it and deploy something it works | 18:42 |
racedo | s/grup/grub/ | 18:42 |
roaksoax | racedo: how do you release it and how do you reboot it? | 19:05 |
roaksoax | racedo: it seems that on the reboot it is not being told to PXE boot! which is what maas tells the node to do when it tells it to start | 19:07 |
roaksoax | racedo: so if you are manually rebooting the node, it won't pxe boot unless you tell it so via IPMI | 19:07 |
racedo | roaksoax: we reboot it via ipmi | 19:07 |
roaksoax | racedo: manually? | 19:08 |
racedo | yes | 19:08 |
racedo | we dont have ilo or access now | 19:08 |
roaksoax | racedo: that's the problem then | 19:08 |
roaksoax | racedo: if you reboot manually you need to tell ipmi to PXE boot | 19:08 |
racedo | so the sequence is: enlist->commission->deploy then juju-destroy then deploy with constraints maas-name to a node then after reboot it goes to grub rescue | 19:10 |
racedo | roaksoax: ack | 19:10 |
roaksoax | racedo: so | 19:10 |
roaksoax | ipmi-chassis-config ${driver_option} -h ${power_address} -u ${power_user} -p ${power_pass} --commit --filename ${config} | 19:10 |
roaksoax | ipmipower ${driver_option} -h ${power_address} -u ${power_user} -p ${power_pass} --cycle --on-if-off | 19:10 |
roaksoax | where config is: http://paste.ubuntu.com/1700983/ | 19:10 |
racedo | yep Boot_Device PXE | 19:11 |
racedo | got that | 19:11 |
racedo | thanks roaksoax that could be it | 19:11 |
racedo | we are confirming it right now | 19:11 |
=== kentb-afk is now known as kentb | ||
negronjl | roaksoax, I have a question re: preseed when you get a chance | 19:58 |
roaksoax | negronjl: shoot :) | 19:58 |
negronjl | roaksoax, I see this line in /usr/share/maas/preseeds/preseed_master: partman/early_command string debconf-set partman-auto/disk `list-devices disk | head -n1` | 19:58 |
negronjl | roaksoax, however, when I have seen that line in the past, I have seen it as: partman/early_command string debconf-set partman-auto/disk "$(list-devices disk | head -n1)" | 19:59 |
negronjl | roaksoax, will the above affect anything ? | 19:59 |
roaksoax | negronjl: uhmmm I wouldn't know really | 20:00 |
roaksoax | i don't think it should | 20:01 |
roaksoax | negronjl: depends on the busybox shell I guess | 20:01 |
roaksoax | with i believe is postfix | 20:01 |
negronjl | roaksoax, If I change the preseed file, do I need to restart any particular service ? | 20:02 |
negronjl | roaksoax, I mean if i change /usr/share/maas/preseeds/preseed_master BTW | 20:03 |
roaksoax | negronjl: no, the preseeds anre rendedred at exec time | 20:03 |
negronjl | roaksoax, ack ... thanks | 20:03 |
=== matsubara is now known as matsubara-afk | ||
racedo | we are getting a "No authorization header received." at the last stage of cloud init during commissioning when the nodes are accessing the maas server | 20:11 |
racedo | that prevents them from reporting back to the maas server and go to ready state and they are stuck at commissioning | 20:11 |
racedo | any clue? | 20:11 |
racedo | the address the nodes are trying to contact is http://maas/MAAS/metadata/2012-03-01/ | 20:12 |
roaksoax | racedo: the DEFAULT_MAAS_URL is incorrect | 20:13 |
racedo | roaksoax: ok, where is that? | 20:13 |
* racedo checking | 20:13 | |
roaksoax | racedo: sudo dpkg-reconfigure maas-region-controller and enter either a hostname or ip address that is addresseable from the node's that are commissioning | 20:14 |
roaksoax | racedo: etc/maas/maas_local_settings.py | 20:14 |
racedo | it's the right one | 20:14 |
roaksoax | racedo: so 'maas' is not resolvable | 20:14 |
roaksoax | http://maas/MAAS/metadata/2012-03-01/ --> 'maas' resolves? | 20:14 |
racedo | ok | 20:14 |
racedo | no | 20:14 |
racedo | it's the ip | 20:14 |
racedo | sorry | 20:15 |
racedo | i pasted it for privacy reasons :) | 20:15 |
roaksoax | racedo: ah lol :), so is the address reacheable from the commissioning server? | 20:15 |
roaksoax | racedo: as in the *same* network? | 20:15 |
roaksoax | of the nodes being deployed? | 20:15 |
racedo | if I access from my browser it says "No authorization header received." | 20:15 |
racedo | yes, they enlist, then reboot then we accept and commission | 20:15 |
racedo | then after cloud init we see that they want to report back to maas using that URL | 20:16 |
racedo | and then they get the auth error 401 | 20:16 |
racedo | and get stuck in commissioning | 20:16 |
racedo | instead of going to ready and shut down | 20:16 |
racedo | they just shut down | 20:17 |
roaksoax | racedo: if you access throught the browser you wont see anything because the commissioning steps does authentication | 20:23 |
racedo | ok | 20:24 |
roaksoax | racedo: maybe ntp issue? | 20:25 |
roaksoax | the clocks in the maas server and the nodes are not the same? | 20:25 |
racedo | i ssh to it during comissiong and check the date and it was fine, we went to the bios to set the right time and date too | 20:25 |
racedo | we just rebooted maas and are trying again | 20:26 |
roaksoax | ack | 20:33 |
racedo | roaksoax: i'm going to share a screenshot in a sec if that's ok | 20:33 |
roaksoax | racedo: sure | 20:34 |
racedo | roaksoax: https://docs.google.com/a/canonical.com/file/d/0BzitEgbYskgzN0Y3X21td0RKMU0/edit?usp=sharing | 20:35 |
racedo | you sho | 20:35 |
racedo | should have access :) | 20:35 |
roaksoax | racedo: yeah that's an issue with oath clocks not being synced | 20:35 |
roaksoax | smoser: ^^ | 20:35 |
racedo | LP 978127 ? | 20:36 |
ubot5 | Launchpad bug 978127 in MAAS "incorrect time on node causes failed oauth" [Critical,Fix released] https://launchpad.net/bugs/978127 | 20:36 |
roaksoax | racedo: that seems to be the one | 20:36 |
roaksoax | racedo: you guys are using stable ppa right? | 20:36 |
roaksoax | smoser: was the fix for this backported to maas/1.2? | 20:36 |
smoser | roaksoax, that bug (and its fix) are displayed there correctly. | 20:37 |
racedo | during commissioning i'm sshing the node and the time it's right, it was 5 hours ahead this morning but not now | 20:37 |
racedo | roaksoax: yeah we use /stable | 20:37 |
smoser | racedo, and that system is 5 hours off the clock on the maas server | 20:37 |
racedo | smoser: it was | 20:37 |
racedo | not any more | 20:37 |
smoser | it was in that screenshot | 20:38 |
smoser | thats what its telling you. | 20:38 |
smoser | unless you're telling me you fixed it since that screen shot. | 20:38 |
smoser | but the INTERNAL SERVER ERROR is different. | 20:38 |
racedo | smoser: no, i ssh during comissioning and the time is right, i did right during the time we took the screenshot | 20:38 |
roaksoax | racedo: also please pastebin apache2's error log | 20:38 |
smoser | i suspect you have something in your maas logs | 20:38 |
racedo | ok | 20:39 |
roaksoax | racedo: is the MAAS server with the same time too? | 20:39 |
racedo | yeah | 20:39 |
smoser | racedo, that system and the maas server disagree on the time. by 18000 seconds. | 20:39 |
smoser | theres little doubt in my mind on that. | 20:39 |
smoser | 'date --utc' | 20:40 |
smoser | on both | 20:40 |
racedo | ok | 20:40 |
smoser | i think you have differing clocks, but i dont think thats the whole issue. the fact that the client is re-setting its clock indicates that its working around the issue. | 20:43 |
negronjl | roaksoax, smoser: apache log with errors here: https://pastebin.canonical.com/85324/ | 20:43 |
smoser | negronjl, /var/log/apache2/errors.log | 20:44 |
smoser | or something to that effect. | 20:44 |
smoser | you're shoing me access log (i htink) | 20:44 |
racedo | smoser: roaksoax https://pastebin.canonical.com/85326/ | 20:45 |
racedo | check lines 4 and 50 | 20:45 |
smoser | right. its 5 hours off. | 20:46 |
roaksoax | racedo: Thu Feb 21 20:43:44 UTC 2013 commissioning node: Thu Feb 21 15:43:42 UTC 2013 | 20:46 |
negronjl | smoser: https://pastebin.canonical.com/85327/ | 20:46 |
roaksoax | times are different | 20:46 |
smoser | (isnt that what i said?) | 20:46 |
racedo | oh! | 20:47 |
roaksoax | racedo: that's your issue | 20:47 |
smoser | its not the issue. | 20:47 |
racedo | i was being too slow with date then :) | 20:47 |
smoser | unless its causing fallout from maas/longpoll | 20:47 |
roaksoax | smoser: i think txlongpoll is just for UI related stuff isn't it? | 20:48 |
smoser | i dont know. but the error in the screenshot says INTERNAL ERROR | 20:49 |
smoser | and the log says INTERNAL ERROR | 20:49 |
roaksoax | negronjl: restart maas-txlongpoll | 20:50 |
negronjl | roaksoax, done | 20:51 |
roaksoax | smoser: maybe it is related... though I think we saw that too lkast time on the drill | 20:51 |
roaksoax | negronjl: so there's 2 things it seems. 1. the clock skew, 2. txlongpoll | 20:51 |
negronjl | roaksoax, checking the txlongpoll on logs to see if it is still there | 20:51 |
racedo | roaksoax: but the time in the maas server is set to EST even if the BIOS has UTC, is that an issue? | 20:52 |
racedo | roaksoax: https://pastebin.canonical.com/85328/ | 20:53 |
smoser | racedo, in ubuntu bios clock always has utc. | 20:53 |
smoser | (i thikn there are some cases where if you're dual booting it will try to not use utc, but you want utc) | 20:53 |
racedo | smoser: should i go to the BIOS and change it to match EST | 20:54 |
racedo | ? | 20:54 |
smoser | that is fine. all checking is done on actual time. | 20:54 |
racedo | ok | 20:54 |
smoser | you want bios set to utc. on both boxes. | 20:54 |
racedo | smoser: we changed it in the BIOS of the commissioning node just in case, then we are changing it back to UTC | 20:55 |
* roaksoax brb | 20:56 | |
smoser | racedo, fwiw, i'm pretty sure you could just run 'sudo ntpdate pool.ntp.org' and reboot. and i think that will get it fixed. | 20:56 |
smoser | (because on system shutdown, the current clock is copied to bios clock) | 20:56 |
racedo | smoser: thats right | 20:57 |
racedo | smoser: ok done | 20:58 |
smoser | that will likely fix the oauth complaints, but i think you'll still see the internal server error. | 21:00 |
racedo | smoser: you are right, now it only says internal server error | 21:00 |
racedo | this is what i see in the maas apache error from that client when commissioning: https://pastebin.canonical.com/85331/ | 21:06 |
smoser | roaksoax, how do we get more info on that ? | 21:06 |
negronjl | smoser, roaksoax: increasing the debug level in apache ... | 21:08 |
smoser | not apache. | 21:08 |
smoser | maybe in maas. | 21:09 |
smoser | i'm pretty sure its comoing from maas. | 21:09 |
negronjl | smoser: ok | 21:09 |
smoser | we should be able to get a maas stack trace some where. | 21:09 |
roaksoax | maas.log | 21:09 |
roaksoax | celery logas | 21:10 |
roaksoax | and txlongpoll logs | 21:10 |
roaksoax | racedo pastebin those please | 21:10 |
racedo | roaksoax: https://pastebin.canonical.com/85334/ is maas.log | 21:12 |
racedo | after increasing the apache log to debug nothing changed from above apache2 access log | 21:13 |
racedo | the 500 errors are logged in the access log rather than the error log | 21:14 |
roaksoax | racedo did you guys crrate any tags? that error is weird in maas.log | 21:15 |
racedo | roaksoax: no | 21:15 |
racedo | roaksoax: we created constraints | 21:15 |
racedo | and actually we are not getting the maas-name constraint to match the nodes names | 21:15 |
roaksoax | maybe thats related | 21:17 |
roaksoax | i have little to no knowledge in the constraints system | 21:18 |
racedo | now there's no zookeeper | 21:18 |
racedo | no constraints, just maas | 21:18 |
racedo | we can reinstall maas :) | 21:18 |
roaksoax | will need to check logs to see ig any upstream commit might havre regressed something | 21:18 |
roaksoax | could you file a bug with that error log? | 21:19 |
racedo | yes | 21:19 |
racedo | roaksoax: https://bugs.launchpad.net/maas/+bug/1131418 | 21:29 |
ubot5 | Launchpad bug 1131418 in MAAS "Nodes don't go to ready, after commissioning they get a 500 error when reporting back to maas" [Undecided,New] | 21:29 |
racedo | roaksoax: as we need to finish this, we may reinstall maas now and do exactly the same steps we were following | 21:29 |
roaksoax | racedo: ok i' testing this in my local virtual environment | 21:29 |
racedo | roaksoax: I'm sharing a doc with you of the exact steps we took from the very beginning | 21:30 |
racedo | roaksoax: i shared with you a dump of all the http traffic between the client and the maas server to debug with wireshark if that helps | 21:48 |
roaksoax | racedo: ack thanks | 22:11 |
roaksoax | racedo: still around? | 23:22 |
racedo | roaksoax: yes | 23:22 |
roaksoax | racedo: i don't think this is needed: juju set-constraints maas-name= | 23:22 |
roaksoax | not anymore with newer maas | 23:22 |
racedo | oh ok | 23:23 |
racedo | but doesn't the constraint stay until wiped? | 23:23 |
roaksoax | racedo: nah, the reason why that was done was because there was a bug , but IIRC that was fixed | 23:23 |
roaksoax | racedo: juju add-unit --constraints doens'yt work | 23:24 |
roaksoax | ? | 23:24 |
racedo | roaksoax: what we were doing is specify every time what node we are deploying the server to | 23:24 |
racedo | oh | 23:24 |
racedo | i see what you mean | 23:24 |
racedo | ok, thanks roaksoax | 23:25 |
negronjl | roaksoax, didn't know that add-unit took constraints ... that saves us time | 23:25 |
roaksoax | negronjl: I think that works | 23:26 |
negronjl | roaksoax, not according to the juju help but, I'll give it a try anyway | 23:26 |
roaksoax | let me check | 23:26 |
negronjl | roaksoax, thx | 23:27 |
roaksoax | negronjl: yeah it doesn't :( | 23:28 |
roaksoax | i thought it did | 23:28 |
negronjl | roaksoax, thx | 23:28 |
roaksoax | racedo: so yeah after deploying swift you have to clean the constraints | 23:29 |
roaksoax | becuase you are setting them globally | 23:29 |
roaksoax | but in cases like the bootstrap I think you have tro | 23:29 |
roaksoax | because you only set them for that deployment in particular | 23:29 |
roaksoax | racedo: ok just tested commissioning with latest from maas-maintainers/stable and it commissioned just fine | 23:30 |
racedo | ok, with constraints and stuff? | 23:30 |
=== kentb is now known as kentb-out | ||
roaksoax | racedo: i'm testing that now | 23:32 |
racedo | cool thx | 23:32 |
roaksoax | racedo: ok bootstrap with constraint went fine. I commissioned nodes after bootstrap, also went fine | 23:43 |
racedo | ok, juan is doing it as well here in parallel | 23:48 |
* roaksoax waitingf for the bootstrap to finish | 23:52 |
Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!