/srv/irclogs.ubuntu.com/2014/09/03/#maas.txt

=== jfarschman is now known as MilesDenver
=== jfarschman is now known as MilesDenver
=== jfarschman is now known as MilesDenver
=== jfarschman is now known as MilesDenver
=== jfarschman is now known as MilesDenver
=== jfarschman is now known as MilesDenver
=== jfarschman is now known as MilesDenver
jtvrvba: is the user-data at the end of installation only for the fast-path installer then?05:44
Valduarehows it going guys05:49
jtvHi05:52
Valduareany news on maas with arm devices05:54
=== liam_ is now known as Guest81233
=== jfarschman is now known as MilesDenver
rvbajtv: the user-data is requested before the f-p installation happens.  Now I'm not sure it happens in the case of d-i.06:35
=== CyberJacob|Away is now known as CyberJacob
jtvrvba: I'm trying it out...06:56
=== jfarschman is now known as MilesDenver
jtvdimitern: hi there — when could you talk about networking?07:35
dimiternjtv, hey, how about tomorrow or on friday?07:36
dimiternjtv, what's a good time for you?07:36
jtvAnytime before 11:00 UTC.  Except our standup is at 08:30 UTC.07:37
dimiternjtv, so how about tomorrow @ 10 UTC ?07:38
jtvYes, great.07:38
dimiternjtv, i'll send an invite, cheers07:38
dimiterni'll invite roaksoax and jam if they want to join07:39
jtvSure.07:39
dimiternjtv, actually, do you mind if we move it 30m earlier - 9:30 UTC, as it will overlap with our standup :)07:40
jtvBetter for me actually!07:40
dimiterngreat! invite sent07:43
ramonskiei have upgraded to trusty on my cluster controller and also to maas 1.5.2 but now i stumbled on to this bug https://bugs.launchpad.net/maas/+bug/130777907:44
ubot5Ubuntu bug 1307779 in MAAS "fallback from specific to generic subarch broken" [Critical,Fix released]07:44
ramonskieit seems to be fixed in 1.6 but i can't find that package. any idea's?07:45
bigjoolsramonskie: ppa:maas-maintainers/stable07:48
ramonskiethanks07:49
=== jfarschman is now known as MilesDenver
ramonskieokay so i upgraded and it finds a image now. only its now stuck on the screen where i see route-info08:47
jtvrvba: hey, how about I remove that restriction where you need at least 16 bits of netmask on a managed network?  That was for the old generated zone files.09:21
rvbajtv: yep, we don't need that restriction anymore.09:22
jtv\o/09:22
jtvEasy karma.09:22
ramonskiewhen enlisting nodes it hangs on the following screen https://dl.dropboxusercontent.com/u/50671970/enlist-hang.jpg09:33
jtvramonskie: that part looks OK to me in itself... how long did you watch them hang?09:34
jtvBecause if it failed there, I'd probably expect some error output.09:34
ramonskiefor about 10 minutes now09:35
jtvIf this is all it shows on the console, I'd give it a bit longer.09:36
=== jfarschman is now known as MilesDenver
ramonskiejtv: thanks i will wait09:37
=== CyberJacob is now known as CyberJacob|Away
ramonskiejtv: how long should i wait....09:52
jtvramonskie: any change on the screen?09:52
ramonskienope nothing09:52
jtvThen I guess it's time to go trawling through the logs.09:53
ramonskiewhich log files do i need to check..09:53
jtvI'm looking along.  It's on the maas server, in /var/log/maas.09:53
jtvTo be honest, since we're not seeing any error message, I don't know what to look for in this case.09:53
jtvOh, one thing that might also help: shift-PageUp on the node's console might show a bit more history.09:54
jtv(Might as well keep the node doing whatever it's doing for now, in case it changes its mind)09:54
jtvFirst thing to do is a quick scan for obvious errors:09:55
jtv/var/log/maas/apache2/error.log, /var/log/maas/celery.log, /var/log/maas/maas.log09:55
jtvIf there's an error in there, chances are it'll jump right out at you.09:56
ramonskiemaas.log is empty nothing09:56
jtvThat is really odd.09:56
jtvEven if there are no requests, it's supposed to have periodic jobs in there.  Unless... which version is this?09:57
jtv(Roughly — e.g. "the one that came with 14.04")09:57
ramonskiefirst i wasn on ubuntu 12.04 with maas 1.4 then upgraded to ubuntu 14.04 and maas 1.5 and now upgraded to 1.609:57
jtvOK, that's pretty recent.  Good.09:58
jtvI'm not sure if you'll have /var/log/maas/apache2 then; but it's just a symlink to /var/log/apache2.09:58
ramonskiein selery is see some dhcp lease errors09:59
jtvOh?09:59
jtvCan you paste one?09:59
ramonskieERROR/MainProcess] Task provisioningserver.tasks.upload_dhcp_leases[e19c4353-92a2-499e-8f95-154b3b017950] raised unexpected: IOError()09:59
ramonskiealso need the trace?09:59
jtvThat'd be nice, thanks.  Maybe use paste.ubuntu.com.10:00
ramonskiewait let me clean all the logs and restart the controller server and start all over10:01
ramonskiein case its non related stuff10:02
jtvOK10:02
jtvThanks for the review rvba.10:03
rvbajtv: np.  I just put up for review https://code.launchpad.net/~rvb/maas/revert-2872/+merge/233191.  A run in the CI shows that it fixes the problem introduced recently.10:05
* jtv looks10:05
ramonskiejtv: these are the errors after a restart in celery.log http://pastebin.com/ayRxD5v010:06
jtvThanks.10:06
jtvrvba: done.  :)10:06
rvbaTa10:06
jtvramonskie: scant consolation but this is code that's already removed from the dev version.  :)10:08
jtvNow, what seems to be going wrong is that the cluster controller is having trouble talking to the region-controller API.10:08
ramonskieits on the same server10:09
ramonskiei only have one10:09
jtvYeah, it should be easy, shouldn't it?10:09
ramonskie:P10:09
jtvYou may want to check that your DEFAULT_MAAS_URL is configured sensibly.10:09
jtvThat's in several places.  Just grep /etc/maas for it — but as root, or you won't be able to read some of those files.10:10
jtv(There's credentials in some of them.)10:10
ramonskie./maas_local_settings.py:DEFAULT_MAAS_URL = "http://172.21.42.1/MAAS" ./maas_local_settings.py.dpkg-dist:DEFAULT_MAAS_URL = "http://maas.internal.example.com/"10:10
jtvThat looks sensible — assuming 172.21.42.1 is indeed your server's IP address, and the nodes will be able to reach it.10:11
jtvYou could have a look to see if that same request shows up in the Apache error log, in case it did get through to the server.10:12
jtv(Or maybe even the Apache access log — but I doubt that)10:12
jtvOh!  Just in case, you may want to search the Apache access log for "/MAAS/MAAS"10:13
ramonskienope nothing found10:14
ramonskieonly thing i see in the apache error log is this : No such file or directory: mod_wsgi (pid=2708): Unable to change working directory to '/home/maas'10:15
ramonskiebut should not be the problem10:15
jtvNo, shouldn't be.10:15
jtvIt's dumb, but maybe you could just try making a wget request to http://172.21.42.1/MAAS from the server itself, just to make sure that gets through?10:16
ramonskiestrange thing is i see now that i have 2 cluster controllers in clusters10:16
jtvTwo clusters?  That's interesting.10:16
jtvWhen it wakes up, the cluster registers itself with the region controller, and then just keeps polling for the region controller to say "sure, yeah, you're accepted."10:17
ramonskieyeah i think upgrade10:17
jtvThe region controller should identify them by UUID.10:17
ramonskiei think its a upgrade quirk that happend in 1.510:18
ramonskiebut i gave that another set op ips and dns zone10:18
jtvNew to me...  I guess they have different UUIDs?  I guess one is "master"?10:18
ramonskieyes one is master and one is called maas10:18
ramonskiein dns zone10:18
jtvIf they're both running on the same server, that spells trouble.10:18
jtvBecause they control a DHCP server, a DNS server, iSCSI, a TFTP server, and so on.10:19
ramonskieahh well that explains a lot10:19
jtvI'm still not sure _how_ it would cause the failure in the log, but it's definitely closer to the source.10:20
ramonskiethe only problem is if i delete the newly created cluster it pops backup in pending state10:20
jtvYeah.10:20
ramonskieand the other cluster have still a set of working nodes in them10:20
jtvYou could try stopping the cluster controller, deleting the new cluster, and then in the UI updating the old one to look like the new one.10:21
jtvMind you, they'll still have different UUIDs...  so that may not be good enough.10:21
jtvI think this'll require some database surgery.10:21
ramonskiethey have different uuids10:21
jtvYeah.10:21
jtvSo the upgrade generated a new one instead of reusing the old one.10:22
ramonskiealso the old cluster has only 6 boot-images and the new one 12610:22
jtvYeah, a lot has changed there.10:22
ramonskieso can i move the nodes from the old cluster to the new one10:22
jtvLet's start by getting a good view of the situation... if you grep /etc/maas for "UUID", do you get consistent UUIDs from the various config files?10:23
ramonskieand then delete the old cluster10:23
jtvThe only way to move nodes is to delete them from one cluster and re-enlist them into the other.10:23
jtvIf that is not a problem, then I think it's the easiest way out.10:23
jtvBut it means that anything you've got running on those nodes is lost.10:23
ramonskiemaas_local_celeryconfig_cluster.py:CLUSTER_UUID = '3d245a63-2b23-42be-8977-f36cb2218b9e'10:24
ramonskieand thats the new one10:24
ramonskieyeah i can't delete thos nodes openstack is running on it with alot of vms :(10:24
jtvBlast.10:24
ramonskieotherwise i would already have done a clean install :)10:25
jtvWell, it's going to get tricky at any rate.  First let me have a look for known bugs.10:25
ramonskieokay10:26
jtvMeanwhile, could you check that the cluster UUID in /etc/maas/maas_cluster.conf is consistent with the one you found in maas_local_celeryconfig_cluster.py?10:26
ramonskieyes they are the same10:27
jtvOK10:27
ramonskieboth the new accepted cluster10:27
ramonskiecan i disabele the other cluster but still let dns work10:28
jtvramonskie: safest thing to try I guess would be to set them to the old cluster.  But... first a look for known bugs.10:28
ramonskiethat should solve it10:28
jtvWell, plus a restart.  :)  And then you'd have to delete the new one.10:28
ramonskieokay but if i set it to the old cluster will also the new boot images be added?10:29
jtvShould be, yes.  Because AFAICT the two actually share everything except a process.10:31
jtvIt's the same files on disc, etc.  It may take a few minutes for the remaining cluster controller to inform the region controller of what it has.10:33
ramonskieyou already checked known bugs?10:33
ramonskieso should i try this?10:33
jtvI checked known unfixed bugs.  Let me make one more round for ones that may have been fixed later.10:34
=== jfarschman is now known as MilesDenver
jtvramonskie: I guess it's not bug 1344089, and that's the best candidate I found.10:38
ubot5bug 1344089 in MAAS 1.6 "IntegrityError after upgrading to 1.6beta5" [Critical,Fix released] https://launchpad.net/bugs/134408910:38
jtv(I realise you hit your problem with an earlier version)10:38
ramonskiei'am on 1.6 now10:39
jtvAnyway, assuming that's not it, we'll have to make the change.  I'd stop the cluster controllers first.10:39
jtvThen update the UUID entries in the config, to use your original cluster's UUID.10:40
jtvI'd also set the cluster interfaces to Unmanaged, just so you can re-enable the right one later.10:41
ramonskiewhats the best and savest way to stop the cluster controller10:42
jtvThen restart, accept the right cluster controller if needed (it may be automatic), enable the right cluster interface, and see if that fixes things.10:42
jtvsudo service maas-cluster-celery stop10:42
jtvsudo service maas-pserv stop10:42
jtvThen I'd run a “ps -ef | grep maas” to check for lingering processes.10:42
ramonskiewow there is still a lot running10:44
jtvYeah it's not a small thing.10:44
ramonskieseveral of these: /usr/bin/python /usr/bin/celeryd --logfile=/var/log/maas/celery-region.log --schedule=/var/lib/maas/celerybeat-region-schedule --loglevel=INFO --beat --queues=celery,master10:45
ramonskieshould i kill them?10:45
jtvNo, those are the region controller's celery.10:45
ramonskieand there should be 10 of them?10:45
jtvProbably not.10:46
ramonskielol10:46
jtvBut I don't know what might cause there to be more...  I do hope you don't have two region controllers as well!10:46
ramonskiei cerently hope not10:48
ramonskiethe only thing i did was what i thought a simple upgrade10:48
jtvYeah.  This clearly shouldn't have happened.10:49
ramonskieokay edited both files maas_local_celeryconfig_cluster.py and maas_cluster.conf10:49
ramonskiewith the old uuid10:50
jtvOK.10:50
jtvAnd you've set the cluster interfaces to Unmanaged in the UI?10:50
ramonskiethe new cluster?10:50
ramonskiedone for the new cluster10:51
jtvOK.  I'd do the old one as well.10:51
jtv(The only drawback is your DHCP server will be down briefly — let's keep it short)10:52
ramonskieno dhcp entries will be deleted?10:52
jtvNot as such, though there may be more confusion that will only become clear later.10:53
ramonskiebacked it up just in case10:53
jtvGood.10:54
jtvAnd then we get to restart.  A reboot would be the most comprehensive.10:55
ramonskiereboot it is10:55
ramonskieokay rebooted10:57
ramonskiei removed the new cluster now10:57
ramonskiedo i need to set managed dhcp on again?10:59
ramonskiewhat are the best next steps to take?11:00
jtvFirst: is the old cluster now Accepted?11:00
jtvIf it is, then yes, re-enable DHCP management (and DNS management I guess — you mentioned using that)11:01
ramonskieyes the old cluster is accepted and the boot-images are now also 126 instead of 611:01
jtvExcellent!11:02
jtvWant to try that node again?11:02
ramonskieyup11:02
ramonskielet me first check if everything is okay11:03
ramonskieand that not the ipaddress have changed :P11:03
jtvYeah.  Anything you can check is a plus at this point.  :)11:03
jtvIf you feel up to it, maybe a fresh look at those logs in /var/log/maas.11:03
ramonskieokay check seems okay no error for now11:05
ramonskiewill try a node now11:05
* jtv bates breath11:06
ramonskiewhooopppdidoooh11:08
ramonskieit works11:08
ramonskiemuchos kudos to you!!!11:08
jtvPhew.11:09
ramonskiethanks for helping mate11:09
jtvGlad I could help — and glad it didn't come crashing down on us.  :-)11:09
ramonskieyes i'm realy glad i don't need to start over. this saved so much work11:10
ramonskieand finaly the auto discover of ipmi is working :D11:10
jtvI'll have to go now, but I would really appreciate if you could file a bug about this — especially the part where you upgraded and got two cluster controllers.  That might still be in the packaging somewhere.11:10
ramonskieokay will do thanks for all the help11:12
ramonskiewhere do you want me to fill in the bug?11:12
jtvhttps://bugs.launchpad.net/maas11:13
jtv(You have a Launchpad account ,right?)11:13
ramonskieyup11:14
ramonskieokay creating one now11:14
jtvThanks.  If we can prevent this from happening to someone else, that's wonderful.11:14
* jtv runs now11:14
jtvGood night!11:14
ramonskiei'm out to bye11:22
ramonskiebug created https://bugs.launchpad.net/maas/+bug/136490311:22
ubot5Ubuntu bug 1364903 in MAAS "2 cluster controllers after upgrade from 1.4 > 1.5" [Undecided,New]11:22
rvbablake_r: Hi Blake.  I had to revert 2872.  See https://code.launchpad.net/~rvb/maas/revert-2872/+merge/233191 for details.11:31
=== jfarschman is now known as MilesDenver
rvbablake_r: Now I'm thinking that revision 2871 also introduced a problem (but a different one): one CI run failed with the nodes failing to get the images they need to boot.  This looks like the problem you diagnosed yesterday and said you were working on.12:10
rvbablake_r: I'd say this is a race condition as the CI test passed a couple of times.12:11
rvbablake_r: since you'll be up in less than an hour I'll refrain from reverting this one again.  Let's talk when you come online.12:12
=== jfarschman is now known as MilesDenver
blake_rrvba: yes it is possible it will pass13:05
blake_rrvba: the issue is that RPC is used for the API call but not for the image selection when a node is booting13:05
blake_rrvba: so pxeconfig will fail13:05
blake_rrvba: I have a branch that fixes pxeconfig13:06
rvbablake_r: 2871 causes the images not to be present from time to time.  2872 (which I reverted) was causing the node to fail to enlist (see my paste on the revertion MP).13:07
rvbanoeds*13:08
rvbanodes*13:08
rvbaarg13:08
blake_rrvba: its not that the images are not present, its that the images are not present in the BootImage model, which is going away13:15
rvbablake_r: right, what I meant that, as far as the node can see (and this involves the BootImage model), the images are not there.13:16
blake_rrvba: yes correct, I was going to get that branch ready and land today, looks like I will have to do all of the again, :(13:17
rvbablake_r: I just reverted 2872 (which was causing failures all the time), not 2871.13:17
blake_rrvba: okay13:18
blake_rrvba: will look at the mp in a moment, getting through email this morning13:18
rvbablake_r: I'm sorry but I was in the middle of a QA and having trunk broken like that means a lot of time wasted for me.13:18
blake_rrvba: oh I see the reason13:21
blake_rrvba: yeah its reporting the avaliable architectures wrong, I will work on a fix13:21
rvbablake_r: cool, ta.13:21
rvbablake_r: if you can fix the breakage introduced by 2871 first that would be great.  Because 2871 is still checked in.13:24
blake_rrvba: okay13:24
rvbaThanks.13:24
=== jfarschman is now known as MilesDenver
=== jfarschman is now known as MilesDenver
newellI am getting a 500 error when I go to aquire a commissioned node with latest trunk.  Here is the stacktrace: http://paste.ubuntu.com/8223903/14:25
newellAnyone seen this before?14:26
rvbanewell: let me have a look at this stacktrace…14:27
rvbanewell: looks like a bug in gen_dynamic_ip_addresses_with_host_maps: it should skip the ngi with no static_ip_range_low/hig14:30
rvbanewell: can you file a critical bug about this?14:30
newellyeah14:30
newellhttps://bugs.launchpad.net/maas/+bug/136499314:35
ubot5Ubuntu bug 1364993 in MAAS "gen_dynamic_ip_addresses_with_host_maps: it should skip the ngi with no static_ip_range_low/hig" [Critical,New]14:35
rvbanewell: I changed the title for this bug.  We try to explain what the problem is in the title/descriptions.  Suggestions on the possible cause or ideas on how to fix it should be put in the comments.14:37
rvbanewell: this helps triaging and lets people come up with alternative solutions.14:38
newellha I just changed the title as well before I just read this14:39
newellwonder where it stands now14:39
newellhttps://bugs.launchpad.net/maas/+bug/136499314:39
ubot5Ubuntu bug 1364993 in MAAS "500 error when trying to acquire a commissioned node" [Critical,New]14:39
newellIs that better?14:39
rvbaYep, it describes the problem.14:40
newellk, we were both in the middle of changing it and my page wasn't refreshed, that is why I didn't see that you had modified it14:40
rvbaI figured :)14:41
=== roadmr is now known as roadmr_afk
=== roadmr_afk is now known as roadmr
=== magicrob1tmonkey is now known as magicrobotmonkey
=== ming is now known as Guest50512
newellrvba, still around?16:24
rvbanewell: yep16:26
newellrvba, we need a test for that bug or is it trivial enough to just push the change you mentioned?16:26
rvbanewell: as with any non-trivial change, it's worth a test16:28
rvbanewell: now I'm not so sure my solution is the right one as test__treats_undefined_static_range_as_zero_size_network seems to test the case where ngi has not static range.16:30
newellyeah I was looking at that too16:31
newellyour change doesn't break any tests though16:31
rvbanewell: there is a bug in the test :)16:36
newellha16:37
rvbanewell: can you spot it?16:37
newelllet me take a look16:37
rvbanewell: My fix is up for review.  https://code.launchpad.net/~rvb/maas/bug-1364993/+merge/233244.  And I need to step out now. ttyl.16:43
newellsounds good I will review it, sorry wife was asking me questions and got pulled away16:43
newellha, just needed to save it16:44
=== roadmr is now known as roadmr_afk
dpb2Hi all -- I'm starting a server, but I don't see any log message for the power on attempt in celery log (just periodic dhcp refreshes, etc).  What is up?17:09
dpb2(this install had been working fine)17:09
dpb2roaksoax: ^ any ideas?17:11
roaksoaxdpb2: check whether maas-pserv is running17:15
roaksoaxdpb2: are you importing images?17:15
dpb2roaksoax: all maas services are reported as running17:16
dpb2roaksoax: let me check on the images17:16
dpb2roaksoax: actually, I'm not sure how to check that. :)17:16
roaksoaxdpb2: what MAAS version are you using?17:17
dpb21.6.1+bzr2550-0ubuntu1~ppa217:17
roaksoaxdpb2: uhmmm17:18
roaksoaxblake_r: ^^ any thoughts?17:18
dpb2roaksoax: I *could* restart all the services, but I didn't want to mask an issue17:18
roaksoaxdpb2: do please restart the issue. 1.7 will completely change in that area17:19
roaksoaxdpb2: because of celery being silly and causing issues like this17:19
roaksoaxdpb2: (we are getting rid of celery)17:19
roaksoaxdpb2: can you see logs?17:20
roaksoaxdpb2: maas.log celery.log17:20
dpb2hm17:20
dpb2roaksoax: I'm seeing the logs now17:20
dpb2(just now)17:20
dpb2roaksoax: so...17:20
dpb2roaksoax: if boot images are importing, does that block power up attempts17:20
dpb2?17:20
roaksoaxdpb2: yes it can... celery blocks any other jobs if a bigger job is in progress17:21
dpb2yikes17:21
dpb2ok17:21
dpb2well, I think it's working now.  the old "try it again" fixed it17:22
dpb2thanks17:22
roaksoaxdpb2: np!17:22
Valduarehows it going guys17:39
Valduareany news on maas with arm devices?17:39
newellValduare, there is currently some arm support (i.e. arm64/armhf etc.)17:50
ValduareI have a few mk808 devices here that would be fun to be able to spin them up etc17:51
=== roadmr_afk is now known as roadmr
=== CyberJacob|Away is now known as CyberJacob

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!