/srv/irclogs.ubuntu.com/2017/04/25/#maas.txt

mupBug #1685963 opened: [2.2] Django signal handlers are too opaque <MAAS:New> <https://launchpad.net/bugs/1685963>00:10
catbus1Hi, seeing iscsistart: cannot make a connection to <MAAS IP>, and ipconfig: BOOTIF SIOCGIFINDEX: No such device prior, what might go wrong?00:11
mpontillocatbus1: I would check that the machine you're booting will get an IP address that can reach MAAS first.00:58
catbus1ack00:58
mpontillocatbus1: I'd also run https://gist.github.com/mpontillo/6ee4c96d8aed4d0efde66a37aa6d5af9 from a machine on the subnet where the nodes are trying to boot, and validate that the URLs look correct01:01
=== mbarnett_ is now known as mbarnett
=== degville_ is now known as degville
mupBug #1686065 opened: [2.2.0rc2, UI, Machine details] The icons for error in Commissioning and Hardware tabs (secondary nav) are not up to date <ui> <ux-qa-2.2> <MAAS:New> <https://launchpad.net/bugs/1686065>10:29
mupBug #1686065 changed: [2.2.0rc2, UI, Machine details] The icons for error in Commissioning and Hardware tabs (secondary nav) are not up to date <ui> <ux-qa-2.2> <MAAS:New> <https://launchpad.net/bugs/1686065>10:32
mupBug #1686065 opened: [2.2.0rc2, UI, Machine details] The icons for error in Commissioning and Hardware tabs (secondary nav) are not up to date <ui> <ux-qa-2.2> <MAAS:New> <https://launchpad.net/bugs/1686065>10:44
jnielsenhello, I was wondering if anyone knew how to delete a dns record from the dns tab on the maas webpage. I have some strange dns entries that look like ip addresses.17:38
jnielsenusing the cli, they are not listed in dnsresources, or dnsrecources-records17:39
jnielsenalso reading the domain (cli) the domain record count does not match what is shown on the maas webpage (cli:103, webpage:121)17:40
xygnalmpontillo: does maas support vlans where the deployed system does not need tags? (native vlans)18:21
roaksoaxxygnal: it does, but they are modeled as "untagged" in a fabric18:22
mpontilloxygnal: are those same VLANs tagged on the rack controller or untagged?18:23
mpontilloxygnal: if they're on the rack controller you might hit https://bugs.launchpad.net/maas/+bug/167833918:23
xygnalwould be tagged on controller, just not on nodes18:23
mpontilloxygnal: okay, yeah, I have that setup at home, and have the bug referenced above, but MAAS otherwise does work18:24
xygnalbut rack controllers not in same L318:24
mpontilloxygnal: is this the DHCP relay config? yeah that should be fine18:24
mpontilloxygnal: if MAAS doesn't know anything about the tags anywhere, it shouldn't get confused about it18:24
roaksoaxjnielsen: like which ones ? do you have any examples ?18:25
mupBug #1686169 opened: MAAS shows unsupported images but can't download them <MAAS:In Progress by ltrager> <https://launchpad.net/bugs/1686169>18:27
xygnalxygnal this is the one that up to now, was just single-interface untagged but multiple fabrics.  even after going full separated fabrics and ensuring no leases are still there for that MAC anymore, it still fails as described before.  We are contemplating if this will not work and if we could switch to using VLAN tagging from MAAS server then.18:28
xygnalmpontillo that would mean using tagging on the MAAS side, but not using tagging on the Node side, because the switch would be doing native vlan tagging for the node18:29
mpontilloxygnal: from my perspective, it should work fine how you currently have it configured. I would like to get to the bottom of why the interfaces become unconfigured when you deploy. I've added some logging to regiond.log to attempt to determine that in https://bugs.launchpad.net/maas/+bug/168596318:31
mpontilloxygnal: rather, in https://code.launchpad.net/~mpontillo/maas/signal-handlers-add-logging--bug-1685963/+merge/32309418:31
xygnaljust now? let me see18:31
xygnaloh18:31
jnielsenroaksoax: in the maas domain it shows A records of the form name:###-###-###-### type:A Data: ###.###.###.###18:31
mpontilloxygnal: it will be released in the next release... I'm trying to figure out how it's possible for the interfaces to become unconfigured in that way18:32
xygnalmpontillo we have deployments coming up in about 3 weeks so we're growing quite concerned about getting to the bottom of it.   Can this logging code by implemented in our current installed version?18:32
mpontilloxygnal: it seems to me that it /might/ happen if the VLAN the interface is on changes. so I'm wondering: if you repeat the deployment, does it always happen again? or does it "settle"?18:32
xygnalremember we had some problems with RC218:32
xygnalmpontillo happens every time.18:33
xygnalevery single deploy fails the same18:33
mpontilloxygnal: yeah, the bugs you encountered with RC2 should be fixed. okay. would it be possible for you to send me some more detailed information about your setup, so I can try to create a reproducible test case?18:33
xygnalI verified no observed IP and no living leases for the box from last deploy attempt (the day before)18:33
xygnalcan you outline what details you want?  THe person who knows that in and out is not me, and I need to make sure they are not missing any details you want to have18:34
xygnalmpontillo: I asked him to get started on that.  I'll pass along any specific data you ask for here to make sure18:35
mpontilloxygnal: I would like to see the output of:18:36
mpontillosudo maas-region dbshell18:36
mpontillo\pset pager off18:36
mpontilloselect * from maas_support__node_networking;18:36
mpontilloxygnal: if it contains sensitive info, you can sanitize it or just email it to me18:36
mpontilloxygnal: or send me a private message to a private pastebin URL such as a secret github gist18:37
mupBug #1686171 opened: NTP and stress-ng tests fail on Trusty <MAAS:Triaged by ltrager> <https://launchpad.net/bugs/1686171>18:39
xygnalmpontillo is that enough to show you our networking config, or do you need any further diagrams outside of it?18:42
mpontilloxygnal: I have what you've posted in the other bugs, so the output of that query would be a good start18:43
mpontilloxygnal: I was looking for another query that might be useful, too18:43
mpontilloxygnal: the output of this query will also help me get a feel for how your networks are modeled in MAAS http://paste.ubuntu.com/24455722/18:47
roaksoaxjnielsen: you mean, ip address based ?18:48
mpontilloxygnal: I forgot to ask, are you using any HA features, such as multiple rack controllers?18:49
mpontilloxygnal: or, might the networks appear (to MAAS) to be on different VLANs depending on which region or rack controller they are viewed from?18:50
mpontillojnielsen: A records may be automatically generated for deployed nodes in MAAS; does that account for the difference?18:51
xygnalmpontillo we are using multiple rack controllers, yes.18:54
xygnalmpontillo only two rack controllers in this environment.  one by itself and one on the same machine as the region controller.18:55
jnielsenroaksoax: yea the names are based off the ip18:55
jnielsenmpontillo: these records look stale.18:56
xygnalmpontillo:  two new sanitized attachments on the bug 168530618:56
mpontilloxygnal: I'm now wondering if it's possible that the network configuration is re-detected on each rack controller independently, and causes changes to ripple through the system and ultimately remove the configuration on those interfaces18:56
xygnalmpontillo shou;d18:56
xygnalshouldn't HA be aware of this and avoid it?18:57
xygnalor is HA a largely untested feature in this situation?18:57
roaksoaxjnielsen: whta evrsion of MAAs are you using ? And those are the addresses that MAAS auto-generates for machines in the dynamic range18:57
jnielsenmpontillo: let me verify the difference to see if they are (stale) and not duplicates, and see if they match the difference18:57
mpontilloxygnal: well, it's certainly a bug. I wouldn't say it's untested; we often test HA. but this may be an edge case that was missed18:57
jnielsenroaksoax: 2.118:57
mpontilloxygnal: can you try an experiment for me? shut down the rack controller you're not using on the region controller. just stop the service. and then see if the bug still occurs18:58
xygnalmpontillo hm.  its supposed to be rackd on one and region on the other.  according to services,  rackd, http, tftp are running.  but dhcpd is not.18:58
roaksoaxxygnal: in a HA mode, the secondary rack wont be running DHCP18:59
roaksoaxxygnal: until the primary dies18:59
xygnalroaksoax: mpontillo: then would I be affected as suspected, or would that not happen?18:59
mpontilloxygnal: in other words, it sounds like you're not using one of your rack controllers for DHCP, so shutting it down shouldn't hurt anything19:01
xygnalmpontillo the 'backup' rack controller is on the region controller.  shall I just stop the rackd service?19:01
mpontilloxygnal: yeah, just stop it and see if it makes anything better. that way we can characterize if this is a bug with HA or not19:01
mpontilloxygnal: maybe just shut down the secondary, that way you can keep everything on one machine and see if that helps -- reduce the number of variables we're dealing with here.19:02
xygnalmpontillo thats the plan19:03
jnielsenmpontillo: I did not verify all of them, but they seem to be stale (not duplicates) and they make up the difference between what maas <user> domain read <domain> is reporting for records19:10
jnielsenand what the webpage shows19:10
jnielseni have to run to a meeting real quick, I'll be back in a hour or so. I'll leave this open19:10
mpontillojnielsen: sure, I'd like to understand more what "stale" means. do you have enough information to file a bug for us?19:11
jnielsenmpontillo: stale means I don't have a machine using that ip19:11
jnielsen'll investigate more19:12
mpontillojnielsen: so possibly DHCP addresses that expired but were not deleted? that might be a known (and/or fixed) bug. let me check19:13
roaksoaxjnielsen: try this19:14
roaksoaxjnielsen: sudo maas-region shell19:14
roaksoaxfrom maaserver.models import DNSResource19:14
roaksoaxDNSResources.objects.all()19:14
roaksoaxjnielsen: and I think they come from this:19:15
roaksoax$GENERATE 190-253 $.90.90.10.in-addr.arpa. IN PTR 10-90-90-$.maas19:15
roaksoaxor similar, actually19:16
mpontillojnielsen: is there something special about xxxmaaskhost021?19:34
xygnalwas that for me?19:35
mpontilloxygnal: ah, yes, sorry jnielsen. it seemed a little strange to me that it has an interface on both xx.xx.97.0/24 and xx.xx.186.0/2319:35
xygnalmpontillo that is a special host being used for something else right now, we have not been testing on it recently.  would it be affecting other builds?19:37
mpontilloxygnal: can you try removing those from the environment and see if that impacts the issue? my concern is that it looks like there are IP addresses from two different fabrics on those machines, which may be contributing to the VLAN flip/flop effect and causing the IP addresses to become unconfigured19:40
mpontilloxygnal: maybe give it a try after you rule out HA?19:40
mpontilloxygnal: just to confirm, have you tried recommissioning the machines before deploying, too?19:42
xygnalmpontillo: we've removed and re-comissioned them before deploying, yes19:46
xygnalxygnal in fact all of the -DEMO nodes I am certainn of this19:47
mpontilloxygnal: ok great.. looking forward to hearing if taking HA out of the picture makes a difference19:49
xygnalmpontillo is it just that one node, or all of the nodes named like that, that need to br removed?19:49
xygnal(that have the dual-networks)19:49
mpontilloxygnal: I'd just remove them all from MAAS for good measure19:49
mpontilloxygnal: when it comes time to test that19:49
mpontilloxygnal: let's try to rule out one thing at a time =)19:50
xygnalmpontillo i'm testing the no-HA right now, just planning for after19:50
mpontilloxygnal: great. right now I'm more suspicious of controllers, since they constantly update the region about their network configuration, which might lead to flip/flops if the two controllers don't agree on what the network looks like. then every 30 seconds to one minute you might see VLAN changes that could cause IP addresses to be cleared19:51
xygnalmpontillo should I rewmove all of the nodes or just those xxxmaask ones?  Being that the other ones have been completely removed and added since re-fabricing19:51
xygnalmpontillo just trying to avoid making the while dev environment unavailable to my colleages by removing all the hosts19:51
xygnalwhole*19:51
mpontilloxygnal: I was only suspicious about the xxxmaask ones for now, but honestly I doubt it will change anything, especially if those nodes are just sitting there19:51
xygnalyes, they are19:52
mpontilloxygnal: honestly I doubt they are the issue; let's focus on anything running the controller software for now19:52
mpontilloxygnal: I assumed that was xxxmaas0{1,2}19:52
vaseyhey folks, i'm tryna diagnose this issue when I try to use juju to bootstrap a MAAS controller: https://pastebin.com/Pd8ktgPm20:00
mpontillovasey: are you using any IPv6 in your environment? is juju configured to talk to MAAS by means of an IP address or DNS name? if DNS, what does it resolve to?20:04
mpontillovasey: in case it helps, here are my notes on testing MAAS with Juju the last time I did it https://gist.github.com/mpontillo/231790806c51cf07e51cbe30e8e0b0a120:05
vaseympontillo: one thing i did was run 'sudo lxd init' before starting up; i now have an lxd bridge interface that i believe my juju is running from, and it's definitely not in the same subnet as my MAAS setup. how would i revert that operation?20:16
vaseympontillo: and to answer your questions, i don't believe i have ipv6 configured anywhere, how would i verify for MAAS? and juju is configured to talk to MAAS via IP address, not dns name20:20
mpontillovasey: the lxd bridge shouldn't affect things, but your containers' default profile would most likely be using the bridge, with NAT20:22
mpontillovasey: come to think of it, that error is most likely occurring when Juju selects a MAAS node to use as the juju controller, and attempts to allocate and deploy it. check your nodes' network configuration to ensure they have automatic IP addresses on the subnet(s) you expect20:23
jnielsenmpontillo: interestingly the entries with the ip-like name have a live machine associated with them (not on my list)20:24
mupBug #1686195 opened: [2.2] MAAS should include a script to test enlistment <MAAS:Triaged by mpontillo> <https://launchpad.net/bugs/1686195>20:24
jnielsenmpontillo: I'll investigate this further, thanks for the help! you too roaksoax:20:24
mpontillojnielsen: right, my guess is that they were created by DHCP leases and and safe to delete. I believe there is a bug where we do not delete them when the lease expires. I don't see a fix on the 2.1 branch20:26
vaseympontillo: that did the trick, none of my nodes had interfaces with auto-assign IPs added. is it possible to have juju run on a node that will be part of an openstack distro or is that just a terrible idea?20:26
jnielsenmpontillo: do you know how to delete them?20:26
vaseympontillo: openstack deployment***20:27
mpontillojnielsen: you can delete them using the command-line interface. probably something like: "maas $PROFILE dnsresource delete <id>".20:27
mpontillojnielsen: but no guarantees because I didn't test that command =)20:27
mpontillovasey: I think juju was designed that way for separation of concerns; if something takes down your deployment, you don't want that to affect juju itself20:28
mpontillovasey: what I have seen many customers do is use a constraint in juju to select a specific machine in MAAS; you could, for example, use the virsh support to make your juju select a manually-configured VM on the MAAS server to be the controller20:30
jnielsenmpontillo: they don't show up in my dnsresource list with maas <user> dnsresources read20:35
vaseympontillo: ahhhh that makes sense20:37
mpontillojnielsen: ah, I must have misunderstood; I thought you were saying they showed up in dnsresources read but not on the DNS page; it's the opposite, then?20:45
mpontillojnielsen: hmm, on my MAAS I see DHCP-provided hostnames if I do something like this: maas $PROFILE dnsresources read | jq -c '.[] | {fqdn:.fqdn, id:.id}'20:53
mpontillojnielsen: and I see things in the DNS details page for that domain which belong to my deployed nodes (they appear with a hyperlink so I can click them and go to the node)20:53
mpontillojnielsen: but I don't see anything that isn't a node that doesn't appear in the domain details20:54
mpontillojnielsen: I guess it would help if you can pastebin: dig @127.0.0.1 -t axfr <domain>20:54
mpontillojnielsen: and the dnsresources read with the jq above20:54
xygnalmpontillo:  no luck . I even deleted the node I am building entirely and re-added.  Comission worked fine, deploy fails the same.21:05
xygnalmpontillo and yes, i removed the khost nodes before even doing that21:05
vaseympontillo: if i'm using an ESXi-managed VM, how do i use virsh  as power control?21:07
vaseympontillo: if i use the VMware power options, using the UUID,  host IP, username and pass, i get this error: "No rack controllers can access the BMC of node: fluent-minnow"21:11
vaseydespite my host being able to ping that host. do i need to enable ssh in ESXi, now that i think about it?21:11
xygnalmpontillo:  updated bug report with two sanitized copies of the logs from that time period today21:12
mpontillovasey: you need to install a python package called pyvmomi to talk to the ESXi API; probably KVM is more well-tested since few people have VMware licenses handy to test with21:35
mpontillovasey: should be available as an apt package, python3-pyvmomi I think -- but there are a few issues with it, certificate checking has been a problem in the past21:35
mpontillovasey: if you get an error that no rack controllers can access the BMC, you might need to refresh the power status to see if MAAS can determine which racks can reach it21:35
mpontilloxygnal: OK, so without the HA rack enabled and without those dual-network nodes, you still have the problem? just to confirm, can you try a second deploy with one of the machines and tell me if it's any different? (that's weird.) I'll have to give this some more thought. please upgrade to the next rc build when it comes out; I've added some additional21:37
mpontillologging that may help, too21:37
xygnalmpontillo: eta on Rc3?21:39
roaksoaxxygnal: ppa:maas/next-proposed has the proposed version of rc321:41
mupBug #1686234 opened: [2.2] MAAS does not delete DNS records for released DHCP leases <MAAS:Triaged by mpontillo> <https://launchpad.net/bugs/1686234>23:09

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!