[00:10] <mup> Bug #1685963 opened: [2.2] Django signal handlers are too opaque <MAAS:New> <https://launchpad.net/bugs/1685963>
[00:11] <catbus1> Hi, seeing iscsistart: cannot make a connection to <MAAS IP>, and ipconfig: BOOTIF SIOCGIFINDEX: No such device prior, what might go wrong?
[00:58] <mpontillo> catbus1: I would check that the machine you're booting will get an IP address that can reach MAAS first.
[00:58] <catbus1> ack
[01:01] <mpontillo> catbus1: I'd also run https://gist.github.com/mpontillo/6ee4c96d8aed4d0efde66a37aa6d5af9 from a machine on the subnet where the nodes are trying to boot, and validate that the URLs look correct
[10:29] <mup> Bug #1686065 opened: [2.2.0rc2, UI, Machine details] The icons for error in Commissioning and Hardware tabs (secondary nav) are not up to date <ui> <ux-qa-2.2> <MAAS:New> <https://launchpad.net/bugs/1686065>
[10:32] <mup> Bug #1686065 changed: [2.2.0rc2, UI, Machine details] The icons for error in Commissioning and Hardware tabs (secondary nav) are not up to date <ui> <ux-qa-2.2> <MAAS:New> <https://launchpad.net/bugs/1686065>
[10:44] <mup> Bug #1686065 opened: [2.2.0rc2, UI, Machine details] The icons for error in Commissioning and Hardware tabs (secondary nav) are not up to date <ui> <ux-qa-2.2> <MAAS:New> <https://launchpad.net/bugs/1686065>
[17:38] <jnielsen> hello, I was wondering if anyone knew how to delete a dns record from the dns tab on the maas webpage. I have some strange dns entries that look like ip addresses.
[17:39] <jnielsen> using the cli, they are not listed in dnsresources, or dnsrecources-records
[17:40] <jnielsen> also reading the domain (cli) the domain record count does not match what is shown on the maas webpage (cli:103, webpage:121)
[18:21] <xygnal> mpontillo: does maas support vlans where the deployed system does not need tags? (native vlans)
[18:22] <roaksoax> xygnal: it does, but they are modeled as "untagged" in a fabric
[18:23] <mpontillo> xygnal: are those same VLANs tagged on the rack controller or untagged?
[18:23] <mpontillo> xygnal: if they're on the rack controller you might hit https://bugs.launchpad.net/maas/+bug/1678339
[18:23] <xygnal> would be tagged on controller, just not on nodes
[18:24] <mpontillo> xygnal: okay, yeah, I have that setup at home, and have the bug referenced above, but MAAS otherwise does work
[18:24] <xygnal> but rack controllers not in same L3
[18:24] <mpontillo> xygnal: is this the DHCP relay config? yeah that should be fine
[18:24] <mpontillo> xygnal: if MAAS doesn't know anything about the tags anywhere, it shouldn't get confused about it
[18:25] <roaksoax> jnielsen: like which ones ? do you have any examples ?
[18:27] <mup> Bug #1686169 opened: MAAS shows unsupported images but can't download them <MAAS:In Progress by ltrager> <https://launchpad.net/bugs/1686169>
[18:28] <xygnal> xygnal this is the one that up to now, was just single-interface untagged but multiple fabrics.  even after going full separated fabrics and ensuring no leases are still there for that MAC anymore, it still fails as described before.  We are contemplating if this will not work and if we could switch to using VLAN tagging from MAAS server then.
[18:29] <xygnal> mpontillo that would mean using tagging on the MAAS side, but not using tagging on the Node side, because the switch would be doing native vlan tagging for the node
[18:31] <mpontillo> xygnal: from my perspective, it should work fine how you currently have it configured. I would like to get to the bottom of why the interfaces become unconfigured when you deploy. I've added some logging to regiond.log to attempt to determine that in https://bugs.launchpad.net/maas/+bug/1685963
[18:31] <mpontillo> xygnal: rather, in https://code.launchpad.net/~mpontillo/maas/signal-handlers-add-logging--bug-1685963/+merge/323094
[18:31] <xygnal> just now? let me see
[18:31] <xygnal> oh
[18:31] <jnielsen> roaksoax: in the maas domain it shows A records of the form name:###-###-###-### type:A Data: ###.###.###.###
[18:32] <mpontillo> xygnal: it will be released in the next release... I'm trying to figure out how it's possible for the interfaces to become unconfigured in that way
[18:32] <xygnal> mpontillo we have deployments coming up in about 3 weeks so we're growing quite concerned about getting to the bottom of it.   Can this logging code by implemented in our current installed version?
[18:32] <mpontillo> xygnal: it seems to me that it /might/ happen if the VLAN the interface is on changes. so I'm wondering: if you repeat the deployment, does it always happen again? or does it "settle"?
[18:32] <xygnal> remember we had some problems with RC2
[18:33] <xygnal> mpontillo happens every time.
[18:33] <xygnal> every single deploy fails the same
[18:33] <mpontillo> xygnal: yeah, the bugs you encountered with RC2 should be fixed. okay. would it be possible for you to send me some more detailed information about your setup, so I can try to create a reproducible test case?
[18:33] <xygnal> I verified no observed IP and no living leases for the box from last deploy attempt (the day before)
[18:34] <xygnal> can you outline what details you want?  THe person who knows that in and out is not me, and I need to make sure they are not missing any details you want to have
[18:35] <xygnal> mpontillo: I asked him to get started on that.  I'll pass along any specific data you ask for here to make sure
[18:36] <mpontillo> xygnal: I would like to see the output of:
[18:36] <mpontillo> sudo maas-region dbshell
[18:36] <mpontillo> \pset pager off
[18:36] <mpontillo> select * from maas_support__node_networking;
[18:36] <mpontillo> xygnal: if it contains sensitive info, you can sanitize it or just email it to me
[18:37] <mpontillo> xygnal: or send me a private message to a private pastebin URL such as a secret github gist
[18:39] <mup> Bug #1686171 opened: NTP and stress-ng tests fail on Trusty <MAAS:Triaged by ltrager> <https://launchpad.net/bugs/1686171>
[18:42] <xygnal> mpontillo is that enough to show you our networking config, or do you need any further diagrams outside of it?
[18:43] <mpontillo> xygnal: I have what you've posted in the other bugs, so the output of that query would be a good start
[18:43] <mpontillo> xygnal: I was looking for another query that might be useful, too
[18:47] <mpontillo> xygnal: the output of this query will also help me get a feel for how your networks are modeled in MAAS http://paste.ubuntu.com/24455722/
[18:48] <roaksoax> jnielsen: you mean, ip address based ?
[18:49] <mpontillo> xygnal: I forgot to ask, are you using any HA features, such as multiple rack controllers?
[18:50] <mpontillo> xygnal: or, might the networks appear (to MAAS) to be on different VLANs depending on which region or rack controller they are viewed from?
[18:51] <mpontillo> jnielsen: A records may be automatically generated for deployed nodes in MAAS; does that account for the difference?
[18:54] <xygnal> mpontillo we are using multiple rack controllers, yes.
[18:55] <xygnal> mpontillo only two rack controllers in this environment.  one by itself and one on the same machine as the region controller.
[18:55] <jnielsen> roaksoax: yea the names are based off the ip
[18:56] <jnielsen> mpontillo: these records look stale.
[18:56] <xygnal> mpontillo:  two new sanitized attachments on the bug 1685306
[18:56] <mpontillo> xygnal: I'm now wondering if it's possible that the network configuration is re-detected on each rack controller independently, and causes changes to ripple through the system and ultimately remove the configuration on those interfaces
[18:56] <xygnal> mpontillo shou;d
[18:57] <xygnal> shouldn't HA be aware of this and avoid it?
[18:57] <xygnal> or is HA a largely untested feature in this situation?
[18:57] <roaksoax> jnielsen: whta evrsion of MAAs are you using ? And those are the addresses that MAAS auto-generates for machines in the dynamic range
[18:57] <jnielsen> mpontillo: let me verify the difference to see if they are (stale) and not duplicates, and see if they match the difference
[18:57] <mpontillo> xygnal: well, it's certainly a bug. I wouldn't say it's untested; we often test HA. but this may be an edge case that was missed
[18:57] <jnielsen> roaksoax: 2.1
[18:58] <mpontillo> xygnal: can you try an experiment for me? shut down the rack controller you're not using on the region controller. just stop the service. and then see if the bug still occurs
[18:58] <xygnal> mpontillo hm.  its supposed to be rackd on one and region on the other.  according to services,  rackd, http, tftp are running.  but dhcpd is not.
[18:59] <roaksoax> xygnal: in a HA mode, the secondary rack wont be running DHCP
[18:59] <roaksoax> xygnal: until the primary dies
[18:59] <xygnal> roaksoax: mpontillo: then would I be affected as suspected, or would that not happen?
[19:01] <mpontillo> xygnal: in other words, it sounds like you're not using one of your rack controllers for DHCP, so shutting it down shouldn't hurt anything
[19:01] <xygnal> mpontillo the 'backup' rack controller is on the region controller.  shall I just stop the rackd service?
[19:01] <mpontillo> xygnal: yeah, just stop it and see if it makes anything better. that way we can characterize if this is a bug with HA or not
[19:02] <mpontillo> xygnal: maybe just shut down the secondary, that way you can keep everything on one machine and see if that helps -- reduce the number of variables we're dealing with here.
[19:03] <xygnal> mpontillo thats the plan
[19:10] <jnielsen> mpontillo: I did not verify all of them, but they seem to be stale (not duplicates) and they make up the difference between what maas <user> domain read <domain> is reporting for records
[19:10] <jnielsen> and what the webpage shows
[19:10] <jnielsen> i have to run to a meeting real quick, I'll be back in a hour or so. I'll leave this open
[19:11] <mpontillo> jnielsen: sure, I'd like to understand more what "stale" means. do you have enough information to file a bug for us?
[19:11] <jnielsen> mpontillo: stale means I don't have a machine using that ip
[19:12] <jnielsen> 'll investigate more
[19:13] <mpontillo> jnielsen: so possibly DHCP addresses that expired but were not deleted? that might be a known (and/or fixed) bug. let me check
[19:14] <roaksoax> jnielsen: try this
[19:14] <roaksoax> jnielsen: sudo maas-region shell
[19:14] <roaksoax> from maaserver.models import DNSResource
[19:14] <roaksoax> DNSResources.objects.all()
[19:15] <roaksoax> jnielsen: and I think they come from this:
[19:15] <roaksoax> $GENERATE 190-253 $.90.90.10.in-addr.arpa. IN PTR 10-90-90-$.maas
[19:16] <roaksoax> or similar, actually
[19:34] <mpontillo> jnielsen: is there something special about xxxmaaskhost021?
[19:35] <xygnal> was that for me?
[19:35] <mpontillo> xygnal: ah, yes, sorry jnielsen. it seemed a little strange to me that it has an interface on both xx.xx.97.0/24 and xx.xx.186.0/23
[19:37] <xygnal> mpontillo that is a special host being used for something else right now, we have not been testing on it recently.  would it be affecting other builds?
[19:40] <mpontillo> xygnal: can you try removing those from the environment and see if that impacts the issue? my concern is that it looks like there are IP addresses from two different fabrics on those machines, which may be contributing to the VLAN flip/flop effect and causing the IP addresses to become unconfigured
[19:40] <mpontillo> xygnal: maybe give it a try after you rule out HA?
[19:42] <mpontillo> xygnal: just to confirm, have you tried recommissioning the machines before deploying, too?
[19:46] <xygnal> mpontillo: we've removed and re-comissioned them before deploying, yes
[19:47] <xygnal> xygnal in fact all of the -DEMO nodes I am certainn of this
[19:49] <mpontillo> xygnal: ok great.. looking forward to hearing if taking HA out of the picture makes a difference
[19:49] <xygnal> mpontillo is it just that one node, or all of the nodes named like that, that need to br removed?
[19:49] <xygnal> (that have the dual-networks)
[19:49] <mpontillo> xygnal: I'd just remove them all from MAAS for good measure
[19:49] <mpontillo> xygnal: when it comes time to test that
[19:50] <mpontillo> xygnal: let's try to rule out one thing at a time =)
[19:50] <xygnal> mpontillo i'm testing the no-HA right now, just planning for after
[19:51] <mpontillo> xygnal: great. right now I'm more suspicious of controllers, since they constantly update the region about their network configuration, which might lead to flip/flops if the two controllers don't agree on what the network looks like. then every 30 seconds to one minute you might see VLAN changes that could cause IP addresses to be cleared
[19:51] <xygnal> mpontillo should I rewmove all of the nodes or just those xxxmaask ones?  Being that the other ones have been completely removed and added since re-fabricing
[19:51] <xygnal> mpontillo just trying to avoid making the while dev environment unavailable to my colleages by removing all the hosts
[19:51] <xygnal> whole*
[19:51] <mpontillo> xygnal: I was only suspicious about the xxxmaask ones for now, but honestly I doubt it will change anything, especially if those nodes are just sitting there
[19:52] <xygnal> yes, they are
[19:52] <mpontillo> xygnal: honestly I doubt they are the issue; let's focus on anything running the controller software for now
[19:52] <mpontillo> xygnal: I assumed that was xxxmaas0{1,2}
[20:00] <vasey> hey folks, i'm tryna diagnose this issue when I try to use juju to bootstrap a MAAS controller: https://pastebin.com/Pd8ktgPm
[20:04] <mpontillo> vasey: are you using any IPv6 in your environment? is juju configured to talk to MAAS by means of an IP address or DNS name? if DNS, what does it resolve to?
[20:05] <mpontillo> vasey: in case it helps, here are my notes on testing MAAS with Juju the last time I did it https://gist.github.com/mpontillo/231790806c51cf07e51cbe30e8e0b0a1
[20:16] <vasey> mpontillo: one thing i did was run 'sudo lxd init' before starting up; i now have an lxd bridge interface that i believe my juju is running from, and it's definitely not in the same subnet as my MAAS setup. how would i revert that operation?
[20:20] <vasey> mpontillo: and to answer your questions, i don't believe i have ipv6 configured anywhere, how would i verify for MAAS? and juju is configured to talk to MAAS via IP address, not dns name
[20:22] <mpontillo> vasey: the lxd bridge shouldn't affect things, but your containers' default profile would most likely be using the bridge, with NAT
[20:23] <mpontillo> vasey: come to think of it, that error is most likely occurring when Juju selects a MAAS node to use as the juju controller, and attempts to allocate and deploy it. check your nodes' network configuration to ensure they have automatic IP addresses on the subnet(s) you expect
[20:24] <jnielsen> mpontillo: interestingly the entries with the ip-like name have a live machine associated with them (not on my list)
[20:24] <mup> Bug #1686195 opened: [2.2] MAAS should include a script to test enlistment <MAAS:Triaged by mpontillo> <https://launchpad.net/bugs/1686195>
[20:24] <jnielsen> mpontillo: I'll investigate this further, thanks for the help! you too roaksoax:
[20:26] <mpontillo> jnielsen: right, my guess is that they were created by DHCP leases and and safe to delete. I believe there is a bug where we do not delete them when the lease expires. I don't see a fix on the 2.1 branch
[20:26] <vasey> mpontillo: that did the trick, none of my nodes had interfaces with auto-assign IPs added. is it possible to have juju run on a node that will be part of an openstack distro or is that just a terrible idea?
[20:26] <jnielsen> mpontillo: do you know how to delete them?
[20:27] <vasey> mpontillo: openstack deployment***
[20:27] <mpontillo> jnielsen: you can delete them using the command-line interface. probably something like: "maas $PROFILE dnsresource delete <id>".
[20:27] <mpontillo> jnielsen: but no guarantees because I didn't test that command =)
[20:28] <mpontillo> vasey: I think juju was designed that way for separation of concerns; if something takes down your deployment, you don't want that to affect juju itself
[20:30] <mpontillo> vasey: what I have seen many customers do is use a constraint in juju to select a specific machine in MAAS; you could, for example, use the virsh support to make your juju select a manually-configured VM on the MAAS server to be the controller
[20:35] <jnielsen> mpontillo: they don't show up in my dnsresource list with maas <user> dnsresources read
[20:37] <vasey> mpontillo: ahhhh that makes sense
[20:45] <mpontillo> jnielsen: ah, I must have misunderstood; I thought you were saying they showed up in dnsresources read but not on the DNS page; it's the opposite, then?
[20:53] <mpontillo> jnielsen: hmm, on my MAAS I see DHCP-provided hostnames if I do something like this: maas $PROFILE dnsresources read | jq -c '.[] | {fqdn:.fqdn, id:.id}'
[20:53] <mpontillo> jnielsen: and I see things in the DNS details page for that domain which belong to my deployed nodes (they appear with a hyperlink so I can click them and go to the node)
[20:54] <mpontillo> jnielsen: but I don't see anything that isn't a node that doesn't appear in the domain details
[20:54] <mpontillo> jnielsen: I guess it would help if you can pastebin: dig @127.0.0.1 -t axfr <domain>
[20:54] <mpontillo> jnielsen: and the dnsresources read with the jq above
[21:05] <xygnal> mpontillo:  no luck . I even deleted the node I am building entirely and re-added.  Comission worked fine, deploy fails the same.
[21:05] <xygnal> mpontillo and yes, i removed the khost nodes before even doing that
[21:07] <vasey> mpontillo: if i'm using an ESXi-managed VM, how do i use virsh  as power control?
[21:11] <vasey> mpontillo: if i use the VMware power options, using the UUID,  host IP, username and pass, i get this error: "No rack controllers can access the BMC of node: fluent-minnow"
[21:11] <vasey> despite my host being able to ping that host. do i need to enable ssh in ESXi, now that i think about it?
[21:12] <xygnal> mpontillo:  updated bug report with two sanitized copies of the logs from that time period today
[21:35] <mpontillo> vasey: you need to install a python package called pyvmomi to talk to the ESXi API; probably KVM is more well-tested since few people have VMware licenses handy to test with
[21:35] <mpontillo> vasey: should be available as an apt package, python3-pyvmomi I think -- but there are a few issues with it, certificate checking has been a problem in the past
[21:35] <mpontillo> vasey: if you get an error that no rack controllers can access the BMC, you might need to refresh the power status to see if MAAS can determine which racks can reach it
[21:37] <mpontillo> xygnal: OK, so without the HA rack enabled and without those dual-network nodes, you still have the problem? just to confirm, can you try a second deploy with one of the machines and tell me if it's any different? (that's weird.) I'll have to give this some more thought. please upgrade to the next rc build when it comes out; I've added some additional
[21:37] <mpontillo> logging that may help, too
[21:39] <xygnal> mpontillo: eta on Rc3?
[21:41] <roaksoax> xygnal: ppa:maas/next-proposed has the proposed version of rc3
[23:09] <mup> Bug #1686234 opened: [2.2] MAAS does not delete DNS records for released DHCP leases <MAAS:Triaged by mpontillo> <https://launchpad.net/bugs/1686234>