[03:43] Hi mpontillo [03:48] After https://askubuntu.com/questions/873701/maas-node-unable-to-resolve-its-own-hostname updated, I'm still trying to get the commissioning working. [03:50] the current problem is that the storage info is still missing after commissioning, with a valid 00-maas-07-block-devices.out output (/dev/sda is listed). [03:50] What else can I try to diagnose the situation? [04:46] tdihp: I would run this script on the commissioning node while SSH'd in to see where the failure is. [04:46] https://gist.github.com/pontillo/0b92a7da2fba43fb5dce705be2dcf38b [04:47] tdihp: that is, the commissioning node executes (pretty much) those commands to figure out what disks are on the system [04:47] tdihp: so you should be able to tell where it's going wrong. I think lamont's theory was that it was timing out trying to contact an iSCSI disk specified by a name rather than an IP address [04:50] Thanks mpontillo, I'll try. [05:09] tdihp: updated https://gist.github.com/pontillo/0b92a7da2fba43fb5dce705be2dcf38b -- now includes also the 'sudo' commands [06:25] mpontillo: I've tried the script [06:26] It will take more than 2 minutes, with more than 6 sudo calls [06:27] However the 00-maas-07-block-devices.err file just has 4 lines of sudo errors. [06:27] mpontillo: Oddly though the script you provided did not reproduce those sudo error messages [06:29] tdihp: my script redirects sudo's stderr to /dev/null =) [06:32] tdihp: can you humor me and do two more things... (1) tell me the output of `iscsiadm -m session`, and (2) if you can determine that open-iscsi is trying to connect to MAAS via a hostname rather than an IP address (either iscsiadm or mount should tell you), run "sudo dpkg-reconfigure -plow maas-rack-controller" on the MAAS server, and make sure the MAAS URL [06:32] is set to an IP address and not a hostname [06:32] mpontillo: And yes, After removing those "2>/dev/null" I saw the identical errors [06:33] mpontillo: the iscsi output: tcp: [1] 10.9.8.1:3260,1 iqn.2004-05.com.ubuntu:maas:ephemeral-ubuntu-amd64-generic-xenial-daily (non-flash) [06:34] tdihp: lamont's theory, by the way, was that the sudo errors are a red herring; I think he thought the actual issue was probing an iSCSI target by its hostname rather than an IP address [06:37] mpontillo: I see. The iscsi target is in ip though. [06:38] tdihp: yeah. I see that. I wonder if it was passed into the kernel that way. try: [06:38] cat /proc/cmdline | tr ' ' '\n' | grep iscsi_target_ip [06:39] tdihp: leave off the grep and you should also see a root= line that has the iSCSI /dev/ path in there. [06:39] iscsi_target_ip=10.9.8.1 [06:39] ok [06:39] tdihp: hm. so when you tried my script, did storage information come back as expected? [06:41] mpontillo: yes, the script did output sda sda1 and sdb [06:41] root=/dev/disk/by-path/ip-10.9.8.1:3260-iscsi-iqn.2004-05.com.ubuntu:maas:ephemeral-ubuntu-amd64-generic-xenial-daily-lun-1 [06:42] The only line(s) with any hostnames are: iscsi_initiator=pure-mammal -- ip=::::pure-mammal:BOOTIF [06:43] tdihp: so it's curious that you don't have that same data when you commission then. in my setup, when I browse to the node details page, scroll down to commissioning details, and click on "00-maas-07-block-devices.out", I see a JSON object which is a list of two dictionaries (each disk in the system) [06:45] mpontillo: Yes, my "00-maas-07-block-devices.out" also lists two items, curious truly :D [06:47] tdihp: ok, what state is the node in? perhaps the commissioning is failing at some other point? [06:47] mpontillo: BTW I've figured a way to tap to the HTTP calls the node made to the MAAS server [06:47] tdihp: oh yeah? what was your method? I've captured packets in the past, but that was a pain. later I noticed we had a mass_get.py script or similar which made the calls, heh [06:48] mpontillo: the node is now in "ready", just with "storage" as 0 GB [06:48] mpontillo: Yeah I wish I could use mitmproxy too. [06:49] mpontillo: https://sysdig.com/blog/decode-your-http-traffic-with-sysdig/ the echo_fds bit [06:49] tdihp: so 00-maas-07-block-devices.out -- I assume it doesn't contain sizes then? [06:50] tdihp: do the sudo commands in my script get the block size? (just noticed a typo, the second one should be bsz: and not size64:) [06:50] tdihp: ah nice, that tool looks interesting [06:51] tdihp: and I didn't use a MitM method, I just used "sudo tcpdump" on the rack controller, btw [06:51] tdihp: but I had to wade through a huge packet capture ;-) [06:53] mpontillo: So the point was, between the "starting 00-maas-07-block-devices" request and the report, there was less than 80 secs [06:55] mpontillo: which matches the lag of 4 sudo calls (2 devices) [06:56] tdihp: well, the important thing to me is whether or not you're getting the data. if sudo is slow, that's not good. but I guess it's aborting before it's able to get the data. which is weird... on my system, I still get the "unable to resolve" error, but it works immediately [06:57] tdihp: I suspect that is because I am getting a NXDOMAIN response back from the DNS server, and it's able to fail fast, whereas in your case, for some reason, the packets don't make it, and it times out [06:58] mpontillo: the 00-maas-07-block-devices.out does include size for sda like: "SIZE": "500107862016", [06:59] tdihp: well that's interesting. so I assume you have a 500 GB disk then [07:00] tdihp: if that's the case then I'm surprised it isn't being reported correctly. [07:00] mpontillo: Yeah, MAAS still reports "No storage information" [07:06] tdihp: is there anything interesting under /var/log/maas/rsyslog//* on the MAAS server? [07:08] tdihp: another thing you could do is look at: curl $(cat /proc/cmdline | tr ' ' '\n' | grep cloud-config-url | cut -d= -f2-) [07:09] tdihp: you should see under datasource: and reporting: the URLs used to communicate back to the MAAS server. though I expect they're okay given that you seem to have commissioning output. [07:09] tdihp: at this point, I would also check /var/log/maas/regiond.log, and check for any Python tracebacks. maybe that could provide a clue why the size couldn't be calculated [07:12] maasserver: [error] Error while calling DescribePowerTypes: Unable to get RPC connection for rack controller 'nv750' (k4am4k). [07:12] tdihp: are the region and the rack on the same machine? is that consistent or a one-off? [07:13] tdihp: that could happen if, for example, MAAS is restarting [07:13] tdihp: what you might do is "tail -f" the log and try commissioning again, then see if any tracebacks occur during commissioning [07:14] mpontillo: It's the same machine [07:14] ok. probably a red herring then, unless you have firewall rules I don't know about ;-) [07:15] mpontillo: and its not one-off [07:16] tdihp: hm, well that is weird then. if you go to Nodes > Controllers and click your controller, is the status all green? [07:17] maasserver.rack_controller: [critical] Failed configuring DHCP on rack controller 'id:1'. [07:17] mpontillo: oops [07:18] mpontillo: It seems the rack controller is offline on MAAS [07:18] tdihp: well that's an interesting clue, however it seems DHCP is working, or you wouldn't be able to PXE boot? [07:18] tdihp: ok, I would "tail -f rackd.log" and then "service maas-rackd restart" [07:18] mpontillo: Yes DHCP/TFTP seems working [07:19] mpontillo: Oh, My IP have changed [07:21] mpontillo: the rackd is fixed, I'll try commision again [07:21] tdihp: great. hoping it works this time =) [07:28] mpontillo: I guess I still have to try harder [07:31] mpontillo: is it possible that the MAAS server failed to update the node status when commissioning? [07:32] tdihp: you can look at the node event log and find out [07:37] mpontillo: there is a burst of events like this: pure-mammal ureadahead[463]: ureadahead:events/fs/open_exec/enable: Ignored relative path [07:47] tdihp: that might be a red herring https://bugs.launchpad.net/maas/+bug/1643838 [07:47] tdihp: any chance you can pastebin your regiond.log and rackd.log? [07:48] tdihp: maas.log too for completeness [07:48] mpontillo: Sure I can. [07:53] mpontillo: https://1drv.ms/u/s!Ap2xOCxdN0NAjCHA48CAfrmpySNc [07:54] mpontillo: Can you download the file? [07:54] Oh, you mean there's a real pastebin tool [07:55] pls wait, I'll paste there [08:01] tdihp: no that's fine, I grabbed it and was looking through [08:02] tdihp: haven't found anything interesting though [08:03] Bug #1656717 opened: Juju -> MAAS [2.2+] API integration needs to account for null spaces [08:04] Bug #1656717 opened: Juju -> MAAS [2.2+] API integration needs to account for null spaces [08:05] Bug #1643001 changed: Moonshot iLO4 'Power HW address' prevent ipmitool from working [08:08] mpontillo: Haven't mentioned but the commissioning did work for only one time, and I could never reproduce =) === frankban|afk is now known as frankban [08:15] tdihp: when was the last time you reopened the browser tab with your MAAS UI? can you try force-reloading it? I may have just spotted the bug. it's possible that if the amount of storage changes, the UI won't refresh properly. it may be that your node commissioned just fine. [08:15] tdihp: the fact that the node is in "Ready" state means that MAAS thinks the node is usable. it would be in "Failed Commissioning" state if MAAS thought otherwise. [08:17] tdihp: if that doesn't work, I would try deleting all the storage devices and recommissioning the node (though it SHOULD work without doing that; MAAS tries to update existing devices though, and I'm wondering if that step somehow goes horribly wrong) [08:19] mpontillo: Yes! Strange enough after having a new browser access, the storage size has shown! [08:19] * mpontillo groans [08:20] I'll try delete the node and do another check [08:20] tdihp: glad we're finally through that =) [08:20] * mpontillo should sleep soon; it's after midnight here [08:20] mpontillo: Thanks really. Good night, cheers [08:21] tdihp: happy to help; have a good day. [08:51] Hi there [08:51] What is the current way of creating custom CentOS images with MAAS-2.x? [09:31] jlec_, CentOS is available by default in MAAS now [09:32] no custom images needed [10:09] brendand: yes I know. But the official images provided by canonical are older versions. I need 7.2/3 [10:10] secondly I like to change the base image deployed. [10:10] And lastely only 6.6 picks up an IP, but the 7.0 fails to do so [10:37] Bug #1656717 opened: Juju -> MAAS [2.2+] API integration needs to account for null spaces [10:38] Bug #1643001 changed: Moonshot iLO4 'Power HW address' prevent ipmitool from working [10:55] Hi [10:55] I see the maas-builder-image are deprecated, do you know how made custom image ? [12:47] Does anybody have problem commissioning nodes using 16.04 with an offline MAAS install? [12:49] I'm getting Hash sum mismatch, I think, it's because its renaming the files in /var/lib/apt/lists === dannf` is now known as dannf [16:44] MrLeau: I have seen that before when my mirror was out of date. For me it meant that my images were more up-to-date than my mirror. [16:44] MrLeau: MAAS installs packages from the archive during commissioning, for things like checking lldp connectivity. [16:48] Yes, it is very likely that my mirror is out of date, but I found if I delete everything from /var/lib/apt/list, apt will update it, it works [16:49] If there was a way to clean /var/lib/apt/list I think it would fix my problem but the commission script runs after the standard apt-get update === frankban is now known as frankban|afk [20:17] hi again. now that the maas-image-builder is deprecated, what is the state of the art way to build custom images? [20:40] jlec, i don't think there is a way afaik [20:41] pmatulis: but how does canonical do it to preovide the image to the community? [20:42] There is something around curtin, but no real docs anywhere === frankban|afk is now known as frankban [21:00] I have a small maas system with http://maas.ubuntu.com/images/ephemeral-v2/daily/ as the boot image sync url, and a while back when I first set it up, I was able to use it to boot with either xenial or zesty. [21:00] but now if I try to boot start node with distro_series=zesty, it tells me: [21:00] {"distro_series": ["'zesty' is not a valid distro_series. It should be one of: '', 'ubuntu/trusty', 'ubuntu/wily', 'ubuntu/xenial'."]} [21:01] but distro_series=xenial works just fine [21:01] brendand: iirc you had helped me with this a while back when it first worked, any ideas? [21:32] plars, what version of maas are you using? [21:33] pmatulis: 1.9.4+bzr459 [21:34] plars, oh. you haven't up'd to the 2.x series. i only really started using maas with 2.0 and 2.1 [21:35] pmatulis: for various reasons, I'm reluctant to update from that. We are trying to use an existing dns/dhcp server in this lab rather than letting maas handle that function. This worked for a long time, then broke in an update a while back. Folks here were able to help me convince it to (mostly) quit trying to force us down that path but warned me that [21:35] future updates might break it completely [21:36] and this did previously import the zesty images and let us deploy with it at one time, so I'm not sure when it stopped [21:36] or why [21:37] plars, maybe go directly to the URL and see if zesty is there [21:37] pmatulis: I see it on that url [21:37] ohh [21:37] there was an update a few days ago it seems [21:37] to that image [21:38] plars, i suggest filing a bug then === frankban is now known as frankban|afk