[03:43] <tdihp> Hi mpontillo
[03:48] <tdihp> After https://askubuntu.com/questions/873701/maas-node-unable-to-resolve-its-own-hostname updated, I'm still trying to get the commissioning working.
[03:50] <tdihp> the current problem is that the storage info is still missing after commissioning, with a valid 00-maas-07-block-devices.out output (/dev/sda is listed).
[03:50] <tdihp> What else can I try to diagnose the situation?
[04:46] <mpontillo> tdihp: I would run this script on the commissioning node while SSH'd in to see where the failure is.
[04:46] <mpontillo> https://gist.github.com/pontillo/0b92a7da2fba43fb5dce705be2dcf38b
[04:47] <mpontillo> tdihp: that is, the commissioning node executes (pretty much) those commands to figure out what disks are on the system
[04:47] <mpontillo> tdihp: so you should be able to tell where it's going wrong. I think lamont's theory was that it was timing out trying to contact an iSCSI disk specified by a name rather than an IP address
[04:50] <tdihp> Thanks mpontillo, I'll try.
[05:09] <mpontillo> tdihp: updated https://gist.github.com/pontillo/0b92a7da2fba43fb5dce705be2dcf38b -- now includes also the 'sudo' commands
[06:25] <tdihp> mpontillo: I've tried the script
[06:26] <tdihp> It will take more than 2 minutes, with more than 6 sudo calls
[06:27] <tdihp> However the 	00-maas-07-block-devices.err file just has 4 lines of sudo errors.
[06:27] <tdihp> mpontillo: Oddly though the script you provided did not reproduce those sudo error messages
[06:29] <mpontillo> tdihp: my script redirects sudo's stderr to /dev/null =)
[06:32] <mpontillo> tdihp: can you humor me and do two more things... (1) tell me the output of `iscsiadm -m session`, and (2) if you can determine that open-iscsi is trying to connect to MAAS via a hostname rather than an IP address (either iscsiadm or mount should tell you), run "sudo dpkg-reconfigure -plow maas-rack-controller" on the MAAS server, and make sure the MAAS URL
[06:32] <mpontillo> is set to an IP address and not a hostname
[06:32] <tdihp> mpontillo: And yes, After removing those "2>/dev/null" I saw the identical errors
[06:33] <tdihp> mpontillo: the iscsi output: tcp: [1] 10.9.8.1:3260,1 iqn.2004-05.com.ubuntu:maas:ephemeral-ubuntu-amd64-generic-xenial-daily (non-flash)
[06:34] <mpontillo> tdihp: lamont's theory, by the way, was that the sudo errors are a red herring; I think he thought the actual issue was probing an iSCSI target by its hostname rather than an IP address
[06:37] <tdihp> mpontillo: I see. The iscsi target is in ip though.
[06:38] <mpontillo> tdihp: yeah. I see that. I wonder if it was passed into the kernel that way. try:
[06:38] <mpontillo> cat /proc/cmdline | tr ' ' '\n' | grep iscsi_target_ip
[06:39] <mpontillo> tdihp: leave off the grep and you should also see a root= line that has the iSCSI /dev/ path in there.
[06:39] <tdihp> iscsi_target_ip=10.9.8.1
[06:39] <mpontillo> ok
[06:39] <mpontillo> tdihp: hm. so when you tried my script, did storage information come back as expected?
[06:41] <tdihp> mpontillo: yes, the script did output sda sda1 and sdb
[06:41] <tdihp> root=/dev/disk/by-path/ip-10.9.8.1:3260-iscsi-iqn.2004-05.com.ubuntu:maas:ephemeral-ubuntu-amd64-generic-xenial-daily-lun-1
[06:42] <tdihp> The only line(s) with any hostnames are: iscsi_initiator=pure-mammal -- ip=::::pure-mammal:BOOTIF
[06:43] <mpontillo> tdihp: so it's curious that you don't have that same data when you commission then. in my setup, when I browse to the node details page, scroll down to commissioning details, and click on "00-maas-07-block-devices.out", I see a JSON object which is a list of two dictionaries (each disk in the system)
[06:45] <tdihp> mpontillo: Yes, my "00-maas-07-block-devices.out" also lists two items, curious truly :D
[06:47] <mpontillo> tdihp: ok, what state is the node in? perhaps the commissioning is failing at some other point?
[06:47] <tdihp> mpontillo: BTW I've figured a way to tap to the HTTP calls the node made to the MAAS server
[06:47] <mpontillo> tdihp: oh yeah? what was your method? I've captured packets in the past, but that was a pain. later I noticed we had a mass_get.py script or similar which made the calls, heh
[06:48] <tdihp> mpontillo: the node is now in "ready", just with "storage" as 0 GB
[06:48] <tdihp> mpontillo: Yeah I wish I could use mitmproxy too.
[06:49] <tdihp> mpontillo: https://sysdig.com/blog/decode-your-http-traffic-with-sysdig/ the echo_fds bit
[06:49] <mpontillo> tdihp: so 00-maas-07-block-devices.out -- I assume it doesn't contain sizes then?
[06:50] <mpontillo> tdihp: do the sudo commands in my script get the block size? (just noticed a typo, the second one should be bsz: and not size64:)
[06:50] <mpontillo> tdihp: ah nice, that tool looks interesting
[06:51] <mpontillo> tdihp: and I didn't use a MitM method, I just used "sudo tcpdump" on the rack controller, btw
[06:51] <mpontillo> tdihp: but I had to wade through a huge packet capture ;-)
[06:53] <tdihp> mpontillo: So the point was, between the "starting 00-maas-07-block-devices" request and the report, there was less than 80 secs
[06:55] <tdihp> mpontillo: which matches the lag of 4 sudo calls (2 devices)
[06:56] <mpontillo> tdihp: well, the important thing to me is whether or not you're getting the data. if sudo is slow, that's not good. but I guess it's aborting before it's able to get the data. which is weird... on my system, I still get the "unable to resolve" error, but it works immediately
[06:57] <mpontillo> tdihp: I suspect that is because I am getting a NXDOMAIN response back from the DNS server, and it's able to fail fast, whereas in your case, for some reason, the packets don't make it, and it times out
[06:58] <tdihp> mpontillo: the 00-maas-07-block-devices.out does include size for sda like: "SIZE": "500107862016",
[06:59] <mpontillo> tdihp: well that's interesting. so I assume you have a 500 GB disk then
[07:00] <mpontillo> tdihp: if that's the case then I'm surprised it isn't being reported correctly.
[07:00] <tdihp> mpontillo: Yeah, MAAS still reports "No storage information"
[07:06] <mpontillo> tdihp: is there anything interesting under /var/log/maas/rsyslog/<hostname>/* on the MAAS server?
[07:08] <mpontillo> tdihp: another thing you could do is look at: curl $(cat /proc/cmdline | tr ' ' '\n' | grep cloud-config-url | cut -d= -f2-)
[07:09] <mpontillo> tdihp: you should see under datasource: and reporting: the URLs used to communicate back to the MAAS server. though I expect they're okay given that you seem to have commissioning output.
[07:09] <mpontillo> tdihp: at this point, I would also check /var/log/maas/regiond.log, and check for any Python tracebacks. maybe that could provide a clue why the size couldn't be calculated
[07:12] <tdihp> maasserver: [error] Error while calling DescribePowerTypes: Unable to get RPC connection for rack controller 'nv750' (k4am4k).
[07:12] <mpontillo> tdihp: are the region and the rack on the same machine? is that consistent or a one-off?
[07:13] <mpontillo> tdihp: that could happen if, for example, MAAS is restarting
[07:13] <mpontillo> tdihp: what you might do is "tail -f" the log and try commissioning again, then see if any tracebacks occur during commissioning
[07:14] <tdihp> mpontillo: It's the same machine
[07:14] <mpontillo> ok. probably a red herring then, unless you have firewall rules I don't know about ;-)
[07:15] <tdihp> mpontillo: and its not one-off
[07:16] <mpontillo> tdihp: hm, well that is weird then. if you go to Nodes > Controllers and click your controller, is the status all green?
[07:17] <tdihp> maasserver.rack_controller: [critical] Failed configuring DHCP on rack controller 'id:1'.
[07:17] <tdihp> mpontillo: oops
[07:18] <tdihp> mpontillo: It seems the rack controller is offline on MAAS
[07:18] <mpontillo> tdihp: well that's an interesting clue, however it seems DHCP is working, or you wouldn't be able to PXE boot?
[07:18] <mpontillo> tdihp: ok, I would "tail -f rackd.log" and then "service maas-rackd restart"
[07:18] <tdihp> mpontillo: Yes DHCP/TFTP seems working
[07:19] <tdihp> mpontillo: Oh, My IP have changed
[07:21] <tdihp> mpontillo: the rackd is fixed, I'll try commision again
[07:21] <mpontillo> tdihp: great. hoping it works this time =)
[07:28] <tdihp> mpontillo: I guess I still have to try harder
[07:31] <tdihp> mpontillo: is it possible that the MAAS server failed to update the node status when commissioning?
[07:32] <mpontillo> tdihp: you can look at the node event log and find out
[07:37] <tdihp> mpontillo: there is a burst of events like this: pure-mammal ureadahead[463]: ureadahead:events/fs/open_exec/enable: Ignored relative path
[07:47] <mpontillo> tdihp: that might be a red herring https://bugs.launchpad.net/maas/+bug/1643838
[07:47] <mpontillo> tdihp: any chance you can pastebin your regiond.log and rackd.log?
[07:48] <mpontillo> tdihp: maas.log too for completeness
[07:48] <tdihp> mpontillo: Sure I can.
[07:53] <tdihp> mpontillo: https://1drv.ms/u/s!Ap2xOCxdN0NAjCHA48CAfrmpySNc
[07:54] <tdihp> mpontillo: Can you download the file?
[07:54] <tdihp> Oh, you mean there's a real pastebin tool
[07:55] <tdihp> pls wait, I'll paste there
[08:01] <mpontillo> tdihp: no that's fine, I grabbed it and was looking through
[08:02] <mpontillo> tdihp: haven't found anything interesting though
[08:03] <mup> Bug #1656717 opened: Juju -> MAAS [2.2+] API integration needs to account for null spaces <juju:Triaged> <MAAS:In Progress by mpontillo> <https://launchpad.net/bugs/1656717>
[08:04] <mup> Bug #1656717 opened: Juju -> MAAS [2.2+] API integration needs to account for null spaces <juju:Triaged> <MAAS:In Progress by mpontillo> <https://launchpad.net/bugs/1656717>
[08:05] <mup> Bug #1643001 changed: Moonshot iLO4 'Power HW address' prevent ipmitool from working <MAAS:Expired> <https://launchpad.net/bugs/1643001>
[08:08] <tdihp> mpontillo: Haven't mentioned but the commissioning did work for only one time, and I could never reproduce =)
[08:15] <mpontillo> tdihp: when was the last time you reopened the browser tab with your MAAS UI? can you try force-reloading it? I may have just spotted the bug. it's possible that if the amount of storage changes, the UI won't refresh properly. it may be that your node commissioned just fine.
[08:15] <mpontillo> tdihp: the fact that the node is in "Ready" state means that MAAS thinks the node is usable. it would be in "Failed Commissioning" state if MAAS thought otherwise.
[08:17] <mpontillo> tdihp: if that doesn't work, I would try deleting all the storage devices and recommissioning the node (though it SHOULD work without doing that; MAAS tries to update existing devices though, and I'm wondering if that step somehow goes horribly wrong)
[08:19] <tdihp> mpontillo: Yes! Strange enough after having a new browser access, the storage size has shown!
[08:19]  * mpontillo groans
[08:20] <tdihp> I'll try delete the node and do another check
[08:20] <mpontillo> tdihp: glad we're finally through that =)
[08:20]  * mpontillo should sleep soon; it's after midnight here
[08:20] <tdihp> mpontillo: Thanks really. Good night, cheers
[08:21] <mpontillo> tdihp: happy to help; have a good day.
[08:51] <jlec_> Hi there
[08:51] <jlec_> What is the current way of creating custom CentOS images with MAAS-2.x?
[09:31] <brendand> jlec_, CentOS is available by default in MAAS now
[09:32] <brendand> no custom images needed
[10:09] <jlec_> brendand: yes I know. But the official images provided by canonical are older versions. I need 7.2/3
[10:10] <jlec_> secondly I like to change the base image deployed.
[10:10] <jlec_> And lastely only 6.6 picks up an IP, but the 7.0 fails to do so
[10:37] <mup> Bug #1656717 opened: Juju -> MAAS [2.2+] API integration needs to account for null spaces <juju:Triaged> <MAAS:In Progress by mpontillo> <https://launchpad.net/bugs/1656717>
[10:38] <mup> Bug #1643001 changed: Moonshot iLO4 'Power HW address' prevent ipmitool from working <MAAS:Expired> <https://launchpad.net/bugs/1643001>
[10:55] <Yoofy> Hi
[10:55] <Yoofy> I see the maas-builder-image are deprecated, do you know how made custom image ?
[12:47] <MrLeau> Does anybody have problem commissioning nodes using 16.04 with an offline MAAS install?
[12:49] <MrLeau> I'm getting Hash sum mismatch, I think, it's because its renaming the files in /var/lib/apt/lists
[16:44] <mpontillo> MrLeau: I have seen that before when my mirror was out of date. For me it meant that my images were more up-to-date than my mirror.
[16:44] <mpontillo> MrLeau: MAAS installs packages from the archive during commissioning, for things like checking lldp connectivity.
[16:48] <MrLeau> Yes, it is very likely that my mirror is out of date, but I found if I delete everything from /var/lib/apt/list, apt will update it, it works
[16:49] <MrLeau> If there was a way to clean /var/lib/apt/list I think it would fix my problem but the commission script runs after the standard apt-get update
[20:17] <jlec> hi again. now that the maas-image-builder is deprecated, what is the state of the art way to build custom images?
[20:40] <pmatulis> jlec, i don't think there is a way afaik
[20:41] <jlec> pmatulis: but how does canonical do it to preovide the image to the community?
[20:42] <jlec> There is something around curtin, but no real docs anywhere
[21:00] <plars> I have a small maas system with http://maas.ubuntu.com/images/ephemeral-v2/daily/ as the boot image sync url, and a while back when I first set it up, I was able to use it to boot with either xenial or zesty.
[21:00] <plars> but now if I try to boot start node with distro_series=zesty, it tells me:
[21:00] <plars> {"distro_series": ["'zesty' is not a valid distro_series.  It should be one of: '', 'ubuntu/trusty', 'ubuntu/wily', 'ubuntu/xenial'."]}
[21:01] <plars> but distro_series=xenial works just fine
[21:01] <plars> brendand: iirc you had helped me with this a while back when it first worked, any ideas?
[21:32] <pmatulis> plars, what version of maas are you using?
[21:33] <plars> pmatulis: 1.9.4+bzr459
[21:34] <pmatulis> plars, oh. you haven't up'd to the 2.x series. i only really started using maas with 2.0 and 2.1
[21:35] <plars> pmatulis: for various reasons, I'm reluctant to update from that. We are trying to use an existing dns/dhcp server in this lab rather than letting maas handle that function. This worked for a long time, then broke in an update a while back. Folks here were able to help me convince it to (mostly) quit trying to force us down that path but warned me that
[21:35] <plars> future updates might break it completely
[21:36] <plars> and this did previously import the zesty images and let us deploy with it at one time, so I'm not sure when it stopped
[21:36] <plars> or why
[21:37] <pmatulis> plars, maybe go directly to the URL and see if zesty is there
[21:37] <plars> pmatulis: I see it on that url
[21:37] <pmatulis> ohh
[21:37] <plars> there was an update a few days ago it seems
[21:37] <plars> to that image
[21:38] <pmatulis> plars, i suggest filing a bug then