tdihp | Hi mpontillo | 03:43 |
---|---|---|
tdihp | After https://askubuntu.com/questions/873701/maas-node-unable-to-resolve-its-own-hostname updated, I'm still trying to get the commissioning working. | 03:48 |
tdihp | the current problem is that the storage info is still missing after commissioning, with a valid 00-maas-07-block-devices.out output (/dev/sda is listed). | 03:50 |
tdihp | What else can I try to diagnose the situation? | 03:50 |
mpontillo | tdihp: I would run this script on the commissioning node while SSH'd in to see where the failure is. | 04:46 |
mpontillo | https://gist.github.com/pontillo/0b92a7da2fba43fb5dce705be2dcf38b | 04:46 |
mpontillo | tdihp: that is, the commissioning node executes (pretty much) those commands to figure out what disks are on the system | 04:47 |
mpontillo | tdihp: so you should be able to tell where it's going wrong. I think lamont's theory was that it was timing out trying to contact an iSCSI disk specified by a name rather than an IP address | 04:47 |
tdihp | Thanks mpontillo, I'll try. | 04:50 |
mpontillo | tdihp: updated https://gist.github.com/pontillo/0b92a7da2fba43fb5dce705be2dcf38b -- now includes also the 'sudo' commands | 05:09 |
tdihp | mpontillo: I've tried the script | 06:25 |
tdihp | It will take more than 2 minutes, with more than 6 sudo calls | 06:26 |
tdihp | However the 00-maas-07-block-devices.err file just has 4 lines of sudo errors. | 06:27 |
tdihp | mpontillo: Oddly though the script you provided did not reproduce those sudo error messages | 06:27 |
mpontillo | tdihp: my script redirects sudo's stderr to /dev/null =) | 06:29 |
mpontillo | tdihp: can you humor me and do two more things... (1) tell me the output of `iscsiadm -m session`, and (2) if you can determine that open-iscsi is trying to connect to MAAS via a hostname rather than an IP address (either iscsiadm or mount should tell you), run "sudo dpkg-reconfigure -plow maas-rack-controller" on the MAAS server, and make sure the MAAS URL | 06:32 |
mpontillo | is set to an IP address and not a hostname | 06:32 |
tdihp | mpontillo: And yes, After removing those "2>/dev/null" I saw the identical errors | 06:32 |
tdihp | mpontillo: the iscsi output: tcp: [1] 10.9.8.1:3260,1 iqn.2004-05.com.ubuntu:maas:ephemeral-ubuntu-amd64-generic-xenial-daily (non-flash) | 06:33 |
mpontillo | tdihp: lamont's theory, by the way, was that the sudo errors are a red herring; I think he thought the actual issue was probing an iSCSI target by its hostname rather than an IP address | 06:34 |
tdihp | mpontillo: I see. The iscsi target is in ip though. | 06:37 |
mpontillo | tdihp: yeah. I see that. I wonder if it was passed into the kernel that way. try: | 06:38 |
mpontillo | cat /proc/cmdline | tr ' ' '\n' | grep iscsi_target_ip | 06:38 |
mpontillo | tdihp: leave off the grep and you should also see a root= line that has the iSCSI /dev/ path in there. | 06:39 |
tdihp | iscsi_target_ip=10.9.8.1 | 06:39 |
mpontillo | ok | 06:39 |
mpontillo | tdihp: hm. so when you tried my script, did storage information come back as expected? | 06:39 |
tdihp | mpontillo: yes, the script did output sda sda1 and sdb | 06:41 |
tdihp | root=/dev/disk/by-path/ip-10.9.8.1:3260-iscsi-iqn.2004-05.com.ubuntu:maas:ephemeral-ubuntu-amd64-generic-xenial-daily-lun-1 | 06:41 |
tdihp | The only line(s) with any hostnames are: iscsi_initiator=pure-mammal -- ip=::::pure-mammal:BOOTIF | 06:42 |
mpontillo | tdihp: so it's curious that you don't have that same data when you commission then. in my setup, when I browse to the node details page, scroll down to commissioning details, and click on "00-maas-07-block-devices.out", I see a JSON object which is a list of two dictionaries (each disk in the system) | 06:43 |
tdihp | mpontillo: Yes, my "00-maas-07-block-devices.out" also lists two items, curious truly :D | 06:45 |
mpontillo | tdihp: ok, what state is the node in? perhaps the commissioning is failing at some other point? | 06:47 |
tdihp | mpontillo: BTW I've figured a way to tap to the HTTP calls the node made to the MAAS server | 06:47 |
mpontillo | tdihp: oh yeah? what was your method? I've captured packets in the past, but that was a pain. later I noticed we had a mass_get.py script or similar which made the calls, heh | 06:47 |
tdihp | mpontillo: the node is now in "ready", just with "storage" as 0 GB | 06:48 |
tdihp | mpontillo: Yeah I wish I could use mitmproxy too. | 06:48 |
tdihp | mpontillo: https://sysdig.com/blog/decode-your-http-traffic-with-sysdig/ the echo_fds bit | 06:49 |
mpontillo | tdihp: so 00-maas-07-block-devices.out -- I assume it doesn't contain sizes then? | 06:49 |
mpontillo | tdihp: do the sudo commands in my script get the block size? (just noticed a typo, the second one should be bsz: and not size64:) | 06:50 |
mpontillo | tdihp: ah nice, that tool looks interesting | 06:50 |
mpontillo | tdihp: and I didn't use a MitM method, I just used "sudo tcpdump" on the rack controller, btw | 06:51 |
mpontillo | tdihp: but I had to wade through a huge packet capture ;-) | 06:51 |
tdihp | mpontillo: So the point was, between the "starting 00-maas-07-block-devices" request and the report, there was less than 80 secs | 06:53 |
tdihp | mpontillo: which matches the lag of 4 sudo calls (2 devices) | 06:55 |
mpontillo | tdihp: well, the important thing to me is whether or not you're getting the data. if sudo is slow, that's not good. but I guess it's aborting before it's able to get the data. which is weird... on my system, I still get the "unable to resolve" error, but it works immediately | 06:56 |
mpontillo | tdihp: I suspect that is because I am getting a NXDOMAIN response back from the DNS server, and it's able to fail fast, whereas in your case, for some reason, the packets don't make it, and it times out | 06:57 |
tdihp | mpontillo: the 00-maas-07-block-devices.out does include size for sda like: "SIZE": "500107862016", | 06:58 |
mpontillo | tdihp: well that's interesting. so I assume you have a 500 GB disk then | 06:59 |
mpontillo | tdihp: if that's the case then I'm surprised it isn't being reported correctly. | 07:00 |
tdihp | mpontillo: Yeah, MAAS still reports "No storage information" | 07:00 |
mpontillo | tdihp: is there anything interesting under /var/log/maas/rsyslog/<hostname>/* on the MAAS server? | 07:06 |
mpontillo | tdihp: another thing you could do is look at: curl $(cat /proc/cmdline | tr ' ' '\n' | grep cloud-config-url | cut -d= -f2-) | 07:08 |
mpontillo | tdihp: you should see under datasource: and reporting: the URLs used to communicate back to the MAAS server. though I expect they're okay given that you seem to have commissioning output. | 07:09 |
mpontillo | tdihp: at this point, I would also check /var/log/maas/regiond.log, and check for any Python tracebacks. maybe that could provide a clue why the size couldn't be calculated | 07:09 |
tdihp | maasserver: [error] Error while calling DescribePowerTypes: Unable to get RPC connection for rack controller 'nv750' (k4am4k). | 07:12 |
mpontillo | tdihp: are the region and the rack on the same machine? is that consistent or a one-off? | 07:12 |
mpontillo | tdihp: that could happen if, for example, MAAS is restarting | 07:13 |
mpontillo | tdihp: what you might do is "tail -f" the log and try commissioning again, then see if any tracebacks occur during commissioning | 07:13 |
tdihp | mpontillo: It's the same machine | 07:14 |
mpontillo | ok. probably a red herring then, unless you have firewall rules I don't know about ;-) | 07:14 |
tdihp | mpontillo: and its not one-off | 07:15 |
mpontillo | tdihp: hm, well that is weird then. if you go to Nodes > Controllers and click your controller, is the status all green? | 07:16 |
tdihp | maasserver.rack_controller: [critical] Failed configuring DHCP on rack controller 'id:1'. | 07:17 |
tdihp | mpontillo: oops | 07:17 |
tdihp | mpontillo: It seems the rack controller is offline on MAAS | 07:18 |
mpontillo | tdihp: well that's an interesting clue, however it seems DHCP is working, or you wouldn't be able to PXE boot? | 07:18 |
mpontillo | tdihp: ok, I would "tail -f rackd.log" and then "service maas-rackd restart" | 07:18 |
tdihp | mpontillo: Yes DHCP/TFTP seems working | 07:18 |
tdihp | mpontillo: Oh, My IP have changed | 07:19 |
tdihp | mpontillo: the rackd is fixed, I'll try commision again | 07:21 |
mpontillo | tdihp: great. hoping it works this time =) | 07:21 |
tdihp | mpontillo: I guess I still have to try harder | 07:28 |
tdihp | mpontillo: is it possible that the MAAS server failed to update the node status when commissioning? | 07:31 |
mpontillo | tdihp: you can look at the node event log and find out | 07:32 |
tdihp | mpontillo: there is a burst of events like this: pure-mammal ureadahead[463]: ureadahead:events/fs/open_exec/enable: Ignored relative path | 07:37 |
mpontillo | tdihp: that might be a red herring https://bugs.launchpad.net/maas/+bug/1643838 | 07:47 |
mpontillo | tdihp: any chance you can pastebin your regiond.log and rackd.log? | 07:47 |
mpontillo | tdihp: maas.log too for completeness | 07:48 |
tdihp | mpontillo: Sure I can. | 07:48 |
tdihp | mpontillo: https://1drv.ms/u/s!Ap2xOCxdN0NAjCHA48CAfrmpySNc | 07:53 |
tdihp | mpontillo: Can you download the file? | 07:54 |
tdihp | Oh, you mean there's a real pastebin tool | 07:54 |
tdihp | pls wait, I'll paste there | 07:55 |
mpontillo | tdihp: no that's fine, I grabbed it and was looking through | 08:01 |
mpontillo | tdihp: haven't found anything interesting though | 08:02 |
mup | Bug #1656717 opened: Juju -> MAAS [2.2+] API integration needs to account for null spaces <juju:Triaged> <MAAS:In Progress by mpontillo> <https://launchpad.net/bugs/1656717> | 08:03 |
mup | Bug #1656717 opened: Juju -> MAAS [2.2+] API integration needs to account for null spaces <juju:Triaged> <MAAS:In Progress by mpontillo> <https://launchpad.net/bugs/1656717> | 08:04 |
mup | Bug #1643001 changed: Moonshot iLO4 'Power HW address' prevent ipmitool from working <MAAS:Expired> <https://launchpad.net/bugs/1643001> | 08:05 |
tdihp | mpontillo: Haven't mentioned but the commissioning did work for only one time, and I could never reproduce =) | 08:08 |
=== frankban|afk is now known as frankban | ||
mpontillo | tdihp: when was the last time you reopened the browser tab with your MAAS UI? can you try force-reloading it? I may have just spotted the bug. it's possible that if the amount of storage changes, the UI won't refresh properly. it may be that your node commissioned just fine. | 08:15 |
mpontillo | tdihp: the fact that the node is in "Ready" state means that MAAS thinks the node is usable. it would be in "Failed Commissioning" state if MAAS thought otherwise. | 08:15 |
mpontillo | tdihp: if that doesn't work, I would try deleting all the storage devices and recommissioning the node (though it SHOULD work without doing that; MAAS tries to update existing devices though, and I'm wondering if that step somehow goes horribly wrong) | 08:17 |
tdihp | mpontillo: Yes! Strange enough after having a new browser access, the storage size has shown! | 08:19 |
* mpontillo groans | 08:19 | |
tdihp | I'll try delete the node and do another check | 08:20 |
mpontillo | tdihp: glad we're finally through that =) | 08:20 |
* mpontillo should sleep soon; it's after midnight here | 08:20 | |
tdihp | mpontillo: Thanks really. Good night, cheers | 08:20 |
mpontillo | tdihp: happy to help; have a good day. | 08:21 |
jlec_ | Hi there | 08:51 |
jlec_ | What is the current way of creating custom CentOS images with MAAS-2.x? | 08:51 |
brendand | jlec_, CentOS is available by default in MAAS now | 09:31 |
brendand | no custom images needed | 09:32 |
jlec_ | brendand: yes I know. But the official images provided by canonical are older versions. I need 7.2/3 | 10:09 |
jlec_ | secondly I like to change the base image deployed. | 10:10 |
jlec_ | And lastely only 6.6 picks up an IP, but the 7.0 fails to do so | 10:10 |
mup | Bug #1656717 opened: Juju -> MAAS [2.2+] API integration needs to account for null spaces <juju:Triaged> <MAAS:In Progress by mpontillo> <https://launchpad.net/bugs/1656717> | 10:37 |
mup | Bug #1643001 changed: Moonshot iLO4 'Power HW address' prevent ipmitool from working <MAAS:Expired> <https://launchpad.net/bugs/1643001> | 10:38 |
Yoofy | Hi | 10:55 |
Yoofy | I see the maas-builder-image are deprecated, do you know how made custom image ? | 10:55 |
MrLeau | Does anybody have problem commissioning nodes using 16.04 with an offline MAAS install? | 12:47 |
MrLeau | I'm getting Hash sum mismatch, I think, it's because its renaming the files in /var/lib/apt/lists | 12:49 |
=== dannf` is now known as dannf | ||
mpontillo | MrLeau: I have seen that before when my mirror was out of date. For me it meant that my images were more up-to-date than my mirror. | 16:44 |
mpontillo | MrLeau: MAAS installs packages from the archive during commissioning, for things like checking lldp connectivity. | 16:44 |
MrLeau | Yes, it is very likely that my mirror is out of date, but I found if I delete everything from /var/lib/apt/list, apt will update it, it works | 16:48 |
MrLeau | If there was a way to clean /var/lib/apt/list I think it would fix my problem but the commission script runs after the standard apt-get update | 16:49 |
=== frankban is now known as frankban|afk | ||
jlec | hi again. now that the maas-image-builder is deprecated, what is the state of the art way to build custom images? | 20:17 |
pmatulis | jlec, i don't think there is a way afaik | 20:40 |
jlec | pmatulis: but how does canonical do it to preovide the image to the community? | 20:41 |
jlec | There is something around curtin, but no real docs anywhere | 20:42 |
=== frankban|afk is now known as frankban | ||
plars | I have a small maas system with http://maas.ubuntu.com/images/ephemeral-v2/daily/ as the boot image sync url, and a while back when I first set it up, I was able to use it to boot with either xenial or zesty. | 21:00 |
plars | but now if I try to boot start node with distro_series=zesty, it tells me: | 21:00 |
plars | {"distro_series": ["'zesty' is not a valid distro_series. It should be one of: '', 'ubuntu/trusty', 'ubuntu/wily', 'ubuntu/xenial'."]} | 21:00 |
plars | but distro_series=xenial works just fine | 21:01 |
plars | brendand: iirc you had helped me with this a while back when it first worked, any ideas? | 21:01 |
pmatulis | plars, what version of maas are you using? | 21:32 |
plars | pmatulis: 1.9.4+bzr459 | 21:33 |
pmatulis | plars, oh. you haven't up'd to the 2.x series. i only really started using maas with 2.0 and 2.1 | 21:34 |
plars | pmatulis: for various reasons, I'm reluctant to update from that. We are trying to use an existing dns/dhcp server in this lab rather than letting maas handle that function. This worked for a long time, then broke in an update a while back. Folks here were able to help me convince it to (mostly) quit trying to force us down that path but warned me that | 21:35 |
plars | future updates might break it completely | 21:35 |
plars | and this did previously import the zesty images and let us deploy with it at one time, so I'm not sure when it stopped | 21:36 |
plars | or why | 21:36 |
pmatulis | plars, maybe go directly to the URL and see if zesty is there | 21:37 |
plars | pmatulis: I see it on that url | 21:37 |
pmatulis | ohh | 21:37 |
plars | there was an update a few days ago it seems | 21:37 |
plars | to that image | 21:37 |
pmatulis | plars, i suggest filing a bug then | 21:38 |
=== frankban is now known as frankban|afk |
Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!