故障描述

在客户现场我们使用7.0u3c的新功能,vSAN 集群关闭向导,对vSAN群集进行停机维护。该集群由四个 Dell R940xa 节点组成,vCenter 位于非 vSAN 节点上。关机是通过各项预检查,在拔掉电源之前vSAN主机已正确关闭。在重新启动vSAN集群后,所有 vSAN 虚拟机都被列为不可访问,并且如果在数据存储中浏览(通过 GUI 或命令行)是不可见的,但是vSAN的容量是正常的。

重启集群的按钮不存在,所以我们工程师按照kb通过命令行手动重启集群。然而,恢复脚本超时:

[root@esxi-ip21:/tmp] python /usr/lib/vmware/vsan/bin/reboot_helper.py recover
Begin to recover the cluster...
Time among connected hosts are synchronized.
Scheduled vSAN cluster restore task.
Waiting for the scheduled task...(18s left)
Checking network status...
Recovery is not ready, retry after 10s...
Recovery is not ready, retry after 10s...
Recovery is not ready, retry after 10s...
Timeout, please try again later

在其他vSAN节点上挨个尝试,仍然是一样超时,但是集群看起来已经正确重组:

[root@esxi-ip24:~] esxcli vsan health cluster list -w
Health Test Name                                                       Status
---------------------------------------------------------------------  ------
Overall health                                                         red (vSAN Object health)
Data                                                                   red
  vSAN object health (objecthealth)                                    red
  vSAN object format health (objectformat)                             green
Performance service                                                    red
  Stats DB object (statsdb)                                            green
  Stats primary election (masterexist)                                 red
Network                                                                green
  Hosts with connectivity issues (hostconnectivity)                    green
  vSAN cluster partition (clusterpartition)                            green
  All hosts have a vSAN vmknic configured (vsanvmknic)                 green
  vSAN: Basic (unicast) connectivity check (smallping)                 green
  vSAN: MTU check (ping with large packet size) (largeping)            green
  vMotion: Basic (unicast) connectivity check (vmotionpingsmall)       green
  vMotion: MTU check (ping with large packet size) (vmotionpinglarge)  green
  Network latency check (hostlatencycheck)                             green
Physical disk                                                          green
  Operation health (physdiskoverall)                                   green
  Disk capacity (physdiskcapacity)                                     green
  Congestion (physdiskcongestion)                                      green
  Component limit health (physdiskcomplimithealth)                     green
  Component metadata health (componentmetadata)                        green
  Memory pools (heaps) (lsomheap)                                      green
  Memory pools (slabs) (lsomslab)                                      green
Cluster                                                                green
  Advanced vSAN configuration in sync (advcfgsync)                     green
  vSAN daemon liveness (clomdliveness)                                 green
  vSAN Disk Balance (diskbalance)                                      green
  Resync operations throttling (resynclimit)                           green
  Software version compatibility (upgradesoftware)                     green
  Disk format version (upgradelowerhosts)                              green
Capacity utilization                                                   green
  Storage space (diskspace)                                            green
  Read cache reservations (rcreservation)                              green
  Component (nodecomponentlimit)                                       green
  What if the most consumed host fails (limit1hf)                      green

[root@esxi-ip21:~] esxcli vsan cluster get
Cluster Information
Enabled: true
Current Local Time: 2022-04-05T06:59:46Z
Local Node UUID: 61706d49-8294-acd0-d16d-0c42a188a480
Local Node Type: NORMAL
Local Node State: AGENT
Local Node Health State: HEALTHY
Sub-Cluster Master UUID: 61630ac8-a9d3-010e-0f3b-0c42a1891c44
Sub-Cluster Backup UUID: 61630325-2430-662c-2398-0c42a188cf94
Sub-Cluster UUID: 521dc69b-43c2-c545-9394-ed3e2a26d54a
Sub-Cluster Membership Entry Revision: 5
Sub-Cluster Member Count: 4
Sub-Cluster Member UUIDs: 61630ac8-a9d3-010e-0f3b-0c42a1891c44, 61630325-2430-662c-2398-0c42a188cf94, 61630d69-99d0-a086-19d6-0c42a188d324, 61706d49-8294-acd0-d16d-0c42a188a480
Sub-Cluster Member HostNames: esxi-ip23, esxi-ip24, esxi-ip22, esxi-ip21
Sub-Cluster Membership UUID: 80c34b62-e2af-95a3-3cbc-0c42a18906f4
Unicast Mode Enabled: true
Maintenance Mode State: OFF
Config Generation: be662064-2725-4b82-82aa-314d0f5628c8 14 2022-04-05T06:19:10.244
Mode: REGULAR

[root@esxi-ip22:~] localcli vsan cluster get
Cluster Information:
   Enabled: true
   Current Local Time: 2022-04-05T11:53:07Z
   Local Node UUID: 61630d69-99d0-a086-19d6-0c42a188d324
   Local Node Type: NORMAL
   Local Node State: AGENT
   Local Node Health State: HEALTHY
   Sub-Cluster Master UUID: 61630ac8-a9d3-010e-0f3b-0c42a1891c44
   Sub-Cluster Backup UUID: 61706d49-8294-acd0-d16d-0c42a188a480
   Sub-Cluster UUID: 521dc69b-43c2-c545-9394-ed3e2a26d54a
   Sub-Cluster Membership Entry Revision: 3
   Sub-Cluster Member Count: 4
   Sub-Cluster Member UUIDs: 61630ac8-a9d3-010e-0f3b-0c42a1891c44, 61706d49-8294-acd0-d16d-0c42a188a480, 61630325-2430-662c-2398-0c42a188cf94, 61630d69-99d0-a086-19d6-0c42a188d324
   Sub-Cluster Member HostNames: esxi-ip23, esxi-ip21, esxi-ip24, esxi-ip22
   Sub-Cluster Membership UUID: 3eed4b62-8a16-340c-ee17-0c42a18906f4
   Unicast Mode Enabled: true
   Maintenance Mode State: OFF
   Config Generation: be662064-2725-4b82-82aa-314d0f5628c8 14 2022-04-05T06:19:10.237
   Mode: REGULAR

[root@esxi-ip23:~] localcli vsan cluster get
Cluster Information:
   Enabled: true
   Current Local Time: 2022-04-05T11:53:15Z
   Local Node UUID: 61630ac8-a9d3-010e-0f3b-0c42a1891c44
   Local Node Type: NORMAL
   Local Node State: MASTER
   Local Node Health State: HEALTHY
   Sub-Cluster Master UUID: 61630ac8-a9d3-010e-0f3b-0c42a1891c44
   Sub-Cluster Backup UUID: 61706d49-8294-acd0-d16d-0c42a188a480
   Sub-Cluster UUID: 521dc69b-43c2-c545-9394-ed3e2a26d54a
   Sub-Cluster Membership Entry Revision: 3
   Sub-Cluster Member Count: 4
   Sub-Cluster Member UUIDs: 61630ac8-a9d3-010e-0f3b-0c42a1891c44, 61706d49-8294-acd0-d16d-0c42a188a480, 61630325-2430-662c-2398-0c42a188cf94, 61630d69-99d0-a086-19d6-0c42a188d324
   Sub-Cluster Member HostNames: esxi-ip23, esxi-ip21, esxi-ip24, esxi-ip22
   Sub-Cluster Membership UUID: 3eed4b62-8a16-340c-ee17-0c42a18906f4
   Unicast Mode Enabled: true
   Maintenance Mode State: OFF
   Config Generation: be662064-2725-4b82-82aa-314d0f5628c8 14 2022-04-05T06:19:10.240
   Mode: REGULAR
[root@esxi-ip23:~] esxcli vsan network list

[root@esxi-ip24:~] localcli vsan cluster get
Cluster Information:
   Enabled: true
   Current Local Time: 2022-04-05T11:53:22Z
   Local Node UUID: 61630325-2430-662c-2398-0c42a188cf94
   Local Node Type: NORMAL
   Local Node State: AGENT
   Local Node Health State: HEALTHY
   Sub-Cluster Master UUID: 61630ac8-a9d3-010e-0f3b-0c42a1891c44
   Sub-Cluster Backup UUID: 61706d49-8294-acd0-d16d-0c42a188a480
   Sub-Cluster UUID: 521dc69b-43c2-c545-9394-ed3e2a26d54a
   Sub-Cluster Membership Entry Revision: 3
   Sub-Cluster Member Count: 4
   Sub-Cluster Member UUIDs: 61630ac8-a9d3-010e-0f3b-0c42a1891c44, 61706d49-8294-acd0-d16d-0c42a188a480, 61630325-2430-662c-2398-0c42a188cf94, 61630d69-99d0-a086-19d6-0c42a188d324
   Sub-Cluster Member HostNames: esxi-ip23, esxi-ip21, esxi-ip24, esxi-ip22
   Sub-Cluster Membership UUID: 3eed4b62-8a16-340c-ee17-0c42a18906f4
   Unicast Mode Enabled: true
   Maintenance Mode State: OFF
   Config Generation: be662064-2725-4b82-82aa-314d0f5628c8 14 2022-04-05T06:19:10.238
   Mode: REGULAR

对vSAN存储进行检查,仍未发现错误:

[root@esxi-ip24:~] localcli vsan storage list | grep CMMDS
In CMMDS: true
In CMMDS: true
In CMMDS: true
In CMMDS: true
In CMMDS: true
In CMMDS: true
In CMMDS: true
In CMMDS: true
In CMMDS: true
In CMMDS: true
In CMMDS: true
In CMMDS: true

继续对每个节点网络进行检查,仍未发现问题:

[root@esxi-ip24:~] esxcli vsan network list
Interface
VmkNic Name: vmk1
IP Protocol: IP
Interface UUID: 52b76165-9ffc-ef69-93ca-b0a31b7caf98
Agent Group Multicast Address: 224.2.3.4
Agent Group IPv6 Multicast Address: ff19::2:3:4
Agent Group Multicast Port: 23451
Master Group Multicast Address: 224.1.2.3
Master Group IPv6 Multicast Address: ff19::1:2:3
Master Group Multicast Port: 12345
Host Unicast Channel Bound Port: 12321
Data-in-Transit Encryption Key Exchange Port: 0
Multicast TTL: 5
Traffic Type: vsan

[root@esxi-ip24:~] vmkping -I vmk1 192.168.90.22
PING 192.168.90.22 (192.168.90.22): 56 data bytes
64 bytes from 192.168.90.22: icmp_seq=0 ttl=64 time=0.215 ms
64 bytes from 192.168.90.22: icmp_seq=1 ttl=64 time=0.108 ms
64 bytes from 192.168.90.22: icmp_seq=2 ttl=64 time=0.094 ms

解决方案

https://kb.vmware.com/s/article/87350

https://communities.vmware.com/t5/VMware-vSAN-Discussions/vCenter-7-0u3-shutdown-vSAN-cluster-results-in-broken-cluster/m-p/2885089

总结

使用超融合一定要买服务,使用超融合一定要买服务,使用超融合一定要买服务!

相关新闻

联系我们

联系我们

400-0512-768

邮件:support@sworditsys.com

工作时间:周一至周五 8:00 - 21:00

分享本页
返回顶部