Hadoop 组件具有机架感知能力。例如,HDFS 块放置将利用机架感知能力实现容错,方法是在不同的机架上放置一个块副本。这可在群集内发生网络交换机故障或分区时提供数据可用性。
Hadoop 主守护程序通过调用外部脚本或 Java 类(如配置文件中指定的)获取群集工作节点的机架 ID。对于拓扑使用 Java 类或外部脚本,输出必须遵守 Java org.apache.hadoop.net.DNSToSwitchMapping 接口。该接口要求维护一对一对应关系,并且拓扑信息采用“/myrack/myhost”格式,其中“/”是拓扑分隔符,“myrack”是机架标识符,“myhost”是单个主机。假设每个机架有一个 /24 子网,则可以使用“/192.168.100.0/192.168.100.5”格式作为唯一的机架-主机拓扑映射。
要使用用于拓扑映射的 Java 类,类名由配置文件中的 net.topology.node.switch.mapping.impl 参数指定。NetworkTopology.java 示例包含在 Hadoop 发行版中,Hadoop 管理员可以对其进行自定义。使用 Java 类而不是外部脚本具有性能优势,因为当新工作器节点注册自身时,Hadoop 无需派生外部进程。
如果实现外部脚本,它将使用配置文件中的 net.topology.script.file.name 参数指定。与 Java 类不同,外部拓扑脚本不包含在 Hadoop 发行版中,由管理员提供。Hadoop 在派生拓扑脚本时会将多个 IP 地址发送到 ARGV。发送到拓扑脚本的 IP 地址数量由 net.topology.script.number.args 控制,默认为 100。如果 net.topology.script.number.args 更改为 1,则会为 DataNodes 和/或 NodeManagers 提交的每个 IP 派生一个拓扑脚本。
如果未设置 net.topology.script.file.name 或 net.topology.node.switch.mapping.impl,则会针对任何传递的 IP 地址返回机架 ID“/default-rack”。虽然此行为看似可取,但它可能会导致 HDFS 块复制出现问题,因为默认行为是将一个复制块写到机架外,而无法这样做,因为只有一个名为“/default-rack”的机架。
#!/usr/bin/python3 # this script makes assumptions about the physical environment. # 1) each rack is its own layer 3 network with a /24 subnet, which # could be typical where each rack has its own # switch with uplinks to a central core router. # # +-----------+ # |core router| # +-----------+ # / \ # +-----------+ +-----------+ # |rack switch| |rack switch| # +-----------+ +-----------+ # | data node | | data node | # +-----------+ +-----------+ # | data node | | data node | # +-----------+ +-----------+ # # 2) topology script gets list of IP's as input, calculates network address, and prints '/network_address/ip'. import netaddr import sys sys.argv.pop(0) # discard name of topology script from argv list as we just want IP addresses netmask = '255.255.255.0' # set netmask to what's being used in your environment. The example uses a /24 for ip in sys.argv: # loop over list of datanode IP's address = '{0}/{1}'.format(ip, netmask) # format address string so it looks like 'ip/netmask' to make netaddr work try: network_address = netaddr.IPNetwork(address).network # calculate and print network address print("/{0}".format(network_address)) except: print("/rack-unknown") # print catch-all value if unable to calculate network address
#!/usr/bin/env bash # Here's a bash example to show just how simple these scripts can be # Assuming we have flat network with everything on a single switch, we can fake a rack topology. # This could occur in a lab environment where we have limited nodes,like 2-8 physical machines on a unmanaged switch. # This may also apply to multiple virtual machines running on the same physical hardware. # The number of machines isn't important, but that we are trying to fake a network topology when there isn't one. # # +----------+ +--------+ # |jobtracker| |datanode| # +----------+ +--------+ # \ / # +--------+ +--------+ +--------+ # |datanode|--| switch |--|datanode| # +--------+ +--------+ +--------+ # / \ # +--------+ +--------+ # |datanode| |namenode| # +--------+ +--------+ # # With this network topology, we are treating each host as a rack. This is being done by taking the last octet # in the datanode's IP and prepending it with the word '/rack-'. The advantage for doing this is so HDFS # can create its 'off-rack' block copy. # 1) 'echo $@' will echo all ARGV values to xargs. # 2) 'xargs' will enforce that we print a single argv value per line # 3) 'awk' will split fields on dots and append the last field to the string '/rack-'. If awk # fails to split on four dots, it will still print '/rack-' last field value echo $@ | xargs -n 1 | awk -F '.' '{print "/rack-"$NF}'