在 YARN 上使用 FPGA

先决条件

YARN 支持 FPGA 资源，但目前仅随附“IntelFpgaOpenclPlugin”
YARN 节点管理器必须预先安装供应商驱动程序，并配置所需的变量
目前不支持 Docker

配置

FPGA 调度

在 resource-types.xml 中

添加以下属性

<configuration>
  <property>
     <name>yarn.resource-types</name>
     <value>yarn.io/fpga</value>
  </property>
</configuration>

对于 容量调度器，必须配置 DominantResourceCalculator 才能启用 FPGA 调度/隔离。使用以下属性配置 DominantResourceCalculator（在 capacity-scheduler.xml 中）

属性	默认值
yarn.scheduler.capacity.resource-calculator	org.apache.hadoop.yarn.util.resource.DominantResourceCalculator

FPGA 隔离

在 `yarn-site.xml` 中

  <property>
    <name>yarn.nodemanager.resource-plugins</name>
    <value>yarn.io/fpga</value>
  </property>

这是在 NodeManager 端启用 FPGA 隔离模块。

默认情况下，当设置上述配置时，YARN 将自动检测和配置 FPGA。仅当管理员有特殊要求时，才需要在 yarn-site.xml 中设置以下配置。

1) 允许的 FPGA 设备

属性	默认值
yarn.nodemanager.resource-plugins.fpga.allowed-fpga-devices	auto

指定 YARN NodeManager 可以管理的 FPGA 设备，以逗号分隔 FPGA 设备数量将报告给 RM 以做出调度决策。设置为 auto（默认）让 YARN 自动从系统中发现 FPGA 资源。

如果管理员只想让 YARN 管理 FPGA 设备的子集，则手动指定 FPGA 设备。目前，由于我们只能在 c-e.cfg 中配置一个主设备号，因此 FPGA 设备由其次要设备号标识。对于英特尔设备，获取 FPGA 次要设备号的常用方法是使用“aocl diagnose”并使用设备名称检查 uevent。

2) 用于发现 FPGA 的可执行文件

属性	默认值
yarn.nodemanager.resource-plugins.fpga.path-to-discovery-executables

当指定 yarn.nodemanager.resource.fpga.allowed-fpga-devices=auto 时，YARN NodeManager 需要运行 FPGA 发现二进制文件（现在仅支持 IntelFpgaOpenclPlugin）以获取 FPGA 信息。当值为空（默认）时，YARN NodeManager 将尝试从供应商插件的首选项中找到发现可执行文件。例如，“IntelFpgaOpenclPlugin”将尝试在从环境“ALTERAOCLSDKROOT”获取的目录中找到“aocl”

3) 要使用的 FPGA 插件

属性	默认值
yarn.nodemanager.resource-plugins.fpga.vendor-plugin.class	org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.fpga.IntelFpgaOpenclPlugin

目前，仅支持英特尔 FPGA 的 OpenCL SDK。在 FPGA 上运行的 IP 程序（.aocx 文件）应使用基于英特尔平台的 OpenCL 编写。

4) CGroups 挂载 FPGA 隔离使用 CGroup 设备控制器执行每个 FPGA 设备隔离。应将以下配置添加到 yarn-site.xml 以自动挂载 CGroup 子设备，否则管理员必须手动创建设备子文件夹才能使用此功能。

属性	默认值
yarn.nodemanager.linux-container-executor.cgroups.mount	true

有关 YARN CGroups 配置的更多详细信息，请参阅使用 CGroups 与 YARN

在 `container-executor.cfg` 中

通常，需要将以下配置添加到 container-executor.cfg。fpga.major-device-number 和 allowed-device-minor-numbers 是可选的允许设备。

[fpga]
module.enabled=true
fpga.major-device-number=## Major device number of FPGA, by default is 246. Strongly recommend setting this
fpga.allowed-device-minor-numbers=## Comma separated allowed minor device numbers, empty means all FPGA devices managed by YARN.

当用户需要在非 Docker 环境下运行 FPGA 应用程序时

[cgroups]
# Root of system cgroups (Cannot be empty or "/")
root=/cgroup
# Parent folder of YARN's CGroups
yarn-hierarchy=yarn

使用它

分布式 shell + FPGA

分布式 shell 目前支持指定除内存和 vcore 之外的其他资源类型

在不使用 docker 容器的情况下运行分布式 shell（.bashrc 包含一些与 SDK 相关的环境变量）

yarn jar <path/to/hadoop-yarn-applications-distributedshell.jar> \
  -jar <path/to/hadoop-yarn-applications-distributedshell.jar> \
  -shell_command "source /home/yarn/.bashrc && aocl diagnose" \
  -container_resources memory-mb=2048,vcores=2,yarn.io/fpga=1 \
  -num_containers 1

您应该能够看到如下输出

aocl diagnose: Running diagnose from /home/fpga/intelFPGA_pro/17.0/hld/board/nalla_pcie/linux64/libexec

------------------------- acl0 -------------------------
Vendor: Nallatech ltd

Phys Dev Name  Status   Information

aclnalla_pcie0Passed   nalla_pcie (aclnalla_pcie0)
                       PCIe dev_id = 2494, bus:slot.func = 02:00.00, Gen3 x8
                       FPGA temperature = 54.4 degrees C.
                       Total Card Power Usage = 32.4 Watts.
                       Device Power Usage = 0.0 Watts.

DIAGNOSTIC_PASSED
---------------------------------------------------------

指定 YARN 在启动容器之前应配置的 IP

对于 FPGA 资源，容器可以具有环境变量“REQUESTED_FPGA_IP_ID”，以便在启动之前让 YARN 为其下载并刷新 IP。例如，REQUESTED_FPGA_IP_ID=“matrix_mul”将导致在容器的本地目录中搜索名称包含“matirx_mul”（应用程序应首先分发它）的 IP 文件（“.aocx”文件）。我们目前仅支持为所有设备刷新一个 IP。如果用户未设置此环境变量，我们假设用户的应用程序可以自行查找 IP 文件。请注意，在 YARN 中预先下载和重新编程 IP 并非必需，因为 OpenCL 应用程序可能会找到 IP 文件并即时重新编程设备。但 YARN 为容器执行此操作将实现最快的重新编程路径。

常规

通用

HDFS

MapReduce

MapReduce REST API

YARN

YARN REST API

YARN 服务

Hadoop 兼容文件系统

认证

工具

参考

配置

在 YARN 上使用 FPGA

先决条件

配置

FPGA 调度

FPGA 隔离

在 `yarn-site.xml` 中

在 `container-executor.cfg` 中

使用它

分布式 shell + FPGA

常规

通用

HDFS

MapReduce

MapReduce REST API

YARN

YARN REST API

YARN 服务

Hadoop 兼容文件系统

认证

工具

参考

配置

在 YARN 上使用 FPGA

先决条件

配置

FPGA 调度

FPGA 隔离

在 yarn-site.xml 中

在 container-executor.cfg 中

使用它

分布式 shell + FPGA

在 `yarn-site.xml` 中

在 `container-executor.cfg` 中