LINSTOR用户指南
请先看这个
本指南旨在为软件定义存储解决方案LINSTOR的用户提供最终参考指南和手册。
本指南始终假设您正在使用最新版本的LINSTOR和相关工具。 |
本指南的组织如下:
-
基本管理任务/设置 deals with LINSTOR’s basic functionality and gives you an insight in using common administrative tasks. Apart of that you can use this chapter as a step-by-step instruction guide to deploy LINSTOR in its basic setup.
-
在高阶LINSTOR任务中,提供了各种高级和重要的LINSTOR任务以及配置,以更复杂的方式配置LINSTOR。
-
章节Kubernetes的LINSTOR卷,Proxmox VE中的LINSTOR卷,OpenNebula中的LINSTOR卷,Openstack中的LINSTOR卷,Docker中的LINSTOR卷说明了如何使用API实现基于LINSTOR的存储,包括kubernetes、proxmox、openbula、openstack和docker。
LINSTOR
1. 基本管理任务/设置
LINSTOR是一个用于Linux系统上存储的配置管理系统。它管理节点集群上的LVM逻辑卷和/或ZFS ZVOLs。它利用DRBD在不同节点之间进行复制,并为用户和应用程序提供块存储设备。它通过bcache管理SSD中HDD备份数据的快照、加密和缓存。
1.1. 概念和术语
本节将介绍一些核心概念和术语,您需要熟悉这些概念和术语才能理解LINSTOR是如何工作和部署存储的。这一部分是以 “ground up” 的方式布置的。
1.1.1. 可安装组件
linstor-controller
LINSTOR设置至少需要一个活动的controller和一个或多个satellites。
The linstor-controller relies on a database that holds all configuration information for the whole cluster. It makes all decisions that need to have a view of the whole cluster. Multiple controllers can be used for LINSTOR but only one can be active.
linstor-satellite
linstor-satellite 运行在每个LINSTOR使用本地存储或为服务提供存储的节点上。它是无状态的;它从控制器接收所需的所有信息。它运行
lvcreate
和 drbdadm
等程序。它行为方式就像一个节点代理。
linstor-client
linstor-client 是一个命令行实用程序,用于向系统发送命令并检查系统的状态。
1.1.2. 对象
对象是LINSTOR呈现给最终用户或应用程序的最终结果,例如Kubernetes/OpenShift、复制块设备(DRBD)、NVMeOF目标等。
节点
节点是参与LINSTOR集群的服务器或容器。 Node 属性定义如下:
-
确定节点参与哪个LINSTOR集群
-
设置节点的角色:Controller, Satellite, Auxiliary
-
网络接口 对象定义了节点的网络连接
网络接口
顾名思义,这就是定义节点网络接口的接口/地址的方式。
Definitions
Definitions定义了一个对象的属性,它们可以被看作是概要文件或模板。创建的对象将继承definitions中定义的配置。必须在创建关联对象之前定义definitions。例如,在创建 Resource 之前,必须先创建 ResourceDefinition_
- StoragePoolDefinition
-
-
定义存储池的名称
-
- ResourceDefinition
-
资源 definitions定义资源的以下属性:
-
DRBD资源的名称
-
用于资源连接的DRBD的TCP端口
-
- VolumeDefinition
-
Volume definitions 定义如下:
-
DRBD资源的卷
-
卷的大小
-
DRBD资源卷的卷号
-
卷的元数据属性
-
用于与DRBD卷关联的DRBD设备的次要编号
-
StoragePool
StoragePool 用于标识LINSTOR上下文中的存储。它定义了:
-
特定节点上存储池的配置
-
用于群集节点上的存储池的存储后端驱动程序(LVM、ZFS等)
-
要传递给存储备份驱动程序的参数和配置
Resource
LINSTOR现在已经扩展了它的能力,在DRBD之外管理更广泛的存储技术。 Resource 包括:
-
表示在 ResourceDefinition 中定义的DRBD资源的位置
-
在群集中的节点上放置资源
-
定义节点上 ResourceDefinition 的位置
Volume
Volumes是 Resource 的子集。一个 Resource 可能有多个volumes,例如,您可能希望将数据库存储在比MySQL集群中的日志慢的存储上。通过将 volumes 保持在单个 resource 下,实际上就是在创建一致性组。 Volume 属性还可以定义更细粒度级别的属性。
1.2. 更广泛的背景
虽然LINSTOR可以用来使DRBD的管理更加方便,但它通常与更高层次的软件栈集成。这种集成已经存在于Kubernetes、OpenStack、OpenNebula和Proxmox中。本指南中包含了在这些环境中部署LINSTOR的特定章节。
LINSTOR使用的南向驱动程序是LVM、thinLVM和ZFS。
1.3. 包
LINSTOR以.rpm和.deb两种包分发:
-
linstor-client 包含命令行客户端程序。这取决于通常已经安装的python。在RHEL8系统中,需要创建python符号链接
-
linstor-controller 和 linstor-satellite 都包含服务的系统单元文件。它们依赖于Java runtime environment(JRE)1.8版(headless)或更高版本。
有关这些软件包的更多详细信息,请参见上面的Installable Components部分。
如果您订阅了LINBIT的支持,那么您将可以通过我们的官方仓库访问我们经过认证的二进制文件。 |
1.4. 安装
如果要在容器中使用LINSTOR,请跳过此主题并使用下面的 “容器” 部分进行安装。 |
1.4.1. Ubuntu Linux
如果您想选择使用DRBD创建复制存储,则需要安装 drbd-dkms 和 drbd-utils 。这些包需要安装在所有节点上。您还需要选择一个卷管理器,ZFS或LVM,在本例中我们使用的是LVM。
# apt install -y drbd-dkms drbd-utils lvm2
根据节点是LINSTOR controller、satellite还是两者(组合)将确定该节点上需要哪些包。对于组合型节点,我们需要controller和satellite 的LINSTOR包。
组合节点:
# apt install linstor-controller linstor-satellite linstor-client
这将使我们剩余的节点成为我们的Satellites,因此我们需要在它们上安装以下软件包:
# apt install linstor-satellite linstor-client
1.4.2. SUSE Linux企业服务器
SLES高可用性扩展(HAE)包括DRBD。
On SLES, DRBD is normally installed via the software installation component of YaST2. It comes bundled with the High Availability package selection.
当我们下载DRBD的最新模块时,我们可以检查LVM工具是否也是最新的。喜欢命令行安装的用户可用以下命令获得最新的DRBD和LVM版本:
# zypper install drbd lvm2
根据节点是LINSTOR controller、satellite还是两者(组合)将确定该节点上需要哪些包。对于组合型节点,我们需要controller和satellite 的LINSTOR包。
组合节点:
# zypper install linstor-controller linstor-satellite linstor-client
这将使我们剩余的节点成为我们的Satellites,因此我们需要在它们上安装以下软件包:
# zypper install linstor-satellite linstor-client
1.4.3. CentOS
CentOS从第5版开始就有了DRBD 8。对于DRBD
9,您需要查看EPEL和类似的源代码。或者,如果您与LINBIT有签订支持合同,您可以利用我们的RHEL 8存储库。可以使用 yum
安装DRBD。我们也可以检查最新版本的LVM工具。
如果要复制存储,LINSTOR 需要 DRBD 9。这需要配置外部存储库,可以是LINBIT的,也可以是第三方的。 |
# yum install drbd kmod-drbd lvm2
根据节点是LINSTOR controller、satellite还是两者(组合)将确定该节点上需要哪些包。对于组合型节点,我们需要controller和satellite 的LINSTOR包。
在RHEL8系统上,需要安装python2才能让linstor客户端工作。 |
组合节点:
# yum install linstor-controller linstor-satellite linstor-client
这将使我们剩余的节点成为我们的Satellites,因此我们需要在它们上安装以下软件包:
# yum install linstor-satellite linstor-client
1.5. 升级
LINSTOR不支持滚动升级,controller和satellites必须具有相同的版本,否则controller将丢弃具有 版本不匹配
的satellite。但这不是问题,因为satellite不会做任何动作,只要它没有连接到controller,DRBD就不会被任何方式中断。
如果您使用的是嵌入的H2数据库,并且升级了linstor-controller包,那么将在默认的 /var/lib/linstor
目录中创建数据库的自动备份文件。如果由于任何原因linstor-controller数据库迁移失败,则此文件是一个很好的还原点,建议将错误报告给Linbit并还原旧的数据库文件并降级到以前的controller版本。
如果使用任何外部数据库或etcd,建议手动备份当前数据库以获得还原点。
因此,首先升级controller主机上的 linstor-controller
和 linstor-client
包,然后重新启动
linstor-controler
, controller应启动,其所有客户端应显示
OFFLINE(VERSION_MISMATCH)
。之后,您可以继续升级所有卫星节点上的`linstor-satellite`,
并重新启动它们,在短时间重新连接后,它们都应再次显示 ONLINE
,并且您的升级已完成。
1.6. 容器
LINSTOR也可用容器运行。基础镜像可以在LINBIT的容器registry仓库 drbd.io
中找到。
要访问这些镜像,首先必须登录registry(访问sales@linbit.com获取凭据):
# docker login drbd.io
此repo中可用的容器包括:
-
drbd.io/drbd9-rhel8
-
drbd.io/drbd9-rhel7
-
drbd.io/drbd9-sles15sp1
-
drbd.io/drbd9-bionic
-
drbd.io/drbd9-focal
-
drbd.io/linstor-csi
-
drbd.io/linstor-controller
-
drbd.io/linstor-satellite
-
drbd.io/linstor-client
通过在浏览器中打开http://drbd.io,可以可用image的最新列表。确保通过 “http”访问主机,因为registry的image本身通过 “https”提供服务。
To load the kernel module, needed only for LINSTOR satellites, you’ll need
to run a drbd9-$dist
container in privileged mode. The kernel module
containers either retrieve an official LINBIT package from a customer
repository, use shipped packages, or they try to build the kernel modules
from source. If you intend to build from source, you need to have the
according kernel headers (e.g., kernel-devel
) installed on the host. There
are 4 ways to execute such a module load container:
-
Building from shipped source
-
Using a shipped/pre-built kernel module
-
指定LINBIT节点哈希和容器版本。
-
绑定装入现有仓库配置。
从已分发代码生成的示例(基于RHEL):
# docker run -it --rm --privileged -v /lib/modules:/lib/modules \ -v /usr/src:/usr/src:ro \ drbd.io/drbd9-rhel7
Example using a module shipped with the container, which is enabled by not
bind-mounting /usr/src
:
# docker run -it --rm --privileged -v /lib/modules:/lib/modules \ drbd.io/drbd9-rhel8
Example using a hash and a distribution (rarely used):
# docker run -it --rm --privileged -v /lib/modules:/lib/modules \ -e LB_DIST=rhel7.7 -e LB_HASH=ThisIsMyNodeHash \ drbd.io/drbd9-rhel7
Example using an existing repo config (rarely used):
# docker run -it --rm --privileged -v /lib/modules:/lib/modules \ -v /etc/yum.repos.d/linbit.repo:/etc/yum.repos.d/linbit.repo:ro \ drbd.io/drbd9-rhel7
In both cases (hash + distribution, as well as bind-mounting a repo) the hash or config has to be from a node that has a special property set. Feel free to contact our support, and we set this property. |
到现在为止(即,DRBD 9之前的版本
\9.0.17″),您必须使用容器化DRBD内核模块,而不是将内核模块加载到主机系统上。如果要使用容器,则不应在主机系统上安装DRBD内核模块。对于DRBD版本9.0.17或更高版本,可以像往常一样在主机系统上安装内核模块,但需要确保使用
usermode_helper=disabled`参数(例如, `modprobe drbd usermode_helper=disabled
)加载模块。
|
然后以守护进程的身份运行LINSTOR satellite容器, 也需具有特权:
# docker run -d --name=linstor-satellite --net=host -v /dev:/dev --privileged drbd.io/linstor-satellite
net=host 是容器化的 drbd-utils 通过netlink与主机内核通信所必需的。
|
要将LINSTOR控制器容器作为守护进程运行,请将主机上的端口 3370
、 3376
和 3377
映射到该容器:
# docker run -d --name=linstor-controller -p 3370:3370 -p 3376:3376 -p 3377:3377 drbd.io/linstor-controller
要与容器化LINSTOR集群交互,可以使用通过包安装在系统上的LINSTOR客户端,也可以通过容器化LINSTOR客户端。要使用LINSTOR客户端容器,请执行以下操作:
# docker run -it --rm -e LS_CONTROLLERS=<controller-host-IP-address> drbd.io/linstor-client node list
从这里开始,您可以使用LINSTOR客户端初始化集群,并开始使用典型的LINSTOR模式创建资源。
要停止并删除守护的容器和映像,请执行以下操作:
# docker stop linstor-controller # docker rm linstor-controller
1.7. 初始化集群
我们假设在 所有 集群节点上已完成以下步骤:
-
DRBD9内核模块已安装并加载
-
已安装
drbd-utils
-
LVM
工具已安装 -
linstor-controller
和/或linstor-satellite
的依赖项已安装 -
linstor-client
已安装在linstor-controller
节点上
在已安装linstor-controller的主机上启动并启用该服务:
# systemctl enable --now linstor-controller
如果您确定linstor控制器服务在安装时自动启用,则还可以使用以下命令:
# systemctl start linstor-controller
1.8. 使用LINSTOR客户端
无论何时运行LINSTOR命令行客户端,它都需要知道listor-controller的运行位置。如果不指定它,它将尝试访问本地运行的linstor-controller,该控制器侦听IP
127.0.0.1
端口是 3376
。这样,我们可以在与 linstor-controller
相同的主机上使用
linstor-client
。
linstor-satellite 需要端口3366和3367。 linstor-controller
需要端口3376和3377。确保防火墙上允许这些端口。
|
# linstor node list
应该输出一个空列表,而不是一条错误信息。
您可以在任何其他机器上使用 linstor
命令,但随后您需要告诉客户端如何找到linstor-controller。如下所示,可以将其指定为命令行选项、环境变量或全局文件:
# linstor --controllers=alice node list # LS_CONTROLLERS=alice linstor node list
或者,您可以创建 /etc/linstor/linstor-client.conf
文件,并按如下方式填充它。
[global] controllers=alice
如果配置了多个linstor控制器,只需在逗号分隔的列表中指定它们即可。linstor客户机会按照列出的顺序进行尝试。
linstor-client命令也可以通过只写参数的起始字母来快速方便地使用,例如: linstor node list → linstor n
l
|
1.9. 向集群添加节点
The next step is to add nodes to your LINSTOR cluster.
# linstor node create bravo 10.43.70.3
If the IP is omitted, the client will try to resolve the given node-name as host-name by itself.
Linstor will automatically detect the node’s local uname -n
which is later
used for the DRBD-resource.
使用 `linstor node list`时,将看到新节点标记为脱机。现在启动并启用该节点上的 linstor-satellite,以便服务在重新启动时也启动:
# systemctl enable --now linstor-satellite
如果您确定该服务已默认启用并在重新启动时启动,则还可以使用 systemctl start linstor-satellite
。
大约10秒后,您将看到 linstor node list
中的状态变为联机。当然,在controller知道satellite节点的存在之前,satellite进程已经启动完毕。
如果承载controller的节点也应该为LINSTOR集群提供存储空间,则必须将其添加为节点并启动linstor-satellite。 |
If you want to have other services wait until the linstor-satellite had a
chance to create the necessary devices (i.e. after a boot), you can update
the corresponding .service file and change Type=simple
to Type=notify
.
This will cause the satellite to delay sending the READY=1
message to
systemd until the controller connects, sends all required data to the
satellite and the satellite at least tried once to get the devices up and
running.
1.10. 存储池
StoragePools在LINSTOR上下文中用于标识存储。要对多个节点的存储池进行分组,只需在每个节点上使用相同的名称。例如,一种有效的方法是给所有的ssd取一个名字,给所有的hdd取另一个名字。
在每个提供存储的主机上,您需要创建LVM VG或ZFS zPool。使用一个LINSTOR存储池名称标识的VG和zPool在主机上可能有不同的VG或zPool名称,但请您自己考虑,是否在所有节点上使用相同的VG或zPool名称。
# vgcreate vg_ssd /dev/nvme0n1 /dev/nvme1n1 [...]
然后需要向LINSTOR注册:
# linstor storage-pool create lvm alpha pool_ssd vg_ssd # linstor storage-pool create lvm bravo pool_ssd vg_ssd
存储池名称和公共元数据称为 存储池定义 。上面列出的命令隐式创建了存储池定义。使用 linstor storage-pool-definition
list 可以看到这一点。也可以显式创建存储池定义,但不是必需的。
|
要列出您可以使用的存储池,请执行以下操作:
# linstor storage-pool list
或者使用短版本
# linstor sp l
如果由于附加的资源或快照(其中一些卷位于另一个仍在运行的存储池中)而阻止删除存储池,则会在相应list命令的 status
列中给出提示(例如
linstor resource list
)
手动删除丢失的存储池中的LINSTOR对象后,可以再次执行lost命令,以确保完全删除了存储池及其剩余对象。
1.10.1. 每个后端设备的存储池
在仅只有一种存储和热修复存储设备功能的集群中,可以选择一种模型,在该模型中为每个物理备份设备创建一个存储池。此模型的优点是将故障域限制在单个存储设备上。
1.10.2. Physical storage command
Since linstor-server 1.5.2 and a recent linstor-client, LINSTOR can create LVM/ZFS pools on a satellite for you. The linstor-client has the following commands to list possible disks and create storage pools, but such LVM/ZFS pools are not managed by LINSTOR and there is no delete command, so such action must be done manually on the nodes.
# linstor physical-storage list
Will give you a list of available disks grouped by size and rotational(SSD/Magnetic Disk).
It will only show disks that pass the following filters:
-
The device size must be greater than 1GiB
-
The device is a root device (not having children) e.g.: /dev/vda, /dev/sda
-
The device does not have any file-system or other
blkid
marker (wipefs -a
might be needed) -
The device is no DRBD device
With the create-device-pool
command you can create a LVM pool on a disk
and also directly add it as a storage-pool in LINSTOR.
# linstor physical-storage create-device-pool --pool-name lv_my_pool LVMTHIN node_alpha /dev/vdc --storage-pool newpool
If the --storage-pool
option was provided, LINSTOR will create a
storage-pool with the given name.
For more options and exact command usage please check the linstor-client help.
1.11. 资源组
资源组是资源定义的父对象,其中对资源组所做的所有属性更改都将由其资源定义的子级继承。资源组还存储自动放置规则的设置,并可以根据存储的规则生成资源定义。
简单地说,资源组就像模板,定义从它们创建的资源的特性。对这些伪模板的更改将应用于从资源组中创建的所有资源,并具有追溯性。
使用资源组定义资源配置方式应被视为部署由LINSTOR配置的卷的典型方法。后面描述从 资源定义 和 卷定义 创建每个 资源 的章节应仅在特殊情况下使用。 |
即使您选择不在LINSTOR集群中创建和使用 资源组 ,从 资源定义 和 卷定义 创建的所有资源都将存在于 DfltRscGrp
资源组 中。
|
使用资源组部署资源的简单模式如下:
# linstor resource-group create my_ssd_group --storage-pool pool_ssd --place-count 2 # linstor volume-group create my_ssd_group # linstor resource-group spawn-resources my_ssd_group my_ssd_res 20G
上述命令创建出一个名为 my_ssd_res
的资源,其中一个定义了双副本数的20GB卷将从名为 pool_ssd
的存储池的节点自动配置。
一个更有用的模式可能是创建一个资源组,其中的设置是您确定的最适合您的用例的。也许您必须对卷的一致性进行夜间联机验证,在这种情况下,您可以创建一个资源组,其中已经设置了您选择的
verify-alg
,以便从该组生成的资源预先配置为 verify-alg
集:
# linstor resource-group create my_verify_group --storage-pool pool_ssd --place-count 2 # linstor resource-group drbd-options --verify-alg crc32c my_verify_group # linstor volume-group create my_verify_group # for i in {00..19}; do linstor resource-group spawn-resources my_verify_group res$i 10G done
上述命令创建出20个10GiB资源,每个资源都预先配置了 crc32c
和 verify alg
。
您可以通过在相应的 资源定义 或 卷定义 上设置选项来优化从资源组派生的单个资源或卷的设置。例如,如果上面示例中的 res11
被一个接收大量小的随机写入的、非常活跃的数据库使用,则可能需要增加该特定资源的 al-extents
:
# linstor resource-definition drbd-options --al-extents 6007 res11
If you configure a setting in a resource-definition that is already configured on the resource-group it was spawned from, the value set in the resource-definition will override the value set on the parent resource-group. For example, if the same ‘res11’ was required to use the slower but more secure ‘sha256’ hash algorithm in its verifications, setting the ‘verify-alg’ on the resource-definition for ‘res11’ would override the value set on the resource-group:
# linstor resource-definition drbd-options --verify-alg sha256 res11
继承设置的层次结构的一个经验法则是 “更接近” 资源或卷的值: 卷定义 设置优先于 卷组 设置, 资源定义 设置优先于 资源组 设置。 |
1.12. 集群配置
1.12.1. 可用的存储插件
LINSTOR支持如下的存储插件:
-
Thick LVM
-
Thin LVM with a single thin pool
-
Thick ZFS
-
Thin ZFS
1.13. 创建和部署资源/卷
在下面的场景中,我们假设目标是创建一个大小为 500 GB
的资源 备份
,该资源在三个集群节点之间复制。
首先,我们创建一个新的资源定义:
# linstor resource-definition create backups
其次,我们在该资源定义中创建一个新的卷定义:
# linstor volume-definition create backups 500G
如果要更改卷定义的大小,只需执行以下操作:
# linstor volume-definition set-size backups 0 100G
参数 0
是资源 backups
中的卷数。必须提供此参数,因为资源可以有多个卷,并且它们由所谓的卷号标识。这个数字可以通过列出卷定义来找到。
只有在没有资源的情况下,才能减小卷定义的大小。尽管如此,即使部署了资源,也可以增加大小。 |
到目前为止,我们只在LINSTOR的数据库中创建了对象,没有在存储节点上创建一个LV。现在你可以选择将安置任务派发给LINSTOR,或者自己做。
1.13.1. 手动放置
使用 resource create
命令,可以将资源定义显式分配给命名节点。
# linstor resource create alpha backups --storage-pool pool_hdd # linstor resource create bravo backups --storage-pool pool_hdd # linstor resource create charlie backups --storage-pool pool_hdd
1.13.2. 自动放置
autoplace之后的值告诉LINSTOR您想要多少个副本。storage-pool 选项应该很明显。
# linstor resource create backups --auto-place 3 --storage-pool pool_hdd
可能不太明显的是,您可以省略 --storage-pool
选项,然后LINSTOR可以自己选择一个存储池。选择遵循以下规则:
-
忽略当前用户无权访问的所有节点和存储池
-
忽略所有无磁盘存储池
-
忽略没有足够可用空间的所有存储池
The remaining storage pools will be rated by different strategies. LINSTOR has currently three strategies:
MaxFreeSpace
: This strategy maps the rating 1:1 to the remaining free
space of the storage pool. However, this strategy only considers the
actually allocated space (in case of thinly provisioned storage pool this
might grow with time without creating new resources)
MinReservedSpace
: Unlink the “MaxFreeSpace”, this strategy considers the
reserved spaced. That is the space that a thin volume can grow to before
reaching its limit. The sum of reserved spaces might exceed the storage
pools capacity, which is as overprovisioning.
MinRscCount
: Simply the count of resources already deployed in a given
storage pool
MaxThroughput
: For this strategy, the storage pool’s
Autoplacer/MaxThroughput
property is the base of the score, or 0 if the
property is not present. Every Volume deployed in the given storage pool
will subtract its defined sys/fs/blkio_throttle_read
and
sys/fs/blkio_throttle_write
property- value from the storage pool’s max
throughput. The resulting score might be negative.
The scores of the strategies will be normalized, weighted and summed up, where the scores of minimizing strategies will be converted first to allow an overall maximization of the resulting score.
The weights of the strategies can be configured with
linstor controller set-property Autoplacer/Weights/$name_of_the_strategy $weight
whereas the strategy-names are listed above and the weight can be an arbitrary decimal.
To keep the behaviour of the autoplacer similar to the old one (due to
compatibility), all strategies have a default-weight of 0, except the
MaxFreeSpace which has a weight of 1.
|
Neither 0 nor a negative score will prevent a storage pool from getting selected, just making them to be considered later. |
Finally LINSTOR tries to find the best matching group of storage pools
meeting all requirements. This step also considers other autoplacement
restrictions as --replicas-on-same
, --replicas-on-different
and others.
These two arguments, --replicas-on-same
and --replicas-on-different
expect the name of a property within the Aux/
namespace. The following
example shows that the client automatically prefixes the testProperty
with
the Aux/
namespace.
linstor resource-group create testRscGrp --replicas-on-same testProperty SUCCESS: Description: New resource group 'testRscGrp' created. Details: Resource group 'testRscGrp' UUID is: 35e043cb-65ab-49ac-9920-ccbf48f7e27d linstor resource-group list +-----------------------------------------------------------------------------+ | ResourceGroup | SelectFilter | VlmNrs | Description | |-============================================================================| | DfltRscGrp | PlaceCount: 2 | | | |-----------------------------------------------------------------------------| | testRscGrp | PlaceCount: 2 | | | | | ReplicasOnSame: ['Aux/testProperty'] | | | +-----------------------------------------------------------------------------+
如果一切顺利,DRBD资源现在已经由LINSTOR创建。这时可以通过使用 lsblk 命令查找DRBD块设备来检查,设备名应该类似于
drbd0000 。
|
现在我们应该可以加载我们资源的块设备并开始使用LINSTOR。
2. 高阶LINSTOR任务
2.1. DRBD客户端
通过使用 --drbd-diskless
选项而不是 --storage-pool
,可以在节点上拥有永久无盘drbd设备。这意味着资源将显示为块设备,并且可以在没有现有存储设备的情况下装载到文件系统。资源的数据通过网络在具有相同资源的其他节点上访问。
# linstor resource create delta backups --drbd-diskless
不推荐使用选项 --diskless , 因为已过时。请改用 --drbd-diskless 或 --nvme-initiator 。
|
2.2. LINSTOR – DRBD一致性组/多个卷
所谓的一致性组是DRBD的一个特性。由于LINSTOR的主要功能之一是使用DRBD管理存储集群,因此本用户指南中提到了这一点。 单个资源中的多个卷是一个一致性组。
这意味着一个资源的不同卷上的更改在其他Satellites上以相同的时间顺序复制。
因此,如果资源中的不同卷上有相互依赖的数据,也不必担心时间问题。
要在LINSTOR资源中部署多个卷,必须创建两个同名的卷定义。
# linstor volume-definition create backups 500G # linstor volume-definition create backups 100G
2.3. 一个资源到不同存储池的卷
这可以通过在将资源部署到节点之前将 StorPoolName
属性设置为卷定义来实现:
# linstor resource-definition create backups # linstor volume-definition create backups 500G # linstor volume-definition create backups 100G # linstor volume-definition set-property backups 0 StorPoolName pool_hdd # linstor volume-definition set-property backups 1 StorPoolName pool_ssd # linstor resource create alpha backups # linstor resource create bravo backups # linstor resource create charlie backups
由于使用 volume-definition create 命令时没有使用 --vlmnr
选项,因此LINSTOR将从0开始分配卷号。在以下两行中,0和1表示这些自动分配的卷号。
|
这里的 resource create
命令不需要 --storage-pool
选项。在这种情况下,LINSTOR使用 fallback
存储池。找到该存储池后,LINSTOR按以下顺序查询以下对象的属性:
-
卷定义
-
Resource
-
资源定义
-
节点
如果这些对象都不包含 StorPoolName
属性,则controller将返回硬编码的 DfltStorPool
字符串作为存储池。
这还意味着,如果在部署资源之前忘记定义存储池,则会收到一条错误消息,即LINSTOR找不到名为 DfltStorPool
的存储池。
2.4. 无DRBD的LINSTOR
LINSTOR也可以在没有DRBD的情况下使用。没有DRBD,LINSTOR能够从LVM和ZFS支持的存储池中配置卷,并在LINSTOR集群中的各个节点上创建这些卷。
目前,LINSTOR支持创建LVM和ZFS卷,可以选择在这些卷上分层luk、DRBD和/或/NVMe-TCP的NVMe的某些组合。
例如,我们在LINSTOR集群中定义了一个Thin LVM支持的存储池,名为 thin-lvm
:
# linstor --no-utf8 storage-pool list +--------------------------------------------------------------+ | StoragePool | Node | Driver | PoolName | ... | |--------------------------------------------------------------| | thin-lvm | linstor-a | LVM_THIN | drbdpool/thinpool | ... | | thin-lvm | linstor-b | LVM_THIN | drbdpool/thinpool | ... | | thin-lvm | linstor-c | LVM_THIN | drbdpool/thinpool | ... | | thin-lvm | linstor-d | LVM_THIN | drbdpool/thinpool | ... | +--------------------------------------------------------------+
我们可以使用LINSTOR在 linstor-d
上创建一个100GiB大小的精简LVM,使用以下命令:
# linstor resource-definition create rsc-1 # linstor volume-definition create rsc-1 100GiB # linstor resource create --layer-list storage \ --storage-pool thin-lvm linstor-d rsc-1
你应该看到 linstor-d
上有一个新的瘦LVM。通过使用 --machine-readable
标志集列出LINSTOR资源,可以从LINSTOR提取设备路径:
# linstor --machine-readable resource list | grep device_path "device_path": "/dev/drbdpool/rsc-1_00000",
如果要将DRBD放置在此卷上(这是LINSTOR中ZFS或LVM支持的卷的默认 --layer-list
选项),可以使用以下资源创建模式:
# linstor resource-definition create rsc-1 # linstor volume-definition create rsc-1 100GiB # linstor resource create --layer-list drbd,storage \ --storage-pool thin-lvm linstor-d rsc-1
然后,您将看到有一个新的 Thin LVM被创建出来用于支撑 linstor-d
上的DRBD卷:
# linstor --machine-readable resource list | grep -e device_path -e backing_disk "device_path": "/dev/drbd1000", "backing_disk": "/dev/drbdpool/rsc-1_00000",
下表显示了哪个层可以后跟哪个子层:
Layer | Child layer |
---|---|
DRBD |
CACHE, WRITECACHE, NVME, LUKS, STORAGE |
CACHE |
WRITECACHE, NVME, LUKS, STORAGE |
WRITECACHE |
CACHE, NVME, LUKS, STORAGE |
NVME |
CACHE, WRITECACHE, LUKS, STORAGE |
LUKS |
STORAGE |
STORAGE |
– |
一个层只能在层列表中出现一次 |
有关 `luks`层的先决条件的信息,请参阅本用户指南的加密卷部分。 |
2.4.1. NVME-OF/NVME-TCP Linstor Layer
NVMe-oF/NVMe-TCP允许LINSTOR将无盘资源连接到具有相同资源的节点,数据存储在NVMe结构上。这就带来了这样一个优势:通过网络访问数据,无需使用本地存储就可以装载资源。在这种情况下,LINSTOR不使用DRBD,因此不会复制LINSTOR提供的NVMe资源,数据存储在一个节点上。
NVMe-oF仅在支持RDMA的网络上工作,NVMe-TCP在每个可以承载IP流量的网络上工作。如果您想了解有关NVMe-oF/NVMe-TCP的NVMe的更多信息,请访问https://www.linbit.com/en/nvme-linstor-swordfish/。 |
要将NVMe-oF/NVMe-TCP与LINSTOR一起使用,需要在充当Satellite的每个节点上安装包
nvme-cli
,并将NVMe-oF/NVMe-TCP用于资源:
如果不使用Ubuntu,请使用合适的命令在OS安装软件包 – SLES: zypper – CentOS: yum |
# apt install nvme-cli
要使资源使用NVMe-oF/NVMe-TCP,必须在创建资源定义时提供一个附加参数:
# linstor resource-definition create nvmedata -l nvme,storage
默认情况下,使用DRBD时-l(layer-stack)参数设置为 drbd,
storage 。如果要创建既不使用NVMe也不使用DBRD的LINSTOR资源,则必须将 -l 参数设置为仅使用 storage 。
|
为我们的资源创建卷定义:
# linstor volume-definition create nvmedata 500G
在节点上创建资源之前,必须知道数据将在本地存储在哪里,以及哪个节点通过网络访问它。
首先,我们在存储数据的节点上创建资源:
# linstor resource create alpha nvmedata --storage-pool pool_ssd
在将通过网络访问资源数据的节点上,必须将资源定义为无盘:
# linstor resource create beta nvmedata -d
-d
参数将此节点上的资源创建为无盘。
现在,您可以在一个节点上装载资源 nvmedata
。
如果您的节点有多个NIC,您应该为NVMe-of/NVME-TCP指定它们之间的路由,否则多个NIC可能会导致问题。 |
2.4.2. OpenFlex™ Layer
Since version 1.5.0 the additional Layer openflex
can be used in LINSTOR.
From LINSTOR’s perspective, the
OpenFlex
Composable Infrastructure takes the role of a combined layer acting as a
storage layer (like LVM) and also providing the allocated space as an NVMe
target. OpenFlex has a REST API which is also used by LINSTOR to operate
with.
As OpenFlex combines concepts of LINSTORs storage as well as NVMe-layer,
LINSTOR was added both, a new storage driver for the storage pools as well
as a dedicated openflex
layer which uses the mentioned REST API.
In order for LINSTOR to communicate with the OpenFlex-API, LINSTOR needs
some additional properties, which can be set once on controller
level to
take LINSTOR-cluster wide effect:
-
StorDriver/Openflex/ApiHost
specifies the host or IP of the API entry-point -
StorDriver/Openflex/ApiPort
this property is glued with a colon to the previous to form the basichttp://ip:port
part used by the REST calls -
StorDriver/Openflex/UserName
the REST username -
StorDriver/Openflex/UserPassword
the password for the REST user
Once that is configured, we can now create LINSTOR objects to represent the OpenFlex architecture. The theoretical mapping of LINSTOR objects to OpenFlex objects are as follows: Obviously an OpenFlex storage pool is represented by a LINSTOR storage pool. As the next thing above a LINSTOR storage pool is already the node, a LINSTOR node represents an OpenFlex storage device. The OpenFlex objects above storage device are not mapped by LINSTOR.
When using NVMe, LINSTOR was designed to run on both sides, the NVMe target as well as on the NVMe initiator side. In the case of OpenFlex, LINSTOR cannot (or even should not) run on the NVMe target side as that is completely managed by OpenFlex. As LINSTOR still needs nodes and storage pools to represent the OpenFlex counterparts, the LINSTOR client was extended with special node create commands since 1.0.14. These commands not only accept additionally needed configuration data, but also starts a “special satellite” besides the already running controller instance. This special satellites are completely LINSTOR managed, they will shutdown when the controller shuts down and will be started again when the controller starts. The new client command for creating a “special satellite” representing an OpenFlex storage device is:
$ linstor node create-openflex-target ofNode1 192.168.166.7 000af795789d
The arguments are as follows:
-
ofNode1
is the node name which is also used by the standardlinstor node create
command -
192.168.166.7
is the address on which the provided NVMe devices can be accessed. As the NVMe devices are accessed by a dedicated network interface, this address differs from the address specified with the propertyStorDriver/Openflex/ApiHost
. The latter is used for the management / REST API. -
000af795789d
is the identifier for the OpenFlex storage device.
The last step of the configuration is the creation of LINSTOR storage pools:
$ linstor storage-pool create openflex ofNode1 sp0 0
-
ofNode1
andsp0
are the node name and storage pool name, respectively, just as usual for the LINSTORs create storage pool command -
The last
0
is the identifier of the OpenFlex storage pool within the previously defined storage device
Once all necessary storage pools are created in LINSTOR, the next steps are similar to the usage of using an NVMe resource with LINSTOR. Here is a complete example:
# set the properties once linstor controller set-property StorDriver/Openflex/ApiHost 10.43.7.185 linstor controller set-property StorDriver/Openflex/ApiPort 80 linstor controller set-property StorDriver/Openflex/UserName myusername linstor controller set-property StorDriver/Openflex/UserPassword mypassword # create a node for openflex storage device "000af795789d" linstor node create-openflex-target ofNode1 192.168.166.7 000af795789d # create a usual linstor satellite. later used as nvme initiator linstor node create bravo # create a storage pool for openflex storage pool "0" within storage device "000af795789d" linstor storage-pool create openflex ofNode1 sp0 0 # create resource- and volume-definition linstor resource-definition create backupRsc linstor volume-definition create backupRsc 10G # create openflex-based nvme target linstor resource create ofNode1 backupRsc --storage-pool sp0 --layer-list openflex # create openflex-based nvme initiator linstor resource create bravo backupRsc --nvme-initiator --layer-list openflex
In case a node should access the OpenFlex REST API through a different host
than specified with + linstor controller set-property
StorDriver/Openflex/ApiHost 10.43.7.185 you can always use LINSTOR’s
inheritance mechanism for properties. That means simply define the same
property on the node-level you need it, i.e. + linstor node set-property
ofNode1 StorDriver/Openflex/ApiHost 10.43.8.185
|
2.4.3. 写入缓存层
一个 DM writecache 设备由两个设备、即一个存储设备和一个缓存设备组成。LINSTOR可以设置这样一个writecache设备,但是需要一些额外的信息,比如存储池和缓存设备的大小。
# linstor storage-pool create lvm node1 lvmpool drbdpool # linstor storage-pool create lvm node1 pmempool pmempool # linstor resource-definition create r1 # linstor volume-definition create r1 100G # linstor volume-definition set-property r1 0 Writecache/PoolName pmempool # linstor volume-definition set-property r1 0 Writecache/Size 1% # linstor resource create node1 r1 --storage-pool lvmpool --layer-list WRITECACHE,STORAGE
The two properties set in the examples are mandatory, but can also be set on
controller level which would act as a default for all resources with
WRITECACHE
in their --layer-list
. However, please note that the
Writecache/PoolName
refers to the corresponding node. If the node does not
have a storage-pool named pmempool
you will get an error message.
在
DM
writecache 中所需的4个必需参数要么通过属性配置,要么由LINSTOR计算。上述链接中列出的可选属性也可以通过属性进行设置。有关
Writecache/*
属性键的列表,请参见 linstor controller set property—help
。
使用 --layer-list DRBD,WRITECACHE,STORAGE
将DRBD配置为使用外部元数据时,只有备份设备将使用writecache,而不是保存外部元数据的设备。
2.4.4. Cache Layer
LINSTOR can also setup a
DM-Cache
device, which is very similar to the DM-Writecache from the previous
section. The major difference is that a cache device is composed by three
devices: one storage device, one cache device and one meta device. The
LINSTOR properties are quite similar to those of the writecache but are
located in the Cache
namespace:
# linstor storage-pool create lvm node1 lvmpool drbdpool # linstor storage-pool create lvm node1 pmempool pmempool # linstor resource-definition create r1 # linstor volume-definition create r1 100G # linstor volume-definition set-property r1 0 Cache/CachePool pmempool # linstor volume-definition set-property r1 0 Cache/Size 1% # linstor resource create node1 r1 --storage-pool lvmpool --layer-list CACHE,STORAGE
Instead of Writecache/PoolName (as when configuring the Writecache layer)
the Cache layer’s only required property is called Cache/CachePool . The
reason for this is that the Cache layer also has a Cache/MetaPool which
can be configured separately or it defaults to the value of
Cache/CachePool .
|
Please see linstor controller set-property --help
for a list of Cache/*
property-keys and default values for omitted properties.
Using --layer-list DRBD,CACHE,STORAGE
while having DRBD configured to use
external metadata, only the backing device will use a cache, not the device
holding the external metadata.
2.4.5. Storage Layer
For some storage providers LINSTOR has special properties:
-
StorDriver/LvcreateOptions
: The value of this property is appended to everylvcreate …
call LINSTOR executes. -
StorDriver/ZfscreateOptions
: The value of this property is appended to everyzfs create …
call LINSTOR executes. -
StorDriver/WaitTimeoutAfterCreate
: If LINSTOR expects a device to appear after creation (for example after calls oflvcreate
,zfs create
,…), LINSTOR waits per default 500ms for the device to appear. These 500ms can be overridden by this property. -
StorDriver/dm_stats
: If set totrue
LINSTOR callsdmstats create $device
after creation anddmstats delete $device --allregions
after deletion of a volume. Currently only enabled for LVM and LVM_THIN storage providers.
2.5. 管理网络接口卡
LINSTOR可以在一台机器上处理多个网络接口卡(NICs),在LINSTOR中称为 netif
。
创建附属节点时,将隐式创建第一个名为 default 的 netif 。使用 node create 命令的 --interface
name 选项,可以给它一个不同的名称。
|
其他NIC的创建方式如下:
# linstor node interface create alpha 100G_nic 192.168.43.221 # linstor node interface create alpha 10G_nic 192.168.43.231
NIC仅由IP地址标识,名称是任意的,与Linux使用的接口名称 无关 。可以将NIC分配给存储池,以便每当在这样的存储池中创建资源时,DRBD通信通过其指定的NIC路由。
# linstor storage-pool set-property alpha pool_hdd PrefNic 10G_nic # linstor storage-pool set-property alpha pool_ssd PrefNic 100G_nic
FIXME 描述了如何通过特定的 netif
路由controller <-> 客户端通信。
2.6. 加密卷
LINSTOR可以处理drbd卷的透明加密。dm-crypt用于对存储设备中提供的存储进行加密。
In order to use dm-crypt please make sure to have cryptsetup installed
before you start the satellite
|
使用加密的基本步骤:
-
在控制器上禁用用户安全性(一旦身份验证生效,这将被废弃)
-
创建主密码
-
Add
luks
to the layer-list. Note that all plugins (e.g., Proxmox) require a DRBD layer as the top most layer if they do not explicitly state otherwise. -
不要忘记在controller重新启动后重新输入主密码。
2.6.1. 禁用用户安全
在 Linstor
controller上禁用用户安全性是一次操作,之后会被持久化。
-
通过
systemctl stop linstor-controller
停止正在运行的linstor控制器` -
在调试模式下启动linstor控制器:
/usr/share/linstor-server/bin/Controller -c /etc/linstor -d
-
在调试控制台中输入:
setSecLvl secLvl(NO_SECURITY)
-
使用调试关闭命令停止linstor控制器:
shutdown
-
使用systemd重新启动控制器:
systemctl start linstor-controller
2.6.2. 加密命令
下面是有关命令的详细信息。
在LINSTOR可以加密任何卷之前,需要创建主密码。这可以通过linstor客户端完成。
# linstor encryption create-passphrase
`crypt create-passphrase`将等待用户输入初始主密码(因为所有其他crypt命令都没有参数)。
如果您想更改主密码,可以使用以下方法:
# linstor encryption modify-passphrase
luks
层可以在创建资源定义或资源本身时添加,而建议使用前一种方法,因为它将自动应用于从该资源定义创建的所有资源。
# linstor resource-definition create crypt_rsc --layer-list luks,storage
要输入主密码(在controller重新启动后),请使用以下命令:
# linstor encryption enter-passphrase
无论何时重新启动linstor controller,用户都必须向控制器发送主密码,否则linstor无法重新打开或创建加密卷。 |
2.6.3. Automatic Passphrase
It is possible to automate the process of creating and re-entering the master passphrase.
To use this, either an environment variable called MASTER_PASSPHRASE
or an
entry in /etc/linstor/linstor.toml
containing the master passphrase has to
be created.
The required linstor.toml
looks like this:
[encrypt] passphrase="example"
If either one of these is set, then every time the controller starts it will check whether a master passphrase already exists. If there is none, it will create a new master passphrase as specified. Otherwise, the controller enters the passphrase.
If a master passphrase is already configured, and it is not the same one as
specified in the environment variable or linstor.toml , the controller will
be unable to re-enter the master passphrase and react as if the user had
entered a wrong passphrase. This can only be resolved through manual input
from the user, using the same commands as if the controller was started
without the automatic passphrase.
|
In case the master passphrase is set in both an environment variable and the
linstor.toml , only the master passphrase from the linstor.toml will be
used.
|
2.7. 检查集群状态
LINSTOR提供各种命令来检查集群的状态。这些命令以 list-
前缀开头,并提供各种筛选和排序选项。--groupby
选项可用于按多个维度对输出进行分组和排序。
# linstor node list # linstor storage-pool list --groupby Size
2.8. 管理快照
精简LVM和ZFS存储池支持快照。
2.8.1. 创建快照
假设在某些节点上放置了名为 resource1
的资源定义,则可以按如下方式创建快照:
# linstor snapshot create resource1 snap1
这将在资源所在的所有节点上创建快照。LINSTOR将确保即使在资源处于活动使用状态时也创建一致的快照。
Setting the resource-definition property AutoSnapshot/RunEvery
LINSTOR
will automatically create snapshots every X minute. The optional property
AutoSnapshot/Keep
can be used to clean-up old snapshots which were created
automatically. No manually created snapshot will be cleaned-up / deleted.
If AutoSnapshot/Keep
is omitted (or ⇐ 0), LINSTOR will keep the last 10
snapshots by default.
# linstor resource-definition set-property AutoSnapshot/RunEvery 15 # linstor resource-definition set-property AutoSnapshot/Keep 5
2.8.2. 还原快照
以下步骤将快照还原到新资源。即使原始资源已从创建快照的节点中删除,也可能发生这种情况。
首先使用与快照中的卷匹配的卷定义新资源:
# linstor resource-definition create resource2 # linstor snapshot volume-definition restore --from-resource resource1 --from-snapshot snap1 --to-resource resource2
此时,如果需要,可以应用其他配置。然后,准备好后,根据快照创建资源:
# linstor snapshot resource restore --from-resource resource1 --from-snapshot snap1 --to-resource resource2
这将在快照所在的所有节点上放置新资源。也可以显式选择放置资源的节点;请参阅帮助( linstor snapshot resource restore
-h
).
2.8.3. 回滚快照
LINSTOR可以将资源回滚到快照状态。回滚时资源不能被使用。也就是说,它不能安装在任何节点上。如果资源正在使用中,请考虑是否可以通过restoring the snapshot来实现您的目标。
回滚执行如下:
# linstor snapshot rollback resource1 snap1
资源只能回滚到最近的快照。要回滚到旧快照,请首先删除中间快照。
2.8.4. 删除快照
可以按如下方式删除现有快照:
# linstor snapshot delete resource1 snap1
2.8.5. Shipping a snapshot
Both, the source as well as the target node have to have the resource for snapshot shipping deployed. Additionally, the target resource has to be deactivated.
# linstor resource deactivate nodeTarget resource1
Deactivating a resource with DRBD in its layer-list can NOT be reactivated again. However, a successfully shipped snapshot of a DRBD resource can still be restored into a new resource. |
To manually start the snapshot-shipping, use:
# linstor snapshot ship --from-node nodeSource --to-node nodeTarget --resource resource1
By default, the snapshot-shipping uses tcp ports from the range
12000-12999. To change this range, the property
SnapshotShipping/TcpPortRange
, which accepts a to-from range, can be set
on the controller:
# linstor controller set-property SnapshotShipping/TcpPortRange 10000-12000
A resource can also be periodically shipped. To accomplish this, it is
mandatory to set the properties SnapshotShipping/TargetNode
as well as
SnapshotShipping/RunEvery
on the resource-definition.
SnapshotShipping/SourceNode
can also be set, but if omitted LINSTOR will
choose an active resource of the same resource-definition.
To allow incremental snapshot-shipping, LINSTOR has to keep at least the
last shipped snapshot on the target node. The property
SnapshotShipping/Keep
can be used to specify how many snapshots LINSTOR
should keep. If the property is not set (or ⇐ 0) LINSTOR will keep the last
10 shipped snapshots by default.
# linstor resource-definition set-property resource1 SnapshotShipping/TargetNode nodeTarget # linstor resource-definition set-property resource1 SnapshotShipping/SourceNode nodeSource # linstor resource-definition set-property resource1 SnapshotShipping/RunEvery 15 # linstor resource-definition set-property resource1 SnapshotShipping/Keep 5
2.9. 设置资源选项
DRBD选项是使用LINSTOR命令设置的。将忽略诸如 /etc/drbd.d/global_common.conf
文件中未由LINSTOR管理的配置。以下命令显示用法和可用选项:
# linstor controller drbd-options -h # linstor resource-definition drbd-options -h # linstor volume-definition drbd-options -h # linstor resource drbd-peer-options -h
例如,很容易为名为 backups
的资源设置DRBD协议:
# linstor resource-definition drbd-options --protocol C backups
2.10. 添加和删除磁盘
LINSTOR可以在无盘和有盘之间转换资源。这是通过 resource toggle disk
命令实现的,该命令的语法类似于 resource
create
。
例如,将磁盘添加到 alpha
上的无盘资源 backups
:
# linstor resource toggle-disk alpha backups --storage-pool pool_ssd
再次删除此磁盘:
# linstor resource toggle-disk alpha backups --diskless
2.10.1. 迁移磁盘
为了在节点之间移动资源而不减少任何点的冗余,可以使用LINSTOR的磁盘迁移功能。首先在目标节点上创建一个无盘资源,然后使用 --migrate
from
选项添加一个磁盘。这将等到数据已同步到新磁盘,然后删除源磁盘。
例如,要将资源 backups
从 alpha
迁移到 bravo
:
# linstor resource create bravo backups --drbd-diskless # linstor resource toggle-disk bravo backups --storage-pool pool_ssd --migrate-from alpha
2.11. LINSTOR的DRBD代理
LINSTOR希望DRBD代理在相关连接所涉及的节点上运行。它目前不支持通过单独节点上的DRBD代理进行连接。
假设我们的集群由本地网络中的节点 alpha
和 bravo
以及远程站点上的节点 charlie
组成,每个节点都部署了名为
backups
的资源定义。然后,可以为 charlie
的连接启用DRBD Proxy,如下所示:
# linstor drbd-proxy enable alpha charlie backups # linstor drbd-proxy enable bravo charlie backups
DRBD代理配置可以通过以下命令定制:
# linstor drbd-proxy options backups --memlimit 100000000 # linstor drbd-proxy compression zlib backups --level 9
LINSTOR不会自动优化远程复制的DRBD配置,因此您可能需要设置一些配置选项,例如协议:
# linstor resource-connection drbd-options alpha charlie backups --protocol A # linstor resource-connection drbd-options bravo charlie backups --protocol A
请与LINBIT联系以获得优化配置的帮助。
2.11.1. Automatically enable DRBD Proxy
LINSTOR can also be configured to automatically enable the above mentioned Proxy connection between two nodes. For this automation, LINSTOR first needs to know on which site each node is.
# linstor node set-property alpha Site A # linstor node set-property bravo Site A # linstor node set-property charlie Site B
As the Site
property might also be used for other site-based decisions in
future features, the DrbdProxy/AutoEnable
also has to be set to true
:
# linstor controller set-property DrbdProxy/AutoEnable true
This property can also be set on node, resource-definition, resource and resource-connection level (from left to right in increasing priority, whereas the controller is the left-most, i.e. least prioritized level)
Once this initialization steps are completed, every newly created resource will automatically check if it has to enable DRBD proxy to any of its peer-resources.
2.12. 外部数据库
可以让LINSTOR与外部数据库提供程序(如Postgresql、MariaDB)一起工作,从版本1.1.0开始支持ETCD键值存储。
To use an external database there are a few additional steps to configure.
You have to create a DB/Schema and user to use for linstor, and configure
this in the /etc/linstor/linstor.toml
.
2.12.1. Postgresql
Postgresql linstor.toml
示例如下:
[db] user = "linstor" password = "linstor" connection_url = "jdbc:postgresql://localhost/linstor"
2.12.2. MariaDB/Mysql
MariaDB linstor.toml
示例如下:
[db] user = "linstor" password = "linstor" connection_url = "jdbc:mariadb://localhost/LINSTOR?createDatabaseIfNotExist=true"
LINSTOR schema/数据库被创建为 LINSTOR ,因此请确保mariadb连接字符串引用了 LINSTOR
schema,如上面的示例所示。
|
2.12.3. ETCD
ETCD是一个分布式键值存储,它使得在HA设置中保持LINSTOR数据库的分布式变得容易。ETCD驱动程序已经包含在LINSTOR控制器包中,只需要在
LINSTOR.toml
中进行配置。
有关如何安装和配置ETCD的更多信息,请访问:https://etcd.io/docs[ETCD docs]
下面是 linstor.toml
中的[db]部分示例:
[db] ## only set user/password if you want to use authentication, only since LINSTOR 1.2.1 # user = "linstor" # password = "linstor" ## for etcd ## do not set user field if no authentication required connection_url = "etcd://etcdhost1:2379,etcdhost2:2379,etcdhost3:2379" ## if you want to use TLS, only since LINSTOR 1.2.1 # ca_certificate = "ca.pem" # client_certificate = "client.pem" ## if you want to use client TLS authentication too, only since LINSTOR 1.2.1 # client_key_pkcs8_pem = "client-key.pkcs8" ## set client_key_password if private key has a password # client_key_password = "mysecret"
2.13. LINSTOR REST-API
To make LINSTOR’s administrative tasks more accessible and also available
for web-frontends a REST-API has been created. The REST-API is embedded in
the linstor-controller and since LINSTOR 0.9.13 configured via the
linstor.toml
configuration file.
[http] enabled = true port = 3370 listen_addr = "127.0.0.1" # to disable remote access
如果您想使用REST-API,可以在以下链接中找到当前文档:https://app.swaggerhub.com/apis-docs/Linstor/Linstor/
2.13.1. LINSTOR REST-API HTTPS
HTTP REST-API也可以通过HTTPS运行,如果您使用任何需要授权的功能,则强烈建议您使用它。所以你必须用一个有效的证书创建一个java密钥库文件,这个证书将用于加密所有的HTTPS通信。
下面是一个简单的示例,说明如何使用java运行时中包含的 keytool
创建自签名证书:
keytool -keyalg rsa -keysize 2048 -genkey -keystore ./keystore_linstor.jks\ -alias linstor_controller\ -dname "CN=localhost, OU=SecureUnit, O=ExampleOrg, L=Vienna, ST=Austria, C=AT"
keytool
将要求密码来保护生成的密钥库文件,这是LINSTOR controller配置所需的。在 linstor.toml
文件中,必须添加以下部分:
[https] keystore = "/path/to/keystore_linstor.jks" keystore_password = "linstor"
现在(重新)启动 linstor controller
,端口3371上应该可以使用HTTPS REST-API。
有关如何导入其他证书的更多信息,请访问:https://docs.oracle.com/javase/8/docs/technotes/tools/unix/keytool.html
当启用HTTPS时,对HTTP/v1/ REST-API的所有请求都将重定向到HTTPS重定向。 |
LINSTOR REST-API HTTPS受限客户端访问
可以通过在控制器上使用SSL信任库来限制客户端访问。基本上,您为您的客户机创建一个证书并将其添加到信任库中,然后客户机使用该证书进行身份验证。
首先创建客户端证书:
keytool -keyalg rsa -keysize 2048 -genkey -keystore client.jks\ -storepass linstor -keypass linstor\ -alias client1\ -dname "CN=Client Cert, OU=client, O=Example, L=Vienna, ST=Austria, C=AT"
然后我们将此证书导入controller信任库:
keytool -importkeystore\ -srcstorepass linstor -deststorepass linstor -keypass linstor\ -srckeystore client.jks -destkeystore trustore_client.jks
并在 linstor.toml
配置文件中启用信任库:
[https] keystore = "/path/to/keystore_linstor.jks" keystore_password = "linstor" truststore = "/path/to/trustore_client.jks" truststore_password = "linstor"
现在重新启动controller,如果没有正确的证书,将无法再访问控制器API。
LINSTOR客户机需要PEM格式的证书,因此在使用之前,我们必须将java密钥库证书转换为PEM格式。
# Convert to pkcs12 keytool -importkeystore -srckeystore client.jks -destkeystore client.p12\ -storepass linstor -keypass linstor\ -srcalias client1 -srcstoretype jks -deststoretype pkcs12 # use openssl to convert to PEM openssl pkcs12 -in client.p12 -out client_with_pass.pem
为了避免一直输入PEM文件密码,可以方便地删除密码。
openssl rsa -in client_with_pass.pem -out client1.pem openssl x509 -in client_with_pass.pem >> client1.pem
现在,这个PEM文件可以很容易地在客户端使用:
linstor --certfile client1.pem node list
--certfile
参数也可以添加到客户端配置文件中,有关详细信息,请参见使用LINSTOR客户端。
2.14. Logging
Linstor uses SLF4J with
Logback as binding. This gives Linstor the
possibility to distinguish between the log levels ERROR
, WARN
, INFO
,
DEBUG
and TRACE
(in order of increasing verbosity). In the current
linstor version (1.1.2) the user has the following four methods to control
the logging level, ordered by priority (first has highest priority):
-
TRACE
mode可以使用调试控制台, 通过enabled
或disabled
分别指明:Command ==> SetTrcMode MODE(enabled) SetTrcMode Set TRACE level logging mode New TRACE level logging mode: ENABLED
-
启动controller或satellite时,可以传递命令行参数:
java ... com.linbit.linstor.core.Controller ... --log-level INFO java ... com.linbit.linstor.core.Satellite ... --log-level INFO
-
建议放在
/etc/linstor/linstor.toml
文件中的logging
部分:[logging] level="INFO"
-
由于Linstor使用Logback作为实现,还可以使用
/usr/share/Linstor server/lib/Logback.xml
。目前只有这种方法支持不同组件的不同日志级别,如下例所示:<?xml version="1.0" encoding="UTF-8"?> <configuration scan="false" scanPeriod="60 seconds"> <!-- Values for scanPeriod can be specified in units of milliseconds, seconds, minutes or hours https://logback.qos.ch/manual/configuration.html --> <appender name="STDOUT" class="ch.qos.logback.core.ConsoleAppender"> <!-- encoders are assigned the type ch.qos.logback.classic.encoder.PatternLayoutEncoder by default --> <encoder> <pattern>%d{HH:mm:ss.SSS} [%thread] %-5level %logger - %msg%n</pattern> </encoder> </appender> <appender name="FILE" class="ch.qos.logback.core.rolling.RollingFileAppender"> <file>${log.directory}/linstor-${log.module}.log</file> <append>true</append> <encoder class="ch.qos.logback.classic.encoder.PatternLayoutEncoder"> <Pattern>%d{yyyy_MM_dd HH:mm:ss.SSS} [%thread] %-5level %logger - %msg%n</Pattern> </encoder> <rollingPolicy class="ch.qos.logback.core.rolling.FixedWindowRollingPolicy"> <FileNamePattern>logs/linstor-${log.module}.%i.log.zip</FileNamePattern> <MinIndex>1</MinIndex> <MaxIndex>10</MaxIndex> </rollingPolicy> <triggeringPolicy class="ch.qos.logback.core.rolling.SizeBasedTriggeringPolicy"> <MaxFileSize>2MB</MaxFileSize> </triggeringPolicy> </appender> <logger name="LINSTOR/Controller" level="INFO" additivity="false"> <appender-ref ref="STDOUT" /> <!-- <appender-ref ref="FILE" /> --> </logger> <logger name="LINSTOR/Satellite" level="INFO" additivity="false"> <appender-ref ref="STDOUT" /> <!-- <appender-ref ref="FILE" /> --> </logger> <root level="WARN"> <appender-ref ref="STDOUT" /> <!-- <appender-ref ref="FILE" /> --> </root> </configuration>
有关 logback.xml
的更多详细信息,请参见https://logback.qos.ch/manual/index.html[Logback
Manual]。
如果没有使用上述任何配置方法,Linstor将默认为 INFO
日志级别。
2.15. Monitoring
Since LINSTOR 1.8.0, a Prometheus /metrics
HTTP
path is provided with LINSTOR and JVM specific exports.
The /metrics
path also supports 3 GET arguments to reduce LINSTOR’s
reported data:
-
resource
-
storage_pools
-
error_reports
These are all default true
, to disabled e.g. error-report data:
http://localhost:3070/metrics?error_reports=false
2.15.1. Health check
The LINSTOR-Controller also provides a /health
HTTP path that will simply
return HTTP-Status 200 if the controller can access its database and all
services are up and running. Otherwise it will return HTTP error status code
500 Internal Server Error
.
2.16. 安全Satellite连接
可以让LINSTOR在controller和satellites连接之间使用SSL安全TCP连接。在不深入讨论java的SSL引擎如何工作的情况下,我们将向您提供使用java运行时环境中的
keytool
的命令行片段,介绍如何使用安全连接配置3节点设置。节点设置如下所示:
节点 alpha
只充当controller。节点 bravo
和节点 charlie
只充当卫星。
下面是生成这样一个密钥库设置的命令,当然你应该根据您的环境编辑值。
# create directories to hold the key files mkdir -p /tmp/linstor-ssl cd /tmp/linstor-ssl mkdir alpha bravo charlie # create private keys for all nodes keytool -keyalg rsa -keysize 2048 -genkey -keystore alpha/keystore.jks\ -storepass linstor -keypass linstor\ -alias alpha\ -dname "CN=Max Mustermann, OU=alpha, O=Example, L=Vienna, ST=Austria, C=AT" keytool -keyalg rsa -keysize 2048 -genkey -keystore bravo/keystore.jks\ -storepass linstor -keypass linstor\ -alias bravo\ -dname "CN=Max Mustermann, OU=bravo, O=Example, L=Vienna, ST=Austria, C=AT" keytool -keyalg rsa -keysize 2048 -genkey -keystore charlie/keystore.jks\ -storepass linstor -keypass linstor\ -alias charlie\ -dname "CN=Max Mustermann, OU=charlie, O=Example, L=Vienna, ST=Austria, C=AT" # import truststore certificates for alpha (needs all satellite certificates) keytool -importkeystore\ -srcstorepass linstor -deststorepass linstor -keypass linstor\ -srckeystore bravo/keystore.jks -destkeystore alpha/certificates.jks keytool -importkeystore\ -srcstorepass linstor -deststorepass linstor -keypass linstor\ -srckeystore charlie/keystore.jks -destkeystore alpha/certificates.jks # import controller certificate into satellite truststores keytool -importkeystore\ -srcstorepass linstor -deststorepass linstor -keypass linstor\ -srckeystore alpha/keystore.jks -destkeystore bravo/certificates.jks keytool -importkeystore\ -srcstorepass linstor -deststorepass linstor -keypass linstor\ -srckeystore alpha/keystore.jks -destkeystore charlie/certificates.jks # now copy the keystore files to their host destinations ssh root@alpha mkdir /etc/linstor/ssl scp alpha/* root@alpha:/etc/linstor/ssl/ ssh root@bravo mkdir /etc/linstor/ssl scp bravo/* root@bravo:/etc/linstor/ssl/ ssh root@charlie mkdir /etc/linstor/ssl scp charlie/* root@charlie:/etc/linstor/ssl/ # generate the satellite ssl config entry echo '[netcom] type="ssl" port=3367 server_certificate="ssl/keystore.jks" trusted_certificates="ssl/certificates.jks" key_password="linstor" keystore_password="linstor" truststore_password="linstor" ssl_protocol="TLSv1.2" ' | ssh root@bravo "cat > /etc/linstor/linstor_satellite.toml" echo '[netcom] type="ssl" port=3367 server_certificate="ssl/keystore.jks" trusted_certificates="ssl/certificates.jks" key_password="linstor" keystore_password="linstor" truststore_password="linstor" ssl_protocol="TLSv1.2" ' | ssh root@charlie "cat > /etc/linstor/linstor_satellite.toml"
现在只需启动controller和satellites,并添加带有 --通信类型SSL
的节点。
2.17. Automatisms for DRBD-Resources
2.17.1. 自动仲裁策略
LINSTOR在资源上自动配置仲裁策略 当仲裁可实现时 。这意味着,每当您有至少两个磁盘和一个或多个无磁盘资源分配,或三个或多个磁盘资源分配时,LINSTOR将自动为您的资源启用仲裁策略。
相反,当达到仲裁所需的资源分配少于最小值时,LINSTOR将自动禁用仲裁策略。
这是通过 DrbdOptions/auto-quorum
属性控制的,该属性可应用于 linstor-controller
、
resource-group
和 resource-definition
。 DrbdOptions/auto-quorum
,
属性的接受值为 disabled
、 suspend io
和 io error
。
将 DrbdOptions/auto-quorum
属性设置为 disabled
将允许您手动或更细粒度地控制资源的仲裁策略(如果您愿意)。
DrbdOptions/auto-quorum 的默认策略是 quorum-most 和 on-no-quorum io-error
。有关DRBD的仲裁功能及其行为的更多信息,请参阅 quorum section of
the DRBD user’s guide.
|
如果未禁用 DrbdOptions/auto-quorum ,则 DrbdOptions/auto-quorum 策略将覆盖任何手动配置的属性。
|
例如,要手动设置名为 my_ssd_group
的 resource-group 的仲裁策略,可以使用以下命令:
# linstor resource-group set-property my_ssd_group DrbdOptions/auto-quorum disabled # linstor resource-group set-property my_ssd_group DrbdOptions/Resource/quorum majority # linstor resource-group set-property my_ssd_group DrbdOptions/Resource/on-no-quorum suspend-io
您可能希望完全禁用DRBD的仲裁功能。要做到这一点,首先需要在适当的LINSTOR对象上禁用’DrbdOptions/auto-quorum’,然后相应地设置DRBD
quorum特性。例如,使用以下命令完全禁用 my_ssd_group
的 resource-group 上的仲裁:
# linstor resource-group set-property my_ssd_group DrbdOptions/auto-quorum disabled # linstor resource-group set-property my_ssd_group DrbdOptions/Resource/quorum off # linstor resource-group set-property my_ssd_group DrbdOptions/Resource/on-no-quorum
在上面的命令中将 DrbdOptions/Resource/on-no-quorum 设置为空值将完全从对象中删除该属性。
|
2.17.2. Auto-Evict
If a satellite is offline for a prolonged period of time, LINSTOR can be configured to declare that node as evicted. This triggers an automated reassignment of the affected DRBD-resources to other nodes to ensure a minimum replica count is kept.
This feature uses the following properties to adapt the behaviour.
DrbdOptions/AutoEvictMinReplicaCount
sets the number of replicas that
should always be present. You can set this property on the controller to
change a global default, or on a specific resource-definition or
resource-group to change it only for that resource-definiton or
resource-group. If this property is left empty, the place-count set for the
auto-placer of the corresponding resource-group will be used.
DrbdOptions/AutoEvictAfterTime
describes how long a node can be offline in
minutes before the eviction is triggered. You can set this property on the
controller to change a global default, or on a single node to give it a
different behavior. The default value for this property is 60 minutes.
DrbdOptions/AutoEvictMaxDisconnectedNodes
sets the percentage of nodes
that can be not reachable (for whatever reason) at the same time. If more
than the given percent of nodes are offline at the same time, the auto-evict
will not be triggered for any node , since in this case LINSTOR assumes
connection problems from the controller. This property can only be set for
the controller, and only accepts a value between 0 and 100. The default
value is 34. If you wish to turn the auto-evict-feature off, simply set this
property to 0. If you want to always trigger the auto-evict, regardless of
how many satellites are unreachable, set it to 100.
DrbdOptions/AutoEvictAllowEviction
is an additional property that can stop
a node from being evicted. This can be useful for various cases, for example
if you need to shut down a node for maintenance. You can set this property
on the controller to change a global default, or on a single node to give it
a different behavior. It accepts true and false as values and per default is
set to true on the controller. You can use this property to turn the
auto-evict feature off by setting it to false on the controller, although
this might not work completely if you already set different values for
individual nodes, since those values take precedence over the global
default.
Afer the linstor-controller loses the connection to a satellite, aside from
trying to reconnect, it starts a timer for that satellite. As soon as that
timer exceeds DrbdOptions/AutoEvictAfterTime
and all of the
DRBD-connections to the DRBD-resources on that satellite are broken, the
controller will check whether or not
DrbdOptions/AutoEvictMaxDisconnectedNodes
has been met. If it hasn’t, and
DrbdOptions/AutoEvictAllowEviction
is true for the node in question, the
satellite will be marked as EVICTED. At the same time, the controller will
check for every DRBD-resource whether the number of resources is still above
DrbdOptions/AutoEvictMinReplicaCount
. If it is, the resource in question
will be marked as DELETED. If it isn’t, an auto-place with the settings from
the corresponding resource-group will be started. Should the auto-place
fail, the controller will try again later when changes that might allow a
different result, such as adding a new node, have happened. Resources where
an auto-place is necessary will only be marked as DELETED if the
corresponding auto-place was successful.
The evicted satellite itself will not be able to reestablish connection with
the controller. Even if the node is up and running, a manual reconnect will
fail. It is also not possible to delete the satellite, even if it is working
as it should be. Should you wish to get rid of an evicted node, you need to
use the node lost
command. The satellite can, however, be restored. This
will remove the EVICTED-flag from the satellite and allow you to use it
again. Previously configured network interfaces, storage pools, properties
and similar entities as well as non-DRBD-related resources and resources
that could not be autoplaced somewhere else will still be on the
satellite. To restore a satellite, use
# linstor node restore [nodename]
2.18. QoS设置
2.18.1. 系统
LINSTOR能够设置以下Sysfs设置:
SysFs | Linstor property |
---|---|
|
|
|
|
|
|
|
|
If a LINSTOR volume is composed of multiple “stacked” volume (for example
DRBD with external metadata will have 3 devices: backing (storage) device,
metadata device and the resulting DRBD device), setting a sys/fs/\*
property for a Volume, only the bottom-most local “data”-device will receive
the corresponding /sys/fs/cgroup/…
setting. That means, in case of the
example above only the backing device will receive the setting. In case a
resource-definition has an nvme-target as well as an nvme-initiator
resource, both bottom-most devices of each node will receive the setting. In
case of the target the bottom-most device will be the volume of LVM or ZFS,
whereas in case of the initiator the bottom-most device will be the
connected nvme-device, regardless which other layers are stacked ontop of
that.
2.19. 得到帮助
2.19.1. 从命令行
在命令行中列出可用命令的一种快速方法是键入 linstor
。
有关子命令(例如列表节点)的更多信息可以通过两种方式检索:
# linstor node list -h # linstor help node list
当LINSTOR以交互模式(LINSTOR interactive
)执行时,使用 help
子命令尤其有用。
LINSTOR最有用的特性之一是它丰富的tab-completion,它基本上可以用来完成LINSTOR所知道的每个对象(例如,节点名、IP地址、资源名等等)。在下面的示例中,我们将展示一些可能的完成及其结果:
# linstor node create alpha 1<tab> # completes the IP address if hostname can be resolved # linstor resource create b<tab> c<tab> # linstor assign-resource backups charlie
如果制表符完成不正常,请尝试获取相应的文件:
# source /etc/bash_completion.d/linstor # or # source /usr/share/bash_completion/completions/linstor
对于zsh shell用户,linstor客户机可以生成一个zsh编译文件,该文件基本上支持命令和参数完成。
# linstor gen-zsh-completer > /usr/share/zsh/functions/Completion/Linux/_linstor
2.19.2. SOS-Report
If something goes wrong and you need help finding the cause of the issue, you can use
# linstor sos-report create
The command above will create a new sos-report in
/var/log/linstor/controller/
on the controller node. Alternatively you can
use
# linstor sos-report download
which will create a new sos-report and additionally downloads that report to the local machine into your current working directory.
This sos-report contains logs and useful debug-information from several
sources (Linstor-logs, dmesg
, versions of external tools used by Linstor,
ip a
, database dump and many more). These information are stored for each
node in plaintext in the resulting .tar.gz
file.
2.19.3. 来自社区
如需社区帮助,请订阅我们的邮件列表: https://lists.linbit.com/listinfo/drbd-user
2.19.4. Github
要提交bug或功能请求,请查看我们的GitHub页面 https://GitHub.com/linbit
2.19.5. 有偿支持和开发
或者,如果您希望购买远程安装服务、24/7支持、访问认证存储库或功能开发,请联系我们:+1-877-454-6248(1-877-4LINBIT),国际电话:+43-1-8178292-0 | sales@linbit.com
3. Kubernetes的LINSTOR卷
本章描述了在Kubernetes中由操作员管理的LINSTOR的使用,以及使用 LINSTOR CSI plugin 配置卷的情况。
3.1. Kubernetes 概述
Kubernetes is a container orchestrator. Kubernetes defines the behavior of
containers and related services via declarative specifications. In this
guide, we’ll focus on using kubectl
to manipulate .yaml
files that
define the specifications of Kubernetes objects.
3.2. 在Kubernetes上部署LINSTOR
3.2.1. 使用LINSTOR Operator部署
LINBIT为商业支持客户提供LINSTOR Operator。Operator通过安装DRBD、管理Satellite/Controller Pods以及其他相关功能,简化了LINSTOR在Kubernetes上的部署。
Operator用Helm v3安装,如下所示:
-
创建包含my.linbit.com凭据的kubernetes 密码:
kubectl create secret docker-registry drbdiocred --docker-server=drbd.io --docker-username=<YOUR_LOGIN> --docker-email=<YOUR_EMAIL> --docker-password=<YOUR_PASSWORD>
默认情况下,此密码的名称必须与Helm值中指定的名称
drbdiocred
匹配。 -
为LINSTOR etcd实例配置存储。有多种选项可用于为LINSTOR配置etcd实例:
-
使用具有默认
StorageClass
的现有存储资源调配器。 -
Disable persistence for basic testing. This can be done by adding
--set etcd.persistentVolume.enabled=false
to thehelm install
command below.
-
-
Read the storage guide and configure a basic storage setup for LINSTOR
-
Read the section on securing the deployment and configure as needed.
-
在最后一步中,使用带有
helm install
命令的--set
选择适当的内核模块注入。-
Choose the injector according to the distribution you are using. Select the latest version from one of
drbd9-rhel7
,drbd9-rhel8
,… from http://drbd.io/ as appropriate. The drbd9-rhel8 image should also be used for RHCOS (OpenShift). For the SUSE CaaS Platform use the SLES injector that matches the base system of the CaaS Platform you are using (e.g.,drbd9-sles15sp1
). For example:operator.satelliteSet.kernelModuleInjectionImage=drbd.io/drbd9-rhel8:v9.0.24
-
Only inject modules that are already present on the host machine. If a module is not found, it will be skipped.
operator.satelliteSet.kernelModuleInjectionMode=DepsOnly
-
Disable kernel module injection if you are installing DRBD by other means. Deprecated by
DepsOnly
operator.satelliteSet.kernelModuleInjectionMode=None
-
-
最后创建一个名为
linstor-op
的Helm部署,它将设置所有内容。helm repo add linstor https://charts.linstor.io helm install linstor-op linstor/linstor
Further deployment customization is discussed in the advanced deployment section
LINSTOR etcd hostPath
持久存储
You can use the pv-hostpath
Helm templates to create hostPath
persistent
volumes. Create as many PVs as needed to satisfy your configured etcd
replicas
(default 1).
创建 hostPath
持久卷,在 nodes=
选项中相应地替换为集群节点名称:
helm repo add linstor https://charts.linstor.io helm install linstor-etcd linstor/pv-hostpath --set "nodes={<NODE0>,<NODE1>,<NODE2>}"
Persistence for etcd is enabled by default.
Using an existing database
LINSTOR can connect to an existing PostgreSQL, MariaDB or etcd database. For instance, for a PostgreSQL instance with the following configuration:
POSTGRES_DB: postgresdb POSTGRES_USER: postgresadmin POSTGRES_PASSWORD: admin123
The Helm chart can be configured to use this database instead of deploying an etcd cluster by adding the following to the Helm install command:
--set etcd.enabled=false --set "operator.controller.dbConnectionURL=jdbc:postgresql://postgres/postgresdb?user=postgresadmin&password=admin123"
3.2.2. Configuring storage
The LINSTOR operator can automate some basic storage set up for LINSTOR.
Configuring storage pool creation
The LINSTOR operator can be used to create LINSTOR storage pools. Creation is under control of the LinstorSatelliteSet resource:
$ kubectl get LinstorSatelliteSet.linstor.linbit.com linstor-op-ns -o yaml
kind: LinstorSatelliteSet
metadata:
..
spec:
..
storagePools:
lvmPools:
- name: lvm-thick
volumeGroup: drbdpool
lvmThinPools:
- name: lvm-thin
thinVolume: thinpool
volumeGroup: drbdpool
zfsPools:
- name: my-linstor-zpool
zPool: for-linstor
thin: true
At install time
At install time, by setting the value of
operator.satelliteSet.storagePools
when running helm install.
First create a file with the storage configuration like:
operator:
satelliteSet:
storagePools:
lvmPools:
- name: lvm-thick
volumeGroup: drbdpool
This file can be passed to the helm installation like this:
helm install -f <file> linstor linstor/linstor
After install
On a cluster with the operator already configured (i.e. after helm
install
), you can edit the LinstorSatelliteSet configuration like this:
$ kubectl edit LinstorSatelliteSet.linstor.linbit.com <satellitesetname>
The storage pool configuration can be updated like in the example above.
Preparing physical devices
By default, LINSTOR expects the referenced VolumeGroups, ThinPools and so on
to be present. You can use the devicePaths: []
option to let LINSTOR
automatically prepare devices for the pool. Eligible for automatic
configuration are block devices that:
-
Are a root device (no partition)
-
do not contain partition information
-
have more than 1 GiB
To enable automatic configuration of devices, set the devicePaths
key on
storagePools
entries:
storagePools:
lvmPools:
- name: lvm-thick
volumeGroup: drbdpool
devicePaths:
- /dev/vdb
lvmThinPools:
- name: lvm-thin
thinVolume: thinpool
volumeGroup: linstor_thinpool
devicePaths:
- /dev/vdc
- /dev/vdd
Currently, this method supports creation of LVM and LVMTHIN storage pools.
lvmPools
configuration
-
name
name of the LINSTOR storage pool.Required -
volumeGroup
name of the VG to create.Required -
devicePaths
devices to configure for this pool.Must be empty and >= 1GiB to be recognized.Optional -
raidLevel
LVM raid level.Optional -
vdo
Enable [VDO] (requires VDO tools in the satellite).Optional -
vdoLogicalSizeKib
Size of the created VG (expected to be bigger than the backing devices by using VDO).Optional -
vdoSlabSizeKib
Slab size for VDO. Optional
lvmThinPools
configuration
-
name
name of the LINSTOR storage pool. Required -
volumeGroup
VG to use for the thin pool. If you want to usedevicePaths
, you must set this to""
. This is required because LINSTOR does not allow configuration of the VG name when preparing devices. -
thinVolume
name of the thinpool. Required -
devicePaths
devices to configure for this pool. Must be empty and >= 1GiB to be recognized. Optional -
raidLevel
LVM raid level. Optional
The volume group created by LINSTOR for LVMTHIN pools will always follow the scheme “linstor_$THINPOOL”. |
zfsPools
configuration
-
name
name of the LINSTOR storage pool. Required -
zPool
name of the zpool to use. Must already be present on all machines. Required -
thin
true
to use thin provisioning,false
otherwise. Required
Using automaticStorageType
(DEPRECATED)
ALL eligible devices will be prepared according to the value of
operator.satelliteSet.automaticStorageType
, unless they are already
prepared using the storagePools
section. Devices are added to a storage
pool based on the device name (i.e. all /dev/nvme1
devices will be part of
the pool autopool-nvme1
)
The possible values for operator.satelliteSet.automaticStorageType
:
-
None
no automatic set up (default) -
LVM
create a LVM (thick) storage pool -
LVMTHIN
create a LVM thin storage pool -
ZFS
create a ZFS based storage pool (UNTESTED)
3.2.3. Securing deployment
This section describes the different options for enabling security features available when using this operator. The following guides assume the operator is installed using Helm
Secure communication with an existing etcd instance
Secure communication to an etcd
instance can be enabled by providing a CA
certificate to the operator in form of a kubernetes secret. The secret has
to contain the key ca.pem
with the PEM encoded CA certificate as value.
The secret can then be passed to the controller by passing the following
argument to helm install
--set operator.controller.dbCertSecret=<secret name>
Authentication with etcd
using certificates
If you want to use TLS certificates to authenticate with an etcd
database,
you need to set the following option on helm install:
--set operator.controller.dbUseClientCert=true
If this option is active, the secret specified in the above section must
contain two additional keys:
* client.cert
PEM formatted certificate presented to etcd
for
authentication
* client.key
private key in PKCS8 format, matching the above client
certificate. Keys can be converted into PKCS8 format using openssl
:
openssl pkcs8 -topk8 -nocrypt -in client-key.pem -out client-key.pkcs8
Configuring secure communication between LINSTOR components
The default communication between LINSTOR components is not secured by TLS. If this is needed for your setup, follow these steps:
-
Create private keys in the java keystore format, one for the controller, one for all satellites:
keytool -keyalg rsa -keysize 2048 -genkey -keystore satellite-keys.jks -storepass linstor -alias satellite -dname "CN=XX, OU=satellite, O=Example, L=XX, ST=XX, C=X" keytool -keyalg rsa -keysize 2048 -genkey -keystore control-keys.jks -storepass linstor -alias control -dname "CN=XX, OU=control, O=Example, L=XX, ST=XX, C=XX"
-
Create a trust store with the public keys that each component needs to trust:
-
Controller needs to trust the satellites
-
Nodes need to trust the controller
keytool -importkeystore -srcstorepass linstor -deststorepass linstor -srckeystore control-keys.jks -destkeystore satellite-trust.jks keytool -importkeystore -srcstorepass linstor -deststorepass linstor -srckeystore satellite-keys.jks -destkeystore control-trust.jks
-
Create kubernetes secrets that can be passed to the controller and satellite pods
kubectl create secret generic control-secret --from-file=keystore.jks=control-keys.jks --from-file=certificates.jks=control-trust.jks kubectl create secret generic satellite-secret --from-file=keystore.jks=satellite-keys.jks --from-file=certificates.jks=satellite-trust.jks
-
Pass the names of the created secrets to
helm install
--set operator.satelliteSet.sslSecret=satellite-secret --set operator.controller.sslSecret=control-secret
It is currently NOT possible to change the keystore password. LINSTOR
expects the passwords to be linstor . This is a current limitation of
LINSTOR.
|
Configuring secure communications for the LINSTOR API
Various components need to talk to the LINSTOR controller via its REST interface. This interface can be secured via HTTPS, which automatically includes authentication. For HTTPS+authentication to work, each component needs access to:
-
A private key
-
A certificate based on the key
-
A trusted certificate, used to verify that other components are trustworthy
The next sections will guide you through creating all required components.
Creating the private keys
Private keys can be created using java’s keytool
keytool -keyalg rsa -keysize 2048 -genkey -keystore controller.pkcs12 -storetype pkcs12 -storepass linstor -ext san=dns:linstor-op-cs.default.svc -dname "CN=XX, OU=controller, O=Example, L=XX, ST=XX, C=X" -validity 5000 keytool -keyalg rsa -keysize 2048 -genkey -keystore client.pkcs12 -storetype pkcs12 -storepass linstor -dname "CN=XX, OU=client, O=Example, L=XX, ST=XX, C=XX" -validity 5000
The clients need private keys and certificate in a different format, so we need to convert it
openssl pkcs12 -in client.pkcs12 -passin pass:linstor -out client.cert -clcerts -nokeys openssl pkcs12 -in client.pkcs12 -passin pass:linstor -out client.key -nocerts -nodes
The alias specified for the controller key (i.e. -ext
san=dns:linstor-op-cs.default.svc ) has to exactly match the service name
created by the operator. When using helm , this is always of the form
<release-name>-cs.<release-namespace>.svc .
|
It is currently NOT possible to change the keystore password. LINSTOR expects the passwords to be linstor. This is a current limitation of LINSTOR |
Create the trusted certificates
For the controller to trust the clients, we can use the following command to create a truststore, importing the client certificate
keytool -importkeystore -srcstorepass linstor -srckeystore client.pkcs12 -deststorepass linstor -deststoretype pkcs12 -destkeystore controller-trust.pkcs12
For the client, we have to convert the controller certificate into a different format
openssl pkcs12 -in controller.pkcs12 -passin pass:linstor -out ca.pem -clcerts -nokeys
Create Kubernetes secrets
Now you can create secrets for the controller and for clients:
kubectl create secret generic http-controller --from-file=keystore.jks=controller.pkcs12 --from-file=truststore.jks=controller-trust.pkcs12 kubectl create secret generic http-client --from-file=ca.pem=ca.pem --from-file=client.cert=client.cert --from-file=client.key=client.key
The names of the secrets can be passed to helm install
to configure all
clients to use https.
--set linstorHttpsControllerSecret=http-controller --set linstorHttpsClientSecret=http-client
Automatically set the passphrase for encrypted volumes
Linstor can be used to create encrypted volumes using LUKS. The passphrase used when creating these volumes can be set via a secret:
kubectl create secret generic linstor-pass --from-literal=MASTER_PASSPHRASE=<password>
On install, add the following arguments to the helm command:
--set operator.controller.luksSecret=linstor-pass
终止Helm部署
To protect the storage infrastructure of the cluster from accidentally deleting vital components, it is necessary to perform some manual steps before deleting a Helm deployment.
-
Delete all volume claims managed by LINSTOR components. You can use the following command to get a list of volume claims managed by LINSTOR. After checking that none of the listed volumes still hold needed data, you can delete them using the generated kubectl delete command.
$ kubectl get pvc --all-namespaces -o=jsonpath='{range .items[?(@.metadata.annotations.volume\.beta\.kubernetes\.io/storage-provisioner=="linstor.csi.linbit.com")]}kubectl delete pvc --namespace {.metadata.namespace} {.metadata.name}{"\n"}{end}' kubectl delete pvc --namespace default data-mysql-0 kubectl delete pvc --namespace default data-mysql-1 kubectl delete pvc --namespace default data-mysql-2
These volumes, once deleted, cannot be recovered. -
Delete the LINSTOR controller and satellite resources.
Deployment of LINSTOR satellite and controller is controlled by the LinstorSatelliteSet and LinstorController resources. You can delete the resources associated with your deployment using kubectl
kubectl delete linstorcontroller <helm-deploy-name>-cs kubectl delete linstorsatelliteset <helm-deploy-name>-ns
After a short wait, the controller and satellite pods should terminate. If they continue to run, you can check the above resources for errors (they are only removed after all associated pods terminate)
-
Delete the Helm deployment.
If you removed all PVCs and all LINSTOR pods have terminated, you can uninstall the helm deployment
helm uninstall linstor-op
Due to the Helm’s current policy, the Custom Resource Definitions named LinstorController and LinstorSatelliteSet will not be deleted by the command. More information regarding Helm’s current position on CRD’s can be found here.
3.2.4. Advanced deployment options
The helm charts provide a set of further customization options for advanced use cases.
global:
imagePullPolicy: IfNotPresent # empty pull policy means k8s default is used ("always" if tag == ":latest", "ifnotpresent" else) (1)
setSecurityContext: true # Force non-privileged containers to run as non-root users
# Dependency charts
etcd:
persistentVolume:
enabled: true
storage: 1Gi
replicas: 1 # How many instances of etcd will be added to the initial cluster. (2)
resources: {} # resource requirements for etcd containers (3)
image:
repository: gcr.io/etcd-development/etcd
tag: v3.4.9
csi-snapshotter:
enabled: true # <- enable to add k8s snapshotting CRDs and controller. Needed for CSI snapshotting
image: k8s.gcr.io/sig-storage/snapshot-controller:v3.0.2
replicas: 1 (2)
resources: {} # resource requirements for the cluster snapshot controller. (3)
stork:
enabled: true
storkImage: docker.io/openstorage/stork:2.5.0
schedulerImage: k8s.gcr.io/kube-scheduler-amd64
schedulerTag: ""
replicas: 1 (2)
storkResources: {} # resources requirements for the stork plugin containers (3)
schedulerResources: {} # resource requirements for the kube-scheduler containers (3)
podsecuritycontext: {}
csi:
enabled: true
pluginImage: "drbd.io/linstor-csi:v0.11.0"
csiAttacherImage: k8s.gcr.io/sig-storage/csi-attacher:v3.0.2
csiLivenessProbeImage: k8s.gcr.io/sig-storage/livenessprobe:v2.1.0
csiNodeDriverRegistrarImage: k8s.gcr.io/sig-storage/csi-node-driver-registrar:v2.0.1
csiProvisionerImage: k8s.gcr.io/sig-storage/csi-provisioner:v2.0.4
csiSnapshotterImage: k8s.gcr.io/sig-storage/csi-snapshotter:v3.0.2
csiResizerImage: k8s.gcr.io/sig-storage/csi-resizer:v1.0.1
controllerReplicas: 1 (2)
nodeAffinity: {} (4)
nodeTolerations: [] (4)
controllerAffinity: {} (4)
controllerTolerations: [] (4)
enableTopology: false
resources: {} (3)
priorityClassName: ""
drbdRepoCred: drbdiocred
linstorHttpsControllerSecret: "" # <- name of secret containing linstor server certificates+key.
linstorHttpsClientSecret: "" # <- name of secret containing linstor client certificates+key.
controllerEndpoint: "" # <- override to the generated controller endpoint. use if controller is not deployed via operator
psp:
privilegedRole: ""
unprivilegedRole: ""
operator:
replicas: 1 # <- number of replicas for the operator deployment (2)
image: "drbd.io/linstor-operator:v1.3.1"
affinity: {} (4)
tolerations: [] (4)
resources: {} (3)
podsecuritycontext: {}
controller:
enabled: true
controllerImage: "drbd.io/linstor-controller:v1.11.1"
luksSecret: ""
dbCertSecret: ""
dbUseClientCert: false
sslSecret: ""
affinity: {} (4)
tolerations: (4)
- key: node-role.kubernetes.io/master
operator: "Exists"
effect: "NoSchedule"
resources: {} (3)
replicas: 1 (2)
satelliteSet:
enabled: true
satelliteImage: "drbd.io/linstor-satellite:v1.11.1"
storagePools: {}
sslSecret: ""
automaticStorageType: None
affinity: {} (4)
tolerations: [] (4)
resources: {} (3)
kernelModuleInjectionImage: "drbd.io/drbd9-rhel7:v9.0.27"
kernelModuleInjectionMode: ShippedModules
kernelModuleInjectionResources: {} (3)
haController:
enabled: true
image: drbd.io/linstor-k8s-ha-controller:v0.1.3
affinity: {} (4)
tolerations: [] (4)
resources: {} (3)
replicas: 1 (2)
1 | Sets the pull policy for all images. |
2 | Controls the number of replicas for each component. |
3 | Set container resource requests and limits. See
the
kubernetes docs. Most containers need a minimal amount of resources,
except for:
|
4 | Affinity and toleration determine where pods are scheduled on the
cluster. See the
kubernetes docs on
affinity and toleration. This may be especially important for the
operator.satelliteSet and csi.node* values. To schedule a pod using a
LINSTOR persistent volume, the node requires a running LINSTOR satellite and
LINSTOR CSI pod. |
High Availability Deployment
To create a High Availability deployment of all components, take a look at the upstream guide The default values are chosen so that scaling the components to multiple replicas ensures that the replicas are placed on different nodes. This ensures that a single node failures will not interrupt the service.
3.2.5. Deploying with an external LINSTOR controller
The operator can configure the satellites and CSI plugin to use an existing LINSTOR setup. This can be useful in cases where the storage infrastructure is separate from the Kubernetes cluster. Volumes can be provisioned in diskless mode on the Kubernetes nodes while the storage nodes will provide the backing disk storage.
To skip the creation of a LINSTOR Controller deployment and configure the
other components to use your existing LINSTOR Controller, use the following
options when running helm install
:
-
operator.controller.enabled=false
This disables creation of theLinstorController
resource -
operator.etcd.enabled=false
Since no LINSTOR Controller will run on Kubernetes, no database is required. -
controllerEndpoint=<url-of-linstor-controller>
The HTTP endpoint of the existing LINSTOR Controller. For example:http://linstor.storage.cluster:3370/
After all pods are ready, you should see the Kubernetes cluster nodes as satellites in your LINSTOR setup.
Your kubernetes nodes must be reachable using their IP by the controller and storage nodes. |
Create a storage class referencing an existing storage pool on your storage nodes.
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: linstor-on-k8s
provisioner: linstor.csi.linbit.com
parameters:
autoPlace: "3"
storagePool: existing-storage-pool
resourceGroup: linstor-on-k8s
You can provision new volumes by creating PVCs using your storage class. The volumes will first be placed only on nodes with the given storage pool, i.e. your storage infrastructure. Once you want to use the volume in a pod, LINSTOR CSI will create a diskless resource on the Kubernetes node and attach over the network to the diskfull resource.
3.2.6. 使用Piraeus Operator部署
在Kubernetes中由社区支持的LINSTOR部署版本称为Piraeus。Piraeus项目提供了一个 an operator 进行部署。
3.3. 在Kubernetes中与LINSTOR互动
Controller pod包括LINSTOR客户端,使得直接与LINSTOR交互变得容易。例如:
kubectl exec linstor-op-cs-controller-<deployment-info> -- linstor storage-pool list
这应该只是调研问题和访问高级功能所必需的。常规操作(如创建卷)应通过Kubernetes integration实现。
3.4. LINSTOR CSI插件部署
operator 的 Helm chart中为您部署了LINSTOR CSI插件,因此如果您使用了该插件,可以跳过本节。
If you are integrating LINSTOR using a different method, you will need to install the LINSTOR CSI plugin. Instructions for deploying the CSI plugin can be found on the project’s github. This will result in a linstor-csi-controller Deployment and a linstor-csi-node DaemonSet running in the kube-system namespace.
NAME READY STATUS RESTARTS AGE IP NODE linstor-csi-controller-ab789 5/5 Running 0 3h10m 191.168.1.200 kubelet-a linstor-csi-node-4fcnn 2/2 Running 0 3h10m 192.168.1.202 kubelet-c linstor-csi-node-f2dr7 2/2 Running 0 3h10m 192.168.1.203 kubelet-d linstor-csi-node-j66bc 2/2 Running 0 3h10m 192.168.1.201 kubelet-b linstor-csi-node-qb7fw 2/2 Running 0 3h10m 192.168.1.200 kubelet-a linstor-csi-node-zr75z 2/2 Running 0 3h10m 192.168.1.204 kubelet-e
3.5. 基本配置和部署
一旦所有linstor-csi Pods 都启动并运行,我们就可以使用通常的Kubernetes工作流创建卷。
Configuring the behavior and properties of LINSTOR volumes deployed via Kubernetes is accomplished via the use of StorageClasses.
the “resourceGroup” parameter is mandatory. Usually you want it to be unique and the same as the storage class name. |
Here below is the simplest practical StorageClass that can be used to deploy volumes:
apiVersion: storage.k8s.io/v1beta1
kind: StorageClass
metadata:
# The name used to identify this StorageClass.
name: linstor-basic-storage-class
# The name used to match this StorageClass with a provisioner.
# linstor.csi.linbit.com is the name that the LINSTOR CSI plugin uses to identify itself
provisioner: linstor.csi.linbit.com
parameters:
# LINSTOR will provision volumes from the drbdpool storage pool configured
# On the satellite nodes in the LINSTOR cluster specified in the plugin's deployment
storagePool: "drbdpool"
resourceGroup: "linstor-basic-storage-class"
# Setting a fstype is required for "fsGroup" permissions to work correctly.
# Currently supported: xfs/ext4
csi.storage.k8s.io/fstype: xfs
DRBD options can be set as well in the parameters section. Valid keys are
defined in the LINSTOR
REST-API (e.g., DrbdOptions/Net/allow-two-primaries: "yes"
).
我们可以使用以下命令创建 StorageClasses :
kubectl create -f linstor-basic-sc.yaml
现在,我们的存储类已经创建,我们现在可以创建一个 PersistentVolumeClaim ,它可以用来提供Kubernetes和LINSTOR都知道的卷:
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: my-first-linstor-volume
spec:
storageClassName: linstor-basic-storage-class
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 500Mi
我们可以使用以下命令创建 PersistentVolumeClaim :
kubectl create -f my-first-linstor-volume-pvc.yaml
这将创建一个Kubernetes已知的 PersistentVolumeClaim ,它将绑定一个
PersistentVolume,另外LINSTOR现在将根据 listor-basic-storage-class
中定义的配置
StorageClass 创建这个卷。LINSTOR卷的名称将是一个UUID,前缀为 csi-
可以使用通常的 listor resource
list
来观察此卷。一旦创建了该卷,我们就可以将其附加到一个pod。下面的 Pod
规范将生成一个Fedora容器,其中附加了忙等待的卷,因此在我们与它交互之前,它不会被取消调度 :
apiVersion: v1
kind: Pod
metadata:
name: fedora
namespace: default
spec:
containers:
- name: fedora
image: fedora
command: [/bin/bash]
args: ["-c", "while true; do sleep 10; done"]
volumeMounts:
- name: my-first-linstor-volume
mountPath: /data
ports:
- containerPort: 80
volumes:
- name: my-first-linstor-volume
persistentVolumeClaim:
claimName: "my-first-linstor-volume"
我们可以使用以下命令创建 Pod :
kubectl create -f my-first-linstor-volume-pod.yaml
运行 kubectl describe pod fedora
可用于确认 Pod 调度和卷附加成功。
要删除卷,请确保没有pod正在使用它,然后通过 kubectl
删除 PersistentVolumeClaim
。例如,要删除我们刚刚创建的卷,请运行以下两个命令,注意在删除 PersistentVolumeClaim 之前必须取消调度该 Pod :
kubectl delete pod fedora # unschedule the pod. kubectl get pod -w # wait for pod to be unscheduled kubectl delete pvc my-first-linstor-volume # remove the PersistentVolumeClaim, the PersistentVolume, and the LINSTOR Volume.
3.5.1. Available parameters in a StorageClass
The following storage class contains all currently available parameters to configure the provisioned storage
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: full-example
provisioner: linstor.csi.linbit.com
parameters:
# CSI related parameters
csi.storage.k8s.io/fstype: xfs
# LINSTOR parameters
autoPlace: "2"
placementCount: "2"
resourceGroup: "full-example"
storagePool: "my-storage-pool"
disklessStoragePool: "DfltDisklessStorPool"
layerList: "drbd,storage"
placementPolicy: "AutoPlace"
allowRemoteVolumeAccess: "true"
encryption: "true"
nodeList: "diskful-a,diskful-b"
clientList: "diskless-a,diskless-b"
replicasOnSame: "zone=a"
replicasOnDifferent: "rack"
disklessOnRemaining: "false"
doNotPlaceWithRegex: "tainted.*"
fsOpts: "nodiscard"
mountOpts: "noatime"
postMountXfsOpts: "extsize 2m"
# DRBD parameters
DrbdOptions/*: <x>
3.5.2. csi.storage.k8s.io/fstype
Sets the file system type to create for volumeMode: FileSystem
PVCs. Currently supported are:
-
ext4
(default) -
xfs
3.5.3. autoPlace
autoPlace
is an integer that determines the amount of replicas a volume of
this StorageClass will have. For instance, autoPlace: "3"
will produce
volumes with three-way replication. If neither autoPlace
nor nodeList
are set, volumes will be automatically placed on one
node.
如果使用此选项,则不能使用nodeList。 |
You have to use quotes, otherwise Kubernetes will complain about a malformed StorageClass. |
此选项(以及影响自动放置行为的所有选项)修改将在其上配置卷的底层存储的LINSTOR节点的数量,并且与可从中访问这些卷的 kubelets 正交。 |
3.5.4. placementCount
placementCount
is an alias for autoPlace
3.5.5. resourceGroup
The LINSTOR Resource Group (RG) to associate with this StorageClass. If not set, a new RG will be created for each new PVC.
3.5.6. storagePool
storagePool
是用于为新创建的卷提供存储的LINSTOR storage pool的名称。
只有配置了相同 storage pool 的节点才被考虑用于autoplacement。同样,对于使用nodeList该列表中指定的所有节点,都必须在其上配置此 storage pool。 |
3.5.7. disklessStoragePool
disklessStoragePool
是一个可选参数,它只影响作为客户端无磁盘分配给 kubelets
的LINSTOR卷。如果在LINSTOR中定义了自定义 diskless storage pool,请在此处指定。
3.5.8. layerList
A comma-seperated list of layers to use for the created volumes. The
available layers and their order are described towards the end of
this section. Defaults to drbd,storage
3.5.9. placementPolicy
Select from one of the available volume schedulers:
-
AutoPlace
, the default: Use LINSTOR autoplace, influenced by replicasOnSame and replicasOnDifferent -
FollowTopology
: Use CSI Topology information to place at least one volume in each “preferred” zone. Only useable if CSI Topology is enabled. -
Manual
: Use only the nodes listed innodeList
andclientList
. -
Balanced
: EXPERIMENTAL Place volumes across failure domains, using the least used storage pool on each selected node.
3.5.10. allowRemoteVolumeAccess
Disable remote access to volumes. This implies that volumes can only be accessed from the initial set of nodes selected on creation. CSI Topology processing is required to place pods on the correct nodes.
3.5.11. encryption
encryption
is an optional parameter that determines whether to encrypt
volumes. LINSTOR must be configured for
encryption for this to work properly.
3.5.12. 节点列表
nodeList
是要分配给卷的节点列表。这将把卷分配给每个节点,并在所有节点之间进行复制。这也可以用于按主机名选择单个节点,但使用replicasOnSame选择单个节点更灵活。
如果使用此选项,则不能使用autoPlace。 |
此选项确定卷的底层存储将在哪个LINSTOR节点上配置,并且与从那个 kubelets 可以访问这些卷位置正交。 |
3.5.13. clientList
clientList
is a list of nodes for diskless volumes to be assigned to. Use
in conjunction with 节点列表.
3.5.14. replicasOnSame
replicasOnSame
is a list of key
or key=value
items used as
autoplacement selection labels when autoplace is
used to determine where to provision storage. These labels correspond to
LINSTOR node properties.
LINSTOR node properties are different from kubernetes node labels. You can
see the properties of a node by running linstor node list-properties
<nodename> . You can also set additional properties (“auxiliary
properties”): linstor node set-property <nodename> --aux <key> <value> .
|
Let’s explore this behavior with examples assuming a LINSTOR cluster such
that node-a
is configured with the following auxiliary property zone=z1
and role=backups
, while node-b
is configured with only zone=z1
.
如果我们使用 autoPlace: "1" ` 和 `replicasOnSame: "zone=z1 role=backups"`来配置一个
StorageClass
,那么从该 StorageClass 创建的所有卷都将配置在 node-a
上,因为这是LINSTOR集群中唯一具有所有正确的key=value对的节点。这是选择单个节点进行资源调配的最灵活方式。
This guide assumes LINSTOR CSI version 0.10.0 or newer. All properties
referenced in replicasOnSame and replicasOnDifferent are interpreted as
auxiliary properties. If you are using an older version of LINSTOR CSI, you
need to add the Aux/ prefix to all property names. So replicasOnSame:
"zone=z1" would be replicasOnSame: "Aux/zone=z1" Using Aux/ manually
will continue to work on newer LINSTOR CSI versions.
|
如果我们使用 autoPlace: "1" ` 和 `replicasOnSame: "zone=z1" ` 来配置一个 StorageClass
,那么卷将在 `node-a
或 node-b
上配置,因为它们都有 zone=z1
辅助属性。
If we configure a StorageClass with autoPlace: "2"
and replicasOnSame:
"zone=z1 role=backups"
, then provisioning will fail, as there are not two
or more nodes that have the appropriate auxiliary properties.
如果我们用 autoPlace: "2"
和 replicasOnSame: "zone=z1"
配置一个存储类,那么卷将同时在
node-a
和 node-b
上配置,因为它们都有 zone=z1
辅助属性。
You can also use a property key without providing a value to ensure all
replicas are placed on nodes with the same property value, with caring about
the particular value. Assuming there are 4 nodes, node-a1
and node-a2
are configured with zone=a
. node-b1
and node-b2
are configured with
zone=b
. Using autoPlace: "2"
and replicasOnSame: "zone"
will place on
either node-a1
and node-a2
OR on node-b1
and node-b2
.
3.5.15. replicasOnDifferent
replicasOnDifferent
takes a list of properties to consider, same as
replicasOnSame. There are two modes of
using replicasOnDifferent
:
-
Preventing volume placement on specific nodes:
If a value is given for the property, the nodes which have that property-value pair assigned will be considered last.
Example:
replicasOnDifferent: "no-csi-volumes=true"
will place no volume on any node with propertyno-csi-volumes=true
unless there are not enough other nodes to fulfill theautoPlace
setting. -
Distribute volumes across nodes with different values for the same key:
If no property value is given, LINSTOR will place the volumes across nodes with different values for that property if possible.
Example: Assuming there are 4 nodes,
node-a1
andnode-a2
are configured withzone=a
.node-b1
andnode-b2
are configured withzone=b
. Using a StorageClass withautoPlace: "2"
andreplicasOnDifferent: "zone"
, LINSTOR will create one replica on eithernode-a1
ornode-a2
and one replica on eithernode-b1
ornode-b2
.
3.5.16. disklessOnRemaining
Create a diskless resource on all nodes that were not assigned a diskful resource.
3.5.17. doNotPlaceWithRegex
Do not place the resource on a node which has a resource with a name matching the regex.
3.5.19. mountOpts
mountOpts
是一个可选参数,在装载时将选项传递给卷的文件系统。
3.5.20. postMountXfsOpts
Extra arguments to pass to xfs_io
, which gets called before right before
first use of the volume.
3.5.21. DrbdOptions/*: <x>
Advanced DRBD options to pass to LINSTOR. For example, to change the
replication protocol, use DrbdOptions/Net/protocol: "A"
.
The full list of options is available here
3.6. 快照
Creating snapshots and creating new volumes from snapshots is done via the use of VolumeSnapshots, VolumeSnapshotClasses, and PVCs.
3.6.1. Adding snapshot support
LINSTOR supports the volume snapshot feature, which is currently in beta. To use it, you need to install a cluster wide snapshot controller. This is done either by the cluster provider, or you can use the LINSTOR chart.
By default, the LINSTOR chart will install its own snapshot controller. This can lead to conflict in some cases:
-
the cluster already has a snapshot controller
-
the cluster does not meet the minimal version requirements (>= 1.17)
In such a case, installation of the snapshot controller can be disabled:
--set csi-snapshotter.enabled=false
3.6.2. Using volume snapshots
Then we can create our VolumeSnapshotClass:
apiVersion: snapshot.storage.k8s.io/v1beta1
kind: VolumeSnapshotClass
metadata:
name: my-first-linstor-snapshot-class
driver: linstor.csi.linbit.com
deletionPolicy: Delete
使用 kubectl
创建 VolumeSnapshotClass :
kubectl create -f my-first-linstor-snapshot-class.yaml
现在,我们将为上面创建的卷创建卷快照。这是用 VolumeSnapshot 完成的:
apiVersion: snapshot.storage.k8s.io/v1beta1
kind: VolumeSnapshot
metadata:
name: my-first-linstor-snapshot
spec:
volumeSnapshotClassName: my-first-linstor-snapshot-class
source:
persistentVolumeClaimName: my-first-linstor-volume
使用 kubectl
创建 VolumeSnapshot :
kubectl create -f my-first-linstor-snapshot.yaml
You can check that the snapshot creation was successful
kubectl describe volumesnapshots.snapshot.storage.k8s.io my-first-linstor-snapshot ... Spec: Source: Persistent Volume Claim Name: my-first-linstor-snapshot Volume Snapshot Class Name: my-first-linstor-snapshot-class Status: Bound Volume Snapshot Content Name: snapcontent-b6072ab7-6ddf-482b-a4e3-693088136d2c Creation Time: 2020-06-04T13:02:28Z Ready To Use: true Restore Size: 500Mi
最后,我们将使用 PVC 从快照创建一个新卷。
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: my-first-linstor-volume-from-snapshot
spec:
storageClassName: linstor-basic-storage-class
dataSource:
name: my-first-linstor-snapshot
kind: VolumeSnapshot
apiGroup: snapshot.storage.k8s.io
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 500Mi
用 kubectl
创建 PVC :
kubectl create -f my-first-linstor-volume-from-snapshot.yaml
3.7. 卷可访问性
LINSTOR卷通常可以通过网络在本地和over the network来访问。
默认情况下,CSI插件将直接附加卷,如果碰巧将 Pod 调度到 kubelet 所在的底层存储上。但是,Pod 调度当前不考虑卷位置。如果需要本地连接的卷,可以使用replicasOnSame参数来限制底层存储的配置位置。
See placementPolicy to see how this default behavior can be modified.
3.8. Volume Locality Optimization using Stork
Stork is a scheduler extender plugin for Kubernetes which allows a storage driver to give the Kubernetes scheduler hints about where to place a new pod so that it is optimally located for storage performance. You can learn more about the project on its GitHub page.
The next Stork release will include the LINSTOR driver by default. In the meantime, you can use a custom-built Stork container by LINBIT which includes a LINSTOR driver, available on Docker Hub
3.8.1. Using Stork
By default, the operator will install the components required for Stork, and
register a new scheduler called stork
with Kubernetes. This new scheduler
can be used to place pods near to their volumes.
apiVersion: v1
kind: Pod
metadata:
name: busybox
namespace: default
spec:
schedulerName: stork (1)
containers:
- name: busybox
image: busybox
command: ["tail", "-f", "/dev/null"]
volumeMounts:
- name: my-first-linstor-volume
mountPath: /data
ports:
- containerPort: 80
volumes:
- name: my-first-linstor-volume
persistentVolumeClaim:
claimName: "test-volume"
1 | Add the name of the scheduler to your pod. |
Deployment of the scheduler can be disabled using
--set stork.enabled=false
3.9. Fast workload fail over using the High Availability Controller
The LINSTOR High Availability Controller (HA Controller) will speed up the fail over process for stateful workloads using LINSTOR for storage. It is deployed by default, and can be scaled to multiple replicas:
$ kubectl get pods -l app.kubernetes.io/name=linstor-ha-controller
NAME READY STATUS RESTARTS AGE
linstor-ha-controller-f496c5f77-fr76m 1/1 Running 0 89s
linstor-ha-controller-f496c5f77-jnqtc 1/1 Running 0 89s
linstor-ha-controller-f496c5f77-zcrqg 1/1 Running 0 89s
In the event of node failures, Kubernetes is very conservative in rescheduling stateful workloads. This means it can take more than 15 minutes for Pods to be moved from unreachable nodes. With the information available to DRBD and LINSTOR, this process can be sped up significantly.
The HA Controller enables fast fail over for:
-
Pods using DRBD backed PersistentVolumes. The DRBD resources must make use of the quorum functionality LINSTOR will configure this automatically for volumes with 2 or more replicas in clusters with at least 3 nodes.
-
The workload does not use any external resources in a way that could lead to a conflicting state if two instances try to use the external resource at the same time. While DRBD can ensure that only one instance can have write access to the storage, it cannot provide the same guarantee for external resources.
-
The Pod is marked with the
linstor.csi.linbit.com/on-storage-lost: remove
label.
3.9.1. Example
The following StatefulSet uses the HA Controller to manage fail over of a pod.
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: my-stateful-app
spec:
serviceName: my-stateful-app
selector:
matchLabels:
app.kubernetes.io/name: my-stateful-app
template:
metadata:
labels:
app.kubernetes.io/name: my-stateful-app
linstor.csi.linbit.com/on-storage-lost: remove
...
Deploy the set and wait for the pod to start
$ kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
my-stateful-app-0 1/1 Running 0 5m 172.31.0.1 node01.ha.cluster <none> <none>
Then one of the nodes becomes unreachable. Shortly after, Kubernetes will
mark the node as NotReady
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
master01.ha.cluster Ready master 12d v1.19.4
master02.ha.cluster Ready master 12d v1.19.4
master03.ha.cluster Ready master 12d v1.19.4
node01.ha.cluster NotReady compute 12d v1.19.4
node02.ha.cluster Ready compute 12d v1.19.4
node03.ha.cluster Ready compute 12d v1.19.4
After about 45 seconds, the Pod will be removed by the HA Controller and re-created by the StatefulSet
$ kubectl get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES my-stateful-app-0 0/1 ContainerCreating 0 3s 172.31.0.1 node02.ha.cluster <none> <none> $ kubectl get events --sort-by=.metadata.creationTimestamp -w ... 0s Warning ForceDeleted pod/my-stateful-app-0 pod deleted because a used volume is marked as failing 0s Warning ForceDetached volumeattachment/csi-d2b994ff19d526ace7059a2d8dea45146552ed078d00ed843ac8a8433c1b5f6f volume detached because it is marked as failing ...
3.10. Upgrading a LINSTOR Deployment on Kubernetes
A LINSTOR Deployment on Kubernets can be upgraded to a new release using Helm.
Before upgrading to a new release, you should ensure you have an up-to-date backup of the LINSTOR database. If you are using the Etcd database packaged in the LINSTOR Chart, see here
Upgrades using the LINSTOR Etcd deployment require etcd to use persistent
storage. Only follow these steps if Etcd was deployed using
etcd.persistentVolume.enabled=true
|
Upgrades will update to new versions of the following components:
-
LINSTOR operator deployment
-
LINSTOR Controller
-
LINSTOR Satellite
-
LINSTOR CSI Driver
-
Etcd
-
Stork
Some versions require special steps, please take a look here The main command to upgrade to a new LINSTOR operator version is:
helm repo update helm upgrade linstor-op linstor/linstor
If you used any customizations on the initial install, pass the same options
to helm upgrade
. For example:
helm install linstor-op linstor/linstor -f <file>
would become
helm upgrade linstor-op linstor/linstor -f <file>
This triggers the rollout of new pods. After a short wait, all pods should be running and ready. Check that no errors are listed in the status section of LinstorControllers, LinstorSatelliteSets and LinstorCSIDrivers.
During the upgrade process, provisioning of volumes and attach/detach operations might not work. Existing volumes and volumes already in use by a pod will continue to work without interruption. |
3.10.1. Upgrade instructions for specific versions
Some versions require special steps, see below.
Upgrade to v1.3
No additional steps necessary.
Upgrade to v1.2
LINSTOR operator v1.2 is supported on Kubernetes 1.17+. If you are using an older Kubernetes distribution, you may need to change the default settings, for example [the CSI provisioner](https://kubernetes-csi.github.io/docs/external-provisioner.html).
There is a known issue when updating the CSI components: the pods will not
be updated to the newest image and the errors
section of the
LinstorCSIDrivers resource shows an error updating the DaemonSet. In this
case, manually delete deployment/linstor-op-csi-controller
and
daemonset/linstor-op-csi-node
. They will be re-created by the operator.
3.10.2. Creating Etcd Backups
To create a backup of the Etcd database and store it on your control host, run:
kubectl exec linstor-op-etcd-0 -- etcdctl snapshot save /tmp/save.db
kubectl cp linstor-op-etcd-0:/tmp/save.db save.db
These commands will create a file save.db
on the machine you are running
kubectl
from.
4. LINSTOR Volumes in Openshift
This chapter describes the usage of LINSTOR in Openshift as managed by the operator and with volumes provisioned using the LINSTOR CSI plugin.
4.1. Openshift Overview
OpenShift is the official Red Hat developed and supported distribution of Kubernetes. As such, you can easily deploy Piraeus or the LINSTOR operator using Helm or via example yamls as mentioned in the previous chapter, Kubernetes的LINSTOR卷.
Some of the value of Red Hat’s Openshift is that it includes its own registry of supported and certified images and operators, in addition to a default and standard web console. This chapter describes how to install the Certified LINSTOR operator via these tools.
4.2. Deploying LINSTOR on Openshift
4.2.1. Before you Begin
LINBIT provides a certified LINSTOR operator via the RedHat marketplace. The operator eases deployment of LINSTOR on Kubernetes by installing DRBD, managing Satellite and Controller pods, and other related functions.
The operator itself is available from the Red Hat Marketplace.
Unlike deployment via the helm chart, the certified Openshift operator does not deploy the needed etcd cluster. You must deploy this yourself ahead of time. We do this via the etcd operator available on operatorhub.io.
It it advised that the etcd deployment uses persistent storage of some
type. Either use an existing storage provisioner with a default
StorageClass or simply use hostPath volumes.
|
Read the storage guide and configure a basic storage setup for LINSTOR.
Read the section on securing the deployment and configure as needed.
4.2.2. Deploying the operator pod
Once etcd and storage has been configured, we are now ready to install the LINSTOR operator. You can find the LINSTOR operator via the left-hand control pane of Openshift Web Console. Expand the “Operators” section and select “OperatorHub”. From here you need to find the LINSTOR operator. Either search for the term “LINSTOR” or filter only by “Marketplace” operators.
The LINSTOR operator can only watch for events and manage custom resources that are within the same namespace it is deployed within (OwnNamsespace). This means the LINSTOR Controller, LINSTOR Satellites, and LINSTOR CSI Driver pods all need to be deployed in the same namsepace as the LINSTOR Operator pod. |
Once you have located the LINSTOR operator in the Marketplace, click the “Install” button and install it as you would any other operator.
At this point you should have just one pod, the operator pod, running.
Next we needs to configure the remaining provided APIs.
4.2.3. Deploying the LINSTOR Controller
Again, navigate to the left-hand control pane of the Openshift Web Console. Expand the “Operators” section, but this time select “Installed Operators”. Find the entry for the “Linstor Operator”, then select the “LinstorController” from the “Provided APIs” column on the right.
From here you should see a page that says “No Operands Found” and will feature a large button on the right which says “Create LinstorController”. Click the “Create LinstorController” button.
Here you will be presented with options to configure the LINSTOR
Controller. Either via the web-form view or the YAML View. Regardless of
which view you select, make sure that the dbConnectionURL
matches the
endpoint provided from your etcd deployment. Otherwise, the defaults are
usually fine for most purposes.
Lastly hit “Create”, you should now see a linstor-controller pod running.
4.2.4. Deploying the LINSTOR Satellites
Next we need to deploy the Satellites Set. Just as before navigate to the left-hand control pane of the Openshift Web Console. Expand the “Operators” section, but this time select “Installed Operators”. Find the entry for the “Linstor Operator”, then select the “LinstorSatelliteSet” from the “Provided APIs” column on the right.
From here you should see a page that says “No Operands Found” and will feature a large button on the right which says “Create LinstorSatelliteSet”. Click the “Create LinstorSatelliteSet” button.
Here you will be presented with the options to configure the LINSTOR
Satellites. Either via the web-form view or the YAML View. One of the first
options you’ll notice is the automaticStorageType
. If set to “NONE” then
you’ll need to remember to configure the storage pools yourself at a later
step.
Another option you’ll notice is kernelModuleInjectionMode
. I usually
select “Compile” for portability sake, but selecting “ShippedModules” will
be faster as it will install pre-compiled kernel modules on all the worker
nodes.
Make sure the controllerEndpoint
matches what is available in the
kubernetes endpoints. The default is usually correct here.
Below is an example manifest:
apiVersion: linstor.linbit.com/v1 kind: LinstorSatelliteSet metadata: name: linstor namespace: default spec: satelliteImage: '' automaticStorageType: LVMTHIN drbdRepoCred: '' kernelModuleInjectionMode: Compile controllerEndpoint: 'http://linstor:3370' priorityClassName: '' status: errors: []
Lastly hit “Create”, you should now see a linstor-node pod running on every worker node.
4.2.5. Deploying the LINSTOR CSI driver
Last bit left is the CSI pods to bridge the layer between the CSI and LINSTOR. Just as before navigate to the left-hand control pane of the Openshift Web Console. Expand the “Operators” section, but this time select “Installed Operators”. Find the entry for the “Linstor Operator”, then select the “LinstorCSIDriver” from the “Provided APIs” column on the right.
From here you should see a page that says “No Operands Found” and will feature a large button on the right which says “Create LinstorCSIDriver”. Click the “Create LinstorCSIDriver” button.
Again, you will be presented with the options. Make sure that the
controllerEnpoint
is correct. Otherwise the defaults are fine for most use
cases.
Lastly hit “Create”. You will now see a single “linstor-csi-controller” pod, as well as a “linstor-csi-node” pod on all worker nodes.
4.3. Interacting with LINSTOR in Openshift.
The Controller pod includes a LINSTOR Client, making it easy to interact directly with LINSTOR. For instance:
oc exec deployment/linstor-cs-controller -- linstor storage-pool list
This should only be necessary for investigating problems and accessing advanced functionality. Regular operation such as creating volumes should be achieved via the Kubernetes integration.
4.4. Configuration and deployment
Once the operator and all the needed pods are deployed, provisioning volumes simply follows the usual Kubernetes workflows.
As such, please see the previous chapter’s section on Basic Configuration and Deployment.
4.5. Deploying additional components
Some additional components are not included in the OperatorHub version of the LINSTOR Operator when compared to the Helm deployment. Most notably, this includes setting up Etcd and deploying the STORK integration.
Etcd can be deployed by using the Etcd Operator available in the OperatorHub.
4.5.1. Stork
To deploy STORK, you can use the single YAML deployment available at:
https://charts.linstor.io/deploy/stork.yaml Download the YAML and replace
every instance of MY-STORK-NAMESPACE
with your desired namespace for
STORK. You also need to replace MY-LINSTOR-URL
with the URL of your
controller. This value depends on the name
you chose when
creating the LinstorController
resource. By default this would be
http://linstor.<operator-namespace>.svc:3370
To apply the YAML to Openshift, either use oc apply -f <filename>
from the
command line or find the “Import YAML” option in the top right of the
Openshift Web Console.
4.5.2. High Availability Controller
To deploy our High Availability Controller, you can use the single YAML deployment available at: https://charts.linstor.io/deploy/ha-controller.yaml
Download the YAML and replace:
-
MY-HA-CTRL-NAMESPACE
: with your preferred project name / namespace. -
MY-LINSTOR-URL
: url of the LINSTOR controller, for example:http://linstor.linstor-namespace.svc:3370/
To apply the YAML to Openshift, either use oc apply -f <filename>
from the
command line or find the “Import YAML” option in the top right of the
Openshift Web Console.
4.5.3. Deploying via Helm on openshift
Alternatively, you can deploy the LINSTOR Operator using Helm instead. Take a look at the Kubernetes guide. Openshift requires changing some of the default values in our Helm chart.
If you chose to use Etcd with hostpath volumes for persistence (see
here), you need to enable selinux
relabelling. To do this pass --set selinux=true
to the pv-hostpath
install command.
For the LINSTOR Operator chart itself, you should change the following values:
global:
setSecurityContext: false (1)
csi-snapshotter:
enabled: false (2)
stork:
schedulerTag: v1.18.6 (3)
etcd:
podsecuritycontext:
supplementalGroups: [1000] (4)
operator:
satelliteSet:
kernelModuleInjectionImage: drbd.io/drbd9-rhel8:v9.0.25 (5)
1 | Openshift uses SCCs to manage security contexts. |
2 | The cluster wide CSI Snapshot Controller is already installed by Openshift. |
3 | Automatic detection of the Kubernetes Scheduler version fails in Openshift, you need to set it manually. Note: the tag does not have to match Openshift’s Kubernetes release. |
4 | If you choose to use Etcd deployed via Helm and use the pv-hostpath chart,
Etcd needs to run as member of group 1000 to access the persistent volume. |
5 | The RHEL8 kernel injector also supports RHCOS. |
Other overrides, such as storage pool configuration, HA deployments and more, are available and documented in the Kubernetes guide.
5. Proxmox VE中的LINSTOR卷
本章描述了通过 LINSTOR Proxmox Plugin 实现的Proxmox VE中的DRBD。
5.1. Proxmox VE概述
proxmox VE是一个易于使用的、完整的服务器虚拟化环境,具有KVM、Linux容器和HA。
linstor-proxmox
是proxmox的一个Perl插件,它与LINSTOR结合,允许在多个proxmox
VE节点上复制VM磁盘。这允许在几秒钟内实时迁移活动vm,而且不需要中央SAN,因为数据已经复制到多个节点。
5.2. Upgrades
If this is a fresh installation, skip this section and continue with Proxmox插件安装.
5.2.1. From 4.x to 5.x
Version 5 of the plugin drops compatibility with the legacy configuration options “storagepool” and “redundancy”. Version 5 requires a “resourcegroup” option, and obviously a LINSTOR resource group. The old options should be removed from the config.
Configuring LINSTOR is described in Section LINSTOR配置, a typical example follows: Let’s assume the pool was set to “mypool”, and redundancy to 3.
# linstor resource-group create --storage-pool=mypool --place-count=3 drbdMypoolThree # linstor volume-group create drbdMypoolThree # vi /etc/pve/storage.cfg drbd: drbdstorage content images,rootdir controller 10.11.12.13 resourcegroup drbdMypoolThree
5.3. Proxmox插件安装
LINBIT为Proxmox VE用户提供了一个专用的公共仓库。这个存储库不仅包含Proxmox插件,还包含整个DRBD-SDS堆栈,包括DRBD SDS内核模块和用户空间实用程序。
DRBD9内核模块是作为dkms软件包(即 drbd-dkms
)安装的,因此,您必须先安装 pve-headers
软件包,然后才能从LINBIT的存储库中设置/安装软件包。 按照该顺序,确保内核模块将为您的内核正确构建。如果您不打算安装最新的Proxmox内核,则必须安装与当前运行的内核匹配的内核头文件(例如 pve-headers-$(uname -r)
)。如果您错过了这一步,那么仍然可以通过输入 apt install --reinstall drbd-dkms
命令, 针对当前内核重建dkms软件包(必须预先安装内核头文件)。
LINBIT’s repository can be enabled as follows, where “$PVERS” should be set to your Proxmox VE major version (e.g., “6”, not “6.1”):
# wget -O- https://packages.linbit.com/package-signing-pubkey.asc | apt-key add - # PVERS=6 && echo "deb http://packages.linbit.com/proxmox/ proxmox-$PVERS drbd-9.0" > \ /etc/apt/sources.list.d/linbit.list # apt update && apt install linstor-proxmox
5.4. LINSTOR配置
5.5. ProxBox插件配置
最后一步是为Proxmox本身提供配置。这可以通过在 /etc/pve/storage.cfg
文件中添加一个条目来完成,其内容类似于以下内容。
drbd: drbdstorage content images,rootdir controller 10.11.12.13 resourcegroup defaultpool
The “drbd” entry is fixed and you are not allowed to modify it, as it tells to Proxmox to use DRBD as storage backend. The “drbdstorage” entry can be modified and is used as a friendly name that will be shown in the PVE web GUI to locate the DRBD storage. The “content” entry is also fixed, so do not change it. The redundancy (specified in the resource group) specifies how many replicas of the data will be stored in the cluster. The recommendation is to set it to 2 or 3 depending on your setup. The data is accessible from all nodes, even if some of them do not have local copies of the data. For example, in a 5 node cluster, all nodes will be able to access 3 copies of the data, no matter where they are stored in. The “controller” parameter must be set to the IP of the node that runs the LINSTOR controller service. Only one node can be set to run as LINSTOR controller at the same time. If that node fails, start the LINSTOR controller on another node and change that value to its IP address.
插件的最新版本允许定义多个不同的存储池。这样的配置如下:
drbd: drbdstorage content images,rootdir controller 10.11.12.13 resourcegroup defaultpool drbd: fastdrbd content images,rootdir controller 10.11.12.13 resourcegroup ssd drbd: slowdrbd content images,rootdir controller 10.11.12.13 resourcegroup backup
现在,您应该可以通过Proxmox的web GUI创建vm,方法是选择 “drbdstorage” , 或者选择任何其他已定义的池作为存储位置。
Starting from version 5 of the plugin one can set the option “preferlocal yes”. If it is set, the plugin tries to create a diskful assignment on the node that issued the storage create command. With this option one can make sure the VM gets local storage if possible. Without that option LINSTOR might place the storage on nodes ‘B’ and ‘C’, while the VM is initially started on node ‘A’. This would still work as node ‘A’ then would get a diskless assignment, but having local storage might be preferred.
此时,您可以尝试实时迁移虚拟机 – 因为所有节点(甚至在无盘节点上)都可以访问所有数据 – 只需几秒钟。如果VM负载不足,并且有很多RAM一直被污染,那么整个过程可能需要更长的时间。但是在任何情况下,停机时间都应该是最少的,而且您不会看到任何中断。
5.6. Making the Controller Highly-Available (optional)
This section describes how the controller can be made highly available, but this is not a must. Please read the entire section before you start, and then decide if the increased complexity, and the limitations are worth it. Or if you are better off by taking regular backups of the LINSTOR controller database and starting a (temporary) controller with the database backup on one of the remaining satellites if the current controller is beyond repair.
对于本指南的其余部分,我们假设您安装了LINSTOR和Proxmox插件,如LINSTOR配置中所述。
基本思想是在一个由Proxmox及其HA特性控制的VM中执行LINSTOR控制器,其中存储驻留在由LINSTOR本身管理的DRBD上。
第一步是为VM分配存储空间:像往常一样创建一个VM,并在 “OS” 部分选择 “Do not use any media”。硬盘当然应该位于DRBD上(例如 “drbdstorage” )。2GB的磁盘空间应该足够了,对于RAM,我们选择了1GB。这些是LINBIT为其客户提供的设备的最低要求(见下文)。如果您希望设置自己的控制器虚拟机,并且您有足够的可用硬件资源,则可以增加这些最小值。在下面的用例中,我们假设控制器VM是用ID 100创建的,但是如果这个VM是在以后创建的,换成一个不同的ID,就可以了。
LINBIT为其客户提供了一个设备,可用于填充创建的存储。为了让设备工作,我们首先创建一个 “Serial Port”。首先点击 \:Hardware” ,然后点击 “Add” ,最后点击 “Serial Port” :

如果一切正常,VM定义应该如下所示:

下一步是将VM设备复制到VM磁盘存储。这可以通过 qemu img
完成。
确保用正确的VM ID替换VM ID。 |
# qemu-img dd -O raw if=/tmp/linbit-linstor-controller-amd64.img \ of=/dev/drbd/by-res/vm-100-disk-1/0
完成后,您可以启动VM并通过Proxmox VNC查看器连接到它。默认用户名和密码都是 “linbit”
。注意,我们保留了ssh服务器的默认配置,因此您将无法通过ssh和用户名/密码登录到VM。如果要启用该功能(和/或 “root” 登录),请在
/etc/ssh/sshd_config
中启用这些设置, 然后重新启动ssh服务。由于这个虚拟机是基于 “Ubuntu Bionic”, 您应该在
/etc/netplan/config.yaml
中更改网络设置(例如,静态IP)。之后,您应该能够通过ssh连接到VM:

在下一步中,将controller VM添加到现有集群:
# linstor node create --node-type Controller \ linstor-controller 10.43.7.254
由于Controller VM将由Proxmox存储插件以特殊的方式处理(与其他虚拟机相比),我们必须确保所有主机都有权访问其备份存储,在PVE HA启动虚拟机 之前 ,否则虚拟机将无法启动。有关如何实现此目标的详细信息,请参见下文。 |
在我们的测试集群中,Controller VM磁盘是在DRBD存储中创建的, 最初分配给一个主机(使用 linstor resource list
检查分配)。然后,我们使用 linstor resource create
命令为这个VM创建额外的资源分配给集群的其他节点。在由四个节点组成的实验环境中,我们将所有资源分配创建为磁盘分配,但无磁盘分配也很好。根据经验,将冗余计数保持在
“3” (通常没有意义),并将其余部分分配为无盘。
由于Controller VM的存储必须以某种方式在所有PVE主机上可用,因此我们必须确保在所有主机上启用 drbd.service
(假定它在此阶段不受LINSTOR控制):
# systemctl enable drbd # systemctl start drbd
默认情况下, linstor satellite
服务在启动时删除其所有资源文件(*.res
)并重新生成它们。这与需要这些资源文件来启动控制器VM的 drbd
服务冲突。首先通过
drbd.service
调出资源,并确保调出控制器资源的 linstor satellite.service
不会删除相应的 res
文件。要进行必要的更改,需要通过 systemctl
为 linstor satellite.service
创建一个drop-in(不要直接编辑文件)。
systemctl edit linstor-satellite [Service] Environment=LS_KEEP_RES=vm-100-disk [Unit] After=drbd.service
当然,在 LS_KEEP_RES
变量中需要调整controller VM的名称。请注意,给定的值被解释为regex,因此不需要指定确切的名称。
别忘了重新启动 linstor satellite.service
。
在那之后,是时候进行最后的步骤了,即从现有的控制器(驻留在物理主机上)切换到VM中的新控制器。因此,让我们停止物理主机上的旧控制器服务,并将LINSTOR控制器数据库复制到VM主机:
# systemctl stop linstor-controller # systemctl disable linstor-controller # scp /var/lib/linstor/* root@10.43.7.254:/var/lib/linstor/
最后,我们可以在VM中启用控制器:
# systemctl start linstor-controller # in the VM # systemctl enable linstor-controller # in the VM
要检查一切是否按预期工作,可以通过询问VM中的控制器来查询物理PVE主机上的群集节点:
linstor—controllers=10.43.7.254 node list
。controller(只是一个controller,而不是
“Combined” 主机)显示为 “OFFLINE”, 但一切正常。这在未来可能会改变为更合理的方式。
As the last — but crucial — step, you need to add the “controllervm”
option to /etc/pve/storage.cfg
, and change the controller IP address to
the IP address of the Controller VM:
drbd: drbdstorage content images,rootdir resourcegroup defaultpool controller 10.43.7.254 controllervm 100
请注意附加设置 “controllervm”
。这个设置非常重要,因为它告诉PVE以不同于DRBD存储中存储的其他VM的方式处理控制器VM。具体来说,它将指示PVE不要使用LINSTOR存储插件来处理控制器VM,而是使用其他方法。原因是,在这个阶段没有简单的LINSTOR后端。一旦控制器VM启动并运行(以及VM中相关的LINSTOR控制器服务),PVE主机将能够使用linstorage插件启动存储在DRBD存储中的其余虚拟机。请确保在
controllervm
设置中设置正确的VM ID。在本例中设置为 100
,表示分配给控制器VM的ID。
非常重要的一点是,要确保控制器VM始终处于启动和运行状态,并且要定期备份它(主要是在修改LINSTOR集群时)。一旦VM消失,并且没有备份,LINSTOR集群必须从头开始重新创建。
为了防止意外删除VM,可以转到PVE GUI中VM的 “Options” 选项卡并启用 “Protection” 选项。但是,如果您意外地删除了VM,那么存储插件会忽略这些请求,因此VM磁盘不会从LINSTOR集群中删除。因此,可以使用与以前相同的ID重新创建VM(只需在PVE中重新创建VM配置文件并分配旧VM使用的相同DRBD存储设备)。这个插件只会返回 “OK” ,带有旧数据的旧VM可以再次使用。一般来说,注意不要删除控制器VM并相应地 “protect” 它。
目前,我们已经将控制器作为VM执行,但是我们应该确保VM的一个实例始终启动。为此,我们使用Proxmox的HA特性。点击虚拟机,然后点击 “More” ,然后点击 “Manage HA” 。我们为控制器VM设置以下参数:

只要您的Proxmox集群中有幸存的节点,一切都应该正常,并且如果承载控制器VM的节点关闭或丢失,Proxmox HA将确保controller在另一台主机上启动。显然,控制器VM的IP不应该更改。这取决于作为管理员的您,以确保是这样的情况(例如,设置一个静态IP,或始终通过桥接接口上的dhcp提供相同的IP)。
在此必须指出的是,如果要为LINSTOR集群使用专用网络,则必须确保为集群通信配置的网络接口在PVE主机上配置为网桥(即vmb1、vmbr2等)。如果将它们设置为直接接口(即eth0、eth1等),则您将无法设置控制器VM vNIC以与群集中的其余LINSTOR节点通信,因为您无法将直接网络接口分配给VM,而只能将桥接接口分配给VM。
此设置未完全处理的一个限制是所有群集节点重新启动后的群集完全中断(例如,常见电源故障)。不幸的是Proxmox在这方面非常有限。可以为VM启用 “HA Feature” ,还可以定义 “Start and Shutdown Order” 约束。但两者都是完全分离的。因此,在启动所有其他VM之前,很难/不可能保证controller VM将启动并运行。
可以通过在Proxmox插件中延迟VM启动来解决这个问题,直到控制器VM启动(即,如果要求插件启动控制器VM,它就会启动,否则它会等待并ping控制器)。虽然这是一个好主意,但在序列化的、非并发的VM启动/插件调用事件流中,这可能会严重失败,在controller VM启动之前,应该启动某些VM(然后被阻止)。这显然会导致僵局。
我们将与Proxmox讨论这些选项,但是我们认为当前的解决方案在大多数典型用例中都是有价值的。尤其是,与pacemaker配置的复杂性相比。我们将讨论一些用例,在这些用例中,人们可以预期不会同时关闭整个集群。即使是这样,当整个集群启动时,只有VMs的自动启动将不起作用。在这种情况下,管理员只需等到Proxmox HA服务启动控制器VM。之后,所有vm都可以在命令行上手动启动/或通过脚本批量启动。
6. OpenNebula中的LINSTOR卷
本章通过使用 LINSTOR storage driver addon 描述OpenNebula中的DRBD。
详细的安装和配置说明,请参见驱动程序源的 README.md 文件。
6.1. OpenNebula概述
OpenNebula是一个灵活的开源云管理平台,允许通过使用插件扩展其功能。
LINSTOR插件允许部署具有由DRBD支持并通过DRBD自己的传输协议连接到网络上的高可用映像的虚拟机。
6.2. OpenNebula插件安装
为OpenNebula安装linstorage插件需要一个工作的OpenNebula集群以及一个工作的LINSTOR集群。
通过访问LINBIT的客户仓库,您可以使用
# apt install linstor-opennebula
或
# yum install linstor-opennebula
如果无法访问LINBIT准备的包,您需要返回阅读 Github page 上的说明。
可以按照指南中的说明安装和配置带有LINSTOR的DRBD集群,请参见初始化集群。
OpenNebula和DRBD集群可以相互独立,但以下情况除外:OpenNebula的前端和主机节点必须包含在这两个集群中。
主机节点不需要本地LINSTOR存储池,因为虚拟机映像是通过网络连接到它的。 [1]。
6.3. 部署选项
建议使用LINSTOR资源组配置您喜欢的部署,请参见OpenNebula资源组。不推荐使用以前的自动放置和部署节点模式, 因为已经过时了。
6.4. 配置
6.4.1. 将驱动程序添加到OpenNebula
修改 /etc/one/oned.conf
的以下部分`
将linstor添加到 TM MAD
和 DATASTORE MAD
部分的驱动程序列表中:
TM_MAD = [ executable = "one_tm", arguments = "-t 15 -d dummy,lvm,shared,fs_lvm,qcow2,ssh,vmfs,ceph,linstor" ]
DATASTORE_MAD = [ EXECUTABLE = "one_datastore", ARGUMENTS = "-t 15 -d dummy,fs,lvm,ceph,dev,iscsi_libvirt,vcenter,linstor -s shared,ssh,ceph,fs_lvm,qcow2,linstor"
添加新的TM_MAD_CONF和DS_MAD_CONF部分:
TM_MAD_CONF = [ NAME = "linstor", LN_TARGET = "NONE", CLONE_TARGET = "SELF", SHARED = "yes", ALLOW_ORPHANS="yes", TM_MAD_SYSTEM = "ssh,shared", LN_TARGET_SSH = "NONE", CLONE_TARGET_SSH = "SELF", DISK_TYPE_SSH = "BLOCK", LN_TARGET_SHARED = "NONE", CLONE_TARGET_SHARED = "SELF", DISK_TYPE_SHARED = "BLOCK" ]
DS_MAD_CONF = [ NAME = "linstor", REQUIRED_ATTRS = "BRIDGE_LIST", PERSISTENT_ONLY = "NO", MARKETPLACE_ACTIONS = "export" ]
完成这些更改后,重新启动opennebula服务。
6.4.2. 配置节点
前端节点通过Linstor向存储和主机节点发出命令
存储节点在本地保存vm的磁盘映像。
主机节点负责运行实例化的vm,通常通过Linstor无盘模式在网络上存储它们需要连接的映像。
所有节点都必须安装DRBD9和Linstor。这个过程在 DRBD9用户指南 中有详细说明
只要前端和主机节点满足这两个角色的所有要求,它们就可以作为其主要角色之外的存储节点。
前端配置
请验证您希望与之通信的控制节点是否可以从前端节点访问。 linstor node list
用于本地运行linstor控制器,而 linstor
--controllers "<IP:PORT>" node list
用于远程运行linstor控制器,这是一种方便的测试方法。
主机配置
主机节点上必须运行Linstor satellite进程,并且与前端和存储节点属于同一Linstor集群,并且可以选择在本地存储。如果
oneadmin
用户能够在主机之间无密码地使用ssh,那么即使使用ssh系统数据存储,也可以使用实时迁移。
存储节点配置
只有前端和主机节点需要安装OpenNebula,但oneadmin用户必须能够无密码访问存储节点。有关如何在发行版中手动配置oneadmin用户帐户,请参阅OpenNebula安装指南。
存储节点必须使用由能够生成快照的驱动程序(如精简LVM插件)创建的存储池。
在这个使用LVM for Linstor准备精简配置存储的示例中,必须在每个存储节点上创建卷组并使用LVM创建精简配置存储。
此过程的示例使用两个物理卷(/dev/sdX和/dev/sdY)以及卷组和thinpool的通用名称。请确保将thinLV的元数据卷设置为合理的大小,一旦它满了,就很难调整大小:
pvcreate /dev/sdX /dev/sdY vgcreate drbdpool /dev/sdX /dev/sdY lvcreate -l 95%VG --poolmetadatasize 8g -T /dev/drbdpool/drbdthinpool
然后在Linstor上创建存储池,并将其用作备份存储。
If you are using ZFS storage pools or thick-LVM, please use LINSTOR_CLONE_MODE
copy otherwise you will have problems deleting linstor resources, because
of ZFS parent-child snapshot relationships.
|
6.4.3. Oneadmin的权限
oneadmin用户必须具有对存储节点上的 mkfs
命令的无密码sudo访问权限
oneadmin ALL=(root) NOPASSWD: /sbin/mkfs
组
请确保考虑应该添加oneadmin的组,以便访问访问存储和实例化vm所需的设备和程序。对于此加载项,oneadmin用户必须属于所有节点上的
disk
组,才能访问保存镜像的DRBD设备。
usermod -a -G disk oneadmin
6.4.4. 创建新的Linstor数据存储
创建一个名为ds.conf的数据存储配置文件,并使用 onedatastore
工具基于该配置创建一个新的数据存储。有两个相互排斥的部署选项:LINSTOR_AUTO_PLACE和LINSTOR_deployment_NODES。如果两者都已配置,则忽略LINSTOR_AUTO_PLACE。对于这两个选项,BRIDGE_LIST必须是Linstor集群中所有存储节点的空间分隔列表。
6.4.5. OpenNebula资源组
因为1.0.0版LINSTOR支持资源组。资源组是所有链接到该资源组的资源共享的设置的集中点。
Create a resource group and volume group for your datastore, it is mandatory
to specify a storage-pool within the resource group, otherwise monitoring
space for opennebula will not work. Here we create one with 2 node
redundancy and use a created opennebula-storagepool
:
linstor resource-group create OneRscGrp --place-count 2 --storage-pool opennebula-storagepool linstor volume-group create
现在使用LINSTOR插件添加OpenNebula数据存储:
cat >ds.conf <<EOI NAME = linstor_datastore DS_MAD = linstor TM_MAD = linstor TYPE = IMAGE_DS DISK_TYPE = BLOCK LINSTOR_RESOURCE_GROUP = "OneRscGrp" COMPATIBLE_SYS_DS = 0 BRIDGE_LIST = "alice bob charlie" #node names EOI onedatastore create ds.conf
6.4.6. 插件属性
LINSTOR_CONTROLLERS
LINSTOR_CONTROLLERS
可用于在LINSTOR控制器进程未在前端本地运行的情况下,将以逗号分隔的控制器IP和端口列表传递给LINSTOR客户端,例如:
LINSTOR_CONTROLLERS = "192.168.1.10:8080,192.168.1.11:6000"
LINSTOR_CLONE_MODE
Linstor支持两种不同的克隆模式,并通过 LINSTOR_CLONE_MODE
属性设置:
-
snapshot
默认模式是 ‘snapshot’ ,它使用linstor快照,并从该快照还原一个新资源,该快照随后是映像的克隆。此模式通常比使用 copy
模式快,因为快照是廉价的副本。
-
copy
第二种模式是 copy
,它创建一个与原始资源大小相同的新资源,并使用 dd
将数据复制到新资源。此模式将比 snapshot
慢,但更健壮,因为它不依赖任何快照机制,如果要将镜像克隆到不同的linstor数据存储中,也会使用此模式。
6.4.7. 不推荐的属性
以下属性已弃用,并将在1.0.0版之后的版本中删除。
LINSTOR_STORAGE_POOL
LINSTOR_STORAGE_POOL
属性用于选择数据存储应使用的LINSTOR存储池。如果使用资源组,则不需要此属性,因为可以通过
自动选择筛选器
选项选择存储池。如果使用 LINSTOR_AUTO_PLACE
或 LINSTOR_DEPLOYMENT_NODES
,并且未设置 LINSTOR_STORAGE_POOL
,则它将回退到LINSTOR中的 DfltStorPool
。
LINSTOR_AUTO_PLACE
LINSTOR_AUTO_PLACE
选项采用一个冗余级别,即一个与存储节点总数之间的数字。资源根据冗余级别自动分配给存储节点。
LINSTOR_DEPLOYMENT_NODES
使用 LINSTOR_DEPLOYMENT_NODES
可以选择资源将始终分配给的一组节点。请注意,bridge列表仍然包含Linstor集群中的所有存储节点。
6.4.8. LINSTOR作为系统数据存储
Linstor驱动程序也可以用作系统数据存储,配置与普通数据存储非常相似,但有一些更改:
cat >system_ds.conf <<EOI NAME = linstor_system_datastore TM_MAD = linstor TYPE = SYSTEM_DS LINSTOR_RESOURCE_GROUP = "OneSysRscGrp" BRIDGE_LIST = "alice bob charlie" # node names EOI onedatastore create system_ds.conf
还要将新的sys datastore id添加到镜像数据存储的 COMPATIBLE_SYS_DS
(以逗号分隔),否则调度程序将忽略它们。
如果要使用易失性磁盘进行实时迁移,则需要为KVM启用 --unsafe
选项,请参见:
opennebula
doc
6.5. Live Migration
即使使用ssh系统数据存储和nfs共享系统数据存储,也支持实时迁移。
6.6. 可用空间报告
空闲空间的计算方式不同,这取决于资源是自动部署还是按节点部署。
对于每个节点放置的数据存储,将根据部署资源的所有节点中限制最严格的存储池报告可用空间。例如,总存储空间最小的节点的容量用于确定数据存储的总大小,而可用空间最小的节点用于确定数据存储中的剩余空间。
对于使用自动放置的数据存储,大小和剩余空间是根据LINSTOR报告的数据存储所使用的聚合存储池确定的。
7. Openstack中的LINSTOR卷
本章介绍了Openstack中的DRBD,它使用 https://github.com/LINBIT/openstack-cinder/tree/stein linstor[LINSTOR Driver] 实现持久、复制和高性能块存储。
7.1. Openstack概述
Openstack包含各种各样的独立服务;与DRBD最相关的两个是Cinder和Nova。Cinder 是块存储服务,Nova 是计算节点服务,负责为虚拟机提供卷。
OpenStack的LINSTOR驱动程序管理DRBD/LINSTOR集群,并使它们在OpenStack环境中可用,特别是在Nova计算实例中。LINSTOR支持的Cinder卷将无缝地提供DRBD/LINSTOR的所有功能,同时允许OpenStack管理其所有部署和管理。驱动程序将允许OpenStack创建和删除持久LINSTOR卷,以及管理和部署卷快照和原始卷映像。
除了使用内核本机DRBD协议进行复制外,LINSTOR驱动程序还允许将iSCSI与LINSTOR集群一起使用,以提供最大的兼容性。有关这两个选项的更多信息,请参见选择传输协议。
7.2. 在OpenStack上安装LINSTOR
在安装OpenStack驱动程序之前,必须完成DRBD和LINSTOR的初始安装和配置。集群中的每个LINSTOR节点也应该定义一个存储池。有关LINSTOR安装的详细信息,请参见here。
7.2.1. 以下是关于在Ubuntu上快速设置LINSTOR集群的概要:
在Cinder节点上安装DRBD和LINSTOR作为LINSTOR Controller节点:
# First, set up LINBIT repository per support contract # Install DRBD and LINSTOR packages sudo apt update sudo apt install -y drbd-dkms lvm2 sudo apt install -y linstor-controller linstor-satellite linstor-client sudo apt install -y drbdtop # Start both LINSTOR Controller and Satellite Services systemctl enable linstor-controller.service systemctl start linstor-controller.service systemctl enable linstor-satellite.service systemctl start linstor-satellite.service # For Diskless Controller, skip the following two 'sudo' commands # For Diskful Controller, create backend storage for DRBD/LINSTOR by creating # a Volume Group 'drbdpool' and specify appropriate volume location (/dev/vdb) sudo vgcreate drbdpool /dev/vdb # Create a Logical Volume 'thinpool' within 'drbdpool' # Specify appropriate thin volume size (64G) sudo lvcreate -L 64G -T drbdpool/thinpool
OpenStack以GiB为单位测量存储大小。 |
在LINSTOR集群的其他节点上安装DRBD和LINSTOR:
# First, set up LINBIT repository per support contract # Install DRBD and LINSTOR packages sudo apt update sudo apt install -y drbd-dkms lvm2 sudo apt install -y linstor-satellite sudo apt install -y drbdtop # Start only the LINSTOR Satellite service systemctl enable linstor-satellite.service systemctl start linstor-satellite.service # Create backend storage for DRBD/LINSTOR by creating a Volume Group 'drbdpool' # Specify appropriate volume location (/dev/vdb) sudo vgcreate drbdpool /dev/vdb # Create a Logical Volume 'thinpool' within 'drbdpool' # Specify appropriate thin volume size (64G) sudo lvcreate -L 64G -T drbdpool/thinpool
最后,从Cinder节点创建LINSTOR Satellite节点和存储池
# Create a LINSTOR cluster, including the Cinder node as one of the nodes # For each node, specify node name, its IP address, volume type (diskless) and # volume location (drbdpool/thinpool) # Create the controller node as combined controller and satellite node linstor node create cinder-node-name 192.168.1.100 --node-type Combined # Create the satellite node(s) linstor node create another-node-name 192.168.1.101 # repeat to add more satellite nodes in the LINSTOR cluster # Create LINSTOR Storage Pool on each nodes # For each node, specify node name, its IP address, # storage pool name (DfltStorPool), # volume type (diskless / lvmthin) and node type (Combined) # Create diskless Controller node on the Cinder controller linstor storage-pool create diskless cinder-node-name DfltStorPool # Create diskful Satellite nodes linstor storage-pool create lvmthin another-node-name DfltStorPool drbdpool/thinpool # repeat to add a storage pool to each node in the LINSTOR cluster
7.2.2. 安装LINSTOR驱动程序文件
linstor驱动程序将从OpenStack Stein版本开始正式提供。最新版本位于 LINBIT OpenStack Repo. 它是一个名为 linstordrv.py 的Python文件。根据OpenStack的安装,其目标可能会有所不同。
将驱动程序(linstordrv.py)放置在OpenStack Cinder节点中的适当位置。
对于Devstack:
/opt/stack/cinder/cinder/volume/drivers/linstordrv.py
对于Ubuntu:
/usr/lib/python2.7/dist-packages/cinder/volume/drivers/linstordrv.py
对于RDO Packstack:
/usr/lib/python2.7/site-packages/cinder/volume/drivers/linstordrv.py
7.3. Cinder配置
7.3.1. 在/etc/cinder/中编辑Cinder配置文件 Cinder.conf ,如下所示:
通过将 linstor
添加到已启用的后端来启用LINSTOR驱动程序
[DEFAULT] ... enabled_backends=lvm, linstor ...
在cinder.conf的末尾添加以下配置选项
[linstor] volume_backend_name = linstor volume_driver = cinder.volume.drivers.linstordrv.LinstorDrbdDriver linstor_default_volume_group_name=drbdpool linstor_default_uri=linstor://localhost linstor_default_storage_pool_name=DfltStorPool linstor_default_resource_size=1 linstor_volume_downsize_factor=4096
7.3.2. 更新驱动程序的Python库
sudo pip install google --upgrade sudo pip install protobuf --upgrade sudo pip install eventlet --upgrade
7.3.3. 为LINSTOR创建新的后端类型
为OpenStack命令行操作配置环境变量后,从Cinder节点运行这些命令。
cinder type-create linstor cinder type-key linstor set volume_backend_name=linstor
7.3.4. 重新启动Cinder服务以完成
对于Devstack:
sudo systemctl restart devstack@c-vol.service sudo systemctl restart devstack@c-api.service sudo systemctl restart devstack@c-sch.service
对于RDO Packstack:
sudo systemctl restart openstack-cinder-volume.service sudo systemctl restart openstack-cinder-api.service sudo systemctl restart openstack-cinder-scheduler.service
对于完整的OpenStack:
sudo systemctl restart cinder-volume.service sudo systemctl restart cinder-api.service sudo systemctl restart cinder-scheduler.service
7.3.5. 验证安装是否正确:
一旦Cinder服务重新启动,可以使用Horizon图形用户界面或命令行创建带有LINSTOR后端的新Cinder卷。使用以下命令行创建卷时,请参考。
# Check to see if there are any recurring errors with the driver. # Occasional 'ERROR' keyword associated with the database is normal. # Use Ctrl-C to stop the log output to move on. sudo systemctl -f -u devstack@c-* | grep error # Create a LINSTOR test volume. Once the volume is created, volume list # command should show one new Cinder volume. The 'linstor' command then # should list actual resource nodes within the LINSTOR cluster backing that # Cinder volume. openstack volume create --type linstor --size 1 --availability-zone nova linstor-test-vol openstack volume list linstor resource list
7.3.6. 附加配置
待补充
7.4. 选择传输协议
使用Cinder运行DRBD/LINSTOR有两种主要方法:
-
使用 LINSTOR.
这些不是非此即彼的;您可以定义多个后端,其中一些使用iSCSI,另一些使用DRBD协议。
7.4.1. iSCSI传输
导出Cinder卷的默认方法是通过iSCSI。这带来了最大兼容性的优势 – 无论是VMWare、Xen、HyperV还是KVM,iSCSI都可以用于每个虚拟机监控程序。
缺点是,所有数据都必须发送到Cinder节点,由(userspace)iSCSI守护进程处理;这意味着数据需要通过内核/userspace边界,这些转换将消耗一些性能。
7.4.2. DRBD/LINSTOR传输
另一种方法是使用DRBD作为传输协议将数据传输到VMs。这意味着DRBD 9 [2]也需要安装在Cinder节点上。
由于OpenStack只在Linux中运行,因此使用DRBD/LINSTOR传输目前仅限制在具有KVM的Linux主机上部署。 |
该解决方案的一个优点是,vm的存储访问请求可以通过DRBD内核模块发送到存储节点,然后存储节点可以直接访问分配的lv;这意味着数据路径上没有内核/用户空间转换,因此性能更好。结合支持RDMA的硬件,您应该获得与VMs直接访问FC后端相同的性能。
另一个优点是,您将从DRBD的HA背景中隐式获益:使用多个存储节点(可能在不同的网络连接上可用)意味着冗余并避免单点故障。
Cinder驱动程序的默认配置选项假定Cinder节点是一个 无磁盘 LINSTOR节点。如果节点是 Diskful 节点,请将
|
7.4.3. 配置传输协议
在 cinder.conf
的LINSTOR部分中,可以定义要使用的传输协议。本章开头描述的初始设置被设置为使用DRBD传输。您可以根据需要进行如下配置。那么Horizon[3]应该在卷创建时提供这些存储后端。
-
要在LINSTOR中使用iSCSI,请执行以下操作:
volume_driver=cinder.volume.drivers.drbdmanagedrv.DrbdManageIscsiDriver
-
要将DRBD内核模块与LINSTOR一起使用,请执行以下操作:
volume_driver=cinder.volume.drivers.drbdmanagedrv.DrbdManageDrbdDriver
由于兼容性原因,旧类名 DrbdManageDriver
暂时保留;它只是iSCSI驱动程序的别名。
总结一下:
-
你需要LINSTOR Cinder驱动程序0.1.0或更高版本,LINSTOR 0.6.5 或者以后的版本。
-
DRBD transport protocol应该是 尽可能首选;iSCSI不会提供任何本地好处。
-
注意不要耗尽磁盘空间,特别是对于精简卷。
8. Docker中的LINSTOR卷
本章介绍docker中由https://github.com/LINBIT/linstor-docker-volume-go[LINSTOR Docker Volume Plugin]管理的LINSTOR卷。
8.1. Docker概述
Docker 是一个以Linux容器的形式开发、发布和运行应用程序的平台。对于需要数据持久性的有状态应用程序,Docker支持使用持久 卷 和 卷驱动程序。
LINSTOR Docker Volume Plugin是一个卷驱动程序,它为docker容器提供来自linstor集群的持久卷。
8.2. 用于Docker安装的LINSTOR插件
To install the linstor-docker-volume
plugin provided by LINBIT, you’ll
need to have a working LINSTOR cluster. After that the plugin can be
installed from the public docker hub.
# docker plugin install linbit/linstor-docker-volume
8.3. 用于Docker配置的LINSTOR插件
As the plugin has to communicate to the LINSTOR controller via the LINSTOR python library, we must tell the plugin where to find the LINSTOR Controller node in its configuration file:
# cat /etc/linstor/docker-volume.conf [global] controllers = linstor://hostnameofcontroller
更广泛的例子如下:
# cat /etc/linstor/docker-volume.conf [global] storagepool = thin-lvm fs = ext4 fsopts = -E discard size = 100MB replicas = 2
8.4. 示例用法
下面是一些如何使用LINSTOR Docker Volume插件的示例。在下面我们期望一个由三个节点(alpha、bravo和charlie)组成的集群。
8.4.1. 示例1-典型的docker模式
在节点alpha上:
$ docker volume create -d linstor \ --opt fs=xfs --opt size=200 lsvol $ docker run -it --rm --name=cont \ -v lsvol:/data --volume-driver=linstor busybox sh $ root@cont: echo "foo" > /data/test.txt $ root@cont: exit
在bravo节点上:
$ docker run -it --rm --name=cont \ -v lsvol:/data --volume-driver=linstor busybox sh $ root@cont: cat /data/test.txt foo $ root@cont: exit $ docker volume rm lsvol
8.4.2. 示例2-按名称分配一个磁盘,两个节点无磁盘
$ docker volume create -d linstor --opt nodes=bravo lsvol
8.4.3. 示例3-一个磁盘完全分配,无论在何处,两个节点无磁盘
$ docker volume create -d linstor --opt replicas=1 lsvol
8.4.4. 示例4-两个按名称分配的磁盘,charlie diskless
$ docker volume create -d linstor --opt nodes=alpha,bravo lsvol
8.4.5. 示例5-两个磁盘满分配,无论在何处,一个节点无磁盘
$ docker volume create -d linstor --opt replicas=2 lsvol