����λ�ã���ҳ > �����̳� > �̳� > Intel HDSLB �������IJ㸺�ؾ����� �� ����ԭ���Ͳ�������

Intel HDSLB �������IJ㸺�ؾ����� �� ����ԭ���Ͳ�������

��Դ������������|��ʱ�䣺2024-05-27 08:46:07 |���Ķ���95��|�� ��ǩ�� T ��ԭ El S in ���� ������ Intel �� |����������

��ƪ��Ҫ������ Intel HDSLB �Ļ�������ԭ���Ͳ������õķ�ʽ��ϣ���ܹ�����������˳���İ� HDSLB-DPVS ��Ŀ ���桱 ������

ǰ��

����һƪ�� Intel HDSLB �������IJ㸺�ؾ����� �� �������ź�Ӧ�ó��� ���У��������ؽ����� HDSLB��High Density Scalable Load Balancer�����ܶȿ���չ�ĸ��ؾ���������Ϊ��һ���������IJ㸺�ؾ�����������λ�������� HDSLB ���Ƽ���ͱ�Ե����Ӧ�ó����е��������ƣ��Լ������ HDSLB �����ܲ������ݡ�

�ٽ�һ���ģ��ڱ�ƪ��������Ҫ��ע HDSLB �Ļ�������ԭ���Ͳ������÷�ʽ����������ʵ�ʵIJ�����Ϊ���ø��㷺�Ŀ������Ƕ��ܹ���ݷ���Ķ� HDSLB չ���о��������ڱ�ƪ�л���� HDSLB-DPVS ��Դ�汾�����н��ܡ�

HDSLB-DPVS �Ļ���ԭ��

����˼�壬HDSLB-DPVS �ǻ��� DPVS ���ж��ο�������Ŀ���� DPVS���ֳ�Ϊ DPDK-LVS����һ���ο��� LVS �ں�̬�IJ㸺�ؾ��������ԭ�������� DPDK �û�̬��������ٿ�ܽ��п������IJ㸺�ؾ��������ɼ���HDSLB-DPVS �ļ�����ջ��Ҫ������ 4 ��������ɣ�

  1. LVS
  2. DPDK
  3. DPVS
  4. HDSLB-DPVS

Ҫ���������� HDSLB-DPVS �Ļ���ʵ��ԭ����������Ҫ��ͷ��ʼ����

LVS

LVS��Linux Virtual Server��Linux �������������һ�������� 1998 ����IJ㸺�ؾ�������Դ��Ŀ����Ŀ����ʹ�� Local Balancer ������ Server Cluster ������ʵ��һ���������ÿ������ԣ�Scalability�����ɿ��ԣ�Reliability���Ϳɹ����ԣ�Manageability���� Virtual Server��

Intel HDSLB ¸ßÐÔÄÜËÄ²ã¸ºÔØ¾ùºâÆ÷ ¡ª »ù±¾Ô­ÀíºÍ²¿ÊðÅäÖÃ

������������Ȼ LVS ���� Kernel ʵ�ֵ������������Ѿ�����ʱ�ˣ������߼��ܹ�����Ʋ��棬LVS �ĺ������������������񣬰�����

  • VS��Virtual Server������������� ��VS ���� DS �� RS ��Ϲ��ɵ�һ���߼����VS ����ͨ��һ�� VIP ���ⲿ Clients �ṩ����
  • DS��Director Server���������ȷ������� ���dz䵱 LB ������ڵķ������������ؾ�����Ե�ִ�к������ַ�������Ҳ��Ϊ FE��ǰ�˷���������
  • RS��Real Server����ʵ�������� ��RS ���������ڴ������������ķ�������Ҳ��Ϊ BE����˷���������
  • VIP��Virtual IP������ IP ��ַ�� ��VS ���ⲿ Client �ṩ����� IP ��ַ��
  • DIP��Director IP��ֱ�� IP ��ַ�� ��Director Server ���ڲ��� RS ����ͨ�ŵ� IP ��ַ��
  • RIP��Real IP����ʵ IP ��ַ�� ��RS �� DS ��ͨ�� IP ��ַ��
  • CIP��Client IP���ͻ��˵� IP ��ַ��
  • NAT �������ת��ģʽ
  • IP Tunneling ͸��ת��ģʽ
  • DR ��������ת��ģʽ
  • �ȵ�

Intel HDSLB ¸ßÐÔÄÜËÄ²ã¸ºÔØ¾ùºâÆ÷ ¡ª »ù±¾Ô­ÀíºÍ²¿ÊðÅäÖÃ

���� LVS ����ϸ�����ݣ��Ƽ��Ķ����� LVS & Keepalived ʵ�� L4 �߿��ø��ؾ����� ��

DPDK

���� 2010 �꣬IEEE 802.3 ��׼ίԱ�ᷢ���� 40GbE �� 100GbE 802.3ba ��̫����׼������������ʽ������ 100G ʱ��������ʱ��Linux �ں�Э��ջ�����紦�����ܾ�һֱ������ս���ȿ��������ݣ�

  • CPU ���� Main Memory ����Ҫ��ʱ��Ϊ 65 ���롣
  • �� NUMA node �� Main Memory ���� Copy ����Ҫ��ʱ��Ϊ 40 ���롣
  • CPU ����һ��Ӳ���ж�����Ҫ��ʱ��Ϊ 100 ΢�롣

��ʵ���ϣ�100G ��������Ϊ 2 �� PPS����ÿ����������ʱ�䲻�ܳ��� 50 ���롣

�ɼ������� Kernel ���������Ѿ��ߵ��˹յ㣬DPDK Ϊ�˶�������ͨ�����м��ټ���ʵ���� 100G ����ת����

  1. ʹ���û�̬Э��ջ�����ں�Э��ջ��Kernel by-pass (user space implementation).
  2. ʹ����ѵ�����жϣ�Polling instead of interrupt.
  3. ʹ�ö�˱�̴�����̣߳�Share-nothing, per-CPU for key data (lockless).
  4. �� CPU ����ͨ�ţ�Lockless message for high performance IPC.
  5. RX Steering and CPU affinity (avoid context switch).
  6. Zero Copy (avoid packet copy and syscalls).
  7. Batching TX/RX.
  8. etc...

Intel HDSLB ¸ßÐÔÄÜËÄ²ã¸ºÔØ¾ùºâÆ÷ ¡ª »ù±¾Ô­ÀíºÍ²¿ÊðÅäÖÃ

���� DPDK ����ϸ�����ݣ��Ƽ��Ķ����� DPDK �� ���ݼ��ٷ����ĺ���˼�� ��

DPVS

���ϣ����� LVS ����������һ�� Linux Kernel Module��ipvs�����������޷������ִ����������Թ��ڹ�˾ iqiyi ���� DPDK ������ DPVS��ֵ��һ����ǣ����� DPVS ��Ŀ�ɹ��ڹ�˾��Դ��ά���������俪Դ���������Ŀ�����Ҳ������Ѻá�

Intel HDSLB ¸ßÐÔÄÜËÄ²ã¸ºÔØ¾ùºâÆ÷ ¡ª »ù±¾Ô­ÀíºÍ²¿ÊðÅäÖÃ

�������ܷ�����Ż�֮�⣬�ڹ��ܲ��棬DPVS Ҳ�ṩ�˸��ḻ��������������

  • L4 Load Balancer, including FNAT, DR, Tunnel, DNAT modes, etc.
  • SNAT mode for Internet access from internal network.
  • NAT64 forwarding in FNAT mode for quick IPv6 adaptation without application changes.
  • Different schedule algorithms like RR, WLC, WRR, MH(Maglev Hashing), Conhash(Consistent Hashing) etc.
  • User-space Lite IP stack (IPv4/IPv6, Routing, ARP, Neighbor, ICMP ...).
  • Support KNI, VLAN, Bonding, Tunneling for different IDC environment.
  • Security aspect, support TCP syn-proxy, Conn-Limit, black-list�� white-list.
  • QoS: Traffic Control.

�������ܹ����棬DPVS ���������ݷ���ܹ��ͻ��� Keepalived �� Master-Backup �߿��üܹ���

  • ipvsadm������ VS��RS ���߼���Դ����Ĺ�����
  • dpip������ IP��Route �Ȼ���������Դ�Ĺ�����
  • keepalived�������ṩ���� VRRP Э��������߿��á�

Intel HDSLB ¸ßÐÔÄÜËÄ²ã¸ºÔØ¾ùºâÆ÷ ¡ª »ù±¾Ô­ÀíºÍ²¿ÊðÅäÖÃ

HDSLB-DPVS

HDSLB-DPVS �� DPVS ��������Ϊ�����ܸ��ؾ���������ô���ߵı���������ʲô�أ��𰸾��Ǹ�ǿ������ܣ�

Intel HDSLB ¸ßÐÔÄÜËÄ²ã¸ºÔØ¾ùºâÆ÷ ¡ª »ù±¾Ô­ÀíºÍ²¿ÊðÅäÖÃ

ͨ���ģ����ǿ���ʹ�� RBP��Ratio of Bandwidth and Performance growth rate�������������ٱȣ�������������������ٱ��� CPU ���ܵ����٣�����RBP=BW GR/Perf. GR��

����ͼ��ʾ��2010 ��ǰ������Ĵ����껯������Լ�� 30%���� 2015 �������� 35%��Ȼ���ڽ���ﵽ 45%�����Ӧ�ģ�CPU ������������ 10 ��ǰ�� 23%���½��� 12%�����ڽ���ֱ�ӽ��͵� 3.5%������ 3 ��ʱ����ڣ�RBP ָ��� RBP��1 ������I/O ѹ����δ���ֳ������������� RBP��3�����ڽ��곬���� RBP��10��

�ɼ���CPU �����Ѿ��޷�ֱ��Ӧ��������������١���Χ�� CPU ���д��������ٵ� DPDK �������������ս��

Intel HDSLB ¸ßÐÔÄÜËÄ²ã¸ºÔØ¾ùºâÆ÷ ¡ª »ù±¾Ô­ÀíºÍ²¿ÊðÅäÖÃ

�ص� DPVS �� HDSLB-DPVS �ı�������������⡣��������Ʋ��棬DPVS ��Ŀ���ǻ��� DPDK ���ʵ������������ļ��٣��� HDSLB-DPVS �����һ���Ľ����ּ������뵽 CPU �� NIC �����ϵ�Ӳ��ƽ̨֮�ϣ���ʵ���� �����ܶȡ� �� ������չ�� �� 2 ��Ŀ�꣺

  • ���ܶ� ��ָ���ǵ��� HDSLB �ڵ�� TCP ���������������������ر�ߡ�
  • ����չ ��ָ���������ܿ������� CPU Core ������������Դ���������Ӷ�������չ��

ʵ�����棬�������͵� Intel Xeon CPU��e.g. 3rd & 4th generation���� E810 100G NIC Ӳ��ƽ̨�ϣ�ʵ���ˣ�

  • Concurrent Session : 100M level / Node
  • Throughput : > 8Mpps / Core @FNAT
  • TCP Session Est. Rate > 800K / Core
  • Linear growth

�Դˣ������ڡ� Intel HDSLB �������IJ㸺�ؾ����� �� �������ź�Ӧ�ó��� �������Ѿ��� HDSLB-DPVS ��Խǰ�����������ݽ����˷��������ﲻ��׸����

HDSLB �IJ�������

Ӳ��Ҫ��

������뵽ʵ�����ڣ���Ҫ��ע HDSLB-DPVS �ı��롢��������á�Ϊ�˽��Ϳ������ż������Ա�����Ҫʹ���˿��������ż����������в���͵��ԡ�

�������Ի������Ƽ� ���⿪�������ż��Ƽ�
CPU �ܹ� Intel Xeon CPU �Ĵ� ֧�� AVX512 ϵ��ָ��� Intel CPU �ͺţ����磺Skylake ��
CPU ��Դ 2NUMA���رճ��߳� 1NUMA��4C
Memory ��Դ 128G 16G
NIC �ͺ� Intel E810 100G VirtI/O ������֧�ֶ����

���� CPU ��Ϣ��

$ lscpu
Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   46 bits physical, 57 bits virtual
CPU(s):                          4
On-line CPU(s) list:             0-3
Thread(s) per core:              2root@l4lb:~# lscpu
Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   46 bits physical, 57 bits virtual
CPU(s):                          4
On-line CPU(s) list:             0-3
Thread(s) per core:              2
Core(s) per socket:              2
Socket(s):                       1
NUMA node(s):                    1
Vendor ID:                       GenuineIntel
CPU family:                      6
Model:                           106
Model name:                      Intel(R) Xeon(R) Platinum 8350C CPU @ 2.60GHz
Stepping:                        6
CPU MHz:                         2599.994
BogoMIPS:                        5199.98
Hypervisor vendor:               KVM
Virtualization type:             full
L1d cache:                       96 KiB
L1i cache:                       64 KiB
L2 cache:                        2.5 MiB
L3 cache:                        48 MiB
NUMA node0 CPU(s):               0-3
Vulnerability Itlb multihit:     Not affected
Vulnerability L1tf:              Not affected
Vulnerability Mds:               Not affected
Vulnerability Meltdown:          Not affected
Vulnerability Mmio stale data:   Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown
Vulnerability Retbleed:          Not affected
Vulnerability Spec store bypass: Vulnerable
Vulnerability Spectre v1:        Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:        Vulnerable, IBPB: disabled, STIBP: disabled, PBRSB-eIBRS: Vulnerable
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Not affected
Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_ts
                                 c arch_perfmon rep_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_tim
                                 er aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single ssbd ibrs ibpb ibrs_enhanced fsgsbase tsc_adjust bm
                                 i1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xge
                                 tbv1 xsaves arat avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq rdpid arch_capabilities

����Ҫ��

  • OS��Ubuntu 20.04.3
  • Kernel��5.4.0-110-generic
  • GCC��9.4.0
  • DPDK��20.08

ֵ��ע����ǣ�Ubuntu /boot ����Ҫ���� 2G����������ں������������⡣�ο����ã� https://askubuntu.com/questions/1207958/error-24-write-error-cannot-write-compressed-block

���� OS ��Ϣ��

# ����ϵͳ
$ sudo apt-get update -y && sudo apt-get upgrade -y
# Dev
$ sudo apt-get install git vim wget patch unzip -y
# popt
$ sudo apt-get install libpopt-dev -y
# NUMA
$ sudo apt-get install libnuma-dev -y
$ sudo apt-get install numactl -y
# Pcap
$ sudo apt-get install libpcap-dev -y 
# SSL
$ sudo apt-get install libssl-dev -y


# Kernel 5.4.0-136
$ uname -r
5.4.0-136-generic

$ ll /boot/vmlinuz*
lrwxrwxrwx 1 root root       25 Dec 27  2022 /boot/vmlinuz -> vmlinuz-5.4.0-136-generic
-rw------- 1 root root 13660416 Aug 10  2022 /boot/vmlinuz-5.4.0-125-generic
-rw------- 1 root root 13668608 Nov 24  2022 /boot/vmlinuz-5.4.0-136-generic
-rw------- 1 root root 11657976 Apr 21  2020 /boot/vmlinuz-5.4.0-26-generic
lrwxrwxrwx 1 root root       25 Dec 27  2022 /boot/vmlinuz.old -> vmlinuz-5.4.0-125-generic

$ dpkg -l | egrep "linux-(signed|modules|image|headers)" | grep $(uname -r)
ii  linux-headers-5.4.0-136-generic       5.4.0-136.153                     amd64        Linux kernel headers for version 5.4.0 on 64 bit x86 SMP
ii  linux-image-5.4.0-136-generic         5.4.0-136.153                     amd64        Signed kernel image generic
ii  linux-modules-5.4.0-136-generic       5.4.0-136.153                     amd64        Linux kernel extra modules for version 5.4.0 on 64 bit x86 SMP
ii  linux-modules-extra-5.4.0-136-generic 5.4.0-136.153                     amd64        Linux kernel extra modules for version 5.4.0 on 64 bit x86 SMP

# GCC 9.4.0
$ gcc --version
gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

# �������
$ ulimit -n 655350
$ echo "ulimit -n 655350" >> /etc/rc.local
$ chmod a+x /etc/rc.local

���밲װ DPDK

DPDK ��װ�������ϸ���ݣ��Ƽ��Ķ����� DPDK �� ��װ���� ����

$ cd /root/
$ git clone https://github.com/intel/high-density-scalable-load-balancer hdslb

$ wget http://fast.dpdk.org/rel/dpdk-20.08.tar.xz
$ tar vxf dpdk-20.08.tar.xz

# �򲹶�
$ cp hdslb/patch/dpdk-20.08/*.patch dpdk-20.08/
$ cd dpdk-20.08/
$ patch -p 1 < 0002-support-large_memory.patch
$ patch -p 1 < 0003-net-i40e-ice-support-rx-markid-ofb.patch

# ����
$ make config T=x86_64-native-linuxapp-gcc MAKE_PAUSE=n
$ make MAKE_PAUSE=n -j 4

���밲װ HDSLB-DPVS

HDSLB-DPVS �ı��밲װ��������Ҫ�������� CPU Ӳ������ָ����磺AVX2��AVX512 �ȵȡ�Ҫ����ɹ����� 2 �����Ҫ��

  1. Ҫ�� CPU Ӳ��֧�֣��Ƽ�ʹ�� Intel Xeon ��������ϵ�У����磺Intel Xeon Gold��
  2. Ҫ�� GCC �汾֧�֣��Ƽ����ð汾�ϸߵ� GCC�����籾���е� 9.4.0��
$ cd dpdk-20.08/
$ export RTE_SDK=$PWD

$ cd hdslb/
$ chmod +x tools/keepalived/configure

# ���밲װ
$ make -j 4
$ make install

���ô�ҳ�ڴ�

�����������Ի����У���ҳ�ڴ�Ӧ�þ����ܵĸ���HDSLB �� LB connect pool ��Ҫ����������ڴ棬����ʵ�ʵ����ܹ����ֱ�ӹ�ϵ��

$ mkdir /mnt/huge_1GB
$ mount -t hugetlbfs nodev /mnt/huge_1GB

$ vim /etc/fstab
nodev /mnt/huge_1GB hugetlbfs pagesize=1GB 0 0

$ # for NUMA machine
$ echo 15 > /sys/devices/system/node/node0/hugepages/hugepages-1048576kB/nr_hugepages

$ vim /etc/default/grub
GRUB_CMDLINE_LINUX_DEFAULT="${GRUB_CMDLINE_LINUX_DEFAULT} default_hugepagesz=1G hugepagesz=1G hugepages=15" 

$ sudo update-grub
$ init 6

$ cat /proc/meminfo | grep Huge
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
FileHugePages:         0 kB
HugePages_Total:      13
HugePages_Free:       13
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:    1048576 kB
Hugetlb:        13631488 kB

$ free -h
              total        used        free      shared  buff/cache   available
Mem:           15Gi        13Gi       2.0Gi       2.0Mi       408Mi       2.2Gi
Swap:            0B          0B          0B

��������

$ modprobe vfio-pci
$ modprobe vfio enable_unsafe_noiommu_mode=1 # https://stackoverflow.com/questions/75840973/dpdk20-11-3-cannot-bind-device-to-vfio-pci
$ echo 1 > /sys/module/vfio/parameters/enable_unsafe_noiommu_mode

$ cd dpdk-20.08/
$ export RTE_SDK=$PWD
$ insmod ${RTE_SDK}/build/kmod/rte_kni.ko

$ ${RTE_SDK}/usertools/dpdk-devbind.py --status-dev net
Network devices using kernel driver
===================================
0000:01:00.0 'Virtio network device 1000' if=eth0 drv=virtio-pci unused=vfio-pci *Active*
0000:03:00.0 'Virtio network device 1000' if=eth1 drv=virtio-pci unused=vfio-pci
0000:04:00.0 'Virtio network device 1000' if=eth2 drv=virtio-pci unused=vfio-pci


$ ifconfig eth1 down # 0000:03:00.0
$ ifconfig eth2 down # 0000:04:00.0
$ ${RTE_SDK}/usertools/dpdk-devbind.py -b vfio-pci 0000:03:00.0 0000:04:00.0

$ ${RTE_SDK}/usertools/dpdk-devbind.py --status-dev net
Network devices using DPDK-compatible driver
============================================
0000:03:00.0 'Virtio network device 1000' drv=vfio-pci unused=
0000:04:00.0 'Virtio network device 1000' drv=vfio-pci unused=
Network devices using kernel driver
===================================
0000:01:00.0 'Virtio network device 1000' if=eth0 drv=virtio-pci unused=vfio-pci *Active*

���� HDSLB-DPVS

$ cp conf/hdslb.conf.sample /etc/hdslb.conf

# ���ý���
$ cat /etc/hdslb.conf
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
! This is hdslb default configuration file.
!
! The attribute "" denotes the configuration item at initialization stage. Item of
! this type is configured oneshoot and not reloadable. If invalid value configured in the
! file, hdslb would use its default value.
!
! Note that hdslb configuration file supports the following comment type:
!   * line comment: using '#" or '!'
!   * inline range comment: using '<' and '>', put comment in between
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

! global config
global_defs {
    log_level   DEBUG # �������
    ! log_file    /var/log/hdslb.log
    ! log_async_mode    on
}

! netif config
netif_defs {
     pktpool_size     1048575
     pktpool_cache    256
    # LAN Interface ����
     device dpdk0 {
        rx {
            queue_number        3
            descriptor_number   1024
            ! rss                 all
        }
        tx {
            queue_number        3
            descriptor_number   1024
        }
        fdir {
            mode                perfect
            pballoc             64k
            status              matched
        }
        ! promisc_mode
        kni_name                dpdk0.kni
    }
    # WAN Interface ����
     device dpdk1 {
        rx {
            queue_number        3
            descriptor_number   1024
            ! rss                 all
        }
        tx {
            queue_number        3
            descriptor_number   1024
        }
        fdir {
            mode                perfect
            pballoc             64k
            status              matched
        }
        ! promisc_mode
        kni_name                dpdk1.kni
    }

    !  bonding bond0 {
    !    mode        0
    !    slave       dpdk0
    !    slave       dpdk1
    !    primary     dpdk0
    !    kni_name    bond0.kni
    !}
}

! worker config (lcores)
worker_defs {
    # control plane CPU
     worker cpu0 {
        type    master
        cpu_id  0
    }
    # data plane CPU
    # dpdk0��1 �� 2 �� Port ��ͬһ���շ����й���ͬһ�� CPU
     worker cpu1 {
        type    slave
        cpu_id  1
        port    dpdk0 {
            rx_queue_ids     0
            tx_queue_ids     0
            ! isol_rx_cpu_ids  9
            ! isol_rxq_ring_sz 1048576
        }
        port    dpdk1 {
            rx_queue_ids     0
            tx_queue_ids     0
            ! isol_rx_cpu_ids  9
            ! isol_rxq_ring_sz 1048576
        }
    }
     worker cpu2 {
        type    slave
        cpu_id  2
        port    dpdk0 {
            rx_queue_ids     1
            tx_queue_ids     1
            ! isol_rx_cpu_ids  10
            ! isol_rxq_ring_sz 1048576
        }
        port    dpdk1 {
            rx_queue_ids     1
            tx_queue_ids     1
            ! isol_rx_cpu_ids  10
            ! isol_rxq_ring_sz 1048576
        }
    }
     worker cpu3 {
        type    slave
        cpu_id  3
        port    dpdk0 {
            rx_queue_ids     2
            tx_queue_ids     2
            ! isol_rx_cpu_ids  11
            ! isol_rxq_ring_sz 1048576
        }
        port    dpdk1 {
            rx_queue_ids     2
            tx_queue_ids     2
            ! isol_rx_cpu_ids  11
            ! isol_rxq_ring_sz 1048576
        }
    }

}

! timer config
timer_defs {
    # cpu job loops to schedule dpdk timer management
    schedule_interval    500
}

! hdslb neighbor config
neigh_defs {
     unres_queue_length  128
     timeout             60
}

! hdslb ipv4 config
ipv4_defs {
    forwarding                 off
     default_ttl         64
    fragment {
         bucket_number   4096
         bucket_entries  16
         max_entries     4096
         ttl             1
    }
}

! hdslb ipv6 config
ipv6_defs {
    disable                     off
    forwarding                  off
    route6 {
         method           hlist
        recycle_time            10
    }
}

! control plane config
ctrl_defs {
    lcore_msg {
         ring_size                4096
        sync_msg_timeout_us             30000000
        priority_level                  low
    }
    ipc_msg {
         unix_domain /var/run/hdslb_ctrl
    }
}

! ipvs config
ipvs_defs {
    conn {
         conn_pool_size       2097152
         conn_pool_cache      256
        conn_init_timeout           30
        ! expire_quiescent_template
        ! fast_xmit_close
        !  redirect           off
    }

    udp {
        ! defence_udp_drop
        uoa_mode        opp
        uoa_max_trail   3
        timeout {
            normal      300
            last        3
        }
    }

    tcp {
        ! defence_tcp_drop
        timeout {
            none        2
            established 90
            syn_sent    3
            syn_recv    30
            fin_wait    7
            time_wait   7
            close       3
            close_wait  7
            last_ack    7
            listen      120
            synack      30
            last        2
        }
        synproxy {
            synack_options {
                mss             1452
                ttl             63
                sack
                ! wscale
                ! timestamp
            }
            ! defer_rs_syn
            rs_syn_max_retry    3
            ack_storm_thresh    10
            max_ack_saved       3
            conn_reuse_state {
                close
                time_wait
                ! fin_wait
                ! close_wait
                ! last_ack
           }
        }
    }
}

! sa_pool config
sa_pool {
    pool_hash_size   16
}

���� HDSLB-DPVS

$ cd hdslb/
$ ./bin/hdslb
current thread affinity is set to F
EAL: Detected 4 lcore(s)
EAL: Detected 1 NUMA nodes
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'PA'
EAL: Probing VFIO support...
EAL: VFIO support initialized
EAL:   Invalid NUMA socket, default to 0
EAL: Probe PCI driver: net_virtio (1af4:1000) device: 0000:01:00.0 (socket 0)
EAL:   Invalid NUMA socket, default to 0
EAL: Probe PCI driver: net_virtio (1af4:1000) device: 0000:03:00.0 (socket 0)
EAL:   using IOMMU type 8 (No-IOMMU)
EAL: Ignore mapping IO port bar(0)
EAL:   Invalid NUMA socket, default to 0
EAL: Probe PCI driver: net_virtio (1af4:1000) device: 0000:04:00.0 (socket 0)
EAL: Ignore mapping IO port bar(0)
EAL: No legacy callbacks, legacy socket not created
DPVS: HDSLB version: , build on 2024.05.24.14:37:02
CFG_FILE: Opening configuration file '/etc/hdslb.conf'.
CFG_FILE: log_level = WARNING
NETIF: dpdk0:rx_queue_number = 3
NETIF: dpdk1:rx_queue_number = 3
NETIF: worker cpu1:dpdk0 rx_queue_id += 0
NETIF: worker cpu1:dpdk0 tx_queue_id += 0
NETIF: worker cpu1:dpdk1 rx_queue_id += 0
NETIF: worker cpu1:dpdk1 tx_queue_id += 0
NETIF: worker cpu2:dpdk0 rx_queue_id += 1
NETIF: worker cpu2:dpdk0 tx_queue_id += 1
NETIF: worker cpu2:dpdk1 rx_queue_id += 1
NETIF: worker cpu2:dpdk1 tx_queue_id += 1
NETIF: worker cpu3:dpdk0 rx_queue_id += 2
NETIF: worker cpu3:dpdk0 tx_queue_id += 2
NETIF: worker cpu3:dpdk1 rx_queue_id += 2
NETIF: worker cpu3:dpdk1 tx_queue_id += 2
Kni: kni_add_dev: fail to set mac FA:27:00:00:0A:02 for dpdk0.kni: Timer expired
Kni: kni_add_dev: fail to set mac FA:27:00:00:0B:F6 for dpdk1.kni: Timer expired

Intel HDSLB ¸ßÐÔÄÜËÄ²ã¸ºÔØ¾ùºâÆ÷ ¡ª »ù±¾Ô­ÀíºÍ²¿ÊðÅäÖÃ

HDSLB-DPVS ���������󣬿��Կ��� 2 �� DPDK Port �Ͷ�Ӧ�� 2 �� KNI Interface������ DPDK Port ���� LB ������ת������ KNI ������ Keepalived HA ����ģʽ��

$ cd hdslb/bin/
$ ./dpip link show
1: dpdk0: socket 0 mtu 1500 rx-queue 3 tx-queue 3
    UP 10000 Mbps half-duplex auto-nego
    addr FA:27:00:00:0A:02 OF_TX_IP_CSUM
2: dpdk1: socket 0 mtu 1500 rx-queue 3 tx-queue 3
    UP 10000 Mbps half-duplex auto-nego
    addr FA:27:00:00:0B:F6 OF_TX_IP_CSUM


$ ip a
1: lo:  mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0:  mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether fa:27:00:00:00:c0 brd ff:ff:ff:ff:ff:ff
    inet 192.168.0.4/25 brd 192.168.0.127 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::f827:ff:fe00:c0/64 scope link
       valid_lft forever preferred_lft forever
71: dpdk0.kni:  mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether fa:27:00:00:0a:02 brd ff:ff:ff:ff:ff:ff
72: dpdk1.kni:  mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether fa:27:00:00:0b:f6 brd ff:ff:ff:ff:ff:ff

���� HDSLB-DPVS Two-arm Full-NAT ģʽ

Intel HDSLB ¸ßÐÔÄÜËÄ²ã¸ºÔØ¾ùºâÆ÷ ¡ª »ù±¾Ô­ÀíºÍ²¿ÊðÅäÖÃ

  • HDSLB-DPVS
$ cd hdslb/bin

# add VIP to WAN interface
$ ./dpip addr add 10.0.0.100/32 dev dpdk1

# route for WAN/LAN access
$ ./dpip route add 10.0.0.0/16 dev dpdk1
$ ./dpip route add 192.168.100.0/24 dev dpdk0

# add routes for other network or default route if needed.
$ ./dpip route show
inet 10.0.0.100/32 via 0.0.0.0 src 0.0.0.0 dev dpdk1 mtu 1500 tos 0 scope host metric 0 proto auto
inet 192.168.100.0/24 via 0.0.0.0 src 0.0.0.0 dev dpdk0 mtu 1500 tos 0 scope link metric 0 proto auto
inet 10.0.0.0/16 via 0.0.0.0 src 0.0.0.0 dev dpdk1 mtu 1500 tos 0 scope link metric 0 proto auto

# add service  to forwarding, scheduling mode is RR.
$ ./ipvsadm -A -t 10.0.0.100:80 -s rr

# add two RS for service, forwarding mode is FNAT (-b)
$ ./ipvsadm -a -t 10.0.0.100:80 -r 192.168.100.2 -b
$ ./ipvsadm -a -t 10.0.0.100:80 -r 192.168.100.3 -b

# add at least one Local-IP (LIP) for FNAT on LAN interface
$ ./ipvsadm --add-laddr -z 192.168.100.200 -t 10.0.0.100:80 -F dpdk0

# Check
$  ./ipvsadm -Ln
IP Virtual Server version 0.0.0 (size=0)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  10.0.0.100:80 rr
  -> 192.168.100.2:80             FullNat 1      0          0
  -> 192.168.100.3:80             FullNat 1      0          0
  • Server
$ python -m SimpleHTTPServer 80
  • Client
$ curl 10.0.0.100

�������

���� 1 ��hdslb/tools/keepalived/configure û��ִ��Ȩ�ޡ�

make[1]: Leaving directory '/root/hdslb/src'
make[1]: Entering directory '/root/hdslb/tools'
if [ ! -f keepalived/Makefile ]; then \
        cd keepalived && \
        ./configure && \
        cd -; \
fi
/bin/sh: 3: ./configure: Permission denied
make[1]: *** [Makefile:29: keepalived_conf] Error 126
make[1]: Leaving directory '/root/hdslb/tools'
make: *** [Makefile:33: all] Error 1

# ���
$ chmod +x /root/hdslb/tools/keepalived/configure

���� 2 ��ȱ�������ļ�

Cause: ports in DPDK RTE (2) != ports in dpvs.conf(0)

# ���
$ cp conf/hdslb.conf.sample /etc/hdslb.conf

���� 3 �������� 2MB hugepage size ̫С

Cause: Cannot init mbuf pool on socket 0

# ������� hugepagesize ����Ϊ 1G
# ref��https://stackoverflow.com/questions/51630926/cannot-create-mbuf-pool-with-dpdk

���� 4 ��ȱ�� rte_kni ģ��

Cause: add KNI port fail, exiting...

# ���
$ insmod ${RTE_SDK}/build/kmod/rte_kni.ko

���� 5 ����������ҳ�ڴ治��

Kni: kni_add_dev: fail to set mac FA:27:00:00:07:AA for dpdk0.kni: Timer expired
Kni: kni_add_dev: fail to set mac FA:27:00:00:00:E1 for dpdk1.kni: Timer expired
IPSET: ipset_init: lcore 3: nothing to do.
IPVS: dp_vs_conn_init: lcore 3: nothing to do.
IPVS: fail to init synproxy: no memory
Segmentation fault (core dumped)

# ��������ݵ� 15G��

���� 6 ��������������֧�� HDSLB-DPVS ��Ҫ�� hardware offloads ���ܡ�

Kni: kni_add_dev: fail to set mac FA:27:00:00:0A:02 for dpdk0.kni: Timer expired
Kni: kni_add_dev: fail to set mac FA:27:00:00:0B:F6 for dpdk1.kni: Timer expired
Ethdev port_id=0 requested Rx offloads 0x62f doesn't match Rx offloads capabilities 0xa1d in rte_eth_dev_configure()
NETIF: netif_port_start: fail to config dpdk0
EAL: Error - exiting with code: 1
  Cause: Start dpdk0 failed, skipping ...

# ������޸� netif ģ�飬��������֧�ֵ� offloads ���ܡ�
static struct rte_eth_conf default_port_conf = {
    .rxmode = {
......
        .offloads = 0,
        //.offloads = DEV_RX_OFFLOAD_CHECKSUM | DEV_RX_OFFLOAD_VLAN,
    },
......
    .txmode = {
......
        .offloads = 0,
        //.offloads = DEV_TX_OFFLOAD_IPV4_CKSUM | DEV_TX_OFFLOAD_UDP_CKSUM | DEV_TX_OFFLOAD_TCP_CKSUM | DEV_TX_OFFLOAD_MBUF_FAST_FREE,
    },

NOTE������ DPDK ���ĵ���offloads mask ��ÿ�� bit ���������ض���ж�ع��ܡ����� 0-15bit ��Ӧ�� Features��

  1. DEV_RX_OFFLOAD_VLAN_STRIP
  2. DEV_RX_OFFLOAD_IPV4_CKSUM
  3. DEV_RX_OFFLOAD_UDP_CKSUM
  4. DEV_RX_OFFLOAD_TCP_CKSUM
  5. DEV_RX_OFFLOAD_TCP_LRO
  6. DEV_RX_OFFLOAD_QINQ_STRIP
  7. DEV_RX_OFFLOAD_OUTER_IPV4_CKSUM
  8. DEV_RX_OFFLOAD_MACSEC_STRIP
  9. DEV_RX_OFFLOAD_VLAN_FILTER
  10. DEV_RX_OFFLOAD_VLAN_EXTEND
  11. DEV_RX_OFFLOAD_SCATTER
  12. DEV_RX_OFFLOAD_TIMESTAMP
  13. DEV_RX_OFFLOAD_SECURITY
  14. DEV_RX_OFFLOAD_KEEP_CRC
  15. DEV_RX_OFFLOAD_SCTP_CKSUM
  16. DEV_RX_OFFLOAD_OUTER_UDP_CKSUM

���� 7 �����������粻֧�� RSS ����С�valid value: 0x0 ��ʾ��ǰ������֧���κ� RSS ��ϣ������


Kni: kni_add_dev: fail to set mac FA:27:00:00:0A:02 for dpdk0.kni: Timer expired
Kni: kni_add_dev: fail to set mac FA:27:00:00:0B:F6 for dpdk1.kni: Timer expired
Ethdev port_id=0 invalid rss_hf: 0x3afbc, valid value: 0x0
NETIF: netif_port_start: fail to config dpdk0
EAL: Error - exiting with code: 1
  Cause: Start dpdk0 failed, skipping ...

# �����ʽ 1��ʹ��֧�� multi-queues �� RSS hash ��������
# �����ʽ 2���޸� netif ģ�飬������ multi-queues �� RSS hash ���ܡ�
static struct rte_eth_conf default_port_conf = {
    .rxmode = {
        //.mq_mode        = ETH_MQ_RX_RSS,
        .mq_mode        = ETH_MQ_RX_NONE,
......
    },
    .rx_adv_conf = {
        .rss_conf = {
            .rss_key = NULL,
            //.rss_hf  = /*ETH_RSS_IP*/ ETH_RSS_TCP,
            .rss_hf  = 0
        },
    },
......
port->dev_conf.rx_adv_conf.rss_conf.rss_hf = 0;    

���� 8 ����֧�ֶಥ��ַ����

Kni: kni_add_dev: fail to set mac FA:27:00:00:0A:02 for dpdk0.kni: Timer expired
Kni: kni_add_dev: fail to set mac FA:27:00:00:0B:F6 for dpdk1.kni: Timer expired
NETIF: multicast address add failed for device dpdk0
EAL: Error - exiting with code: 1
  Cause: Start dpdk0 failed, skipping ...

# ������رնಥ����
    //ret = idev_add_mcast_init(port);
    //if (ret != EDPVS_OK) {
    //    RTE_LOG(WARNING, NETIF, "multicast address add failed for device %s\n", port->name);
    //    return ret;
    //}

���� 9 ��LB connect pool �ڴ�̫С����������˳���

$ ./ipvsadm -A -t 10.0.0.100:80 -s rr
[sockopt_msg_recv] socket msg header recv error -- 0/88 recieved  

IPVS: lb_conn_hash_table_init: lcore 0: create conn_hash_tbl failed. Requested size: 1073741824 bytes. LB_CONN_CACHE_LINES_DEF: 1, LB_CONN_TBL_SIZE: 16777216

# �����ʽ 1�������Ӵ�ҳ�ڴ浽ʵ����Ҫ�Ĵ�С��
# �����ʽ 2��
#	1�����ͷ�һ�� lcore �Ĵ�ҳ�ڴ�
#	2������С DPVS_CONN_POOL_SIZE_DEF �� 2097152 ���ٵ� 1048576
//#define DPVS_CONN_POOL_SIZE_DEF     2097152
#define DPVS_CONN_POOL_SIZE_DEF     1048576

���� 10 ���������汾��ȱ�ٱ���ָ�

error: inlining failed in call to always_inline   "'_mm256_cmpeq_epi64_mask':"  : target specific option mismatch

# �����
# 1������ GCC �汾�� 9.4.0
# 2��ȷ�� CPU ֧��ָ���ref��https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#expand=3828,301,2553&text=_mm256_cmpeq_epi64_mask&ig_expand=872

Intel HDSLB ¸ßÐÔÄÜËÄ²ã¸ºÔØ¾ùºâÆ÷ ¡ª »ù±¾Ô­ÀíºÍ²¿ÊðÅäÖÃ

���

ֵ��ע��������������¼�DZ����ڵ��俪�����е��Գ���ʱ�����������⣬ʵ������һ����Դ������������Ի���ͨ���������������Դ���㵼�µĴ󲿷����⡣

��󣬱�ƪ��Ҫ������ Intel HDSLB �Ļ�������ԭ���Ͳ������õķ�ʽ��ϣ���ܹ�����������˳���İ� HDSLB-DPVS ��Ŀ ���桱 ���������棬���ǽ��ٴο����������Ļ���֮�ϣ�ͨ����Intel HDSLB �������IJ㸺�ؾ����� �� �߼����Ժʹ��������������������ھ� HDSLB-DPVS �ĸ߼����ԡ������ܹ������ʹ���������������ڴ�������

С���Ƽ��Ķ�

�������������Ľ�Ϊ������Ϣ����������������ͬ���޹۵��֤ʵ��������

�����Ƶ����

����

ͬ������

����

ɨ��ά�����������ֻ��汾��

ɨ��ά����������΢�Ź��ںţ�

��վ�������������������ϴ��������ַ���İ�Ȩ���뷢�ʼ�[email protected]

��ICP��2022002427��-10 �湫��������43070202000427��© 2013~2025 haote.com ������