Date   

Re: DAOS server start failed(NVMe Scan Failed: privileged binary execution failed)

JiangYu
 

Thanks, the CPU information is as follows, how can I choose the CPU?

[root@Rocky-1 ~]# lscpu
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              40
On-line CPU(s) list: 0-39
Thread(s) per core:  2
Core(s) per socket:  10
Socket(s):           2
NUMA node(s):        2
Vendor ID:           GenuineIntel
BIOS Vendor ID:      Intel
CPU family:          6
Model:               62
Model name:          Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz
BIOS Model name:           Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz
Stepping:            4
CPU MHz:             3600.000
CPU max MHz:         3600.0000
CPU min MHz:         1200.0000
BogoMIPS:            5599.96
Virtualization:      VT-x
L1d cache:           32K
L1i cache:           32K
L2 cache:            256K
L3 cache:            25600K
NUMA node0 CPU(s):   0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
NUMA node1 CPU(s):   1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm cpuid_fault pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm ida arat pln pts md_clear flush_l1d


Re: DAOS server start failed(NVMe Scan Failed: privileged binary execution failed)

Huang, Lei
 

ERROR: /usr/bin/daos_admin SIGILL: illegal instruction

 

You CPU does not support certain instructions inside daos_admin process. Could you please attach the output of “lscpu” of your computer? Thank you!

 

-lei

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of lnsyyj@...
Sent: Wednesday, November 2, 2022 10:26 PM
To: daos@daos.groups.io
Subject: [daos] DAOS server start failed(NVMe Scan Failed: privileged binary execution failed)

 

Hello everyone,

When I start Daos Server, the following information appears. How should I solve it?

 

[root@Rocky-1 ~]# /usr/share/spdk/scripts/setup.sh

0000:44:00.0 (1d78 1512): nvme -> vfio-pci

[root@Rocky-1 ~]# cat /etc/daos/daos_server.yml 

name: daos_server

access_points: ['Rocky-1']

port: 10001

transport_config:

  allow_insecure: false

  client_cert_dir: /etc/daos/certs/clients

  ca_cert: /etc/daos/certs/daosCA.crt

  cert: /etc/daos/certs/server.crt

  key: /etc/daos/certs/server.key

provider: ofi+sockets

socket_dir: /var/run/

nr_hugepages: 4096

control_log_mask: DEBUG

control_log_file: /var/log/daos_server.log

helper_log_file: /var/log/daos_admin.log

 

engines:

-

  targets: 8

  nr_xs_helpers: 0

  fabric_iface: enp3s0f1

  fabric_iface_port: 31316

  log_mask: INFO

  log_file: /var/log/daos_engine_0.log

  env_vars:

      - CRT_TIMEOUT=30

  storage:

  -

    class: ram

    scm_mount: /mnt/daos0

    scm_size: 2 #gb to allocate for tmpfs to emulate SCM

  -

    class: nvme

    bdev_list: ["0000:44:00.0"]

 


[root@Rocky-1 ~]# /usr/bin/daos_server start

DAOS Server config loaded from /etc/daos/daos_server.yml

/usr/bin/daos_server logging to file /var/log/daos_server.log

DEBUG 11:17:31.720878 start.go:90: Switching control log level to DEBUG

DEBUG 11:17:31.721131 defaults.go:92: failed to load library: unable to open a handle to the library

ERROR: unable to open a handle to the library

DEBUG 11:17:31.721209 fabric.go:875: waiting for fabric interfaces to become ready...

DEBUG 11:17:31.721299 fabric.go:892: fabric interface "enp3s0f1" is ready

DEBUG 11:17:31.721372 provider.go:87: getting topology with hwloc version 0x20100

DEBUG 11:17:31.769773 provider.go:145: adding device found at "/sys/class/net/eno1" (type network interface, NUMA node 0)

DEBUG 11:17:31.769933 provider.go:145: adding device found at "/sys/class/net/eno2" (type network interface, NUMA node 0)

DEBUG 11:17:31.770081 provider.go:145: adding device found at "/sys/class/net/eno3" (type network interface, NUMA node 0)

DEBUG 11:17:31.770212 provider.go:145: adding device found at "/sys/class/net/eno4" (type network interface, NUMA node 0)

DEBUG 11:17:31.770357 provider.go:145: adding device found at "/sys/class/net/enp3s0f0" (type network interface, NUMA node 0)

DEBUG 11:17:31.770485 provider.go:145: adding device found at "/sys/class/net/enp3s0f1" (type network interface, NUMA node 0)

DEBUG 11:17:31.770537 provider.go:125: failed to read net device: open /sys/class/net/lo/device/net: no such file or directory

DEBUG 11:17:31.770749 provider.go:264: adding virtual device at "/sys/devices/virtual/net/lo"

DEBUG 11:17:31.886150 provider.go:83: found fabric interfaces:

enp3s0f1 (providers: ofi+sockets, ofi+tcp, ofi+tcp;ofi_rxm, udp, udp;ofi_rxd)

lo (providers: ofi+sockets, ofi+tcp, ofi+tcp;ofi_rxm, udp, udp;ofi_rxd)

shm (providers: shm)

DEBUG 11:17:31.886239 provider.go:292: no cxi subsystem in sysfs

DEBUG 11:17:31.886338 fabric.go:441: unable to open a handle to the library

DEBUG 11:17:31.886419 fabric.go:511: ignoring fabric interface "shm" (shm) not found in topology

DEBUG 11:17:31.886534 fabric.go:793: discovered 2 fabric interfaces:

enp3s0f1 (interface: enp3s0f1) (providers: ofi+sockets, ofi+tcp, ofi+tcp;ofi_rxm, udp, udp;ofi_rxd)

lo (interface: lo) (providers: ofi+sockets, ofi+tcp, ofi+tcp;ofi_rxm, udp, udp;ofi_rxd)

DEBUG 11:17:31.886645 server.go:750: detected NUMA affinity 0 for engine 0

DEBUG 11:17:31.886675 server.go:757: enabling single-engine legacy core allocation algorithm

DEBUG 11:17:31.886703 server.go:420: validating config file read from "/etc/daos/daos_server.yml"

DEBUG 11:17:31.886742 server.go:443: vfio=true hotplug=false vmd=true requested in config

WARNING: Configuration includes only one access point. This provides no redundancy in the event of an access point failure.

DEBUG 11:17:31.886841 server.go:549: engine 0 fabric numa 0, storage numa 0

DEBUG 11:17:31.887914 server_utils.go:148: setting OFI_DOMAIN=enp3s0f1 for enp3s0f1

DEBUG 11:17:31.889170 server.go:377: active config saved to /var/run/.daos_server.active.yml (read-only)

DEBUG 11:17:31.889251 server.go:525: fault domain: /rocky-1

DEBUG 11:17:31.889862 server.go:236: setting core dump filter to 0x13

DEBUG 11:17:31.890615 database.go:280: set db replica addr: 192.168.1.215:10001

DEBUG 11:17:31.891076 server.go:164: time to init network: 242.45µs

DEBUG 11:17:31.891195 server_utils.go:260: allocating 4098 hugepages on each of these numa nodes: [0]

DEBUG 11:17:31.891267 ctl_storage.go:53: calling bdev provider prepare: {ForwardableRequest:{Forwarded:false} HugePageCount:4098 HugeNodes:0 CleanHugePagesOnly:false PCIAllowList: PCIBlockList: TargetUser:root Reset_:false DisableVFIO:false EnableVMD:true}

DEBUG 11:17:32.224164 server.go:164: time to prepare bdev storage: 332.967644ms

DEBUG 11:17:32.224261 ctl_storage.go:59: calling bdev provider scan: {ForwardableRequest:{Forwarded:false} DeviceList:0000:44:00.0 VMDEnabled:false BypassCache:true}

ERROR: /usr/bin/daos_admin SIGILL: illegal instruction

PC=0x7fbc78755c0e m=0 sigcode=2

signal arrived during cgo execution

instruction bytes: 0xc4 0xe2 0x69 0xf7 0xc0 0x41 0x89 0x85 0xe0 0x19 0x0 0x0 0xe8 0xb1 0x98 0xfe

 

goroutine 1 [syscall]:

runtime.cgocall(0x92049d, 0xc0001d1c20)

/usr/src/runtime/cgocall.go:158 +0x5c fp=0xc0001d1bf8 sp=0xc0001d1bc0 pc=0x408b1c

github.com/daos-stack/daos/src/control/lib/spdk._Cfunc_nvme_discover()

_cgo_gotypes.go:321 +0x49 fp=0xc0001d1c20 sp=0xc0001d1bf8 pc=0x904fc9

github.com/daos-stack/daos/src/control/lib/spdk.(*NvmeImpl).Discover(0xc0000bcd00?, {0xb267f8, 0xc000184300})

/builddir/build/BUILD/daos-2.2.0/src/control/lib/spdk/nvme.go:127 +0x54 fp=0xc0001d1cd8

ERROR: /usr/bin/daos_admin  sp=0xc0001d1c20 pc=0x9059b4

github.com/daos-stack/daos/src/control/server/storage/bdev.(*spdkBackend).Scan(0xc0000bcce0, {{0x56?}, 0xc0001d5030?, 0x20?, 0x1?})

/builddir/build/BUILD/daos-2.2.0/src/control/server/storage/bdev/backend.go:341 +0x1b7 fp=0xc0001d1da8 sp=0xc0001d1cd8 pc=0x909f37

github.com/daos-stack/daos/src/control/server/storage/bdev.(*Provider).Scan(...)

/builddir/build/BUILD/daos-2.2.0/src/control/server/storage/bdev/provider.go:54

main.(*bdevScanHandler).Handle(0xc000014788, {0xb267f8?, 0xc000184300}, 0xc0002e4240)

/builddir/build/BUILD/daos-2.2.0/src/control/cmd/daos_admin/handler.go:175 +0x27a fp=0xc0001d1e08 sp=0xc0001d1da8 pc=0x91d1fa

github.com/daos-stack/daos/src/control/pbin.(*App).handleRequest(0xc0000caae0, 0xc0002e4240)

/builddir/build/BUILD/daos-2.2.0/src/control/pbin/app.go:214 +0x62 fp=0xc0001d1e58 sp=0xc0001d1e08 pc=0x5949c2

github.com/daos-stack/daos/src/control/pbin.(*App).Run(0xc0000caae0)

/builddir/build/BUILD/daos-2.2.0/src/control/pbin/app.go:155 +0x2ed fp=0xc0001d1f50 sp=0xc0001d1e58 pc=0x59448d

main.main()

/builddir/build/BUILD/daos-2.2.0/src/control/cmd/daos_admin/main.go:25 +0xaf fp=0xc0001d1f80 sp=0xc0001d1f50 pc=0x91de6f

runtime.main()

/usr/src/runtime/proc.go:250 +0x212 fp=0xc0001d1fe0 sp=0xc0001d1f80 pc=0x43dd32

runtime.goexit

ERROR: /usr/bin/daos_admin ()

/usr/src/runtime/asm_amd64.s:1594 +0x1 fp=0xc0001d1fe8 sp=0xc0001d1fe0 pc=0x46b9c1

 

goroutine 2 [force gc (idle)]:

runtime.gopark(0x0?, 0x0?, 0x0?

ERROR: /usr/bin/daos_admin , 0x0?, 0x0?)

/usr/src/runtime/proc.go:363 +0xd6 fp=0xc00009efb0 sp=0xc00009ef90 pc=0x43e0f6

ERROR: /usr/bin/daos_admin runtime.goparkunlock(...)

ERROR: /usr/bin/daos_admin  /usr/src/runtime/proc.go:369

runtime.forcegchelper()

 

ERROR: /usr/bin/daos_admin /usr/src/runtime/proc.go:302 +0xad fp=0xc00009efe0 sp=0xc00009efb0 pc=0x43df8d

runtime.goexit()

/usr/src/runtime/asm_amd64.s

ERROR: /usr/bin/daos_admin :1594 +0x1 fp=0xc00009efe8 sp=0xc00009efe0 pc=0x46b9c1

created by 

ERROR: /usr/bin/daos_admin runtime.init.6

/usr/src/runtime/proc.go:290 +0x25

ERROR: /usr/bin/daos_admin 

goroutine 3 [GC sweep wait]:

runtime.gopark(0x0

ERROR: /usr/bin/daos_admin ?, 0x0?, 0x0?, 0x0?

ERROR: /usr/bin/daos_admin , 0x0?)

/usr/src/runtime/proc.go:363 +

ERROR: /usr/bin/daos_admin 0xd6 fp=0xc00009f790 sp=0xc00009f770 pc=0x43e0f6

ERROR: /usr/bin/daos_admin runtime.goparkunlock(...)

/usr/src/runtime/proc.go:369

runtime.bgsweep(0x0?)

 

ERROR: /usr/bin/daos_admin /usr/src/runtime/mgcsweep.go:278 +0x8e fp=0xc00009f7c8 sp=0xc00009f790 pc=0x429c2e

runtime.gcenable.func1()

/usr/src/runtime/mgc.go:178 +

ERROR: /usr/bin/daos_admin 0x26 fp=0xc00009f7e0 sp=0xc00009f7c8 pc=0x41e8c6

runtime.goexit()

/usr/src/runtime/asm_amd64.s:1594 +0x1 fp=0xc00009f7e8 sp=0xc00009f7e0 pc=0x46b9c1

created by runtime.gcenable

/usr/src/runtime/mgc.go:

ERROR: /usr/bin/daos_admin 178 +0x6b

 

goroutine 4 [GC scavenge wait]:

runtime.gopark(0xc0000c6000?, 0xb1e1e8?

ERROR: /usr/bin/daos_admin , 0x1?, 0x0?, 0x0?)

/usr/src/runtime/proc.go:363 +

ERROR: /usr/bin/daos_admin 0xd6 fp=0xc00009ff70 sp=0xc00009ff50 pc=0x43e0f6

runtime.goparkunlock(...)

ERROR: /usr/bin/daos_admin  /usr/src/runtime/proc.go:369

runtime.(*scavengerState).park(0x10a3a20)

/usr/src/runtime/mgcscavenge.go:389 +0x53 fp=

ERROR: /usr/bin/daos_admin 0xc00009ffa0 sp=0xc00009ff70 pc=0x427cd3

runtime.bgscavenge(0x0?)

/usr/src/runtime/mgcscavenge.go:

ERROR: /usr/bin/daos_admin 617 +0x45 fp=0xc00009ffc8 sp=0xc00009ffa0 pc=0x4282a5

runtime.gcenable.func2

ERROR: /usr/bin/daos_admin ()

/usr/src/runtime/mgc.go:179 +0x26 fp=0xc00009ffe0 sp=0xc00009ffc8 pc=0x41e866

ERROR: /usr/bin/daos_admin 

runtime.goexit()

/usr/src/runtime/asm_amd64.s:1594 +0x1 fp=

ERROR: /usr/bin/daos_admin 0xc00009ffe8 sp=0xc00009ffe0 pc=0x46b9c1

created by runtime.gcenable

/usr/src/runtime/mgc.go:179

ERROR: /usr/bin/daos_admin  +0xaa

 

goroutine 5 [finalizer wait]:

runtime.gopark(0x10a4520?, 

ERROR: /usr/bin/daos_admin 0xc000007860?, 0x0?, 0x0?, 0xc00009e770?)

/usr/src/runtime/proc.go:363

ERROR: /usr/bin/daos_admin  +0xd6 fp=0xc00009e628 sp=0xc00009e608 pc=0x43e0f6

runtime.goparkunlock(...)

 

ERROR: /usr/bin/daos_admin /usr/src/runtime/proc.go:369

runtime.runfinq()

/usr/src/runtime/mfinal.go:

ERROR: /usr/bin/daos_admin 180 +0x10f fp=0xc00009e7e0 sp=0xc00009e628 pc=0x41d9cf

runtime.goexit()

/usr/src/runtime/asm_amd64.s:

ERROR: /usr/bin/daos_admin 1594 +0x1 fp=0xc00009e7e8 sp=0xc00009e7e0 pc=0x46b9c1

created by runtime.createfing

/usr/src/runtime/mfinal.go:157 +0x45

ERROR: /usr/bin/daos_admin 

rax    0x1

rbx    0x2492240

rcx    0x7fbc78da4e60

rdx    0x0

rdi    0x2492240

rsi    

ERROR: /usr/bin/daos_admin 0x7fbc78da1af0

rbp    0x2000003e7240

rsp    0x7ffc5180b120

r8     0x7fbc78da2460

r9     0x0

r10    0x70000000004

r11    0x0

ERROR: /usr/bin/daos_admin 

r12    0x202001000000

r13    0x2000003e7240

r14    0x7ffc5180b150

r15    0x0

rip    0x7fbc78755c0e

rflags 

ERROR: /usr/bin/daos_admin 0x13246

cs     0x33

fs     0x0

gs     0x0

DEBUG 11:17:32.627353 exec.go:188: discarding garbage response ""

DEBUG 11:17:32.627423 exec.go:188: discarding garbage response ""

DEBUG 11:17:32.627466 exec.go:188: discarding garbage response ""

DEBUG 11:17:32.627498 exec.go:188: discarding garbage response ""

DEBUG 11:17:32.627541 exec.go:188: discarding garbage response ""

ERROR: NVMe Scan Failed: privileged binary execution failed: Unable to decode response after 5 attempts

DEBUG 11:17:32.627658 server.go:164: time to scan bdev storage: 403.426657ms

DEBUG 11:17:32.627726 pubsub.go:259: stopping event loop

DEBUG 11:17:32.627853 main.go:69: Unable to decode response after 5 attempts

github.com/daos-stack/daos/src/control/pbin.ExecReq

/builddir/build/BUILD/daos-2.2.0/src/control/pbin/exec.go:197

github.com/daos-stack/daos/src/control/pbin.(*Forwarder).SendReq

/builddir/build/BUILD/daos-2.2.0/src/control/pbin/forwarding.go:100

github.com/daos-stack/daos/src/control/server/storage.(*BdevAdminForwarder).SendReq

/builddir/build/BUILD/daos-2.2.0/src/control/server/storage/bdev.go:579

github.com/daos-stack/daos/src/control/server/storage.(*BdevAdminForwarder).Scan

/builddir/build/BUILD/daos-2.2.0/src/control/server/storage/bdev.go:586

github.com/daos-stack/daos/src/control/server/storage.scanBdevs

/builddir/build/BUILD/daos-2.2.0/src/control/server/storage/provider.go:483

github.com/daos-stack/daos/src/control/server/storage.(*Provider).ScanBdevs

/builddir/build/BUILD/daos-2.2.0/src/control/server/storage/provider.go:493

github.com/daos-stack/daos/src/control/server.(*StorageControlService).NvmeScan

/builddir/build/BUILD/daos-2.2.0/src/control/server/ctl_storage.go:60

github.com/daos-stack/daos/src/control/server.scanBdevStorage

/builddir/build/BUILD/daos-2.2.0/src/control/server/server_utils.go:297

github.com/daos-stack/daos/src/control/server.(*server).addEngines

/builddir/build/BUILD/daos-2.2.0/src/control/server/server.go:306

github.com/daos-stack/daos/src/control/server.Start

/builddir/build/BUILD/daos-2.2.0/src/control/server/server.go:549

main.(*startCmd).Execute

/builddir/build/BUILD/daos-2.2.0/src/control/cmd/daos_server/start.go:147

main.parseOpts.func1

/builddir/build/BUILD/daos-2.2.0/src/control/cmd/daos_server/main.go:126

github.com/jessevdk/go-flags.(*Parser).ParseArgs

/builddir/build/BUILD/daos-2.2.0/src/control/vendor/github.com/jessevdk/go-flags/parser.go:314

main.parseOpts

/builddir/build/BUILD/daos-2.2.0/src/control/cmd/daos_server/main.go:134

main.main

/builddir/build/BUILD/daos-2.2.0/src/control/cmd/daos_server/main.go:151

runtime.main

/usr/src/runtime/proc.go:250

runtime.goexit

/usr/src/runtime/asm_amd64.s:1594

privileged binary execution failed

github.com/daos-stack/daos/src/control/pbin.(*Forwarder).SendReq

/builddir/build/BUILD/daos-2.2.0/src/control/pbin/forwarding.go:105

github.com/daos-stack/daos/src/control/server/storage.(*BdevAdminForwarder).SendReq

/builddir/build/BUILD/daos-2.2.0/src/control/server/storage/bdev.go:579

github.com/daos-stack/daos/src/control/server/storage.(*BdevAdminForwarder).Scan

/builddir/build/BUILD/daos-2.2.0/src/control/server/storage/bdev.go:586

github.com/daos-stack/daos/src/control/server/storage.scanBdevs

/builddir/build/BUILD/daos-2.2.0/src/control/server/storage/provider.go:483

github.com/daos-stack/daos/src/control/server/storage.(*Provider).ScanBdevs

/builddir/build/BUILD/daos-2.2.0/src/control/server/storage/provider.go:493

github.com/daos-stack/daos/src/control/server.(*StorageControlService).NvmeScan

/builddir/build/BUILD/daos-2.2.0/src/control/server/ctl_storage.go:60

github.com/daos-stack/daos/src/control/server.scanBdevStorage

/builddir/build/BUILD/daos-2.2.0/src/control/server/server_utils.go:297

github.com/daos-stack/daos/src/control/server.(*server).addEngines

/builddir/build/BUILD/daos-2.2.0/src/control/server/server.go:306

github.com/daos-stack/daos/src/control/server.Start

/builddir/build/BUILD/daos-2.2.0/src/control/server/server.go:549

main.(*startCmd).Execute

/builddir/build/BUILD/daos-2.2.0/src/control/cmd/daos_server/start.go:147

main.parseOpts.func1

/builddir/build/BUILD/daos-2.2.0/src/control/cmd/daos_server/main.go:126

github.com/jessevdk/go-flags.(*Parser).ParseArgs

/builddir/build/BUILD/daos-2.2.0/src/control/vendor/github.com/jessevdk/go-flags/parser.go:314

main.parseOpts

/builddir/build/BUILD/daos-2.2.0/src/control/cmd/daos_server/main.go:134

main.main

/builddir/build/BUILD/daos-2.2.0/src/control/cmd/daos_server/main.go:151

runtime.main

/usr/src/runtime/proc.go:250

runtime.goexit

/usr/src/runtime/asm_amd64.s:1594

NVMe Scan Failed

github.com/daos-stack/daos/src/control/server.scanBdevStorage

/builddir/build/BUILD/daos-2.2.0/src/control/server/server_utils.go:302

github.com/daos-stack/daos/src/control/server.(*server).addEngines

/builddir/build/BUILD/daos-2.2.0/src/control/server/server.go:306

github.com/daos-stack/daos/src/control/server.Start

/builddir/build/BUILD/daos-2.2.0/src/control/server/server.go:549

main.(*startCmd).Execute

/builddir/build/BUILD/daos-2.2.0/src/control/cmd/daos_server/start.go:147

main.parseOpts.func1

/builddir/build/BUILD/daos-2.2.0/src/control/cmd/daos_server/main.go:126

github.com/jessevdk/go-flags.(*Parser).ParseArgs

/builddir/build/BUILD/daos-2.2.0/src/control/vendor/github.com/jessevdk/go-flags/parser.go:314

main.parseOpts

/builddir/build/BUILD/daos-2.2.0/src/control/cmd/daos_server/main.go:134

main.main

/builddir/build/BUILD/daos-2.2.0/src/control/cmd/daos_server/main.go:151

runtime.main

/usr/src/runtime/proc.go:250

runtime.goexit

/usr/src/runtime/asm_amd64.s:1594

ERROR: NVMe Scan Failed: privileged binary execution failed: Unable to decode response after 5 attempts

 


Re: DAOS server start failed(NVMe Scan Failed: privileged binary execution failed)

JiangYu
 

Does this refer to NVMe instructions or CPU instructions? Is my device not supported?


DAOS server start failed(NVMe Scan Failed: privileged binary execution failed)

JiangYu
 

Hello everyone,
When I start Daos Server, the following information appears. How should I solve it?


[root@Rocky-1 ~]# /usr/share/spdk/scripts/setup.sh
0000:44:00.0 (1d78 1512): nvme -> vfio-pci

[root@Rocky-1 ~]# cat /etc/daos/daos_server.yml 
name: daos_server
access_points: ['Rocky-1']
port: 10001
transport_config:
  allow_insecure: false
  client_cert_dir: /etc/daos/certs/clients
  ca_cert: /etc/daos/certs/daosCA.crt
  cert: /etc/daos/certs/server.crt
  key: /etc/daos/certs/server.key
provider: ofi+sockets
socket_dir: /var/run/
nr_hugepages: 4096
control_log_mask: DEBUG
control_log_file: /var/log/daos_server.log
helper_log_file: /var/log/daos_admin.log
 
engines:
-
  targets: 8
  nr_xs_helpers: 0
  fabric_iface: enp3s0f1
  fabric_iface_port: 31316
  log_mask: INFO
  log_file: /var/log/daos_engine_0.log
  env_vars:
      - CRT_TIMEOUT=30
  storage:
  -
    class: ram
    scm_mount: /mnt/daos0
    scm_size: 2 #gb to allocate for tmpfs to emulate SCM
  -
    class: nvme
    bdev_list: ["0000:44:00.0"]
 

[root@Rocky-1 ~]# /usr/bin/daos_server start
DAOS Server config loaded from /etc/daos/daos_server.yml
/usr/bin/daos_server logging to file /var/log/daos_server.log
DEBUG 11:17:31.720878 start.go:90: Switching control log level to DEBUG
DEBUG 11:17:31.721131 defaults.go:92: failed to load library: unable to open a handle to the library
ERROR: unable to open a handle to the library
DEBUG 11:17:31.721209 fabric.go:875: waiting for fabric interfaces to become ready...
DEBUG 11:17:31.721299 fabric.go:892: fabric interface "enp3s0f1" is ready
DEBUG 11:17:31.721372 provider.go:87: getting topology with hwloc version 0x20100
DEBUG 11:17:31.769773 provider.go:145: adding device found at "/sys/class/net/eno1" (type network interface, NUMA node 0)
DEBUG 11:17:31.769933 provider.go:145: adding device found at "/sys/class/net/eno2" (type network interface, NUMA node 0)
DEBUG 11:17:31.770081 provider.go:145: adding device found at "/sys/class/net/eno3" (type network interface, NUMA node 0)
DEBUG 11:17:31.770212 provider.go:145: adding device found at "/sys/class/net/eno4" (type network interface, NUMA node 0)
DEBUG 11:17:31.770357 provider.go:145: adding device found at "/sys/class/net/enp3s0f0" (type network interface, NUMA node 0)
DEBUG 11:17:31.770485 provider.go:145: adding device found at "/sys/class/net/enp3s0f1" (type network interface, NUMA node 0)
DEBUG 11:17:31.770537 provider.go:125: failed to read net device: open /sys/class/net/lo/device/net: no such file or directory
DEBUG 11:17:31.770749 provider.go:264: adding virtual device at "/sys/devices/virtual/net/lo"
DEBUG 11:17:31.886150 provider.go:83: found fabric interfaces:
enp3s0f1 (providers: ofi+sockets, ofi+tcp, ofi+tcp;ofi_rxm, udp, udp;ofi_rxd)
lo (providers: ofi+sockets, ofi+tcp, ofi+tcp;ofi_rxm, udp, udp;ofi_rxd)
shm (providers: shm)
DEBUG 11:17:31.886239 provider.go:292: no cxi subsystem in sysfs
DEBUG 11:17:31.886338 fabric.go:441: unable to open a handle to the library
DEBUG 11:17:31.886419 fabric.go:511: ignoring fabric interface "shm" (shm) not found in topology
DEBUG 11:17:31.886534 fabric.go:793: discovered 2 fabric interfaces:
enp3s0f1 (interface: enp3s0f1) (providers: ofi+sockets, ofi+tcp, ofi+tcp;ofi_rxm, udp, udp;ofi_rxd)
lo (interface: lo) (providers: ofi+sockets, ofi+tcp, ofi+tcp;ofi_rxm, udp, udp;ofi_rxd)
DEBUG 11:17:31.886645 server.go:750: detected NUMA affinity 0 for engine 0
DEBUG 11:17:31.886675 server.go:757: enabling single-engine legacy core allocation algorithm
DEBUG 11:17:31.886703 server.go:420: validating config file read from "/etc/daos/daos_server.yml"
DEBUG 11:17:31.886742 server.go:443: vfio=true hotplug=false vmd=true requested in config
WARNING: Configuration includes only one access point. This provides no redundancy in the event of an access point failure.
DEBUG 11:17:31.886841 server.go:549: engine 0 fabric numa 0, storage numa 0
DEBUG 11:17:31.887914 server_utils.go:148: setting OFI_DOMAIN=enp3s0f1 for enp3s0f1
DEBUG 11:17:31.889170 server.go:377: active config saved to /var/run/.daos_server.active.yml (read-only)
DEBUG 11:17:31.889251 server.go:525: fault domain: /rocky-1
DEBUG 11:17:31.889862 server.go:236: setting core dump filter to 0x13
DEBUG 11:17:31.890615 database.go:280: set db replica addr: 192.168.1.215:10001
DEBUG 11:17:31.891076 server.go:164: time to init network: 242.45µs
DEBUG 11:17:31.891195 server_utils.go:260: allocating 4098 hugepages on each of these numa nodes: [0]
DEBUG 11:17:31.891267 ctl_storage.go:53: calling bdev provider prepare: {ForwardableRequest:{Forwarded:false} HugePageCount:4098 HugeNodes:0 CleanHugePagesOnly:false PCIAllowList: PCIBlockList: TargetUser:root Reset_:false DisableVFIO:false EnableVMD:true}
DEBUG 11:17:32.224164 server.go:164: time to prepare bdev storage: 332.967644ms
DEBUG 11:17:32.224261 ctl_storage.go:59: calling bdev provider scan: {ForwardableRequest:{Forwarded:false} DeviceList:0000:44:00.0 VMDEnabled:false BypassCache:true}
ERROR: /usr/bin/daos_admin SIGILL: illegal instruction
PC=0x7fbc78755c0e m=0 sigcode=2
signal arrived during cgo execution
instruction bytes: 0xc4 0xe2 0x69 0xf7 0xc0 0x41 0x89 0x85 0xe0 0x19 0x0 0x0 0xe8 0xb1 0x98 0xfe
 
goroutine 1 [syscall]:
runtime.cgocall(0x92049d, 0xc0001d1c20)
/usr/src/runtime/cgocall.go:158 +0x5c fp=0xc0001d1bf8 sp=0xc0001d1bc0 pc=0x408b1c
github.com/daos-stack/daos/src/control/lib/spdk._Cfunc_nvme_discover()
_cgo_gotypes.go:321 +0x49 fp=0xc0001d1c20 sp=0xc0001d1bf8 pc=0x904fc9
github.com/daos-stack/daos/src/control/lib/spdk.(*NvmeImpl).Discover(0xc0000bcd00?, {0xb267f8, 0xc000184300})
/builddir/build/BUILD/daos-2.2.0/src/control/lib/spdk/nvme.go:127 +0x54 fp=0xc0001d1cd8
ERROR: /usr/bin/daos_admin  sp=0xc0001d1c20 pc=0x9059b4
github.com/daos-stack/daos/src/control/server/storage/bdev.(*spdkBackend).Scan(0xc0000bcce0, {{0x56?}, 0xc0001d5030?, 0x20?, 0x1?})
/builddir/build/BUILD/daos-2.2.0/src/control/server/storage/bdev/backend.go:341 +0x1b7 fp=0xc0001d1da8 sp=0xc0001d1cd8 pc=0x909f37
github.com/daos-stack/daos/src/control/server/storage/bdev.(*Provider).Scan(...)
/builddir/build/BUILD/daos-2.2.0/src/control/server/storage/bdev/provider.go:54
main.(*bdevScanHandler).Handle(0xc000014788, {0xb267f8?, 0xc000184300}, 0xc0002e4240)
/builddir/build/BUILD/daos-2.2.0/src/control/cmd/daos_admin/handler.go:175 +0x27a fp=0xc0001d1e08 sp=0xc0001d1da8 pc=0x91d1fa
github.com/daos-stack/daos/src/control/pbin.(*App).handleRequest(0xc0000caae0, 0xc0002e4240)
/builddir/build/BUILD/daos-2.2.0/src/control/pbin/app.go:214 +0x62 fp=0xc0001d1e58 sp=0xc0001d1e08 pc=0x5949c2
github.com/daos-stack/daos/src/control/pbin.(*App).Run(0xc0000caae0)
/builddir/build/BUILD/daos-2.2.0/src/control/pbin/app.go:155 +0x2ed fp=0xc0001d1f50 sp=0xc0001d1e58 pc=0x59448d
main.main()
/builddir/build/BUILD/daos-2.2.0/src/control/cmd/daos_admin/main.go:25 +0xaf fp=0xc0001d1f80 sp=0xc0001d1f50 pc=0x91de6f
runtime.main()
/usr/src/runtime/proc.go:250 +0x212 fp=0xc0001d1fe0 sp=0xc0001d1f80 pc=0x43dd32
runtime.goexit
ERROR: /usr/bin/daos_admin ()
/usr/src/runtime/asm_amd64.s:1594 +0x1 fp=0xc0001d1fe8 sp=0xc0001d1fe0 pc=0x46b9c1
 
goroutine 2 [force gc (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?
ERROR: /usr/bin/daos_admin , 0x0?, 0x0?)
/usr/src/runtime/proc.go:363 +0xd6 fp=0xc00009efb0 sp=0xc00009ef90 pc=0x43e0f6
ERROR: /usr/bin/daos_admin runtime.goparkunlock(...)
ERROR: /usr/bin/daos_admin /usr/src/runtime/proc.go:369
runtime.forcegchelper()
 
ERROR: /usr/bin/daos_admin /usr/src/runtime/proc.go:302 +0xad fp=0xc00009efe0 sp=0xc00009efb0 pc=0x43df8d
runtime.goexit()
/usr/src/runtime/asm_amd64.s
ERROR: /usr/bin/daos_admin :1594 +0x1 fp=0xc00009efe8 sp=0xc00009efe0 pc=0x46b9c1
created by 
ERROR: /usr/bin/daos_admin runtime.init.6
/usr/src/runtime/proc.go:290 +0x25
ERROR: /usr/bin/daos_admin 
goroutine 3 [GC sweep wait]:
runtime.gopark(0x0
ERROR: /usr/bin/daos_admin ?, 0x0?, 0x0?, 0x0?
ERROR: /usr/bin/daos_admin , 0x0?)
/usr/src/runtime/proc.go:363 +
ERROR: /usr/bin/daos_admin 0xd6 fp=0xc00009f790 sp=0xc00009f770 pc=0x43e0f6
ERROR: /usr/bin/daos_admin runtime.goparkunlock(...)
/usr/src/runtime/proc.go:369
runtime.bgsweep(0x0?)
 
ERROR: /usr/bin/daos_admin /usr/src/runtime/mgcsweep.go:278 +0x8e fp=0xc00009f7c8 sp=0xc00009f790 pc=0x429c2e
runtime.gcenable.func1()
/usr/src/runtime/mgc.go:178 +
ERROR: /usr/bin/daos_admin 0x26 fp=0xc00009f7e0 sp=0xc00009f7c8 pc=0x41e8c6
runtime.goexit()
/usr/src/runtime/asm_amd64.s:1594 +0x1 fp=0xc00009f7e8 sp=0xc00009f7e0 pc=0x46b9c1
created by runtime.gcenable
/usr/src/runtime/mgc.go:
ERROR: /usr/bin/daos_admin 178 +0x6b
 
goroutine 4 [GC scavenge wait]:
runtime.gopark(0xc0000c6000?, 0xb1e1e8?
ERROR: /usr/bin/daos_admin , 0x1?, 0x0?, 0x0?)
/usr/src/runtime/proc.go:363 +
ERROR: /usr/bin/daos_admin 0xd6 fp=0xc00009ff70 sp=0xc00009ff50 pc=0x43e0f6
runtime.goparkunlock(...)
ERROR: /usr/bin/daos_admin /usr/src/runtime/proc.go:369
runtime.(*scavengerState).park(0x10a3a20)
/usr/src/runtime/mgcscavenge.go:389 +0x53 fp=
ERROR: /usr/bin/daos_admin 0xc00009ffa0 sp=0xc00009ff70 pc=0x427cd3
runtime.bgscavenge(0x0?)
/usr/src/runtime/mgcscavenge.go:
ERROR: /usr/bin/daos_admin 617 +0x45 fp=0xc00009ffc8 sp=0xc00009ffa0 pc=0x4282a5
runtime.gcenable.func2
ERROR: /usr/bin/daos_admin ()
/usr/src/runtime/mgc.go:179 +0x26 fp=0xc00009ffe0 sp=0xc00009ffc8 pc=0x41e866
ERROR: /usr/bin/daos_admin 
runtime.goexit()
/usr/src/runtime/asm_amd64.s:1594 +0x1 fp=
ERROR: /usr/bin/daos_admin 0xc00009ffe8 sp=0xc00009ffe0 pc=0x46b9c1
created by runtime.gcenable
/usr/src/runtime/mgc.go:179
ERROR: /usr/bin/daos_admin  +0xaa
 
goroutine 5 [finalizer wait]:
runtime.gopark(0x10a4520?, 
ERROR: /usr/bin/daos_admin 0xc000007860?, 0x0?, 0x0?, 0xc00009e770?)
/usr/src/runtime/proc.go:363
ERROR: /usr/bin/daos_admin  +0xd6 fp=0xc00009e628 sp=0xc00009e608 pc=0x43e0f6
runtime.goparkunlock(...)
 
ERROR: /usr/bin/daos_admin /usr/src/runtime/proc.go:369
runtime.runfinq()
/usr/src/runtime/mfinal.go:
ERROR: /usr/bin/daos_admin 180 +0x10f fp=0xc00009e7e0 sp=0xc00009e628 pc=0x41d9cf
runtime.goexit()
/usr/src/runtime/asm_amd64.s:
ERROR: /usr/bin/daos_admin 1594 +0x1 fp=0xc00009e7e8 sp=0xc00009e7e0 pc=0x46b9c1
created by runtime.createfing
/usr/src/runtime/mfinal.go:157 +0x45
ERROR: /usr/bin/daos_admin 
rax    0x1
rbx    0x2492240
rcx    0x7fbc78da4e60
rdx    0x0
rdi    0x2492240
rsi    
ERROR: /usr/bin/daos_admin 0x7fbc78da1af0
rbp    0x2000003e7240
rsp    0x7ffc5180b120
r8     0x7fbc78da2460
r9     0x0
r10    0x70000000004
r11    0x0
ERROR: /usr/bin/daos_admin 
r12    0x202001000000
r13    0x2000003e7240
r14    0x7ffc5180b150
r15    0x0
rip    0x7fbc78755c0e
rflags 
ERROR: /usr/bin/daos_admin 0x13246
cs     0x33
fs     0x0
gs     0x0
DEBUG 11:17:32.627353 exec.go:188: discarding garbage response ""
DEBUG 11:17:32.627423 exec.go:188: discarding garbage response ""
DEBUG 11:17:32.627466 exec.go:188: discarding garbage response ""
DEBUG 11:17:32.627498 exec.go:188: discarding garbage response ""
DEBUG 11:17:32.627541 exec.go:188: discarding garbage response ""
ERROR: NVMe Scan Failed: privileged binary execution failed: Unable to decode response after 5 attempts
DEBUG 11:17:32.627658 server.go:164: time to scan bdev storage: 403.426657ms
DEBUG 11:17:32.627726 pubsub.go:259: stopping event loop
DEBUG 11:17:32.627853 main.go:69: Unable to decode response after 5 attempts
github.com/daos-stack/daos/src/control/pbin.ExecReq
/builddir/build/BUILD/daos-2.2.0/src/control/pbin/exec.go:197
github.com/daos-stack/daos/src/control/pbin.(*Forwarder).SendReq
/builddir/build/BUILD/daos-2.2.0/src/control/pbin/forwarding.go:100
github.com/daos-stack/daos/src/control/server/storage.(*BdevAdminForwarder).SendReq
/builddir/build/BUILD/daos-2.2.0/src/control/server/storage/bdev.go:579
github.com/daos-stack/daos/src/control/server/storage.(*BdevAdminForwarder).Scan
/builddir/build/BUILD/daos-2.2.0/src/control/server/storage/bdev.go:586
github.com/daos-stack/daos/src/control/server/storage.scanBdevs
/builddir/build/BUILD/daos-2.2.0/src/control/server/storage/provider.go:483
github.com/daos-stack/daos/src/control/server/storage.(*Provider).ScanBdevs
/builddir/build/BUILD/daos-2.2.0/src/control/server/storage/provider.go:493
github.com/daos-stack/daos/src/control/server.(*StorageControlService).NvmeScan
/builddir/build/BUILD/daos-2.2.0/src/control/server/ctl_storage.go:60
github.com/daos-stack/daos/src/control/server.scanBdevStorage
/builddir/build/BUILD/daos-2.2.0/src/control/server/server_utils.go:297
github.com/daos-stack/daos/src/control/server.(*server).addEngines
/builddir/build/BUILD/daos-2.2.0/src/control/server/server.go:306
github.com/daos-stack/daos/src/control/server.Start
/builddir/build/BUILD/daos-2.2.0/src/control/server/server.go:549
main.(*startCmd).Execute
/builddir/build/BUILD/daos-2.2.0/src/control/cmd/daos_server/start.go:147
main.parseOpts.func1
/builddir/build/BUILD/daos-2.2.0/src/control/cmd/daos_server/main.go:126
github.com/jessevdk/go-flags.(*Parser).ParseArgs
/builddir/build/BUILD/daos-2.2.0/src/control/vendor/github.com/jessevdk/go-flags/parser.go:314
main.parseOpts
/builddir/build/BUILD/daos-2.2.0/src/control/cmd/daos_server/main.go:134
main.main
/builddir/build/BUILD/daos-2.2.0/src/control/cmd/daos_server/main.go:151
runtime.main
/usr/src/runtime/proc.go:250
runtime.goexit
/usr/src/runtime/asm_amd64.s:1594
privileged binary execution failed
github.com/daos-stack/daos/src/control/pbin.(*Forwarder).SendReq
/builddir/build/BUILD/daos-2.2.0/src/control/pbin/forwarding.go:105
github.com/daos-stack/daos/src/control/server/storage.(*BdevAdminForwarder).SendReq
/builddir/build/BUILD/daos-2.2.0/src/control/server/storage/bdev.go:579
github.com/daos-stack/daos/src/control/server/storage.(*BdevAdminForwarder).Scan
/builddir/build/BUILD/daos-2.2.0/src/control/server/storage/bdev.go:586
github.com/daos-stack/daos/src/control/server/storage.scanBdevs
/builddir/build/BUILD/daos-2.2.0/src/control/server/storage/provider.go:483
github.com/daos-stack/daos/src/control/server/storage.(*Provider).ScanBdevs
/builddir/build/BUILD/daos-2.2.0/src/control/server/storage/provider.go:493
github.com/daos-stack/daos/src/control/server.(*StorageControlService).NvmeScan
/builddir/build/BUILD/daos-2.2.0/src/control/server/ctl_storage.go:60
github.com/daos-stack/daos/src/control/server.scanBdevStorage
/builddir/build/BUILD/daos-2.2.0/src/control/server/server_utils.go:297
github.com/daos-stack/daos/src/control/server.(*server).addEngines
/builddir/build/BUILD/daos-2.2.0/src/control/server/server.go:306
github.com/daos-stack/daos/src/control/server.Start
/builddir/build/BUILD/daos-2.2.0/src/control/server/server.go:549
main.(*startCmd).Execute
/builddir/build/BUILD/daos-2.2.0/src/control/cmd/daos_server/start.go:147
main.parseOpts.func1
/builddir/build/BUILD/daos-2.2.0/src/control/cmd/daos_server/main.go:126
github.com/jessevdk/go-flags.(*Parser).ParseArgs
/builddir/build/BUILD/daos-2.2.0/src/control/vendor/github.com/jessevdk/go-flags/parser.go:314
main.parseOpts
/builddir/build/BUILD/daos-2.2.0/src/control/cmd/daos_server/main.go:134
main.main
/builddir/build/BUILD/daos-2.2.0/src/control/cmd/daos_server/main.go:151
runtime.main
/usr/src/runtime/proc.go:250
runtime.goexit
/usr/src/runtime/asm_amd64.s:1594
NVMe Scan Failed
github.com/daos-stack/daos/src/control/server.scanBdevStorage
/builddir/build/BUILD/daos-2.2.0/src/control/server/server_utils.go:302
github.com/daos-stack/daos/src/control/server.(*server).addEngines
/builddir/build/BUILD/daos-2.2.0/src/control/server/server.go:306
github.com/daos-stack/daos/src/control/server.Start
/builddir/build/BUILD/daos-2.2.0/src/control/server/server.go:549
main.(*startCmd).Execute
/builddir/build/BUILD/daos-2.2.0/src/control/cmd/daos_server/start.go:147
main.parseOpts.func1
/builddir/build/BUILD/daos-2.2.0/src/control/cmd/daos_server/main.go:126
github.com/jessevdk/go-flags.(*Parser).ParseArgs
/builddir/build/BUILD/daos-2.2.0/src/control/vendor/github.com/jessevdk/go-flags/parser.go:314
main.parseOpts
/builddir/build/BUILD/daos-2.2.0/src/control/cmd/daos_server/main.go:134
main.main
/builddir/build/BUILD/daos-2.2.0/src/control/cmd/daos_server/main.go:151
runtime.main
/usr/src/runtime/proc.go:250
runtime.goexit
/usr/src/runtime/asm_amd64.s:1594
ERROR: NVMe Scan Failed: privileged binary execution failed: Unable to decode response after 5 attempts
 


[DUG'22] Save the date & call for presentations!

Kudryavtsev, Andrey O <andrey.o.kudryavtsev@...>
 

Greetings DAOS Community!

 

SC22 is around the corner and it means we have a special event coming again. The Intel DAOS team invites you to join us for the 6th annual DAOS User Group (DUG22). This will be the first in-person user group since the pandemic.

 

The agenda is not yet finalized and we’re inviting the community members to submit their presentation proposals. Please, send brief submissions to daos-info@daos.groups.io and keep me copied.

If you have any feedback to share, what you want to see the most, type of presentations, areas to cover and others to listen, - don’t hesitate to contact me directly. This is the event we make for you!

 

The event will take place on November 14th from 9am until 1pm. We did our best to avoid overlaps with other activities and that’s why Monday was selected. We hope it fits your plans and the agenda and doesn’t overlap with other workshops and tutorials that day.

 

Event Location: Venetian Room, Fairmont Hotel (1717 N Akard St, Dallas, TX 75201), which is within one mile from the Kay Bailey Hutchison Convention Center.

 

Additional details will be shared once the agenda is finalized. We hope to see you all in person.

 

Best Regards,

Andrey, Kelsey, Johann and the rest of the DAOS team. 

 

-- 

Andrey Kudryavtsev, 

DAOS Product Manager

Intel Corp. 

 


Community Roadmap Update

Lombardi, Johann
 

Hi there,

 

Please note that the DAOS community roadmap has been updated on the wiki. Those changes were required to accelerate support for the “Non-PMem phase 1 and phase 2” I/O path (labelled md_on_ssd in jira, see here for more info) and also better align with our upcoming deployments/projects. Please let us know if you have any questions or comments.

 

Best regards,

Johann

---------------------------------------------------------------------
Intel Corporation SAS (French simplified joint stock company)
Registered headquarters: "Les Montalets"- 2, rue de Paris,
92196 Meudon Cedex, France
Registration Number:  302 456 199 R.C.S. NANTERRE
Capital: 5 208 026.16 Euros

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


DAOS Community Update / Oct'22

Lombardi, Johann
 

Hi there,

Please find below the DAOS community newsletter for October 2022. A copy of this newsletter is also available on the wiki.

Past Events

Upcoming Events

Release

  • Current stable release is 2.2.0 released on Oct 21. See https://docs.daos.io/v2.2/ and https://packages.daos.io/v2.2/ for more information. Please see the release notes for more details.
  • With the release of 2.2.0, 2.0.x releases are declared end-of-life.
  • Branches:
    • release/2.2 is the release branch for the stable 2.2 release. Latest bug fix release is 2.2.0 (v2.2.0 tag).
    • Master is the development branch for the future 2.4 release. Latest test build is 2.3.101 (v2.3.101-tb tag) including the EC rotation feature.
  • Major recent changes on release/2.2 (future 2.2 release):
    • Fix VMD domain parsing
    • Fix PS replica leaks
    • Fix 2.0/2.2 interoperability issue with pool RF
    • Fix assertion failure in dc_cont_free()
    • Fix race condition in cart
    • Address memory corruption during key_query
    • Several fixes for EC migration
    • Check and reset NONEXIST in iter_next and probe
    • Bump protobuf-java from 3.16.1 to 3.16.3
  • Major recent changes on master (future 2.4 release):
    • All patches listed in the 2.2 section above.
    • Fix a bug in key enumeration associated with ads[0].kd_key_len
    • Add support for rf_lvl to cont create api on pydaos
    • Enable EC parity rotation by default
    • Add missing void in dfs_init/fini declaration
    • Remove RPC post increment restriction preventing extra RPC handles from being posted upon exhaustion
    • Re-enable custom RPC timeout in RDB
    • Remove ability to build w/o stdatomic.h
    • Add bulk and vos latency to metrics
    • Skip reclaim job during merge
    • Fix some DTX visibility issues
    • Allo daos_server network scan to run w/o config
    • Update DAOS to use UCX 1.13 and disable UCX multi-rail support
    • Don't hold lock for d_hhash_link_get/putref
    • Add dmg system exclude
    • Fix auto object class selection for RP hints for arrays
    • Don't set pool destroy state if service is not up
    • Improve PS reconfigurations
    • Add IOPS info to daos pool autotest
    • Fix swim paranoia
    • Reject invalid number of pool create ranks
    • Add config option to agent to ignore interfaces
    • Several fixes to EC parity rotation
    • Add support for pull request template
    • Fix a number of python flake issues
    • Add ability to run server under valgrind
    • Add NUMA affinity to tmpfs mount options
    • Add pool svc list to property query
    • Bypass checks in pool evict rdb tx update
    • Several IV fixes
    • Remove CentOS7 leftovers
    • Add DFS readdirplus API
    • Several checksum scrubbing upgrade fixes
    • Rename privileged helper from daos_admin to daos_server_helper
    • Rename rf and rf_level properties to rd_fac and rd_lvl
    • Add rebuild version to pool query
    • Bump garbage collection ULT stack size
  • What is coming:
    • 2.2.1 bug fix release
    • 2.4.0 feature freeze

R&D

  • Major features under development:
    • VOS on SPDK blob
      • Detailed design documented here Metadata on SSDs including the WAL layout (Meta blob and WAL blob layout)
      • All development and testing tasks are tracked under DAOS-11040 for phase 1.
      • Changes to the yaml file implemented. WAL infrastructure and metadata blob creation landed.
      • PMDK-based allocator extracted and integrated into DAOS. Early performance evaluation in progress.
      • Branch: feature/vos-on-blob
      • Target release: 2.4 (phase 1 preview)
    • Multi-user dfuse
    • More aggressive caching in dfuse for AI APPs
      • FUSE version updated for EL8 for readdir caching support, not needed on Leap that was recent enough FUSE version.
      • FUSE kernel readdir is on enabled, dfuse readdir still under work.
      • PR: https://github.com/daos-stack/daos/pull/6776
      • Target release: 2.4
    • Catastrophic recovery
      • Aka distributed fsck or checker
      • Tests for ddb (low level debugger utility similar to debugfs for ext4) landed
      • Testing for the dmg checker landed.
      • Testing for pass 3 and 4 under development.
      • Pass 4 for container recovery completed.
      • Branch: feature/cat_recovery
      • Target release: 2.6
    • Multi-homed network support
      • Aka multi-provider support
      • This feature aims at supporting multiple network provider in the engine
      • Branch is feature complete now and testing is underway
      • Branch: feature/multiprovider
      • Target release: 2.6
    • Client-side metrics
    • Performance domain
      • Extend placement algorithm to be aware of fabric topology
      • Fix to avoid putting shards on the same domain landed
      • Branch: feature/perf_dom
      • Target release: 2.8
  • Pathfinding:
    • DAOS Pipeline API for active storage
    • Leveraging the Intel Data Streaming Accelerator (DSA) to accelerate DAOS
      • Prototype leveraging DSA for VOS aggregation delivered
      • Initial results shared at IXPUG conference.
    • OPX provider support in collaboration with Cornelis Networks
      • OPX provider merged upstream in libfabric
      • Provider supported in latest mercury version
      • Changes to DAOS to enable OPX as part of the build in progress
    • GPU data path optimizations
  • I/O Middleware / Framework Support:

News

  • Congratulation to the Seagate team for the integration of the DAOS backend to the Rados Gateway (RGW)!
  • Updated DAOS roadmap including changes for the md_on_ssd phase 1 and phase 2 project to be available soon.

 

---------------------------------------------------------------------
Intel Corporation SAS (French simplified joint stock company)
Registered headquarters: "Les Montalets"- 2, rue de Paris,
92196 Meudon Cedex, France
Registration Number:  302 456 199 R.C.S. NANTERRE
Capital: 5 208 026.16 Euros

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


Announcement: DAOS 2.2 is generally available

Poddubnyy, Ivan
 

The DAOS team would like to announce the release of DAOS Version 2.2.

 

It is a major release containing the following new features and improvements:

 

  • Rocky Linux 8 and Alma Linux 8 support have been added
  • CentOS Linux 8 support is removed
  • Support for the libfabric/tcp provider is added. It replaces libfabric/sockets
  • UCX support has been added (Technology Preview)
  • Interoperability of DAOS 2.2 with DAOS 2.0
  • Intel VMD devices are now supported in the control plane
  • POSIX containers (DFS) now support file modification time (mtime)

 

The release also contains a number of the bugfixes and stability improvements.

 

With the release of DAOS 2.2, the previous version – DAOS 2.0.3 – is now declared End-Of-Life.

 

The complete list of changes can be found here: https://docs.daos.io/v2.2/release/release_notes/

 

There are several resources available for the release:

 

RPM Repositories: https://packages.daos.io/v2.2/

Admin Guide: https://docs.daos.io/v2.2/admin/hardware/

User Guide: https://docs.daos.io/v2.2/user/workflow/

Architecture Overview: https://docs.daos.io/v2.2/overview/architecture/

Source Code: https://github.com/daos-stack/daos/releases/

 

As always, feel free to use this mailing list for any issues you may find with the release or our JIRA bug tracking system, available at https://daosio.atlassian.net/jira or on our Slack channel at https://daos-stack.slack.com.

 

 

Thank you,

 

Ivan Poddubnyy

DAOS Customer Enablement and Support Manager

Super Compute Storage Architecture and Development Division

Intel

 


Re: How to install DAOS on ARM64 platform

Groot
 

Yes, the /root/huzj/daos/install/lib64/daos_srv/librdb.so exists.
And the environment variables we set just like the introduction in https://docs.daos.io/v2.0/QSG/build_from_scratch/#environment-setup
export daospath=/root/huzj/daos
export CPATH=${daospath}/install/include/:$CPATH
export PATH=${daospath}/install/bin/:${daospath}/install/sbin:$PATH
the server config file is:
#For a single-server system
 
name: daos_server
access_points: ['master']
port: 10001
 
 
provider: ofi+sockets
control_log_file: /tmp/daos_server.log
transport_config:
  allow_insecure: false
  client_cert_dir: /etc/daos/certs/clients
  ca_cert: /etc/daos/certs/daosCA.crt
  cert: /etc/daos/certs/server.crt
  key: /etc/daos/certs/server.key
 
telemetry_port: 9191
 
engines:
  -
    rank: 1
    pinned_numa_node: 0
    targets: 2
    nr_xs_helpers: 4
    fabric_iface: enp3s0
    fabric_iface_port: 31416
    log_file: /tmp/daos_engine.0.log
 
    env_vars:
      - FI_SOCKETS_MAX_CONN_RETRY=1
      - FI_SOCKETS_CONN_TIMEOUT=2000
    # Storage definitions (one per tier)
    storage:
      -
        # When scm_class is set to ram, tmpfs will be used to emulate SCM.
        # The size of ram is specified by scm_size in GB units.
        class: ram
        scm_size: 2
        scm_mount: /mnt/daos
 
Thanks.
Groot


Re: How to install DAOS on ARM64 platform

Faccini, Bruno
 

Can you check if /root/huzj/daos/install/lib64/daos_srv/librdb.so exists ?

And if not, is there any log for this build available ?

Also, what are the environment variables for the session you are using to start the server/engine ?

And last, can you attach your server/engine config file ?

Thanks in advance for your help,

Bruno.

 

From: <daos@daos.groups.io> on behalf of Groot <kukougu@...>
Reply to: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Friday 30 September 2022 at 10:48
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] How to install DAOS on ARM64 platform

 

Since I build by source on ARM64 platform. I use daos_server start to start the daos server. But get the error bleow and I mkdir the /var/run/daos_server directory and the daos_server start successfully.
$ daos_server start 
ERROR: dRPC server setup: missing socket directory /var/run/daos_server: stat /var/run/daos_server: no such file or directory

But the daos_engine.0 (/tmp/daos_engine.0.log) get error after format the storage.

09/30-16:15:24.84 slave1 DAOS[1213401/-1/0] server ERR  src/engine/module.c:90 dss_module_load() cannot load librdb.so: /root/huzj/daos/install/bin/../lib64/daos_srv/librdb.so: undefined symbol: ds_obj_enum_pack

09/30-16:15:24.84 slave1 DAOS[1213401/-1/0] server ERR  src/engine/init.c:231 modules_load() Failed to load module rdb: -1003

Thanks a lot.
Groot

---------------------------------------------------------------------
Intel Corporation SAS (French simplified joint stock company)
Registered headquarters: "Les Montalets"- 2, rue de Paris,
92196 Meudon Cedex, France
Registration Number:  302 456 199 R.C.S. NANTERRE
Capital: 5 208 026.16 Euros

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


Re: How to install DAOS on ARM64 platform

Groot
 

Since I build by source on ARM64 platform. I use daos_server start to start the daos server. But get the error bleow and I mkdir the /var/run/daos_server directory and the daos_server start successfully.
$ daos_server start 
ERROR: dRPC server setup: missing socket directory /var/run/daos_server: stat /var/run/daos_server: no such file or directory

But the daos_engine.0 (/tmp/daos_engine.0.log) get error after format the storage.
09/30-16:15:24.84 slave1 DAOS[1213401/-1/0] server ERR  src/engine/module.c:90 dss_module_load() cannot load librdb.so: /root/huzj/daos/install/bin/../lib64/daos_srv/librdb.so: undefined symbol: ds_obj_enum_pack
09/30-16:15:24.84 slave1 DAOS[1213401/-1/0] server ERR  src/engine/init.c:231 modules_load() Failed to load module rdb: -1003
Thanks a lot.
Groot


Re: How to install DAOS on ARM64 platform

samir.raval@...
 

Hello Groot,

Can you check if servers are ready using "dmg system query"? Looks like daos engine is not up.

Please also provide /tmp/daos_server.log and  /tmp/daos_engine.0.log.

Thank You
SAMIR


Re: How to install DAOS on ARM64 platform

Groot
 

Thanks a lot. We compile successfully by using the master branch.
But we face another problem by using ram and tmpfs to emulate SCM. We set the server config file just as https://github.com/daos-stack/daos/blob/master/utils/config/examples/daos_server_local.yml
And get the error when create pool
$ dmg pool create --size 1G Pool1
Creating DAOS pool with automatic storage allocation: 1.0 GB total, 6,94 tier ratio
ERROR: dmg: pool create failed: rpc error: code = Unknown desc = pool request contains zero target ranks
At the same time, we get the same error when we use ram to emulate SCM on x86 platform.
Any ideas?

Thanks.
Groot


Re: How to install DAOS on ARM64 platform

Nabarro, Tom
 

Hello Groot,

 

What version of DAOS source are you compiling from?

Please try with current master branch if you are not already.

 

Regards,

Tom

 

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Groot
Sent: Sunday, September 25, 2022 2:02 PM
To: daos@daos.groups.io
Subject: Re: [daos] How to install DAOS on ARM64 platform

 

Thanks a lot.
But I can't install ipmctl on ARM64 platform.  I  tried to compile from the source but got an error saying the lack of cpuid.h file. 
And I get error no nvm_management.h file while compiling daos as below

github.com/daos-stack/daos/src/control/lib/ipmctl

lib/ipmctl/nvm.go:17:10: fatal error: nvm_management.h: No such file or directory

So how to compile daos on ARM64 platform? Give some details ?
Thanks.
Groot


Re: How to install DAOS on ARM64 platform

Groot
 

Thanks a lot.
But I can't install ipmctl on ARM64 platform.  I  tried to compile from the source but got an error saying the lack of cpuid.h file. 
And I get error no nvm_management.h file while compiling daos as below
github.com/daos-stack/daos/src/control/lib/ipmctl
lib/ipmctl/nvm.go:17:10: fatal error: nvm_management.h: No such file or directory

So how to compile daos on ARM64 platform? Give some details ?
Thanks.
Groot


Re: How to install DAOS on ARM64 platform

Lombardi, Johann
 

Hi there,

 

Yes, the process is the same except that we don’t provide RPMs. Please use the master branch which is regularly built and (basically) tested on ARM64.

 

Cheers,

Johann

 

From: <daos@daos.groups.io> on behalf of Groot <kukougu@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Friday 23 September 2022 at 11:22
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: [daos] How to install DAOS on ARM64 platform

 

How to install DAOS on ARM64 platform? Does it just like the process on x86 platform?

Thanks a lot.
Groot

---------------------------------------------------------------------
Intel Corporation SAS (French simplified joint stock company)
Registered headquarters: "Les Montalets"- 2, rue de Paris,
92196 Meudon Cedex, France
Registration Number:  302 456 199 R.C.S. NANTERRE
Capital: 5 208 026.16 Euros

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


How to install DAOS on ARM64 platform

Groot
 

How to install DAOS on ARM64 platform? Does it just like the process on x86 platform?

Thanks a lot.
Groot


Re: system fault testing

Lombardi, Johann
 

Hi Chuck,

 

We are slowly moving all our (internal) design documentations and test plans to the public wiki. Let me share with you the one related to system fault testing.

 

Cheers,

Johann

 

From: <daos@daos.groups.io> on behalf of "Tuffli, Chuck" <chuck.tuffli@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Saturday 3 September 2022 at 00:24
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: [daos] system fault testing

 

The DAOS documentation has a good overview of its fault model. We are starting to experiment with various types of failures (pull a drive, pull a network cable, pull a power cord) and are curious what testing has been done in this area. Is there a test plan someone could share?

 

--chuck

---------------------------------------------------------------------
Intel Corporation SAS (French simplified joint stock company)
Registered headquarters: "Les Montalets"- 2, rue de Paris,
92196 Meudon Cedex, France
Registration Number:  302 456 199 R.C.S. NANTERRE
Capital: 5 208 026.16 Euros

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


DAOS Community Update / Sep'22

Lombardi, Johann
 

Hi there,

Please find below the DAOS community newsletter for September 2022. A copy of this newsletter is also available on the wiki.

Past Events

  • Flash Memory Summit’22: 3rd Workshop on Extreme-Scale Storage and Analysis (August 2nd-4th)
    Requirements and Challenges Associated with the World's Fastest Storage Platform
    https://www.flashmemorysummit.com
    Jeff Olivier (Intel)

Upcoming Events

  • IXPUG Annual Conference 2022 (Sep 29)
    The Evolution of Storage and Memory and the DAOS Role in It
    Kevin Harms (ANL)
    Andrey Kudryavtsev (Intel)
  • SuperCheck-SC'22 (Nov 14)
    DAOS: Nextgen Storage Stack for HPC and AI
    Johann Lombardi (Intel)
  • SC'22 BoF (Nov 15-17)
    DAOS Storage Community BoF
    Kevin Harms (ANL)
    Michael Hennecke (Intel)
    Dean Hildebrand (Google)
    Panagiotis Adamidis (DKRZ)
  • SC'22 BoF (Nov 15-17)
    The Storage Tower of Babel? ... Not! Actually, maybe?
    Philippe Deniel (CEA)
    John Bent (Seagate)
    Tiago Quintino (ECMWF)
    Johann Lombardi (Intel)
  • SC'22 Tutorial (Nov 13-14)
    Emerging Storage Interfaces: DAOS and PMDK
    Adrian Jackson (EPCC)
    Mohamad Chaarawi (Intel)
    Johann Lombardi (Intel) 
  • 6th annual DAOS User Group (Nov/Dec'22)

Release

  • Current stable release is 2.0.3. See https://docs.daos.io/v2.0/ and https://packages.daos.io/v2.0/ for more information.
    2.0.3 includes several fixes for ARM64 support, erasure code and pool operations. Please see the release notes for more details.
  • Branches:
    • release/2.0 is the release branch for the stable 2.0 release. Latest bug fix release is 2.0.3 (v2.0.3 tag).
    • release/2.2 is the development branch for the future 2.2 release. The first release candidate has been created (v2.2.0-rc1 tag).
    • Master is the development branch for the future 2.4 release. Latest test build is 2.3.100 (v2.3.100-tb tag). New build including EC parity rotation feature imminent.
  • Major recent changes on release/2.0 (bugfix release):
    • Several coverty fixes
    • Fix incorrect assertion failure hit when running soak testing with LAMMPS application
    • Bump hadoop-common version to 3.3.3
    • Several documentation fixes
    • Several test fixes.
  • Major recent changes on release/2.2 (future 2.2 release):
    • All patches listed in the 2.0 section above.
    • Update mercury to 2.2.0
    • Update pmdk to 1.12.1
    • Trigger DTX reindex before DTX resync
    • Fix issue with srx_disabled config field
    • Fix mtime set to not rely on DAOS HLC
    • Improve DAOS build preprocessing steps
    • Fix java jar build instructions
    • Reduce lock contention on hash lock in libdaos to increase multi-thread performance
    • Set UCX_IB_FORK_INIT env var in the engine
    • Add new metrics to track EC full stripe and partial updates
    • Improve dfs_setattr to re-sample mtime on file size changes
    • Add UCX documentation
    • Do not use stable epoch for reclaim
    • Fix dfs_open for directories without O_EXCL
    • Add support for 2.0/2.2 agent interoperability
  • Major recent changes on master (future 2.4 release):
    • All patches listed in the 2.2 section above.
    • Add prefix to notice logging in the control plane
    • Add githook install script
    • Move NLT and unit tests to el8
    • Fix a race in dc_tx_get_epoch
    • Fix name match in daos_oclass_name2id()
    • Add ability for engine to manage its own ABT stack via mmap() to pro-actively detect stack overrun
    • Limit number of outstanding I/Os to NVMe device
    • Remove indirect link for ISA-L
    • Store scan objects target ID during rebuild to avoid excessive iteration when sending object list
    • Create a single bulk handle per DMA chunk and share the same handle for all bulk transfer against the same DMA chunk.
    • Retry map_fresh on more errors
    • Refactor daos_server standalone command surface
    • Reject read/write hole in bio
    • Run NLT on ARM64 self-hosted runners
    • Fix gap in EC rotation patch in tx classify
    • Replace SWIM D_CIRCLEQ with a hash table.
    • Fix VMD domain parsing
    • Accept positional args in dfuse command to support mtab entries
    • Set EC cell alignment to 32 bytes
    • Disallow IP address with negative port in the control plane
  • What is coming:
    • 2.2.0 GA
    • 2.4.0 feature freeze

R&D

  • Major features under development:
    • VOS on SPDK blob
    • Multi-user dfuse
    • More aggressive caching in dfuse for AI APPs
      • FUSE version updated for EL8 for readdir caching support, not needed on Leap that was recent enough FUSE version.
      • FUSE kernel readdir is on enabled, dfuse readdir still under work.
      • PR: https://github.com/daos-stack/daos/pull/6776
      • Target release: 2.4
    • Catastrophic recovery
      • Aka distributed fsck or checker
      • Tests for ddb (low level debugger utility similar to debugfs for ext4) under review
      • Testing for the dmg checker under development
      • Pass 4 for container recovery completed.
      • Branch: feature/cat_recovery
      • Target release: 2.6
    • Multi-homed network support
      • Aka multi-provider support
      • This feature aims at supporting multiple network provider in the engine
      • Branch is feature complete now and testing is underway
      • Branch: feature/multiprovider
      • Target release: 2.6
    • Client-side metrics
    • Performance domain
      • Extend placement algorithm to be aware of fabric topology
      • Fix to avoid putting shards on the same domain landed
      • Branch: feature/perf_dom
      • Target release: 2.8 
  • Pathfinding:
    • DAOS Pipeline API for active storage
    • Leveraging the Intel Data Streaming Accelerator (DSA) to accelerate DAOS
      • Prototype leveraging DSA for VOS aggregation delivered
      • Initial results shared at IXPUG conference.
    • OPX provider support in collaboration with Cornelis Networks
      • OPX provider merged upstream in libfabric
      • Provider supported in latest mercury version
      • Changes to DAOS to enable OPX as part of the build in progress
    • GPU data path optimizations
  • I/O Middleware / Framework Support

News

  • In addition to building on ARM platform on Ubuntu 22.04, AlmaLinux 8 and Leap 15, some basic tests (called NLT, stands for Node Local Tests) are now run on every PR landing. See this link for more information .Thanks again for Linaro and Croit for their support.Next step is to run unit tests.
  • Congrats to Croit and DenisB for merging the SPDK DAOS bdev upstream!
  • The  DAOS community BoF for SC'22 has been accepted!

 

---------------------------------------------------------------------
Intel Corporation SAS (French simplified joint stock company)
Registered headquarters: "Les Montalets"- 2, rue de Paris,
92196 Meudon Cedex, France
Registration Number:  302 456 199 R.C.S. NANTERRE
Capital: 5 208 026.16 Euros

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


system fault testing

Tuffli, Chuck
 

The DAOS documentation has a good overview of its fault model. We are starting to experiment with various types of failures (pull a drive, pull a network cable, pull a power cord) and are curious what testing has been done in this area. Is there a test plan someone could share?

--chuck

21 - 40 of 1664