hayashi-erika@...
$ daos_server start -o daos/utils/config/examples/daos_server_local.yml
DAOS Server config loaded from /home/USER/daos/utils/config/examples/daos_server_local.yml
daos_server logging to file /tmp/daos_server.log
DEBUG 18:18:04.617736 start.go:89: Switching control log level to DEBUG
DEBUG 18:18:04.742295 netdetect.go:279: 2 NUMA nodes detected with 18 cores per node
DEBUG 18:18:04.743438 netdetect.go:284: initDeviceScan completed. Depth -5, numObj 11, systemDeviceNames [lo enp94s0f0 enp94s0f1 eno1 eno2 ib0 ib1 virbr0 virbr0-nic], hwlocDeviceNames [eno1
eno2 card0 controlD64 ib0 mlx5_0 enp94s0f0 enp94s0f1 sda ib1 mlx5_1]
DEBUG 18:18:04.743534 netdetect.go:913: Calling ValidateProviderConfig with ib0, ofi+verbs;ofi_rxm
DEBUG 18:18:04.743598 netdetect.go:964: Input provider string: ofi+verbs;ofi_rxm
DEBUG 18:18:04.744605 netdetect.go:995: There are 0 hfi1 devices in the system
DEBUG 18:18:04.744674 netdetect.go:928: Device ib0 supports provider: ofi+verbs;ofi_rxm
DEBUG 18:18:04.744740 netdetect.go:913: Calling ValidateProviderConfig with ib1, ofi+verbs;ofi_rxm
DEBUG 18:18:04.744775 netdetect.go:964: Input provider string: ofi+verbs;ofi_rxm
DEBUG 18:18:04.745406 netdetect.go:995: There are 0 hfi1 devices in the system
DEBUG 18:18:04.745486 netdetect.go:928: Device ib1 supports provider: ofi+verbs;ofi_rxm
DEBUG 18:18:04.746310 server.go:401: Active config saved to /home/USER/daos/utils/config/examples/.daos_server.active.yml (read-only)
DEBUG 18:18:04.746397 server.go:113: fault domain: /10.0.0.0_104810
DEBUG 18:18:04.746757 server.go:163: automatic NVMe prepare req: {ForwardableRequest:{Forwarded:false} HugePageCount:128 PCIWhitelist: PCIBlacklist: TargetUser:USER ResetOnly:false DisableVFIO:true
DisableVMD:true}
DEBUG 18:18:12.057940 database.go:246: set db replica addr: 127.0.0.1:10001
DEBUG 18:18:12.190386 netdetect.go:279: 2 NUMA nodes detected with 18 cores per node
DEBUG 18:18:12.191763 netdetect.go:284: initDeviceScan completed. Depth -5, numObj 11, systemDeviceNames [lo enp94s0f0 enp94s0f1 eno1 eno2 ib0 ib1 virbr0 virbr0-nic], hwlocDeviceNames [eno1
eno2 card0 controlD64 ib0 mlx5_0 enp94s0f0 enp94s0f1 sda ib1 mlx5_1]
DEBUG 18:18:12.191921 netdetect.go:669: Searching for a device alias for: ib0
DEBUG 18:18:12.192029 netdetect.go:693: Device alias for ib0 is mlx5_0
DEBUG 18:18:12.192225 class.go:196: spdk : bdev_list empty in config, no nvme.conf generated for server
DEBUG 18:18:12.192444 netdetect.go:669: Searching for a device alias for: ib1
DEBUG 18:18:12.192561 netdetect.go:693: Device alias for ib1 is mlx5_1
DEBUG 18:18:12.192653 class.go:196: spdk : bdev_list empty in config, no nvme.conf generated for server
DAOS Control Server v1.1.3 (pid 59341) listening on 0.0.0.0:10001
DEBUG 18:18:15.403381 instance_exec.go:35: instance 0: checking if storage is formatted
Checking DAOS I/O Engine instance 0 storage ...
DEBUG 18:18:15.403441 instance_exec.go:35: instance 1: checking if storage is formatted
DEBUG 18:18:15.403477 instance_storage.go:74: /mnt/daos: checking formatting
Checking DAOS I/O Engine instance 1 storage ...
DEBUG 18:18:15.403535 instance_storage.go:74: /mnt/daos1: checking formatting
DEBUG 18:18:19.835749 instance_storage.go:90: /mnt/daos1 (dcpm) needs format: false
DEBUG 18:18:19.835871 instance_storage.go:121: instance 1: no SCM format required; checking for superblock
DEBUG 18:18:19.835961 instance_superblock.go:90: /mnt/daos1: checking superblock
DEBUG 18:18:19.837041 instance_storage.go:127: instance 1: superblock not needed
DEBUG 18:18:19.837116 instance_exec.go:62: instance start()
DEBUG 18:18:19.837154 class.go:223: skip bdev conf file generation as no path set
SCM @ /mnt/daos1: 799 GB Total/783 GB Avail
DEBUG 18:18:19.837460 instance_exec.go:79: instance 1: awaiting DAOS I/O Engine init
DEBUG 18:18:19.837696 exec.go:72: daos_engine:1 args: [-t 1 -x 0 -f 17 -g daos_server -d /var/run/daos_server -s /mnt/daos1 -I 1]
DEBUG 18:18:19.837800 exec.go:73: daos_engine:1 env: [CRT_CTX_SHARE_ADDR=0 CRT_TIMEOUT=0 D_LOG_MASK=DEBUG D_LOG_FILE=/tmp/daos_engine.1.log CRT_PHY_ADDR_STR=ofi+verbs;ofi_rxm OFI_INTERFACE=ib1
OFI_PORT=31417 OFI_DOMAIN=mlx5_1]
Starting I/O server instance 1: /home/USER/daos/install/bin/daos_engine
DEBUG 18:18:19.846734 instance_storage.go:90: /mnt/daos (dcpm) needs format: false
DEBUG 18:18:19.846816 instance_storage.go:121: instance 0: no SCM format required; checking for superblock
DEBUG 18:18:19.846873 instance_superblock.go:90: /mnt/daos: checking superblock
DEBUG 18:18:19.847415 instance_storage.go:127: instance 0: superblock not needed
DEBUG 18:18:19.847502 database.go:334: system db start: isReplica: true, isBootstrap: true
DEBUG 18:18:19.848912 api.go:556: initial configuration: index=1 servers=[%+v [{Suffrage:Voter ID:127.0.0.1:10001 Address:127.0.0.1:10001}]]
DEBUG 18:18:19.849019 raft.go:154: isBootstrap: true, newDB: false
DEBUG 18:18:19.849079 instance_exec.go:62: instance start()
DEBUG 18:18:19.849118 class.go:223: skip bdev conf file generation as no path set
SCM @ /mnt/daos: 799 GB Total/783 GB Avail
DEBUG 18:18:19.849239 raft.go:152: entering follower state: follower=Node at 127.0.0.1:10001 [Follower] leader=
DEBUG 18:18:19.849341 instance_exec.go:79: instance 0: awaiting DAOS I/O Engine init
DEBUG 18:18:19.849527 exec.go:72: daos_engine:0 args: [-t 1 -x 0 -g daos_server -d /var/run/daos_server -s /mnt/daos -I 0]
DEBUG 18:18:19.849575 exec.go:73: daos_engine:0 env: [OFI_DOMAIN=mlx5_0 D_LOG_MASK=DEBUG D_LOG_FILE=/tmp/daos_engine.0.log CRT_PHY_ADDR_STR=ofi+verbs;ofi_rxm OFI_INTERFACE=ib0 OFI_PORT=31416
CRT_CTX_SHARE_ADDR=0 CRT_TIMEOUT=0]
Starting I/O server instance 0: /home/USER/daos/install/bin/daos_engine
daos_engine:1 Using legacy core allocation algorithm
daos_engine:0 Using legacy core allocation algorithm
ERROR: daos_engine:1 *** Process 60471 received signal 11 ***
Associated errno: Success (0)
Failing for address: 0x7fb154a65000
ERROR: daos_engine:1 /lib64/libpthread.so.0(+0xf630)[0x7fb1555f4630]
ERROR: daos_engine:1 /lib64/libc.so.6(+0x156918)[0x7fb154ac2918]
/home/USER/daos/install/lib64/../prereq/release/mercury/lib/../../ofi/lib/libfabric.so.1(+0x3842f)[0x7fb14e59642f]
/home/USER/daos/install/lib64/../prereq/release/mercury/lib/../../ofi/lib/libfabric.so.1(+0x3872e)[0x7fb14e59672e]
/home/USER/daos/install/lib64/../prereq/release/mercury/lib/../../ofi/lib/libfabric.so.1(+0x37f06)[0x7fb14e595f06]
/home/USER/daos/install/lib64/../prereq/release/mercury/lib/../../ofi/lib/libfabric.so.1(+0x39b7a)[0x7fb14e597b7a]
/home/USER/daos/install/lib64/../prereq/release/mercury/lib/../../ofi/lib/libfabric.so.1(+0x5f10e)[0x7fb14e5bd10e]
ERROR: daos_engine:1 /home/USER/daos/install/lib64/../prereq/release/mercury/lib/../../ofi/lib/libfabric.so.1(+0x7088b)[0x7fb14e5ce88b]
/home/USER/daos/install/lib64/../prereq/release/mercury/lib/libna.so.2(+0xdab9)[0x7fb1537bdab9]
/home/USER/daos/install/lib64/../prereq/release/mercury/lib/libna.so.2(NA_Initialize_opt+0x3af)[0x7fb1537b422f]
/home/USER/daos/install/lib64/../prereq/release/mercury/lib/libmercury.so.2(+0xd03f)[0x7fb1539df03f]
/home/USER/daos/install/lib64/../prereq/release/mercury/lib/libmercury.so.2(HG_Core_init_opt+0xa)[0x7fb1539e4fda]
/home/USER/daos/install/lib64/../prereq/release/mercury/lib/libmercury.so.2(HG_Init_opt+0x7b)[0x7fb1539d79bb]
ERROR: daos_engine:1 /home/USER/daos/install/lib64/libcart.so.4(+0x4c92a)[0x7fb15646c92a]
/home/USER/daos/install/lib64/libcart.so.4(crt_hg_ctx_init+0x388)[0x7fb15646dce8]
/home/USER/daos/install/lib64/libcart.so.4(crt_context_create+0x40a)[0x7fb15643a6ca]
/home/USER/daos/install/bin/daos_engine[0x420b58]
ERROR: daos_engine:1 /home/USER/daos/install/bin/../prereq/release/argobots/lib/libabt.so.0(+0x1317b)[0x7fb1553d617b]
/home/USER/daos/install/bin/../prereq/release/argobots/lib/libabt.so.0(+0x13851)[0x7fb1553d6851]
ERROR: daos_engine:0 *** Process 60472 received signal 11 ***
Associated errno: Success (0)
Failing for address: 0x7f2093174000
ERROR: daos_engine:0 /lib64/libpthread.so.0(+0xf630)[0x7f2093d03630]
ERROR: daos_engine:0 /lib64/libc.so.6(+0x156918)[0x7f20931d1918]
/home/USER/daos/install/lib64/../prereq/release/mercury/lib/../../ofi/lib/libfabric.so.1(+0x3842f)[0x7f208cca542f]
/home/USER/daos/install/lib64/../prereq/release/mercury/lib/../../ofi/lib/libfabric.so.1(+0x3872e)[0x7f208cca572e]
/home/USER/daos/install/lib64/../prereq/release/mercury/lib/../../ofi/lib/libfabric.so.1(+0x37f06)[0x7f208cca4f06]
/home/USER/daos/install/lib64/../prereq/release/mercury/lib/../../ofi/lib/libfabric.so.1(+0x39b7a)[0x7f208cca6b7a]
/home/USER/daos/install/lib64/../prereq/release/mercury/lib/../../ofi/lib/libfabric.so.1(+0x5f10e)[0x7f208cccc10e]
/home/USER/daos/install/lib64/../prereq/release/mercury/lib/../../ofi/lib/libfabric.so.1(+0x7088b)[0x7f208ccdd88b]
/home/USER/daos/install/lib64/../prereq/release/mercury/lib/libna.so.2(+0xdab9)[0x7f2091eccab9]
/home/USER/daos/install/lib64/../prereq/release/mercury/lib/libna.so.2(NA_Initialize_opt+0x3af)[0x7f2091ec322f]
/home/USER/daos/install/lib64/../prereq/release/mercury/lib/libmercury.so.2(+0xd03f)[0x7f20920ee03f]
/home/USER/daos/install/lib64/../prereq/release/mercury/lib/libmercury.so.2(HG_Core_init_opt+0xa)[0x7f20920f3fda]
/home/USER/daos/install/lib64/../prereq/release/mercury/lib/libmercury.so.2(HG_Init_opt+0x7b)[0x7f20920e69bb]
ERROR: daos_engine:0 /home/USER/daos/install/lib64/libcart.so.4(+0x4c92a)[0x7f2094b7b92a]
/home/USER/daos/install/lib64/libcart.so.4(crt_hg_ctx_init+0x388)[0x7f2094b7cce8]
/home/USER/daos/install/lib64/libcart.so.4(crt_context_create+0x40a)[0x7f2094b496ca]
/home/USER/daos/install/bin/daos_engine[0x420b58]
/home/USER/daos/install/bin/../prereq/release/argobots/lib/libabt.so.0(+0x1317b)[0x7f2093ae517b]
/home/USER/daos/install/bin/../prereq/release/argobots/lib/libabt.so.0(+0x13851)[0x7f2093ae5851]
instance 0 exited: instance 0 exited prematurely: /home/USER/daos/install/bin/daos_engine (instance 0) exited: signal: segmentation fault (core dumped)
ERROR: removing socket file: removing instance 0 socket file: no dRPC client set (data plane not started?)
DEBUG 18:18:20.292988 system.go:237: forwarding engine_status_down event to MS access points [localhost:10001] (seq: 1)
&&& RAS EVENT id: [engine_status_down] ts: [2021-04-19T18:18:20.292879+0900] host: [10.0.0.0_104810] type: [STATE_CHANGE] sev: [ERROR] msg: [DAOS rank exited unexpectedly] pid: [59341] rank:
[0]
DEBUG 18:18:20.294097 system.go:202: DAOS cluster event request: sequence:1 event:<id:2 msg:"DAOS rank exited unexpectedly" timestamp:"2021-04-19T18:18:20.292879+0900" type:1 severity:3 hostname:"10.0.0.0_104810"
proc_id:59341 rank_state_info:<errored:true error:"instance 0 exited prematurely: /home/USER/daos/install/bin/daos_engine (instance 0) exited: signal: segmentation fault (core dumped)" > >
DEBUG 18:18:20.294315 rpc.go:196: request hosts: [localhost:10001]
DEBUG 18:18:20.299667 rpc.go:380: MS request error: not the DAOS Management Service leader (try or one of ); retrying after 0s
DEBUG 18:18:20.299845 rpc.go:196: request hosts: [localhost:10001]
DEBUG 18:18:20.301829 rpc.go:380: MS request error: not the DAOS Management Service leader (try or one of ); retrying after 1.25s
instance 1 exited: instance 1 exited prematurely: /home/USER/daos/install/bin/daos_engine (instance 1) exited: signal: segmentation fault (core dumped)
ERROR: removing socket file: removing instance 1 socket file: no dRPC client set (data plane not started?)
DEBUG 18:18:21.174777 system.go:237: forwarding engine_status_down event to MS access points [localhost:10001] (seq: 2)
&&& RAS EVENT id: [engine_status_down] ts: [2021-04-19T18:18:21.174668+0900] host: [10.0.0.0_104810] type: [STATE_CHANGE] sev: [ERROR] msg: [DAOS rank exited unexpectedly] pid: [59341] rank:
[1]
DEBUG 18:18:21.175125 system.go:202: DAOS cluster event request: sequence:2 event:<id:2 msg:"DAOS rank exited unexpectedly" timestamp:"2021-04-19T18:18:21.174668+0900" type:1 severity:3 hostname:"10.0.0.0_104810"
rank:1 proc_id:59341 rank_state_info:<instance:1 errored:true error:"instance 1 exited prematurely: /home/USER/daos/install/bin/daos_engine (instance 1) exited: signal: segmentation fault (core dumped)" > >
DEBUG 18:18:21.175325 rpc.go:196: request hosts: [localhost:10001]
DEBUG 18:18:21.177949 rpc.go:380: MS request error: not the DAOS Management Service leader (try or one of ); retrying after 0s
DEBUG 18:18:21.178188 rpc.go:196: request hosts: [localhost:10001]
DEBUG 18:18:21.182118 rpc.go:380: MS request error: not the DAOS Management Service leader (try or one of ); retrying after 1.75s
DEBUG 18:18:21.552634 rpc.go:196: request hosts: [localhost:10001]
DEBUG 18:18:21.555054 rpc.go:380: MS request error: not the DAOS Management Service leader (try or one of ); retrying after 1.75s
DEBUG 18:18:22.933217 rpc.go:196: request hosts: [localhost:10001]
DEBUG 18:18:22.935642 rpc.go:380: MS request error: not the DAOS Management Service leader (try or one of ); retrying after 2.75s
DEBUG 18:18:23.157591 raft.go:214: heartbeat timeout reached, starting election: last-leader=
DEBUG 18:18:23.157720 raft.go:250: entering candidate state: node=Node at 127.0.0.1:10001 [Candidate] term=4
DEBUG 18:18:23.158132 raft.go:268: votes: needed=1
DEBUG 18:18:23.158193 raft.go:287: vote granted: from=127.0.0.1:10001 term=4 tally=1
DEBUG 18:18:23.158236 raft.go:292: election won: tally=1
DEBUG 18:18:23.158289 raft.go:363: entering leader state: leader=Node at 127.0.0.1:10001 [Leader]
DEBUG 18:18:23.158506 database.go:414: node 127.0.0.1:10001 gained MS leader state
MS leader running on 10.0.0.0_104810
DEBUG 18:18:23.158614 mgmt_system.go:148: starting joinLoop
DEBUG 18:18:23.305370 rpc.go:196: request hosts: [localhost:10001]
DEBUG 18:18:23.307429 membership.go:451: processing RAS event "DAOS rank exited unexpectedly" from rank 0 on host "10.0.0.0_104810"
ERROR: updating member states: unable to find member with rank 0
DEBUG 18:18:25.686152 rpc.go:196: request hosts: [localhost:10001]
DEBUG 18:18:25.689836 membership.go:451: processing RAS event "DAOS rank exited unexpectedly" from rank 1 on host "10.0.0.0_104810"
ERROR: updating member states: unable to find member with rank 1