Re: Pool creation fails with "instance is not an access point"
4felgenh@...
Hi,
toggle quoted message
Show quoted text
thanks for the tip! It seems that a "regular dmg storage format" doesn't do the trick, but appending "--reformat" resolves the issue. It runs fine now. Kind regards Ruben Am 14.05.20 um 09:02 schrieb Lombardi, Johann:
|
|
Re: Pool creation fails with "instance is not an access point"
Lombardi, Johann
Hi,
I assume that you have run dmg storage format after starting the server and before creating the pool, right? If you don’t want to emulate any SSD, you should also comment out the bdev_* options in the yaml file.
Cheers, Johann
From:
<daos@daos.groups.io> on behalf of "4felgenh@..." <4felgenh@...>
Hello, --------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for |
|
Re: after formatting scm , no dRPC client set problem
Lombardi, Johann
Hi there,
From your log: snode3: Starting I/O server instance 0: /usr/bin/daos_io_server snode2: daos_io_server:0 05/12-06:28:47.96 snode2 Using legacy core allocation algorithm snode2: instance 0 exited: instance 0 exited prematurely: /usr/bin/daos_io_server (instance 0) exited: exit status 1
After format, the I/O engine failed to be started. Could you please look into the server logs under /tmp (i.e. /tmp/server.log)?
Cheers, Johann
From:
<daos@daos.groups.io> on behalf of "timehuang88@..." <timehuang88@...>
Hi there, snode2: daos_server logging to file /tmp/daos_control.log snode2: DEBUG 06:28:33.811928 start.go:105: Switching control log level to DEBUG snode2: DEBUG 06:28:33.812236 netdetect.go:829: Calling ValidateProviderConfig with eno3, ofi+verbs;ofi_rxm snode2: DEBUG 06:28:33.812276 netdetect.go:880: Input provider string: ofi+verbs;ofi_rxm snode1: daos_server logging to file /tmp/daos_control.log snode1: DEBUG 17:27:30.107559 start.go:105: Switching control log level to DEBUG snode1: DEBUG 17:27:30.107776 netdetect.go:829: Calling ValidateProviderConfig with eno3, ofi+verbs;ofi_rxm snode1: DEBUG 17:27:30.107811 netdetect.go:880: Input provider string: ofi+verbs;ofi_rxm snode3: daos_server logging to file /tmp/daos_control.log snode3: DEBUG 17:28:01.164972 start.go:105: Switching control log level to DEBUG snode3: DEBUG 17:28:01.165212 netdetect.go:829: Calling ValidateProviderConfig with eno3, ofi+verbs;ofi_rxm snode3: DEBUG 17:28:01.165251 netdetect.go:880: Input provider string: ofi+verbs;ofi_rxm snode2: DEBUG 06:28:34.026543 netdetect.go:912: There are 0 hfi1 devices in the system snode2: DEBUG 06:28:34.026642 netdetect.go:844: Device eno3 supports provider: ofi+verbs;ofi_rxm snode2: DEBUG 06:28:34.027690 config.go:391: Active config saved to /etc/daos/.daos_server.active.yml (read-only) snode2: DEBUG 06:28:34.028020 server.go:137: automatic NVMe prepare req: {ForwardableRequest:{Forwarded:false} HugePageCount:4096 PCIWhitelist: TargetUser:root ResetOnly:false} snode1: DEBUG 17:27:30.326383 netdetect.go:912: There are 0 hfi1 devices in the system snode1: DEBUG 17:27:30.326540 netdetect.go:844: Device eno3 supports provider: ofi+verbs;ofi_rxm snode1: DEBUG 17:27:30.327572 config.go:391: Active config saved to /etc/daos/.daos_server.active.yml (read-only) snode1: DEBUG 17:27:30.327889 server.go:137: automatic NVMe prepare req: {ForwardableRequest:{Forwarded:false} HugePageCount:4096 PCIWhitelist: TargetUser:root ResetOnly:false} snode3: DEBUG 17:28:01.386394 netdetect.go:912: There are 0 hfi1 devices in the system snode3: DEBUG 17:28:01.386496 netdetect.go:844: Device eno3 supports provider: ofi+verbs;ofi_rxm snode3: DEBUG 17:28:01.387691 config.go:391: Active config saved to /etc/daos/.daos_server.active.yml (read-only) snode3: DEBUG 17:28:01.387992 server.go:137: automatic NVMe prepare req: {ForwardableRequest:{Forwarded:false} HugePageCount:4096 PCIWhitelist: TargetUser:root ResetOnly:false} snode2: DEBUG 06:28:42.730225 netdetect.go:591: Searching for a device alias for: eno3 snode2: DEBUG 06:28:42.961947 netdetect.go:334: There are 2 children of this parent node. snode2: DEBUG 06:28:42.962026 netdetect.go:616: Device alias for eno3 is i40iw0 snode3: DEBUG 17:28:10.257330 netdetect.go:591: Searching for a device alias for: eno3 snode2: ERROR: /usr/bin/daos_admin EAL: No available hugepages reported in hugepages-1048576kB snode3: DEBUG 17:28:10.492866 netdetect.go:334: There are 2 children of this parent node. snode3: DEBUG 17:28:10.492945 netdetect.go:616: Device alias for eno3 is i40iw1 snode3: ERROR: /usr/bin/daos_admin EAL: No available hugepages reported in hugepages-1048576kB snode1: DEBUG 17:27:40.062619 netdetect.go:591: Searching for a device alias for: eno3 snode1: DEBUG 17:27:40.294456 netdetect.go:334: There are 2 children of this parent node. snode1: DEBUG 17:27:40.294538 netdetect.go:616: Device alias for eno3 is i40iw1 snode1: ERROR: /usr/bin/daos_admin EAL: No available hugepages reported in hugepages-1048576kB snode2: DAOS Control Server (pid 10183) listening on 0.0.0.0:10001 snode2: DEBUG 06:28:45.672802 instance_exec.go:55: instance 0: checking if storage is formatted snode2: Waiting for DAOS I/O Server instance 0 storage to be ready... snode2: DEBUG 06:28:45.672850 instance_storage.go:88: /mnt/daos: checking formatting snode3: DAOS Control Server (pid 11997) listening on 0.0.0.0:10001 snode3: DEBUG 17:28:12.955111 instance_exec.go:55: instance 0: checking if storage is formatted snode3: Waiting for DAOS I/O Server instance 0 storage to be ready... snode3: DEBUG 17:28:12.955181 instance_storage.go:88: /mnt/daos: checking formatting snode1: DAOS Control Server (pid 14644) listening on 0.0.0.0:10001 snode1: DEBUG 17:27:42.844751 instance_exec.go:55: instance 0: checking if storage is formatted snode1: Waiting for DAOS I/O Server instance 0 storage to be ready... snode1: DEBUG 17:27:42.844827 instance_storage.go:88: /mnt/daos: checking formatting snode2: DEBUG 06:28:47.811976 instance_storage.go:104: /mnt/daos (dcpm) needs format: false snode2: DEBUG 06:28:47.812056 instance_storage.go:135: instance 0: no SCM format required; checking for superblock snode2: DEBUG 06:28:47.812109 superblock.go:112: /mnt/daos: checking superblock snode2: DEBUG 06:28:47.813721 instance_storage.go:141: instance 0: superblock not needed snode2: SCM @ /mnt/daos: 532 GB Total/528 GB Avail snode2: DEBUG 06:28:47.814325 instance_exec.go:93: instance 0: awaiting DAOS I/O Server init snode2: DEBUG 06:28:47.814536 exec.go:115: daos_io_server:0 args: [-t 8 -x 0 -f 1 -g daos_server -d /tmp/daos_sockets -s /mnt/daos -n /mnt/daos/daos_nvme.conf -i 10184 -I 0] snode2: DEBUG 06:28:47.814617 exec.go:116: daos_io_server:0 env: [OFI_INTERFACE=eno3 CRT_TIMEOUT=0 DAOS_MD_CAP=1024 CRT_CTX_SHARE_ADDR=0 CRT_PHY_ADDR_STR=ofi+verbs;ofi_rxm D_LOG_FILE=/tmp/server0.log OFI_PORT=31416 FI_SOCKETS_MAX_CONN_RETRY=1 FI_SOCKETS_CONN_TIMEOUT=2000 OFI_DOMAIN=i40iw0 D_LOG_MASK=ERR] snode2: Starting I/O server instance 0: /usr/bin/daos_io_server snode3: DEBUG 17:28:15.086695 instance_storage.go:104: /mnt/daos (dcpm) needs format: false snode3: DEBUG 17:28:15.086764 instance_storage.go:135: instance 0: no SCM format required; checking for superblock snode3: DEBUG 17:28:15.086822 superblock.go:112: /mnt/daos: checking superblock snode3: DEBUG 17:28:15.088428 instance_storage.go:141: instance 0: superblock not needed snode3: SCM @ /mnt/daos: 532 GB Total/528 GB Avail snode3: DEBUG 17:28:15.089161 instance_exec.go:93: instance 0: awaiting DAOS I/O Server init snode3: DEBUG 17:28:15.089521 exec.go:115: daos_io_server:0 args: [-t 8 -x 0 -f 1 -g daos_server -d /tmp/daos_sockets -s /mnt/daos -n /mnt/daos/daos_nvme.conf -i 11998 -I 0] snode3: DEBUG 17:28:15.089610 exec.go:116: daos_io_server:0 env: [OFI_DOMAIN=i40iw1 DAOS_MD_CAP=1024 FI_SOCKETS_MAX_CONN_RETRY=1 D_LOG_MASK=ERR CRT_PHY_ADDR_STR=ofi+verbs;ofi_rxm OFI_INTERFACE=eno3 OFI_PORT=31416 CRT_CTX_SHARE_ADDR=0 CRT_TIMEOUT=0 FI_SOCKETS_CONN_TIMEOUT=2000 D_LOG_FILE=/tmp/server0.log] snode3: Starting I/O server instance 0: /usr/bin/daos_io_server snode2: daos_io_server:0 05/12-06:28:47.96 snode2 Using legacy core allocation algorithm snode2: instance 0 exited: instance 0 exited prematurely: /usr/bin/daos_io_server (instance 0) exited: exit status 1 snode2: ERROR: removing socket file: removing instance 0 socket file: no dRPC client set (data plane not started?) snode3: daos_io_server:0 05/11-17:28:15.24 snode3 Using legacy core allocation algorithm snode3: instance 0 exited: instance 0 exited prematurely: /usr/bin/daos_io_server (instance 0) exited: exit status 1 snode3: ERROR: removing socket file: removing instance 0 socket file: no dRPC client set (data plane not started?) snode1: DEBUG 17:27:44.741324 instance_storage.go:104: /mnt/daos (dcpm) needs format: false snode1: DEBUG 17:27:44.741373 instance_storage.go:135: instance 0: no SCM format required; checking for superblock snode1: DEBUG 17:27:44.741404 superblock.go:112: /mnt/daos: checking superblock snode1: DEBUG 17:27:44.742695 instance_storage.go:141: instance 0: superblock not needed snode1: SCM @ /mnt/daos: 532 GB Total/528 GB Avail snode1: DEBUG 17:27:44.743511 instance.go:382: instance 0: bootstrapping system member: rank 0, addr 10.158.24.33:10001 snode1: DEBUG 17:27:44.743543 instance_exec.go:93: instance 0: awaiting DAOS I/O Server init snode1: DEBUG 17:27:44.743998 exec.go:115: daos_io_server:0 args: [-t 8 -x 0 -f 1 -g daos_server -d /tmp/daos_sockets -s /mnt/daos -n /mnt/daos/daos_nvme.conf -i 14645 -I 0] snode1: DEBUG 17:27:44.744066 exec.go:116: daos_io_server:0 env: [FI_SOCKETS_CONN_TIMEOUT=2000 OFI_DOMAIN=i40iw1 DAOS_MD_CAP=1024 D_LOG_MASK=ERR CRT_PHY_ADDR_STR=ofi+verbs;ofi_rxm OFI_PORT=31416 CRT_CTX_SHARE_ADDR=0 CRT_TIMEOUT=0 FI_SOCKETS_MAX_CONN_RETRY=1 D_LOG_FILE=/tmp/server0.log OFI_INTERFACE=eno3] snode1: Starting I/O server instance 0: /usr/bin/daos_io_server snode1: daos_io_server:0 05/11-17:27:44.89 snode1 Using legacy core allocation algorithm snode1: instance 0 exited: instance 0 exited prematurely: /usr/bin/daos_io_server (instance 0) exited: exit status 1 snode1: ERROR: removing socket file: removing instance 0 socket file: no dRPC client set (data plane not started?)
--------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for |
|
Re: Message looks serious?
Wang, Di
If 40 does not exist, it should not be able to connect to the pool at all, I.e. it should output sth like "failed to connect to pool: …”.
These logs seems suggesting the pool connection did happen. Would you please collect the client side daos log? (by "export D_LOG_FILE=xxx “? It might tell us what really happened. Thanks.
Thanks
WangDi
From: <daos@daos.groups.io> on behalf of Colin Ngam <cngam@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io> Date: Wednesday, May 13, 2020 at 10:08 AM To: "daos@daos.groups.io" <daos@daos.groups.io> Subject: [daos] Message looks serious? Greetings,
Executing the command: daos pool list-cont --pool a68b3845-fe78-481e-aa84-164e851d5f52 --svc 40
Note that 40 does not exist.
We did not get an error from the daos command.
In the log:
05/13-11:57:11.02 delphi-006 DAOS[26509/26552] pool WARN src/pool/srv_target.c:1020 ds_pool_tgt_map_update() Ignore update pool a68b3845 1 -> 1 05/13-11:57:11.02 delphi-006 DAOS[26509/26552] pool WARN src/pool/srv_target.c:1020 ds_pool_tgt_map_update() Ignore update pool a68b3845 1 -> 1 05/13-11:57:11.02 delphi-006 DAOS[26509/26552] pool WARN src/pool/srv_target.c:1020 ds_pool_tgt_map_update() Ignore update pool a68b3845 1 -> 1
My guess is that ds_pool_tgt_map_update() should not even be called?
Cheers,
Colin
|
|
Message looks serious?
Colin Ngam <cngam@...>
Greetings,
Executing the command: daos pool list-cont --pool a68b3845-fe78-481e-aa84-164e851d5f52 --svc 40
Note that 40 does not exist.
We did not get an error from the daos command.
In the log:
05/13-11:57:11.02 delphi-006 DAOS[26509/26552] pool WARN src/pool/srv_target.c:1020 ds_pool_tgt_map_update() Ignore update pool a68b3845 1 -> 1 05/13-11:57:11.02 delphi-006 DAOS[26509/26552] pool WARN src/pool/srv_target.c:1020 ds_pool_tgt_map_update() Ignore update pool a68b3845 1 -> 1 05/13-11:57:11.02 delphi-006 DAOS[26509/26552] pool WARN src/pool/srv_target.c:1020 ds_pool_tgt_map_update() Ignore update pool a68b3845 1 -> 1
My guess is that ds_pool_tgt_map_update() should not even be called?
Cheers,
Colin
|
|
after formatting scm , no dRPC client set problem
timehuang88@...
Hi there,
I try to run DAOS in a real physical environment, but got problem. Below is the output when i try to start DAOS system with 3 storage nodes. do anybody help me. if need any informatino, please let me know. thx. by the way, another question: inside the daos_server.yml file, which files should I put into the client_cert_ dir(/etc/daos/clients), or just leave it empty? [root@client ~]# clush -w snode[1-3] daos_server start -o /etc/daos/daos_server.yml snode2: daos_server logging to file /tmp/daos_control.log
snode2: DEBUG 06:28:33.811928 start.go:105: Switching control log level to DEBUG
snode2: DEBUG 06:28:33.812236 netdetect.go:829: Calling ValidateProviderConfig with eno3, ofi+verbs;ofi_rxm
snode2: DEBUG 06:28:33.812276 netdetect.go:880: Input provider string: ofi+verbs;ofi_rxm
snode1: daos_server logging to file /tmp/daos_control.log
snode1: DEBUG 17:27:30.107559 start.go:105: Switching control log level to DEBUG
snode1: DEBUG 17:27:30.107776 netdetect.go:829: Calling ValidateProviderConfig with eno3, ofi+verbs;ofi_rxm
snode1: DEBUG 17:27:30.107811 netdetect.go:880: Input provider string: ofi+verbs;ofi_rxm
snode3: daos_server logging to file /tmp/daos_control.log
snode3: DEBUG 17:28:01.164972 start.go:105: Switching control log level to DEBUG
snode3: DEBUG 17:28:01.165212 netdetect.go:829: Calling ValidateProviderConfig with eno3, ofi+verbs;ofi_rxm
snode3: DEBUG 17:28:01.165251 netdetect.go:880: Input provider string: ofi+verbs;ofi_rxm
snode2: DEBUG 06:28:34.026543 netdetect.go:912: There are 0 hfi1 devices in the system
snode2: DEBUG 06:28:34.026642 netdetect.go:844: Device eno3 supports provider: ofi+verbs;ofi_rxm
snode2: DEBUG 06:28:34.027690 config.go:391: Active config saved to /etc/daos/.daos_server.active.yml (read-only)
snode2: DEBUG 06:28:34.028020 server.go:137: automatic NVMe prepare req: {ForwardableRequest:{Forwarded:false} HugePageCount:4096 PCIWhitelist: TargetUser:root ResetOnly:false}
snode1: DEBUG 17:27:30.326383 netdetect.go:912: There are 0 hfi1 devices in the system
snode1: DEBUG 17:27:30.326540 netdetect.go:844: Device eno3 supports provider: ofi+verbs;ofi_rxm
snode1: DEBUG 17:27:30.327572 config.go:391: Active config saved to /etc/daos/.daos_server.active.yml (read-only)
snode1: DEBUG 17:27:30.327889 server.go:137: automatic NVMe prepare req: {ForwardableRequest:{Forwarded:false} HugePageCount:4096 PCIWhitelist: TargetUser:root ResetOnly:false}
snode3: DEBUG 17:28:01.386394 netdetect.go:912: There are 0 hfi1 devices in the system
snode3: DEBUG 17:28:01.386496 netdetect.go:844: Device eno3 supports provider: ofi+verbs;ofi_rxm
snode3: DEBUG 17:28:01.387691 config.go:391: Active config saved to /etc/daos/.daos_server.active.yml (read-only)
snode3: DEBUG 17:28:01.387992 server.go:137: automatic NVMe prepare req: {ForwardableRequest:{Forwarded:false} HugePageCount:4096 PCIWhitelist: TargetUser:root ResetOnly:false}
snode2: DEBUG 06:28:42.730225 netdetect.go:591: Searching for a device alias for: eno3
snode2: DEBUG 06:28:42.961947 netdetect.go:334: There are 2 children of this parent node.
snode2: DEBUG 06:28:42.962026 netdetect.go:616: Device alias for eno3 is i40iw0
snode3: DEBUG 17:28:10.257330 netdetect.go:591: Searching for a device alias for: eno3
snode2: ERROR: /usr/bin/daos_admin EAL: No available hugepages reported in hugepages-1048576kB
snode3: DEBUG 17:28:10.492866 netdetect.go:334: There are 2 children of this parent node.
snode3: DEBUG 17:28:10.492945 netdetect.go:616: Device alias for eno3 is i40iw1
snode3: ERROR: /usr/bin/daos_admin EAL: No available hugepages reported in hugepages-1048576kB
snode1: DEBUG 17:27:40.062619 netdetect.go:591: Searching for a device alias for: eno3
snode1: DEBUG 17:27:40.294456 netdetect.go:334: There are 2 children of this parent node.
snode1: DEBUG 17:27:40.294538 netdetect.go:616: Device alias for eno3 is i40iw1
snode1: ERROR: /usr/bin/daos_admin EAL: No available hugepages reported in hugepages-1048576kB
snode2: DAOS Control Server (pid 10183) listening on 0.0.0.0:10001
snode2: DEBUG 06:28:45.672802 instance_exec.go:55: instance 0: checking if storage is formatted
snode2: Waiting for DAOS I/O Server instance 0 storage to be ready...
snode2: DEBUG 06:28:45.672850 instance_storage.go:88: /mnt/daos: checking formatting
snode3: DAOS Control Server (pid 11997) listening on 0.0.0.0:10001
snode3: DEBUG 17:28:12.955111 instance_exec.go:55: instance 0: checking if storage is formatted
snode3: Waiting for DAOS I/O Server instance 0 storage to be ready...
snode3: DEBUG 17:28:12.955181 instance_storage.go:88: /mnt/daos: checking formatting
snode1: DAOS Control Server (pid 14644) listening on 0.0.0.0:10001
snode1: DEBUG 17:27:42.844751 instance_exec.go:55: instance 0: checking if storage is formatted
snode1: Waiting for DAOS I/O Server instance 0 storage to be ready...
snode1: DEBUG 17:27:42.844827 instance_storage.go:88: /mnt/daos: checking formatting
snode2: DEBUG 06:28:47.811976 instance_storage.go:104: /mnt/daos (dcpm) needs format: false
snode2: DEBUG 06:28:47.812056 instance_storage.go:135: instance 0: no SCM format required; checking for superblock
snode2: DEBUG 06:28:47.812109 superblock.go:112: /mnt/daos: checking superblock
snode2: DEBUG 06:28:47.813721 instance_storage.go:141: instance 0: superblock not needed
snode2: SCM @ /mnt/daos: 532 GB Total/528 GB Avail
snode2: DEBUG 06:28:47.814325 instance_exec.go:93: instance 0: awaiting DAOS I/O Server init
snode2: DEBUG 06:28:47.814536 exec.go:115: daos_io_server:0 args: [-t 8 -x 0 -f 1 -g daos_server -d /tmp/daos_sockets -s /mnt/daos -n /mnt/daos/daos_nvme.conf -i 10184 -I 0]
snode2: DEBUG 06:28:47.814617 exec.go:116: daos_io_server:0 env: [OFI_INTERFACE=eno3 CRT_TIMEOUT=0 DAOS_MD_CAP=1024 CRT_CTX_SHARE_ADDR=0 CRT_PHY_ADDR_STR=ofi+verbs;ofi_rxm D_LOG_FILE=/tmp/server0.log OFI_PORT=31416 FI_SOCKETS_MAX_CONN_RETRY=1 FI_SOCKETS_CONN_TIMEOUT=2000 OFI_DOMAIN=i40iw0 D_LOG_MASK=ERR]
snode2: Starting I/O server instance 0: /usr/bin/daos_io_server
snode3: DEBUG 17:28:15.086695 instance_storage.go:104: /mnt/daos (dcpm) needs format: false
snode3: DEBUG 17:28:15.086764 instance_storage.go:135: instance 0: no SCM format required; checking for superblock
snode3: DEBUG 17:28:15.086822 superblock.go:112: /mnt/daos: checking superblock
snode3: DEBUG 17:28:15.088428 instance_storage.go:141: instance 0: superblock not needed
snode3: SCM @ /mnt/daos: 532 GB Total/528 GB Avail
snode3: DEBUG 17:28:15.089161 instance_exec.go:93: instance 0: awaiting DAOS I/O Server init
snode3: DEBUG 17:28:15.089521 exec.go:115: daos_io_server:0 args: [-t 8 -x 0 -f 1 -g daos_server -d /tmp/daos_sockets -s /mnt/daos -n /mnt/daos/daos_nvme.conf -i 11998 -I 0]
snode3: DEBUG 17:28:15.089610 exec.go:116: daos_io_server:0 env: [OFI_DOMAIN=i40iw1 DAOS_MD_CAP=1024 FI_SOCKETS_MAX_CONN_RETRY=1 D_LOG_MASK=ERR CRT_PHY_ADDR_STR=ofi+verbs;ofi_rxm OFI_INTERFACE=eno3 OFI_PORT=31416 CRT_CTX_SHARE_ADDR=0 CRT_TIMEOUT=0 FI_SOCKETS_CONN_TIMEOUT=2000 D_LOG_FILE=/tmp/server0.log]
snode3: Starting I/O server instance 0: /usr/bin/daos_io_server
snode2: daos_io_server:0 05/12-06:28:47.96 snode2 Using legacy core allocation algorithm
snode2: instance 0 exited: instance 0 exited prematurely: /usr/bin/daos_io_server (instance 0) exited: exit status 1
snode2: ERROR: removing socket file: removing instance 0 socket file: no dRPC client set (data plane not started?)
snode3: daos_io_server:0 05/11-17:28:15.24 snode3 Using legacy core allocation algorithm
snode3: instance 0 exited: instance 0 exited prematurely: /usr/bin/daos_io_server (instance 0) exited: exit status 1
snode3: ERROR: removing socket file: removing instance 0 socket file: no dRPC client set (data plane not started?)
snode1: DEBUG 17:27:44.741324 instance_storage.go:104: /mnt/daos (dcpm) needs format: false
snode1: DEBUG 17:27:44.741373 instance_storage.go:135: instance 0: no SCM format required; checking for superblock
snode1: DEBUG 17:27:44.741404 superblock.go:112: /mnt/daos: checking superblock
snode1: DEBUG 17:27:44.742695 instance_storage.go:141: instance 0: superblock not needed
snode1: SCM @ /mnt/daos: 532 GB Total/528 GB Avail
snode1: DEBUG 17:27:44.743511 instance.go:382: instance 0: bootstrapping system member: rank 0, addr 10.158.24.33:10001
snode1: DEBUG 17:27:44.743543 instance_exec.go:93: instance 0: awaiting DAOS I/O Server init
snode1: DEBUG 17:27:44.743998 exec.go:115: daos_io_server:0 args: [-t 8 -x 0 -f 1 -g daos_server -d /tmp/daos_sockets -s /mnt/daos -n /mnt/daos/daos_nvme.conf -i 14645 -I 0]
snode1: DEBUG 17:27:44.744066 exec.go:116: daos_io_server:0 env: [FI_SOCKETS_CONN_TIMEOUT=2000 OFI_DOMAIN=i40iw1 DAOS_MD_CAP=1024 D_LOG_MASK=ERR CRT_PHY_ADDR_STR=ofi+verbs;ofi_rxm OFI_PORT=31416 CRT_CTX_SHARE_ADDR=0 CRT_TIMEOUT=0 FI_SOCKETS_MAX_CONN_RETRY=1 D_LOG_FILE=/tmp/server0.log OFI_INTERFACE=eno3]
snode1: Starting I/O server instance 0: /usr/bin/daos_io_server
snode1: daos_io_server:0 05/11-17:27:44.89 snode1 Using legacy core allocation algorithm
snode1: instance 0 exited: instance 0 exited prematurely: /usr/bin/daos_io_server (instance 0) exited: exit status 1
snode1: ERROR: removing socket file: removing instance 0 socket file: no dRPC client set (data plane not started?)
|
|
Pool creation fails with "instance is not an access point"
4felgenh@...
Hello,
My setup and issues are very similar to what has been described in https://daos.groups.io/g/daos/message/317. I have tried every suggested fix from the corresponding thread and so far, none have worked. I'd like to setup a very simple daos server for testing purposes on a server with 120 GB of regular RAM, no NVMe SSDs attached, and only one single server instance. However, I'm not using docker. Hence, I installed daos as described in the admin guide and used the config file from daos/utils/config/examples/daos_server_local.yml which would use a ram disk to emulate scm. After I have started the daos server with, "daos_server --debug --config=$basepath/daos_server_local.yml start", I'd like to create a pool in a second terminal with "dmg -i -l localhost:10001 pool create -s 1G". This fails with: localhost:10001: connected Pool-create command FAILED: rpc error: code = Unknown desc = instance is not an access point ERROR: dmg: rpc error: code = Unknown desc = instance is not an access point Did I miss some step that I have to execute beforehand? |
|
DAOS command no longer giving output on failure
Farrell, Patrick Arthur <patrick.farrell@...>
Good morning,
I'm running commit:
commit 53b0c5ff3d45c8addfee11cbfd6dd49e6f88dc3e
And the daos container create command is no longer giving any output on failure:
[daos@hl-d102 ~]$ daos container create --pool=d885cc1f-9363-4342-a859-ec32b3230c08 --svc=1 16G
[daos@hl-d102 ~]$ daos container create --pool=d885cc1f-9363-4342-a859-ec32b3230c08 --svc=1
[daos@hl-d102 ~]$ daos container create --pool=d885cc1f-9363-4342-a859-ec32b3230c08
[daos@hl-d102 ~]$ daos container create --pool=d885cc1f-9363-4342-a859-ec32b3230c0
All of those failed. I am not completely sure why they failed, and I'm capable of using debug to determine why (it's probably some sort of config issue), and obviously the first command is wrong.
The key point is:
None of them gave *any* output. This makes troubleshooting a little tricky. I'm not sure at what commit this last worked, but it definitely did a few weeks ago.
Thanks,
Patrick
|
|
Re: Dead definition?
Li, Wei G
Hi Colin,
toggle quoted message
Show quoted text
This part could use some improvements indeed. The “in” and “out” structs are generated by macros from src/container/rpc.h: CRT_RPC_DECLARE(cont_snap_create, DAOS_ISEQ_CONT_EPOCH_OP, DAOS_OSEQ_CONT_EPOCH_OP) CRT_RPC_DECLARE(cont_snap_destroy, DAOS_ISEQ_CONT_EPOCH_OP, DAOS_OSEQ_CONT_EPOCH_OP) I think ds_cont_snap_{create,destroy} should, instead, use cont_snap_{create,destroy}_{in,out} that differ from cont_epoch_op_{in,out}. For “create”, cei_epoch doesn’t really apply (i.e., DAOS-4453); for “destroy”, so doesn’t eco_epoch. We shall make this RPC format change early, if possible. Thanks, liwei On May 8, 2020, at 8:12 AM, Colin Ngam <colin.ngam@...> wrote: |
|
Re: missing protoc-gen-c
Zhang, Jiafu
Thanks you, Nabarro.
From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of
Nabarro, Tom
That’s the one . thanks
From: daos@daos.groups.io <daos@daos.groups.io>
On Behalf Of Colin Ngam
This one? doc/dev/development.md
From: <daos@daos.groups.io> on behalf of "Nabarro, Tom" <tom.nabarro@...>
So apologies but that link is dead, anyone know where the contents got moved to? I can’t find any references to relevant content in the docs.
The protobuf-c package is installed during build (I’m using master) and gets pulled in from <daos>/utils/sl/components/__init__.py but we don’t build the protoc* binaries/full compiler to avoid overhead for something that is only required very occasionally for developer purposes (we supply the –disable-protoc configure option in the build).
For the moment you will need to build the compiler and plug-in yourself but we plan on providing a build flag for developers that’s need it.
Tom
From: daos@daos.groups.io <daos@daos.groups.io>
On Behalf Of Nabarro, Tom
This is a development tool so maybe it’s not pulled in by the default build.
I installed as follows:
git clone https://github.com/protobuf-c/protobuf-c cd protobuf-c/ ./autogen.sh ./configure --prefix=/home/tanabarr/protobuf/install PKG_CONFIG_PATH=/home/tanabarr/protobuf/install/lib/pkgconfig make && make install
it is not a plug-in that ships with the stock protobuf compiler package (which ships instead with C++ plugin).
See the <daos>/src/proto/Makefile for some details and this will point you to the following doc if compiler is missing.
https://github.com/daos-stack/daos/blob/master/doc/development.md#protobuf-compiler
Regards, Tom Nabarro – DCG/ESAD M: +44 (0)7786 260986 Skype: tom.nabarro
From: daos@daos.groups.io <daos@daos.groups.io>
On Behalf Of Zhang, Jiafu
Hi Guys,
In https://github.com/daos-stack/daos/tree/master/src/proto/README.md, it uses “protoc -I mgmt --c_out=../mgmt mgmt/srv.proto --plugin=/opt/potobuf/install/bin/protoc-gen-c” to generate C code. In recent DAOS, I cannot find “protoc-gen-c”.
Do you know why?
Thanks. --------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for --------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for --------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for |
|
Dead definition?
Colin Ngam
Greetings,
struct cont_snap_destroy_in { struct cont_op_in cei_op; daos_epoch_t cei_epoch; };
struct cont_snap_destroy_out { struct cont_op_out ceo_op; daos_epoch_t ceo_epoch; };
Looks dead as the routine ds_cont_snap_destroy() seems to use: struct cont_epoch_op_in { struct cont_op_in cei_op; daos_epoch_t cei_epoch; };
struct cont_epoch_op_out { struct cont_op_out ceo_op; daos_epoch_t ceo_epoch; };
Thanks.
Colin
|
|
Re: missing protoc-gen-c
Broken links fixed on master https://github.com/daos-stack/daos/pull/2654
From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of
Nabarro, Tom
That’s the one . thanks
From: daos@daos.groups.io <daos@daos.groups.io>
On Behalf Of Colin Ngam
This one? doc/dev/development.md
From: <daos@daos.groups.io> on behalf of "Nabarro, Tom" <tom.nabarro@...>
So apologies but that link is dead, anyone know where the contents got moved to? I can’t find any references to relevant content in the docs.
The protobuf-c package is installed during build (I’m using master) and gets pulled in from <daos>/utils/sl/components/__init__.py but we don’t build the protoc* binaries/full compiler to avoid overhead for something that is only required very occasionally for developer purposes (we supply the –disable-protoc configure option in the build).
For the moment you will need to build the compiler and plug-in yourself but we plan on providing a build flag for developers that’s need it.
Tom
From: daos@daos.groups.io <daos@daos.groups.io>
On Behalf Of Nabarro, Tom
This is a development tool so maybe it’s not pulled in by the default build.
I installed as follows:
git clone https://github.com/protobuf-c/protobuf-c cd protobuf-c/ ./autogen.sh ./configure --prefix=/home/tanabarr/protobuf/install PKG_CONFIG_PATH=/home/tanabarr/protobuf/install/lib/pkgconfig make && make install
it is not a plug-in that ships with the stock protobuf compiler package (which ships instead with C++ plugin).
See the <daos>/src/proto/Makefile for some details and this will point you to the following doc if compiler is missing.
https://github.com/daos-stack/daos/blob/master/doc/development.md#protobuf-compiler
Regards, Tom Nabarro – DCG/ESAD M: +44 (0)7786 260986 Skype: tom.nabarro
From: daos@daos.groups.io <daos@daos.groups.io>
On Behalf Of Zhang, Jiafu
Hi Guys,
In https://github.com/daos-stack/daos/tree/master/src/proto/README.md, it uses “protoc -I mgmt --c_out=../mgmt mgmt/srv.proto --plugin=/opt/potobuf/install/bin/protoc-gen-c” to generate C code. In recent DAOS, I cannot find “protoc-gen-c”.
Do you know why?
Thanks. --------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for --------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for --------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for --------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for |
|
Re: missing protoc-gen-c
That’s the one . thanks
From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of
Colin Ngam
This one? doc/dev/development.md
From: <daos@daos.groups.io> on behalf of "Nabarro, Tom" <tom.nabarro@...>
So apologies but that link is dead, anyone know where the contents got moved to? I can’t find any references to relevant content in the docs.
The protobuf-c package is installed during build (I’m using master) and gets pulled in from <daos>/utils/sl/components/__init__.py but we don’t build the protoc* binaries/full compiler to avoid overhead for something that is only required very occasionally for developer purposes (we supply the –disable-protoc configure option in the build).
For the moment you will need to build the compiler and plug-in yourself but we plan on providing a build flag for developers that’s need it.
Tom
From: daos@daos.groups.io <daos@daos.groups.io>
On Behalf Of Nabarro, Tom
This is a development tool so maybe it’s not pulled in by the default build.
I installed as follows:
git clone https://github.com/protobuf-c/protobuf-c cd protobuf-c/ ./autogen.sh ./configure --prefix=/home/tanabarr/protobuf/install PKG_CONFIG_PATH=/home/tanabarr/protobuf/install/lib/pkgconfig make && make install
it is not a plug-in that ships with the stock protobuf compiler package (which ships instead with C++ plugin).
See the <daos>/src/proto/Makefile for some details and this will point you to the following doc if compiler is missing.
https://github.com/daos-stack/daos/blob/master/doc/development.md#protobuf-compiler
Regards, Tom Nabarro – DCG/ESAD M: +44 (0)7786 260986 Skype: tom.nabarro
From: daos@daos.groups.io <daos@daos.groups.io>
On Behalf Of Zhang, Jiafu
Hi Guys,
In https://github.com/daos-stack/daos/tree/master/src/proto/README.md, it uses “protoc -I mgmt --c_out=../mgmt mgmt/srv.proto --plugin=/opt/potobuf/install/bin/protoc-gen-c” to generate C code. In recent DAOS, I cannot find “protoc-gen-c”.
Do you know why?
Thanks. --------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for --------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for --------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for |
|
Re: missing protoc-gen-c
Colin Ngam
This one? doc/dev/development.md
From: <daos@daos.groups.io> on behalf of "Nabarro, Tom" <tom.nabarro@...>
So apologies but that link is dead, anyone know where the contents got moved to? I can’t find any references to relevant content in the docs.
The protobuf-c package is installed during build (I’m using master) and gets pulled in from <daos>/utils/sl/components/__init__.py but we don’t build the protoc* binaries/full compiler to avoid overhead for something that is only required very occasionally for developer purposes (we supply the –disable-protoc configure option in the build).
For the moment you will need to build the compiler and plug-in yourself but we plan on providing a build flag for developers that’s need it.
Tom
From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of
Nabarro, Tom
This is a development tool so maybe it’s not pulled in by the default build.
I installed as follows:
git clone https://github.com/protobuf-c/protobuf-c cd protobuf-c/ ./autogen.sh ./configure --prefix=/home/tanabarr/protobuf/install PKG_CONFIG_PATH=/home/tanabarr/protobuf/install/lib/pkgconfig make && make install
it is not a plug-in that ships with the stock protobuf compiler package (which ships instead with C++ plugin).
See the <daos>/src/proto/Makefile for some details and this will point you to the following doc if compiler is missing.
https://github.com/daos-stack/daos/blob/master/doc/development.md#protobuf-compiler
Regards, Tom Nabarro – DCG/ESAD M: +44 (0)7786 260986 Skype: tom.nabarro
From: daos@daos.groups.io <daos@daos.groups.io>
On Behalf Of Zhang, Jiafu
Hi Guys,
In https://github.com/daos-stack/daos/tree/master/src/proto/README.md, it uses “protoc -I mgmt --c_out=../mgmt mgmt/srv.proto --plugin=/opt/potobuf/install/bin/protoc-gen-c” to generate C code. In recent DAOS, I cannot find “protoc-gen-c”.
Do you know why?
Thanks. --------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for --------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for |
|
Re: missing protoc-gen-c
So apologies but that link is dead, anyone know where the contents got moved to? I can’t find any references to relevant content in the docs.
The protobuf-c package is installed during build (I’m using master) and gets pulled in from <daos>/utils/sl/components/__init__.py but we don’t build the protoc* binaries/full compiler to avoid overhead for something that is only required very occasionally for developer purposes (we supply the –disable-protoc configure option in the build).
For the moment you will need to build the compiler and plug-in yourself but we plan on providing a build flag for developers that’s need it.
Tom
From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of
Nabarro, Tom
This is a development tool so maybe it’s not pulled in by the default build.
I installed as follows:
git clone https://github.com/protobuf-c/protobuf-c cd protobuf-c/ ./autogen.sh ./configure --prefix=/home/tanabarr/protobuf/install PKG_CONFIG_PATH=/home/tanabarr/protobuf/install/lib/pkgconfig make && make install
it is not a plug-in that ships with the stock protobuf compiler package (which ships instead with C++ plugin).
See the <daos>/src/proto/Makefile for some details and this will point you to the following doc if compiler is missing.
https://github.com/daos-stack/daos/blob/master/doc/development.md#protobuf-compiler
Regards, Tom Nabarro – DCG/ESAD M: +44 (0)7786 260986 Skype: tom.nabarro
From: daos@daos.groups.io <daos@daos.groups.io>
On Behalf Of Zhang, Jiafu
Hi Guys,
In https://github.com/daos-stack/daos/tree/master/src/proto/README.md, it uses “protoc -I mgmt --c_out=../mgmt mgmt/srv.proto --plugin=/opt/potobuf/install/bin/protoc-gen-c” to generate C code. In recent DAOS, I cannot find “protoc-gen-c”.
Do you know why?
Thanks. --------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for --------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for |
|
Re: missing protoc-gen-c
This is a development tool so maybe it’s not pulled in by the default build.
I installed as follows:
git clone https://github.com/protobuf-c/protobuf-c cd protobuf-c/ ./autogen.sh ./configure --prefix=/home/tanabarr/protobuf/install PKG_CONFIG_PATH=/home/tanabarr/protobuf/install/lib/pkgconfig make && make install
it is not a plug-in that ships with the stock protobuf compiler package (which ships instead with C++ plugin).
See the <daos>/src/proto/Makefile for some details and this will point you to the following doc if compiler is missing.
https://github.com/daos-stack/daos/blob/master/doc/development.md#protobuf-compiler
Regards, Tom Nabarro – DCG/ESAD M: +44 (0)7786 260986 Skype: tom.nabarro
From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of
Zhang, Jiafu
Hi Guys,
In https://github.com/daos-stack/daos/tree/master/src/proto/README.md, it uses “protoc -I mgmt --c_out=../mgmt mgmt/srv.proto --plugin=/opt/potobuf/install/bin/protoc-gen-c” to generate C code. In recent DAOS, I cannot find “protoc-gen-c”.
Do you know why?
Thanks. --------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for |
|
missing protoc-gen-c
Zhang, Jiafu
Hi Guys,
In https://github.com/daos-stack/daos/tree/master/src/proto/README.md, it uses “protoc -I mgmt --c_out=../mgmt mgmt/srv.proto --plugin=/opt/potobuf/install/bin/protoc-gen-c” to generate C code. In recent DAOS, I cannot find “protoc-gen-c”.
Do you know why?
Thanks. |
|
Re: Cart corp operations
Liu, Xuezhao
Hi,
As the comments of co_post_reply (in cart’s api.h): /** * Collective RPC post-reply callback. * This is an optional callback. If specified, it will execute after * reply is sent to parent (after co_aggregate executes). */
For example, bcast traverse a sub-tree like node_a (parent) -> node_b (self) -> node_c and node_d (children). At request forward phase -- node_b gets request from node_a and forward to node_c+node_d. At reply aggregate phase – node_b will send reply to node_a, but the reply will be delayed until gets and aggregates replies from its children (co_aggregate is for this purpose). When the aggregated reply sent to parent, it will call the co_post_reply to do some cleanup work if it is provided (free memory or whatever else).
Thanks, Xuezhao
From:
<daos@daos.groups.io> on behalf of Colin Ngam <colin.ngam@...>
Greetings,
Is “co_aggeregate” operation called before or after “ca_post_reply” operation?
Thanks.
Colin |
|
Cart corp operations
Colin Ngam
Greetings,
Is “co_aggeregate” operation called before or after “ca_post_reply” operation?
Thanks.
Colin |
|
Re: Anyone seen this DAX oops before?
Kevan Rehm
I am not sure, unfortunately, we didn’t notice when it first happened. We will keep an eye out if it happens again.
Kevan
From: <daos@daos.groups.io> on behalf of "Lombardi, Johann" <johann.lombardi@...>
It has been a while since I last saw a kernel backtrace 😊 Never seen this before. I assume that this happens during pool deletion or disconnect? Johann
From:
<daos@daos.groups.io> on behalf of Kevan Rehm <kevan.rehm@...>
We had an oops occur today, backtrace is below. When I started looking, I see there are 6 of these over the last month, the backtrace is always the same. It seems to be happening in DAX. Does this look familiar?
Thanks, Kevan
# cat backtrace WARNING: CPU: 18 PID: 43720 at fs/dax.c:419 dax_disassociate_entry+0xdb/0x130 Modules linked in: ext4 mbcache jbd2 vfio_pci virtio_pci virtio_ring virtio nfsv3 nfs_acl socwatch2_11(OE) vtsspp(OE) sep5(OE) socperf3(OE) pax(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx5_ib(OE) ib_uverbs(OE) mlx4_ib(OE) ib_core(OE) mlx4_en(OE) mlx4_core(OE) sunrpc iTCO_wdt iTCO_vendor_support skx_edac intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm vfat fat irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd pcspkr joydev mei_me mei lpc_ich i2c_i801 wmi ipmi_si ipmi_devintf ipmi_msghandler dax_pmem device_dax acpi_power_meter acpi_pad knem(OE) ip_tables xfs libcrc32c nd_pmem nd_btt ast i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm ahci crct10dif_pclmul mlx5_core(OE) crct10dif_common crc32c_intel drm libahci mlxfw(OE) ptp pps_core libata vfio_mdev(OE) vfio_iommu_type1 vfio nvme mdev(OE) devlink nvme_core mlx_compat(OE) drm_panel_orientation_quirks nfit libnvdimm dm_mirror dm_region_hash dm_log dm_mod CPU: 18 PID: 43720 Comm: daos_sys_0 Tainted: G OE ------------ T 3.10.0-1127.el7.x86_64 #1 Hardware name: Cray Inc. SYS-2029UZ-TN20R25M/X11DPU-Z+, BIOS 3.2 10/22/2019 Call Trace: [<ffffffffb5b7ff85>] dump_stack+0x19/0x1b [<ffffffffb549bd18>] __warn+0xd8/0x100 [<ffffffffb549be5d>] warn_slowpath_null+0x1d/0x20 [<ffffffffb56a6d0b>] dax_disassociate_entry+0xdb/0x130 [<ffffffffb56a7868>] __dax_invalidate_mapping_entry+0x68/0x120 [<ffffffffb56a9f47>] dax_delete_mapping_entry+0x17/0x50 [<ffffffffb55ce2fc>] truncate_exceptional_entry.part.13+0x1c/0x40 [<ffffffffb55ceaa2>] truncate_inode_pages_range+0x192/0x750 [<ffffffffb55cf0cf>] truncate_inode_pages_final+0x4f/0x60 [<ffffffffc0eeea0f>] ext4_evict_inode+0x10f/0x470 [ext4] [<ffffffffb566b674>] evict+0xb4/0x180 [<ffffffffb566ba9c>] iput+0xfc/0x190 [<ffffffffb5666438>] __dentry_kill+0x158/0x1d0 [<ffffffffb5666ad5>] dput+0xb5/0x1a0 [<ffffffffb564f4dd>] __fput+0x18d/0x230 [<ffffffffb564f66e>] ____fput+0xe/0x10 [<ffffffffb54c31cb>] task_work_run+0xbb/0xe0 [<ffffffffb542cc65>] do_notify_resume+0xa5/0xc0 [<ffffffffb5b9322f>] int_signal+0x12/0x17 [root@delphi-002 oops-2020-05-05-11:59:34-44218-0]#
--------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for |
|