Date
1 - 2 of 2
New daos setup
BASDEN, ALASTAIR G.
Hi,
I am trying to set up a new daos system, and could do with a bit of help. It is a simple test setup, with a single server. The server has 2 CPUs, each with DCPM non-volatile memory attached, 1.6TB per CPU. (12x 256GB DIMMs). And 12x NVMe drives, each 1.6TB. My /etc/daos/daos_server.yml file looks like (the server name is daos): name: daos1 access_points: ['daos'] transport_config: allow_insecure: true client_cert_dir: /etc/daos/daosCA/certs ca_cert: /etc/daos/daosCA/certs/daosCA.crt cert: /etc/daos/daosCA/certs/server.crt key: /etc/daos/daosCA/certs/server.key fabric_ifaces: [ib0] provider: ofi+verbs;ofi_rxm servers: - rank: 0 targets: 6 first_core: 0 fabric_iface: ib0 fabric_iface_port: 20000 scm_mount: /mnt/daos/1 scm_class: dcpm scm_list: [/dev/pmem0] bdev_class: nvme rank: 1 targets: 6 first_core: 24 fabric_iface: ib0 fabric_iface_port: 20001 scm_mount: /mnt/daos/2 scm_class: dcpm scm_list: [/dev/pmem1] bdev_class: nvme To start it, I am running "daos_server". "dmg -i storage scan" then returns: daos:10001: connected Hosts SCM Total NVMe Total ----- --------- ---------- daos 3.2 TB (2 namespaces) 19 TB (12 controllers) First question - what are the rank 0 and rank 1? Can I do it all in a single rank (I read that scm_list can only have 1 entry, hence the reason for putting 2 ranks). If the 2 ranks is correct, do I need to do something to start another one? Currently, the only /dev/pmem1 is mounted, not /dev/pmem0. So it might seem that rank:0 is being ignored. I then run "daos_agent", and the daos_agent.yml file has: name: daos1 access_points: ['daos'] ca_cert: /etc/daos/daosCA/certs/daosCA.crt cert: /etc/daos/daosCA/certs/agent.crt key: /etc/daos/daosCA/certs/agent.key If I think run: dfuse --svc=0 --mountpoint=/mnt/daosusermnt it returns without any complaints, but I don't have anything mounted. How do I proceed from here to get it mounted as a file system? Thanks, Alastair.
|
|
Jacque, Kristin
Hi Alastair,
toggle quoted messageShow quoted text
Running with two ranks is required if you want to use all your DCPM, in your case. The assumption is that each rank is running on a particular CPU, and uses the storage associated with that CPU. I can offer a few tips for getting going as well. First: if you're using "allow_insecure: true", you should ensure it's set in all of the yml config files, and comment out or remove all of the other CA/certificate-related options. If this is inconsistent amongst the different components, they'll be unable to talk to each other. In the daos_server.yml, you may also need to add the bdev_list of NVMe addresses for the SSDs for each rank. Not sure if DAOS can auto-populate that yet if it's not supplied by the config. You can try without it, but if that doesn't work, you may need to add them to the server config and reformat. Are you doing a dmg storage format after you start the daos_server? It should be necessary the first time you start up. This will set up your SCM mount points. If you haven't done this yet, it's probably why your ranks haven't started up. You might have to unmount your SCM beforehand if it's already mounted. Hopefully this will help you get started. Thanks, Kris
-----Original Message-----
From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of BASDEN, ALASTAIR G. Sent: Wednesday, January 27, 2021 10:23 AM To: daos@daos.groups.io Subject: [daos] New daos setup Hi, I am trying to set up a new daos system, and could do with a bit of help. It is a simple test setup, with a single server. The server has 2 CPUs, each with DCPM non-volatile memory attached, 1.6TB per CPU. (12x 256GB DIMMs). And 12x NVMe drives, each 1.6TB. My /etc/daos/daos_server.yml file looks like (the server name is daos): name: daos1 access_points: ['daos'] transport_config: allow_insecure: true client_cert_dir: /etc/daos/daosCA/certs ca_cert: /etc/daos/daosCA/certs/daosCA.crt cert: /etc/daos/daosCA/certs/server.crt key: /etc/daos/daosCA/certs/server.key fabric_ifaces: [ib0] provider: ofi+verbs;ofi_rxm servers: - rank: 0 targets: 6 first_core: 0 fabric_iface: ib0 fabric_iface_port: 20000 scm_mount: /mnt/daos/1 scm_class: dcpm scm_list: [/dev/pmem0] bdev_class: nvme rank: 1 targets: 6 first_core: 24 fabric_iface: ib0 fabric_iface_port: 20001 scm_mount: /mnt/daos/2 scm_class: dcpm scm_list: [/dev/pmem1] bdev_class: nvme To start it, I am running "daos_server". "dmg -i storage scan" then returns: daos:10001: connected Hosts SCM Total NVMe Total ----- --------- ---------- daos 3.2 TB (2 namespaces) 19 TB (12 controllers) First question - what are the rank 0 and rank 1? Can I do it all in a single rank (I read that scm_list can only have 1 entry, hence the reason for putting 2 ranks). If the 2 ranks is correct, do I need to do something to start another one? Currently, the only /dev/pmem1 is mounted, not /dev/pmem0. So it might seem that rank:0 is being ignored. I then run "daos_agent", and the daos_agent.yml file has: name: daos1 access_points: ['daos'] ca_cert: /etc/daos/daosCA/certs/daosCA.crt cert: /etc/daos/daosCA/certs/agent.crt key: /etc/daos/daosCA/certs/agent.key If I think run: dfuse --svc=0 --mountpoint=/mnt/daosusermnt it returns without any complaints, but I don't have anything mounted. How do I proceed from here to get it mounted as a file system? Thanks, Alastair.
|
|